MyArxiv
Robotics
Bab_Sak Robotic Intubation System (BRIS): A Learning-Enabled Control Framework for Safe Fiberoptic Endotracheal Intubation
Endotracheal intubation is a critical yet technically demanding procedure, with failure or improper tube placement leading to severe complications. Existing robotic and teleoperated intubation systems primarily focus on airway navigation and do not provide integrated control of endotracheal tube advancement or objective verification of tube depth relative to the carina. This paper presents the Robotic Intubation System (BRIS), a compact, human-in-the-loop platform designed to assist fiberoptic-guided intubation while enabling real-time, objective depth awareness. BRIS integrates a four-way steerable fiberoptic bronchoscope, an independent endotracheal tube advancement mechanism, and a camera-augmented mouthpiece compatible with standard clinical workflows. A learning-enabled closed-loop control framework leverages real-time shape sensing to map joystick inputs to distal bronchoscope tip motion in Cartesian space, providing stable and intuitive teleoperation under tendon nonlinearities and airway contact. Monocular endoscopic depth estimation is used to classify airway regions and provide interpretable, anatomy-aware guidance for safe tube positioning relative to the carina. The system is validated on high-fidelity airway mannequins under standard and difficult airway configurations, demonstrating reliable navigation and controlled tube placement. These results highlight BRIS as a step toward safer, more consistent, and clinically compatible robotic airway management.
StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
Stereo cameras closely mimic human binocular vision, providing rich spatial cues critical for precise robotic manipulation. Despite their advantage, the adoption of stereo vision in vision-language-action models (VLAs) remains underexplored. In this work, we present StereoVLA, a VLA model that leverages rich geometric cues from stereo vision. We propose a novel Geometric-Semantic Feature Extraction module that utilizes vision foundation models to extract and fuse two key features: 1) geometric features from subtle stereo-view differences for spatial perception; 2) semantic-rich features from the monocular view for instruction following. Additionally, we propose an auxiliary Interaction-Region Depth Estimation task to further enhance spatial perception and accelerate model convergence. Extensive experiments show that our approach outperforms baselines by a large margin in diverse tasks under the stereo setting and demonstrates strong robustness to camera pose variations.
Flexible Multitask Learning with Factorized Diffusion Policy
Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task distributions is often difficult, and existing monolithic models often underfit the action distribution and lack the flexibility required for efficient adaptation. We introduce a novel modular diffusion policy framework that factorizes complex action distributions into a composition of specialized diffusion models, each capturing a distinct sub-mode of the behavior space for a more effective overall policy. In addition, this modular structure enables flexible policy adaptation to new tasks by adding or fine-tuning components, which inherently mitigates catastrophic forgetting. Empirically, across both simulation and real-world robotic manipulation settings, we illustrate how our method consistently outperforms strong modular and monolithic baselines.
Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space
Unmanned aerial vehicles (UAVs) have emerged as powerful embodied agents. One of the core abilities is autonomous navigation in large-scale three-dimensional environments. Existing navigation policies, however, are typically optimized for low-level objectives such as obstacle avoidance and trajectory smoothness, lacking the ability to incorporate high-level semantics into planning. To bridge this gap, we propose ANWM, an aerial navigation world model that predicts future visual observations conditioned on past frames and actions, thereby enabling agents to rank candidate trajectories by their semantic plausibility and navigational utility. ANWM is trained on 4-DoF UAV trajectories and introduces a physics-inspired module: Future Frame Projection (FFP), which projects past frames into future viewpoints to provide coarse geometric priors. This module mitigates representational uncertainty in long-distance visual generation and captures the mapping between 3D trajectories and egocentric observations. Empirical results demonstrate that ANWM significantly outperforms existing world models in long-distance visual forecasting and improves UAV navigation success rates in large-scale environments.
Online Inertia Parameter Estimation for Unknown Objects Grasped by a Manipulator Towards Space Applications
Knowing the inertia parameters of a grasped object is crucial for dynamics-aware manipulation, especially in space robotics with free-floating bases. This work addresses the problem of estimating the inertia parameters of an unknown target object during manipulation. We apply and extend an existing online identification method by incorporating momentum conservation, enabling its use for the floating-base robots. The proposed method is validated through numerical simulations, and the estimated parameters are compared with ground-truth values. Results demonstrate accurate identification in the scenarios, highlighting the method's applicability to on-orbit servicing and other space missions.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Optimal Trajectory Planning for Orbital Robot Rendezvous and Docking
Approaching a tumbling target safely is a critical challenge in space debris removal missions utilizing robotic manipulators onboard servicing satellites. In this work, we propose a trajectory planning method based on nonlinear optimization for a close-range rendezvous to bring a free-floating, rotating debris object in a two-dimensional plane into the manipulator's workspace, as a preliminary step for its capture. The proposed method introduces a dynamic keep-out sphere that adapts depending on the approach conditions, allowing for closer and safer access to the target. Furthermore, a control strategy is developed to reproduce the optimized trajectory using discrete ON/OFF thrusters, considering practical implementation constraints.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
MoonBot: Modular and On-Demand Reconfigurable Robot Toward Moon Base Construction
The allure of lunar surface exploration and development has recently captured widespread global attention. Robots have proved to be indispensable for exploring uncharted terrains, uncovering and leveraging local resources, and facilitating the construction of future human habitats. In this article, we introduce the modular and on-demand reconfigurable robot (MoonBot), a modular and reconfigurable robotic system engineered to maximize functionality while operating within the stringent mass constraints of lunar payloads and adapting to varying environmental conditions and task requirements. This article details the design and development of MoonBot and presents a preliminary field demonstration that validates the proof of concept through the execution of milestone tasks simulating the establishment of lunar infrastructure. These tasks include essential civil engineering operations, infrastructural component transportation and deployment, and assistive operations with inflatable modules. Furthermore, we systematically summarize the lessons learned during testing, focusing on the connector design and providing valuable insights for the advancement of modular robotic systems in future lunar missions.
comment: This is the authors' version of a paper accepted for publication in IEEE Transactions on Field Robotics, (c) IEEE. The final published version is available at https://doi.org/10.1109/TFR.2025.3624346
RoboCade: Gamifying Robot Data Collection
Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of gamified robot data collection, we collect a demonstration dataset through our platform, and show that co-training robot policies with this data can improve success rate on non-gamified target tasks (+16-56%). Further, we conduct a user study to validate that novice users find the gamified platform significantly more enjoyable than a standard non-gamified platform (+24%). These results highlight the promise of gamified data collection as a scalable, accessible, and engaging method for collecting demonstration data.
comment: 10 pages, 9 figures
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.
Transformer Driven Visual Servoing for Fabric Texture Matching Using Dual-Arm Manipulator
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.
comment: 8 pages, 11 figures. Accepted to IEEE Robotics and Automation Letters (RA-L)
Super-LIO: A Robust and Efficient LiDAR-Inertial Odometry System with a Compact Mapping Strategy
LiDAR-Inertial Odometry (LIO) is a foundational technique for autonomous systems, yet its deployment on resource-constrained platforms remains challenging due to computational and memory limitations. We propose Super-LIO, a robust LIO system that demands both high performance and accuracy, ideal for applications such as aerial robots and mobile autonomous systems. At the core of Super-LIO is a compact octo-voxel-based map structure, termed OctVox, that limits each voxel to eight fused subvoxels, enabling strict point density control and incremental denoising during map updates. This design enables a simple yet efficient and accurate map structure, which can be easily integrated into existing LIO frameworks. Additionally, Super-LIO designs a heuristic-guided KNN strategy (HKNN) that accelerates the correspondence search by leveraging spatial locality, further reducing runtime overhead. We evaluated the proposed system using four publicly available datasets and several self-collected datasets, totaling more than 30 sequences. Extensive testing on both X86 and ARM platforms confirms that Super-LIO offers superior efficiency and robustness, while maintaining competitive accuracy. Super-LIO processes each frame approximately 73% faster than SOTA, while consuming less CPU resources. The system is fully open-source and plug-and-play compatible with a wide range of LiDAR sensors and platforms. The implementation is available at: https://github.com/Liansheng-Wang/Super-LIO.git
comment: 8 pages, 5 figures
Learning collision risk proactively from naturalistic driving data at scale
Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. Existing methods either require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors, or are tailored to limited scenarios. Here we present the Generalised Surrogate Safety Measure (GSSM), a data-driven approach that learns collision risk from naturalistic driving without the need for crash or risk labels. Trained over multiple datasets and evaluated on 2,591 real-world crashes and near-crashes, a basic GSSM using only instantaneous motion kinematics achieves an area under the precision-recall curve of 0.9, and secures a median time advance of 2.6 seconds to prevent potential collisions. Incorporating additional interaction patterns and contextual factors provides further performance gains. Across interaction scenarios such as rear-end, merging, and turning, GSSM consistently outperforms existing baselines in accuracy and timeliness. These results establish GSSM as a scalable, context-aware, and generalisable foundation to identify risky interactions before they become unavoidable, supporting proactive safety in autonomous driving systems and traffic incident management. Code and experiment data are openly accessible at https://github.com/Yiru-Jiao/GSSM.
comment: Equation (15) in the previous versions was wrong, which has been corrected since v4
Think, Act, Learn: A Framework for Autonomous Robotic Agents using Closed-Loop Large Language Models
The integration of Large Language Models (LLMs) into robotics has unlocked unprecedented capabilities in high-level task planning. However, most current systems operate in an open-loop fashion, where LLMs act as one-shot planners, rendering them brittle and unable to adapt to unforeseen circumstances in dynamic physical environments. To overcome this limitation, this paper introduces the "Think, Act, Learn" (T-A-L) framework, a novel architecture that enables an embodied agent to autonomously learn and refine its policies through continuous interaction. Our framework establishes a closed-loop cycle where an LLM first "thinks" by decomposing high-level commands into actionable plans. The robot then "acts" by executing these plans while gathering rich, multimodal sensory feedback. Critically, the "learn" module processes this feedback to facilitate LLM-driven self-reflection, allowing the agent to perform causal analysis on its failures and generate corrective strategies. These insights are stored in an experiential memory to guide future planning cycles. We demonstrate through extensive experiments in both simulation and the real world that our T-A-L agent significantly outperforms baseline methods, including open-loop LLMs, Behavioral Cloning, and traditional Reinforcement Learning. Our framework achieves over a 97% success rate on complex, long-horizon tasks, converges to a stable policy in an average of just 9 trials, and exhibits remarkable generalization to unseen tasks. This work presents a significant step towards developing more robust, adaptive, and truly autonomous robotic agents.
comment: 13 pages, 7 figures
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.
comment: 11 pages, 6 figures
Multiagent Systems
MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting NeurIPS 2025
Recent advances in large language models (LLMs) are transforming data-intensive domains, with finance representing a high-stakes environment where transparent and reproducible analysis of heterogeneous signals is essential. Traditional quantitative methods remain vulnerable to survivorship bias, while many AI-driven approaches struggle with signal integration, reproducibility, and computational efficiency. We introduce MASFIN, a modular multi-agent framework that integrates LLMs with structured financial metrics and unstructured news, while embedding explicit bias-mitigation protocols. The system leverages GPT-4.1-nano for reproducability and cost-efficient inference and generates weekly portfolios of 15-30 equities with allocation weights optimized for short-term performance. In an eight-week evaluation, MASFIN delivered a 7.33% cumulative return, outperforming the S&P 500, NASDAQ-100, and Dow Jones benchmarks in six of eight weeks, albeit with higher volatility. These findings demonstrate the promise of bias-aware, generative AI frameworks for financial forecasting and highlight opportunities for modular multi-agent design to advance practical, transparent, and reproducible approaches in quantitative finance.
comment: Accepted to the NeurIPS 2025 Workshop on Generative AI in Finance
Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development
Agentic AI and Multi-Agent Systems are poised to dominate industry and society imminently. Powered by goal-driven autonomy, they represent a powerful form of generative AI, marking a transition from reactive content generation into proactive multitasking capabilities. As an exemplar, we propose an architecture of a multi-agent system for the implementation phase of the software engineering process. We also present a comprehensive threat model for the proposed system. We demonstrate that while such systems can generate code quite accurately, they are vulnerable to attacks, including code injection. Due to their autonomous design and lack of humans in the loop, these systems cannot identify and respond to attacks by themselves. This paper analyzes the vulnerability of multi-agent systems and concludes that the coder-reviewer-tester architecture is more resilient than both the coder and coder-tester architectures, but is less efficient at writing code. We find that by adding a security analysis agent, we mitigate the loss in efficiency while achieving even better resiliency. We conclude by demonstrating that the security analysis agent is vulnerable to advanced code injection attacks, showing that embedding poisonous few-shot examples in the injected code can increase the attack success rate from 0% to 71.95%.
AI Urban Scientist: Multi-Agent Collaborative Automation for Urban Research
Urban research aims to understand how cities operate and evolve as complex adaptive systems. With the rapid growth of urban data and analytical methodologies, the central challenge of the field has shifted from data availability to the integration of heterogeneous data into coherent, verifiable urban knowledge through multidisciplinary approaches. Recent advances in AI, particularly the emergence of large language models (LLMs), have enabled the development of AI scientists capable of autonomous reasoning, hypothesis generation, and data-driven experimentation, demonstrating substantial potential for autonomous urban research. However, most general-purpose AI systems remain misaligned with the domain-specific knowledge, methodological conventions, and inferential standards required in urban studies. Here, we introduce the AI Urban Scientist, a knowledge-driven multi-agent framework designed to support autonomous urban research. Grounded in hypotheses, peer-review feedback, datasets, and research methodologies distilled from large-scale prior studies, the system constructs structured domain knowledge that guides LLM-based agents to automatically generate hypotheses, identify and integrate multi-source urban datasets, conduct empirical analyses and simulations, and iteratively refine analytical methods. Through this process, the framework synthesizes new insights in urban science and accelerates the urban research lifecycle.
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Coopetition refers to simultaneous cooperation and competition among actors who "cooperate to grow the pie and compete to split it up." Modern socio-technical systems are characterized by strategic coopetition in which actors concomitantly cooperate to create value and compete to capture it. While conceptual modeling languages such as i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This technical report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients through a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines comprehensive experimental testing comprising over 22,000 trials across power and logarithmic value function specifications, demonstrating functional form robustness, with empirical application to the Samsung-Sony S-LCD joint venture (2004-2011). This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in multi-agent systems, with companion work addressing trust dynamics, collective action, and reciprocity mechanisms.
comment: 39 pages, 9 figures, Validation artifacts including source code available at https://github.com/vikpant/strategic-coopetition/tree/master/TR_validation/TR1_foundations
Systems and Control (EESS)
Optimal Placement of Data Centers to Support Power Distribution Networks Using Intelligent Algorithms with Economic Indicators
Data centers are among the fastest growing electricity consumers and can impose severe voltage drops and feeder losses when connected to weak distribution networks. This paper formulates a techno economic siting problem in which each candidate data center site is mapped to a bus of the distribution network and is assumed to deploy on site renewable generation and power electronic interfaces, resulting in a controllable net active power injection equivalent to distributed generation. A mixed integer nonlinear optimization model is developed to jointly select the connection bus and size the DG capacity while respecting network operating limits. The objective combines three normalized terms including active power losses, a voltage deviation index capturing profile quality, and investment cost derived from location dependent land price and unit DG cost. To address the discrete continuous search space, an intelligent genetic algorithm is embedded in a multi scenario decision framework with adaptive weight tuning. Three stakeholder scenarios prioritize losses, voltage quality, or techno economic balance, and additional balanced scenarios are generated automatically until the optimal bus decision converges. A case study on the IEEE 33 bus radial system demonstrates the effectiveness of the approach. The converged design selects bus 14 with 1.10 MW DG, reducing total losses from 202.67 kW to 129.37 kW while improving the minimum bus voltage to 0.933 per unit at a moderate investment cost of 1.33 MUSD. The proposed framework provides an interpretable pathway to integrate economic indicators into distribution aware data center siting.
comment: 7 pages, 4 figures, 4 tables
Optimal Trajectory Planning for Orbital Robot Rendezvous and Docking
Approaching a tumbling target safely is a critical challenge in space debris removal missions utilizing robotic manipulators onboard servicing satellites. In this work, we propose a trajectory planning method based on nonlinear optimization for a close-range rendezvous to bring a free-floating, rotating debris object in a two-dimensional plane into the manipulator's workspace, as a preliminary step for its capture. The proposed method introduces a dynamic keep-out sphere that adapts depending on the approach conditions, allowing for closer and safer access to the target. Furthermore, a control strategy is developed to reproduce the optimized trajectory using discrete ON/OFF thrusters, considering practical implementation constraints.
comment: Author's version of a manuscript accepted at the International Conference on Space Robotics 2025 (iSpaRo 2025). (c) IEEE
Low-cost Pyranometer-Based ANN Approach for MPPT in Solar PV Systems
This article presents a study on the application of artificial neural networks (ANNs) for maximum power point tracking (MPPT) in photovoltaic (PV) systems using low-cost pyranometer sensors. The proposed approach integrates pyranometers, temperature sensors, and an ANN to estimate the duty cycle of a DC/DC converter, enabling the system to consistently operate at its maximum power point. The strategy was implemented in the local control of a Cuk converter and experimentally validated against the conventional Perturb and Observe (P&O) method. Results demonstrate that the ANN-based technique, leveraging affordable sensor technology, achieves accurate MPPT performance with reduced fluctuations, enhancing the responsiveness and efficiency of PV tracking systems.
comment: License corrected. Content unchanged
Client-Aided Secure Two-Party Computation of Dynamic Controllers
In this paper, we propose a secure two-party computation protocol for dynamic controllers using a secret sharing scheme. The proposed protocol realizes outsourcing of controller computation to two servers, while controller parameters, states, inputs, and outputs are kept secret against the servers. Unlike previous encrypted controls in a single-server setting, the proposed method can operate a dynamic controller for an infinite time horizon without controller state decryption or input re-encryption. We show that the control performance achievable by the proposed protocol can be made arbitrarily close to that attained by the unencrypted controller. Furthermore, system-theoretic and cryptographic modifications of the protocol are presented to improve the communication complexity. The feasibility of the protocol is demonstrated through numerical examples of PID and observer-based controls.
comment: 12 pages, 4 figures
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Robotics
HELP: Hierarchical Embodied Language Planner for Household Tasks
Embodied agents tasked with complex scenarios, whether in real or simulated environments, rely heavily on robust planning capabilities. When instructions are formulated in natural language, large language models (LLMs) equipped with extensive linguistic knowledge can play this role. However, to effectively exploit the ability of such models to handle linguistic ambiguity, to retrieve information from the environment, and to be based on the available skills of an agent, an appropriate architecture must be designed. We propose a Hierarchical Embodied Language Planner, called HELP, consisting of a set of LLM-based agents, each dedicated to solving a different subtask. We evaluate the proposed approach on a household task and perform real-world experiments with an embodied agent. We also focus on the use of open source LLMs with a relatively small number of parameters, to enable autonomous deployment.
MAction-SocialNav: Multi-Action Socially Compliant Navigation via Reasoning-enhanced Prompt Tuning
Socially compliant navigation requires robots to move safely and appropriately in human-centered environments by respecting social norms. However, social norms are often ambiguous, and in a single scenario, multiple actions may be equally acceptable. Most existing methods simplify this problem by assuming a single correct action, which limits their ability to handle real-world social uncertainty. In this work, we propose MAction-SocialNav, an efficient vision language model for socially compliant navigation that explicitly addresses action ambiguity, enabling generating multiple plausible actions within one scenario. To enhance the model's reasoning capability, we introduce a novel meta-cognitive prompt (MCP) method. Furthermore, to evaluate the proposed method, we curate a multi-action socially compliant navigation dataset that accounts for diverse conditions, including crowd density, indoor and outdoor environments, and dual human annotations. The dataset contains 789 samples, each with three-turn conversation, split into 710 training samples and 79 test samples through random selection. We also design five evaluation metrics to assess high-level decision precision, safety, and diversity. Extensive experiments demonstrate that the proposed MAction-SocialNav achieves strong social reasoning performance while maintaining high efficiency, highlighting its potential for real-world human robot navigation. Compared with zero-shot GPT-4o and Claude, our model achieves substantially higher decision quality (APG: 0.595 vs. 0.000/0.025) and safety alignment (ER: 0.264 vs. 0.642/0.668), while maintaining real-time efficiency (1.524 FPS, over 3x faster).
Structural Induced Exploration for Balanced and Scalable Multi-Robot Path Planning
Multi-robot path planning is a fundamental yet challenging problem due to its combinatorial complexity and the need to balance global efficiency with fair task allocation among robots. Traditional swarm intelligence methods, although effective on small instances, often converge prematurely and struggle to scale to complex environments. In this work, we present a structure-induced exploration framework that integrates structural priors into the search process of the ant colony optimization (ACO). The approach leverages the spatial distribution of the task to induce a structural prior at initialization, thereby constraining the search space. The pheromone update rule is then designed to emphasize structurally meaningful connections and incorporates a load-aware objective to reconcile the total travel distance with individual robot workload. An explicit overlap suppression strategy further ensures that tasks remain distinct and balanced across the team. The proposed framework was validated on diverse benchmark scenarios covering a wide range of instance sizes and robot team configurations. The results demonstrate consistent improvements in route compactness, stability, and workload distribution compared to representative metaheuristic baselines. Beyond performance gains, the method also provides a scalable and interpretable framework that can be readily applied to logistics, surveillance, and search-and-rescue applications where reliable large-scale coordination is essential.
comment: 20pages, 6Figues
AstraNav-Memory: Contexts Compression for Long Memory
Lifelong embodied navigation requires agents to accumulate, retain, and exploit spatial-semantic experience across tasks, enabling efficient exploration in novel environments and rapid goal reaching in familiar ones. While object-centric memory is interpretable, it depends on detection and reconstruction pipelines that limit robustness and scalability. We propose an image-centric memory framework that achieves long-term implicit memory via an efficient visual context compression module end-to-end coupled with a Qwen2.5-VL-based navigation policy. Built atop a ViT backbone with frozen DINOv3 features and lightweight PixelUnshuffle+Conv blocks, our visual tokenizer supports configurable compression rates; for example, under a representative 16$\times$ compression setting, each image is encoded with about 30 tokens, expanding the effective context capacity from tens to hundreds of images. Experimental results on GOAT-Bench and HM3D-OVON show that our method achieves state-of-the-art navigation performance, improving exploration in unfamiliar environments and shortening paths in familiar ones. Ablation studies further reveal that moderate compression provides the best balance between efficiency and accuracy. These findings position compressed image-centric memory as a practical and scalable interface for lifelong embodied agents, enabling them to reason over long visual histories and navigate with human-like efficiency.
SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
High-fidelity and controllable 3D simulation is essential for addressing the long-tail data scarcity in Autonomous Driving (AD), yet existing methods struggle to simultaneously achieve photorealistic rendering and interactive traffic editing. Current approaches often falter in large-angle novel view synthesis and suffer from geometric or lighting artifacts during asset manipulation. To address these challenges, we propose SymDrive, a unified diffusion-based framework capable of joint high-quality rendering and scene editing. We introduce a Symmetric Auto-regressive Online Restoration paradigm, which constructs paired symmetric views to recover fine-grained details via a ground-truth-guided dual-view formulation and utilizes an auto-regressive strategy for consistent lateral view generation. Furthermore, we leverage this restoration capability to enable a training-free harmonization mechanism, treating vehicle insertion as context-aware inpainting to ensure seamless lighting and shadow consistency. Extensive experiments demonstrate that SymDrive achieves state-of-the-art performance in both novel-view enhancement and realistic 3D vehicle insertion.
World-Coordinate Human Motion Retargeting via SAM 3D Body
Recovering world-coordinate human motion from monocular videos with humanoid robot retargeting is significant for embodied intelligence and robotics. To avoid complex SLAM pipelines or heavy temporal models, we propose a lightweight, engineering-oriented framework that leverages SAM 3D Body (3DB) as a frozen perception backbone and uses the Momentum HumanRig (MHR) representation as a robot-friendly intermediate. Our method (i) locks the identity and skeleton-scale parameters of per tracked subject to enforce temporally consistent bone lengths, (ii) smooths per-frame predictions via efficient sliding-window optimization in the low-dimensional MHR latent space, and (iii) recovers physically plausible global root trajectories with a differentiable soft foot-ground contact model and contact-aware global optimization. Finally, we retarget the reconstructed motion to the Unitree G1 humanoid using a kinematics-aware two-stage inverse kinematics pipeline. Results on real monocular videos show that our method has stable world trajectories and reliable robot retargeting, indicating that structured human representations with lightweight physical constraints can yield robot-ready motion from monocular input.
A Novel Robotic Variable Stiffness Mechanism Based on Helically Wound Structured Electrostatic Layer Jamming
This paper introduces a novel variable stiffness mechanism termed Helically Wound Structured Electrostatic Layer Jamming (HWS-ELJ) and systematically investigates its potential applications in variable stiffness robotic finger design. The proposed method utilizes electrostatic attraction to enhance interlayer friction, thereby suppressing relative sliding and enabling tunable stiffness. Compared with conventional planar ELJ, the helical configuration of HWS-ELJ provides exponentially increasing stiffness adjustment with winding angle, achieving significantly greater stiffness enhancement for the same electrode contact area while reducing the required footprint under equivalent stiffness conditions. Considering the practical advantage of voltage-based control, a series of experimental tests under different initial force conditions were conducted to evaluate the stiffness modulation characteristics of HWS-ELJ. The results demonstrated its rational design and efficacy, with outcomes following the deduced theoretical trends. Furthermore, a robotic finger prototype integrating HWS-ELJ was developed, demonstrating voltage-driven stiffness modulation and confirming the feasibility of the proposed robotic variable stiffness mechanism.
Spatiotemporal Tubes for Probabilistic Temporal Reach-Avoid-Stay Task in Uncertain Dynamic Environment
In this work, we extend the Spatiotemporal Tube (STT) framework to address Probabilistic Temporal Reach-Avoid-Stay (PrT-RAS) tasks in dynamic environments with uncertain obstacles. We develop a real-time tube synthesis procedure that explicitly accounts for time-varying uncertain obstacles and provides formal probabilistic safety guarantees. The STT is formulated as a time-varying ball in the state space whose center and radius evolve online based on uncertain sensory information. We derive a closed-form, approximation-free control law that confines the system trajectory within the tube, ensuring both probabilistic safety and task satisfaction. Our method offers a formal guarantee for probabilistic avoidance and finite-time task completion. The resulting controller is model-free, approximation-free, and optimization-free, enabling efficient real-time execution while guaranteeing convergence to the target. The effectiveness and scalability of the framework are demonstrated through simulation studies and hardware experiments on mobile robots, a UAV, and a 7-DOF manipulator navigating in cluttered and uncertain environments.
World-Coordinate Human Motion Retargeting via SAM 3D Body
Recovering world-coordinate human motion from monocular videos with humanoid robot retargeting is significant for embodied intelligence and robotics. To avoid complex SLAM pipelines or heavy temporal models, we propose a lightweight, engineering-oriented framework that leverages SAM 3D Body (3DB) as a frozen perception backbone and uses the Momentum HumanRig (MHR) representation as a robot-friendly intermediate. Our method (i) locks the identity and skeleton-scale parameters of per tracked subject to enforce temporally consistent bone lengths, (ii) smooths per-frame predictions via efficient sliding-window optimization in the low-dimensional MHR latent space, and (iii) recovers physically plausible global root trajectories with a differentiable soft foot-ground contact model and contact-aware global optimization. Finally, we retarget the reconstructed motion to the Unitree G1 humanoid using a kinematics-aware two-stage inverse kinematics pipeline. Results on real monocular videos show that our method has stable world trajectories and reliable robot retargeting, indicating that structured human representations with lightweight physical constraints can yield robot-ready motion from monocular input.
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by 64% on four challenging tasks (book flipping, belt assembly, cable routing, and gear insertion) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io
Planar Juggling of a Devil-Stick using Discrete VHCs
Planar juggling of a devil-stick using impulsive inputs is addressed using the concept of discrete virtual holonomic constraints (DVHC). The location of the center-of-mass of the devil-stick is specified in terms of its orientation at the discrete instants when impulsive control inputs are applied. The discrete zero dynamics (DZD) resulting from the choice of DVHC provides conditions for stable juggling. A control design that enforces the DVHC and an orbit stabilizing controller are presented. The approach is validated in simulation.
comment: 9 pages, 7 figures; this is an extended version of the article published in the IEEE Control Systems Letters
How Robot Dogs See the Unseeable: Improving Visual Interpretability via Peering for Exploratory Robots
Occlusion from obstacles, such as foliage, can severely obstruct a robot's sensors, impairing scene understanding. We show that "peering", a characteristic side-to-side movement used by insects to overcome their visual limitations, can also allow robots to markedly improve visual reasoning under partial occlusion. This is accomplished by applying core signal processing principles, specifically optical synthetic aperture sensing, together with the vision reasoning capabilities of modern large multimodal models. Peering enables real-time, high-resolution, and wavelength-independent perception, which is crucial for vision-based scene understanding across a wide range of applications. The approach is low-cost and immediately deployable on any camera-equipped robot. We investigated different peering motions and occlusion masking strategies, demonstrating that, unlike peering, state-of-the-art multi-view 3D vision techniques fail in these conditions due to their high susceptibility to occlusion. Robots that see through occlusion will gain superior perception abilities - including enhanced scene understanding, situational awareness, camouflage breaking, and advanced navigation.
UniTacHand: Unified Spatio-Tactile Representation for Human to Robotic Hand Skill Transfer
Tactile sensing is crucial for robotic hands to achieve human-level dexterous manipulation, especially in scenarios with visual occlusion. However, its application is often hindered by the difficulty of collecting large-scale real-world robotic tactile data. In this study, we propose to collect low-cost human manipulation data using haptic gloves for tactile-based robotic policy learning. The misalignment between human and robotic tactile data makes it challenging to transfer policies learned from human data to robots. To bridge this gap, we propose UniTacHand, a unified representation to align robotic tactile information captured by dexterous hands with human hand touch obtained from gloves. First, we project tactile signals from both human hands and robotic hands onto a morphologically consistent 2D surface space of the MANO hand model. This unification standardizes the heterogeneous data structures and inherently embeds the tactile signals with spatial context. Then, we introduce a contrastive learning method to align them into a unified latent space, trained on only 10 minutes of paired data from our data collection system. Our approach enables zero-shot tactile-based policy transfer from humans to a real robot, generalizing to objects unseen in the pre-training data. We also demonstrate that co-training on mixed data, including both human and robotic demonstrations via UniTacHand, yields better performance and data efficiency compared with using only robotic data. UniTacHand paves a path toward general, scalable, and data-efficient learning for tactile-based dexterous hands.
FAR-AVIO: Fast and Robust Schur-Complement Based Acoustic-Visual-Inertial Fusion Odometry with Sensor Calibration
Underwater environments impose severe challenges to visual-inertial odometry systems, as strong light attenuation, marine snow and turbidity, together with weakly exciting motions, degrade inertial observability and cause frequent tracking failures over long-term operation. While tightly coupled acoustic-visual-inertial fusion, typically implemented through an acoustic Doppler Velocity Log (DVL) integrated with visual-inertial measurements, can provide accurate state estimation, the associated graph-based optimization is often computationally prohibitive for real-time deployment on resource-constrained platforms. Here we present FAR-AVIO, a Schur-Complement based, tightly coupled acoustic-visual-inertial odometry framework tailored for underwater robots. FAR-AVIO embeds a Schur complement formulation into an Extended Kalman Filter(EKF), enabling joint pose-landmark optimization for accuracy while maintaining constant-time updates by efficiently marginalizing landmark states. On top of this backbone, we introduce Adaptive Weight Adjustment and Reliability Evaluation(AWARE), an online sensor health module that continuously assesses the reliability of visual, inertial and DVL measurements and adaptively regulates their sigma weights, and we develop an efficient online calibration scheme that jointly estimates DVL-IMU extrinsics, without dedicated calibration manoeuvres. Numerical simulations and real-world underwater experiments consistently show that FAR-AVIO outperforms state-of-the-art underwater SLAM baselines in both localization accuracy and computational efficiency, enabling robust operation on low-power embedded platforms. Our implementation has been released as open source software at https://far-vido.gitbook.io/far-vido-docs.
VTAO-BiManip: Masked Visual-Tactile-Action Pre-training with Object Understanding for Bimanual Dexterous Manipulation
Bimanual dexterous manipulation remains significant challenges in robotics due to the high DoFs of each hand and their coordination. Existing single-hand manipulation techniques often leverage human demonstrations to guide RL methods but fail to generalize to complex bimanual tasks involving multiple sub-skills. In this paper, we introduce VTAO-BiManip, a novel framework that combines visual-tactile-action pretraining with object understanding to facilitate curriculum RL to enable human-like bimanual manipulation. We improve prior learning by incorporating hand motion data, providing more effective guidance for dual-hand coordination than binary tactile feedback. Our pretraining model predicts future actions as well as object pose and size using masked multimodal inputs, facilitating cross-modal regularization. To address the multi-skill learning challenge, we introduce a two-stage curriculum RL approach to stabilize training. We evaluate our method on a bottle-cap unscrewing task, demonstrating its effectiveness in both simulated and real-world environments. Our approach achieves a success rate that surpasses existing visual-tactile pretraining methods by over 20%.
Motus: A Unified Latent Action World Model
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinders learning from large-scale, heterogeneous data. In this paper, we propose Motus, a unified latent action world model that leverages existing general pretrained models and rich, sharable motion information. Motus introduces a Mixture-of-Transformer (MoT) architecture to integrate three experts (i.e., understanding, video generation, and action) and adopts a UniDiffuser-style scheduler to enable flexible switching between different modeling modes (i.e., world models, vision-language-action models, inverse dynamics models, video generation models, and video-action joint prediction models). Motus further leverages the optical flow to learn latent actions and adopts a recipe with three-phase training pipeline and six-layer data pyramid, thereby extracting pixel-level "delta action" and enabling large-scale action pretraining. Experiments show that Motus achieves superior performance against state-of-the-art methods in both simulation (a +15% improvement over X-VLA and a +45% improvement over Pi0.5) and real-world scenarios(improved by +11~48%), demonstrating unified modeling of all functionalities and priors significantly benefits downstream robotic tasks.
PlanScope: Learning to Plan Within Decision Scope for Urban Autonomous Driving
In the context of urban autonomous driving, imitation learning-based methods have shown remarkable effectiveness, with a typical practice to minimize the discrepancy between expert driving logs and predictive decision sequences. As expert driving logs natively contain future short-term decisions with respect to events, such as sudden obstacles or rapidly changing traffic signals. We believe that unpredictable future events and corresponding expert reactions can introduce reasoning disturbances, negatively affecting the convergence efficiency of planning models. At the same time, long-term decision information, such as maintaining a reference lane or avoiding stationary obstacles, is essential for guiding short-term decisions. Our preliminary experiments on shortening the planning horizon show a rise-and-fall trend in driving performance, supporting these hypotheses. Based on these insights, we present PlanScope, a sequential-decision-learning framework with novel techniques for separating short-term and long-term decisions in decision logs. To identify and extract each decision component, the Wavelet Transform on trajectory profiles is proposed. After that, to enhance the detail-generating ability of Neural Networks, extra Detail Decoders are proposed. Finally, to enable in-scope decision supervision across detail levels, Multi-Scope Supervision strategies are adopted during training. The proposed methods, especially the time-dependent normalization, outperform baseline models in closed-loop evaluations on the nuPlan dataset, offering a plug-and-play solution to enhance existing planning models.
Design and Research of a Self-Propelled Pipeline Robot Based on Force Analysis and Dynamic Simulation
In pipeline inspection, traditional tethered inspection robots are severely constrained by cable length and weight, which greatly limit their travel range and accessibility. To address these issues, this paper proposes a self-propelled pipeline robot design based on force analysis and dynamic simulation, with a specific focus on solving core challenges including vertical climbing failure and poor passability in T-branch pipes. Adopting a wheeled configuration and modular design, the robot prioritizes the core demand of body motion control. Specifically, 3D modeling of the robot was first completed using SolidWorks. Subsequently, the model was imported into ADAMS for dynamic simulation, which provided a basis for optimizing the drive module and motion control strategy.To verify the robot's dynamic performance, an experimental platform with acrylic pipes was constructed. Through adjusting its body posture to surmount obstacles and select directions, the robot has demonstrated its ability to stably traverse various complex pipeline scenarios. Notably, this work offers a technical feasibility reference for the application of pipeline robots in the inspection of medium and low-pressure urban gas pipelines.
comment: The article needs further revision
Research on Dead Reckoning Algorithm for Self-Propelled Pipeline Robots in Three-Dimensional Complex Pipelines
In the field of gas pipeline location, existing pipeline location methods mostly rely on pipeline location instruments. However, when faced with complex and curved pipeline scenarios, these methods often fail due to problems such as cable entanglement and insufficient equipment flexibility. To address this pain point, we designed a self-propelled pipeline robot. This robot can autonomously complete the location work of complex and curved pipelines in complex pipe networks without external dragging. In terms of pipeline mapping technology, traditional visual mapping and laser mapping methods are easily affected by lighting conditions and insufficient features in the confined space of pipelines, resulting in mapping drift and divergence problems. In contrast, the pipeline location method that integrates inertial navigation and wheel odometers is less affected by pipeline environmental factors. Based on this, this paper proposes a pipeline robot location method based on extended Kalman filtering (EKF). Firstly, the body attitude angle is initially obtained through an inertial measurement unit (IMU). Then, the extended Kalman filtering algorithm is used to improve the accuracy of attitude angle estimation. Finally, high-precision pipeline location is achieved by combining wheel odometers. During the testing phase, the roll wheels of the pipeline robot needed to fit tightly against the pipe wall to reduce slippage. However, excessive tightness would reduce the flexibility of motion control due to excessive friction. Therefore, a balance needed to be struck between the robot's motion capability and positioning accuracy. Experiments were conducted using the self-propelled pipeline robot in a rectangular loop pipeline, and the results verified the effectiveness of the proposed dead reckoning algorithm.
comment: The paper needs further revision
Multiagent Systems
Multi-agent Adaptive Mechanism Design
We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no truthful adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.
PERELMAN: Pipeline for scientific literature meta-analysis. Technical report
We present PERELMAN (PipEline foR sciEntific Literature Meta-ANalysis), an agentic framework designed to extract specific information from a large corpus of scientific articles to support large-scale literature reviews and meta-analyses. Our central goal is to reliably transform heterogeneous article content into a unified, machine-readable representation. PERELMAN first elicits domain knowledge-including target variables, inclusion criteria, units, and normalization rules-through a structured dialogue with a subject-matter expert. This domain knowledge is then reused across multiple stages of the pipeline and guides coordinated agents in extracting evidence from narrative text, tables, and figures, enabling consistent aggregation across studies. In order to assess reproducibility and validate our implementation, we evaluate the system on the task of reproducing the meta-analysis of layered Li-ion cathode properties (NMC811 material). We describe our solution, which has the potential to reduce the time required to prepare meta-analyses from months to minutes.
Democratizing Drug Discovery with an Orchestrated, Knowledge-Driven Multi-Agent Team for User-Guided Therapeutic Design
Therapeutic discovery remains a formidable challenge, impeded by the fragmentation of specialized domains and the execution gap between computational design and physiological validation. Although generative AI offers promise, current models often function as passive assistants rather than as autonomous executors. Here, we introduce OrchestRA, a human-in-the-loop multi-agent platform that unifies biology, chemistry, and pharmacology into an autonomous discovery engine. Unlike static code generators, our agents actively execute simulations and reason the results to drive iterative optimization. Governed by an Orchestrator, a Biologist Agent leverages deep reasoning over a massive knowledge graph (>10 million associations) to pinpoint high-confidence targets; a Chemist Agent autonomously detects structural pockets for de novo design or drug repositioning; and a Pharmacologist Agent evaluates candidates via rigorous physiologically based pharmacokinetic (PBPK) simulations. This architecture establishes a dynamic feedback loop where pharmacokinetic and toxicity profiles directly trigger structural reoptimization. By seamlessly integrating autonomous execution with human guidance, OrchestRA democratizes therapeutic design, transforming drug discovery from a stochastic search to a programmable evidence-based engineering discipline.
comment: 51 pages, 4 figures (with supplementary information)
The AI Committee: A Multi-Agent Framework for Automated Validation and Remediation of Web-Sourced Data
Many research areas rely on data from the web to gain insights and test their methods. However, collecting comprehensive research datasets often demands manually reviewing many web pages to identify and record relevant data points, which is labor-intensive and susceptible to error. While the emergence of large language models (LLM)-powered web agents has begun to automate parts of this process, they often struggle to ensure the validity of the data they collect. Indeed, these agents exhibit several recurring failure modes - including hallucinating or omitting values, misinterpreting page semantics, and failing to detect invalid information - which are subtle and difficult to detect and correct manually. To address this, we introduce the AI Committee, a novel model-agnostic multi-agent system that automates the process of validating and remediating web-sourced datasets. Each agent is specialized in a distinct task in the data quality assurance pipeline, from source scrutiny and fact-checking to data remediation and integrity validation. The AI Committee leverages various LLM capabilities - including in-context learning for dataset adaptation, chain-of-thought reasoning for complex semantic validation, and a self-correction loop for data remediation - all without task-specific training. We demonstrate the effectiveness of our system by applying it to three real-world datasets, showing that it generalizes across LLMs and significantly outperforms baseline approaches, achieving data completeness up to 78.7% and precision up to 100%. We additionally conduct an ablation study demonstrating the contribution of each agent to the Committee's performance. This work is released as an open-source tool for the research community.
A Plan Reuse Mechanism for LLM-Driven Agent
Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests' semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
comment: This paper is an English version of A Plan Reuse Mechanism for LLM-Driven Agent published in 2024 in the Journal of Computer Research and Development
Systems and Control (EESS)
Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers
AI data centers which are GPU centric, have adopted liquid cooling to handle extreme heat loads, but coolant leaks result in substantial energy loss through unplanned shutdowns and extended repair periods. We present a proof-of-concept smart IoT monitoring system combining LSTM neural networks for probabilistic leak forecasting with Random Forest classifiers for instant detection. Testing on synthetic data aligned with ASHRAE 2021 standards, our approach achieves 96.5% detection accuracy and 87% forecasting accuracy at 90% probability within plus or minus 30-minute windows. Analysis demonstrates that humidity, pressure, and flow rate deliver strong predictive signals, while temperature exhibits minimal immediate response due to thermal inertia in server hardware. The system employs MQTT streaming, InfluxDB storage, and Streamlit dashboards, forecasting leaks 2-4 hours ahead while identifying sudden events within 1 minute. For a typical 47-rack facility, this approach could prevent roughly 1,500 kWh annual energy waste through proactive maintenance rather than reactive emergency procedures. While validation remains synthetic-only, results establish feasibility for future operational deployment in sustainable data center operations.
comment: 7 pages, 6 figures, IEEE conference format
Economic and Reliability Value of Improved Offshore Wind Forecasting in Bulk Power Grid Operation: A Case Study of The New York Power Grid
This study investigates the economic and reliability benefits of improved offshore wind forecasting for grid operations along the U.S. East Coast. We introduce and evaluate a state-of-the-art, machine-learning-based offshore wind forecasting model tailored for this region by integrating its improved forecasts into a dynamic reserve procurement framework aligned with New York Independent System Operator (NYISO) practices to evaluate their economic value. To determine system-wide reserve needs, plant-specific reserves are aggregated. However, conventional methods overlook spatial correlation across sites, often leading to over procurement. To address this, we propose a risk-based reserve aggregation technique that leverages spatial diversification. Additionally, we evaluate the reliability improvements enabled by the enhanced offshore wind forecast. To evaluate the operational impact, we propose an operational resource adequacy framework that captures uncertainty from forecast errors and grid conditions. Using this framework, we quantify key reliability metrics under different offshore wind forecast scenarios. Using New York State as a case study, we find that the improved forecast enables more accurate reserve estimation, reducing procurement costs by 5.53% in 2035 scenario compared to a well-validated numerical weather prediction model. Applying the risk-based aggregation further reduces total production costs by 7.21%. From a reliability perspective, the improved forecasts lower the system Loss of Load Probability (LOLP) by approximately 19% in the 2035 scenario, highlighting its potential to enhance system reliability during real-time grid operations.
comment: Submitted to Applied Energy
Asynchronous Averaging on Dynamic Graphs with Selective Neighborhood Contraction
We study a discrete-time consensus model in which agents iteratively update their states through interactions on a dynamic social network. At each step, a single agent is selected asynchronously and averages the values of its current neighbors. A distinctive feature of our model is that an agent's neighborhood may contract following an update, while non-selected agents may add or remove neighbors independently. This creates a time-varying communication structure with endogenous contraction. We show that under mild assumptions--specifically, that the evolving graph is connected infinitely often--the system reaches consensus almost surely. Our results extend classical consensus theory on time-varying graphs and asynchronous updates by introducing selective neighborhood contraction, offering new insights into agreement dynamics in evolving social systems.
comment: 10 pages, 12 figures
Towards Learning-Based Formula 1 Race Strategies
This paper presents two complementary frameworks to optimize Formula 1 race strategies, jointly accounting for energy allocation, tire wear and pit stop timing. First, the race scenario is modeled using lap time maps and a dynamic tire wear model capturing the main trade-offs arising during a race. Then, we solve the problem by means of a mixed-integer nonlinear program that handles the integer nature of the pit stop decisions. The same race scenario is embedded into a reinforcement learning environment, on which an agent is trained. Providing fast inference at runtime, this method is suited to improve human decision-making during real races. The learned policy's suboptimality is assessed with respect to the optimal solution, both in a nominal scenario and with an unforeseen disturbance. In both cases, the agent achieves approximately 5s of suboptimality on 1.5h of race time, mainly attributable to the different energy allocation strategy. This work lays the foundations for learning-based race strategies and provides a benchmark for future developments.
Dynamic Cooperative Strategies in Search Engine Advertising Market: With and Without Retail Competition
In search engine advertising (SEA) market, where competition among retailers is intense and multifaceted, channel coordination between retailers and manufacturers emerges as a critical factor, which significantly influences the effectiveness of advertising strategies. This research attempts to provide managerial guidelines for cooperative advertising in the SEA context by modeling two cooperative advertising decision scenarios. Scenario I defines a simple cooperative channel consisting of one manufacturer and one retailer. In Scenario II, we consider a more general setting where there is an independent retailer who competes with the Manufacturer-Retailer alliance in Scenario I. We propose a novel cooperative advertising optimization model, wherein a manufacturer can advertise product directly through SEA campaigns and indirectly by subsidizing its retailer. To highlight the distinctive features of SEA, our model incorporates dynamic quality scores and focuses on a finite time horizon. In each scenario, we provide a feasible equilibrium solution of optimal policies for all members. Subsequently, we conduct numerical experiments to perform sensitivity analysis for both the quality score and gross margin. Additionally, we explore the impact of the initial market share of the competing retailer in Scenario II. Finally, we investigate how retail competition affects the cooperative alliance's optimal strategy and channel performance. Our identified properties derived from the equilibrium and numerical analyses offer crucial insights for participants engaged in cooperative advertising within the SEA market.
comment: 60 pages, 17 figures,6 tables
Spatiotemporal Tubes for Probabilistic Temporal Reach-Avoid-Stay Task in Uncertain Dynamic Environment
In this work, we extend the Spatiotemporal Tube (STT) framework to address Probabilistic Temporal Reach-Avoid-Stay (PrT-RAS) tasks in dynamic environments with uncertain obstacles. We develop a real-time tube synthesis procedure that explicitly accounts for time-varying uncertain obstacles and provides formal probabilistic safety guarantees. The STT is formulated as a time-varying ball in the state space whose center and radius evolve online based on uncertain sensory information. We derive a closed-form, approximation-free control law that confines the system trajectory within the tube, ensuring both probabilistic safety and task satisfaction. Our method offers a formal guarantee for probabilistic avoidance and finite-time task completion. The resulting controller is model-free, approximation-free, and optimization-free, enabling efficient real-time execution while guaranteeing convergence to the target. The effectiveness and scalability of the framework are demonstrated through simulation studies and hardware experiments on mobile robots, a UAV, and a 7-DOF manipulator navigating in cluttered and uncertain environments.
Convergence Analysis of Natural Power Method and Its Applications to Control
This paper analyzes the discrete-time natural power method, demonstrating its convergence to the dominant $r$-dimensional subspace corresponding to the $r$ eigenvalues with the largest absolute values. This contrasts with the Oja flow, which targets eigenvalues with the largest real parts. We leverage this property to develop methods for model order reduction and low-rank controller synthesis for discrete-time LTI systems, proving preservation of key system properties. We also extend the low-rank control framework to slowly-varying LTV systems, showing its utility for tracking time-varying dominant subspaces.
comment: 6 pages. submitted
Planar Juggling of a Devil-Stick using Discrete VHCs
Planar juggling of a devil-stick using impulsive inputs is addressed using the concept of discrete virtual holonomic constraints (DVHC). The location of the center-of-mass of the devil-stick is specified in terms of its orientation at the discrete instants when impulsive control inputs are applied. The discrete zero dynamics (DZD) resulting from the choice of DVHC provides conditions for stable juggling. A control design that enforces the DVHC and an orbit stabilizing controller are presented. The approach is validated in simulation.
comment: 9 pages, 7 figures; this is an extended version of the article published in the IEEE Control Systems Letters
Robotics
Quadrupped-Legged Robot Movement Plan Generation using Large Language Model
Traditional control interfaces for quadruped robots often impose a high barrier to entry, requiring specialized technical knowledge for effective operation. To address this, this paper presents a novel control framework that integrates Large Language Models (LLMs) to enable intuitive, natural language-based navigation. We propose a distributed architecture where high-level instruction processing is offloaded to an external server to overcome the onboard computational constraints of the DeepRobotics Jueying Lite 3 platform. The system grounds LLM-generated plans into executable ROS navigation commands using real-time sensor fusion (LiDAR, IMU, and Odometry). Experimental validation was conducted in a structured indoor environment across four distinct scenarios, ranging from single-room tasks to complex cross-zone navigation. The results demonstrate the system's robustness, achieving an aggregate success rate of over 90\% across all scenarios, validating the feasibility of offloaded LLM-based planning for autonomous quadruped deployment in real-world settings.
LookPlanGraph: Embodied Instruction Following Method with VLM Graph Augmentation
Methods that use Large Language Models (LLM) as planners for embodied instruction following tasks have become widespread. To successfully complete tasks, the LLM must be grounded in the environment in which the robot operates. One solution is to use a scene graph that contains all the necessary information. Modern methods rely on prebuilt scene graphs and assume that all task-relevant information is available at the start of planning. However, these approaches do not account for changes in the environment that may occur between the graph construction and the task execution. We propose LookPlanGraph - a method that leverages a scene graph composed of static assets and object priors. During plan execution, LookPlanGraph continuously updates the graph with relevant objects, either by verifying existing priors or discovering new entities. This is achieved by processing the agents egocentric camera view using a Vision Language Model. We conducted experiments with changed object positions VirtualHome and OmniGibson simulated environments, demonstrating that LookPlanGraph outperforms methods based on predefined static scene graphs. To demonstrate the practical applicability of our approach, we also conducted experiments in a real-world setting. Additionally, we introduce the GraSIF (Graph Scenes for Instruction Following) dataset with automated validation framework, comprising 514 tasks drawn from SayPlan Office, BEHAVIOR-1K, and VirtualHome RobotHow. Project page available at https://lookplangraph.github.io .
RoboCade: Gamifying Robot Data Collection
Imitation learning from human demonstrations has become a dominant approach for training autonomous robot policies. However, collecting demonstration datasets is costly: it often requires access to robots and needs sustained effort in a tedious, long process. These factors limit the scale of data available for training policies. We aim to address this scalability challenge by involving a broader audience in a gamified data collection experience that is both accessible and motivating. Specifically, we develop a gamified remote teleoperation platform, RoboCade, to engage general users in collecting data that is beneficial for downstream policy training. To do this, we embed gamification strategies into the design of the system interface and data collection tasks. In the system interface, we include components such as visual feedback, sound effects, goal visualizations, progress bars, leaderboards, and badges. We additionally propose principles for constructing gamified tasks that have overlapping structure with useful downstream target tasks. We instantiate RoboCade on three manipulation tasks -- including spatial arrangement, scanning, and insertion. To illustrate the viability of gamified robot data collection, we collect a demonstration dataset through our platform, and show that co-training robot policies with this data can improve success rate on non-gamified target tasks (+16-56%). Further, we conduct a user study to validate that novice users find the gamified platform significantly more enjoyable than a standard non-gamified platform (+24%). These results highlight the promise of gamified data collection as a scalable, accessible, and engaging method for collecting demonstration data.
comment: 10 pages, 9 figures
UniTacHand: Unified Spatio-Tactile Representation for Human to Robotic Hand Skill Transfer
Tactile sensing is crucial for robotic hands to achieve human-level dexterous manipulation, especially in scenarios with visual occlusion. However, its application is often hindered by the difficulty of collecting large-scale real-world robotic tactile data. In this study, we propose to collect low-cost human manipulation data using haptic gloves for tactile-based robotic policy learning. The misalignment between human and robotic tactile data makes it challenging to transfer policies learned from human data to robots. To bridge this gap, we propose UniTacHand, a unified representation to align robotic tactile information captured by dexterous hands with human hand touch obtained from gloves. First, we project tactile signals from both human hands and robotic hands onto a morphologically consistent 2D surface space of the MANO hand model. This unification standardizes the heterogeneous data structures and inherently embeds the tactile signals with spatial context. Then, we introduce a contrastive learning method to align them into a unified latent space, trained on only 10 minutes of paired data from our data collection system. Our approach enables zero-shot tactile-based policy transfer from humans to a real robot, generalizing to objects unseen in the pre-training data. We also demonstrate that co-training on mixed data, including both human and robotic demonstrations via UniTacHand, yields better performance and data efficiency compared with using only robotic data. UniTacHand paves a path toward general, scalable, and data-efficient learning for tactile-based dexterous hands.
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: 7 pages, 7 figures, 4 algorithms
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
Embodied agents powered by vision-language models (VLMs) are increasingly capable of executing complex real-world tasks, yet they remain vulnerable to hazardous instructions that may trigger unsafe behaviors. Runtime safety guardrails, which intercept hazardous actions during task execution, offer a promising solution due to their flexibility. However, existing defenses often rely on static rule filters or prompt-level control, which struggle to address implicit risks arising in dynamic, temporally dependent, and context-rich environments. To address this, we propose RoboSafe, a hybrid reasoning runtime safeguard for embodied agents through executable predicate-based safety logic. RoboSafe integrates two complementary reasoning processes on a Hybrid Long-Short Safety Memory. We first propose a Backward Reflective Reasoning module that continuously revisits recent trajectories in short-term memory to infer temporal safety predicates and proactively triggers replanning when violations are detected. We then propose a Forward Predictive Reasoning module that anticipates upcoming risks by generating context-aware safety predicates from the long-term safety memory and the agent's multimodal observations. Together, these components form an adaptive, verifiable safety logic that is both interpretable and executable as code. Extensive experiments across multiple agents demonstrate that RoboSafe substantially reduces hazardous actions (-36.8% risk occurrence) compared with leading baselines, while maintaining near-original task performance. Real-world evaluations on physical robotic arms further confirm its practicality. Code will be released upon acceptance.
comment: 11 pages, 6 figures
Wireless Center of Pressure Feedback System for Humanoid Robot Balance Control using ESP32-C3
Maintaining stability during the single-support phase is a fundamental challenge in humanoid robotics, particularly in dance robots that require complex maneuvers and high mechanical freedom. Traditional tethered sensor configurations often restrict joint movement and introduce mechanical noises. This study proposes a wireless embedded balance system designed to maintain stability on uneven surfaces. The system utilizes a custom-designed foot unit integrated with four load cells and an ESP32-C3 microcontroller to estimate the Center of Pressure (CoP) in real time. The CoP data were transmitted wirelessly to the main controller to minimize the wiring complexity of the 29-DoF VI-ROSE humanoid robot. A PID control strategy is implemented to adjust the torso, hip, and ankle roll joints based on CoP feedback. Experimental characterization demonstrated high sensor precision with an average measurement error of 14.8 g. Furthermore, the proposed control system achieved a 100% success rate in maintaining balance during single-leg lifting tasks at a 3-degree inclination with optimized PID parameters (Kp=0.10, Kd=0.005). These results validate the efficacy of wireless CoP feedback in enhancing the postural stability of humanoid robots, without compromising their mechanical flexibility.
Schrödinger's Navigator: Imagining an Ensemble of Futures for Zero-Shot Object Navigation
Zero-shot object navigation (ZSON) requires a robot to locate a target object in a previously unseen environment without relying on pre-built maps or task-specific training. However, existing ZSON methods often struggle in realistic and cluttered environments, particularly when the scene contains heavy occlusions, unknown risks, or dynamically moving target objects. To address these challenges, we propose \textbf{Schrödinger's Navigator}, a navigation framework inspired by Schrödinger's thought experiment on uncertainty. The framework treats unobserved space as a set of plausible future worlds and reasons over them before acting. Conditioned on egocentric visual inputs and three candidate trajectories, a trajectory-conditioned 3D world model imagines future observations along each path. This enables the agent to see beyond occlusions and anticipate risks in unseen regions without requiring extra detours or dense global mapping. The imagined 3D observations are fused into the navigation map and used to update a value map. These updates guide the policy toward trajectories that avoid occlusions, reduce exposure to uncertain space, and better track moving targets. Experiments on a Go2 quadruped robot across three challenging scenarios, including severe static occlusions, unknown risks, and dynamically moving targets, show that Schrödinger's Navigator consistently outperforms strong ZSON baselines in self-localization, object localization, and overall Success Rate in occlusion-heavy environments. These results demonstrate the effectiveness of trajectory-conditioned 3D imagination in enabling robust zero-shot object navigation.
Flocking phase transition and threat responses in bio-inspired autonomous drone swarms
Collective motion inspired by animal groups offers powerful design principles for autonomous aerial swarms. We present a bio-inspired 3D flocking algorithm in which each drone interacts only with a minimal set of influential neighbors, relying solely on local alignment and attraction cues. By systematically tuning these two interaction gains, we map a phase diagram revealing sharp transitions between swarming and schooling, as well as a critical region where susceptibility, polarization fluctuations, and reorganization capacity peak. Outdoor experiments with a swarm of ten drones, combined with simulations using a calibrated flight-dynamics model, show that operating near this transition enhances responsiveness to external disturbances. When confronted with an intruder, the swarm performs rapid collective turns, transient expansions, and reliably recovers high alignment within seconds. These results demonstrate that minimal local-interaction rules are sufficient to generate multiple collective phases and that simple gain modulation offers an efficient mechanism to adjust stability, flexibility, and resilience in drone swarms.
SparScene: Efficient Traffic Scene Representation via Sparse Graph Learning for Large-Scale Trajectory Generation
Multi-agent trajectory generation is a core problem for autonomous driving and intelligent transportation systems. However, efficiently modeling the dynamic interactions between numerous road users and infrastructures in complex scenes remains an open problem. Existing methods typically employ distance-based or fully connected dense graph structures to capture interaction information, which not only introduces a large number of redundant edges but also requires complex and heavily parameterized networks for encoding, thereby resulting in low training and inference efficiency, limiting scalability to large and complex traffic scenes. To overcome the limitations of existing methods, we propose SparScene, a sparse graph learning framework designed for efficient and scalable traffic scene representation. Instead of relying on distance thresholds, SparScene leverages the lane graph topology to construct structure-aware sparse connections between agents and lanes, enabling efficient yet informative scene graph representation. SparScene adopts a lightweight graph encoder that efficiently aggregates agent-map and agent-agent interactions, yielding compact scene representations with substantially improved efficiency and scalability. On the motion prediction benchmark of the Waymo Open Motion Dataset (WOMD), SparScene achieves competitive performance with remarkable efficiency. It generates trajectories for more than 200 agents in a scene within 5 ms and scales to more than 5,000 agents and 17,000 lanes with merely 54 ms of inference time with a GPU memory of 2.9 GB, highlighting its superior scalability for large-scale traffic scenes.
comment: 13 pages, 7 figures, 5 tables
Robust and Efficient MuJoCo-based Model Predictive Control via Web of Affine Spaces Derivatives ICRA 2026
MuJoCo is a powerful and efficient physics simulator widely used in robotics. One common way it is applied in practice is through Model Predictive Control (MPC), which uses repeated rollouts of the simulator to optimize future actions and generate responsive control policies in real time. To make this process more accessible, the open source library MuJoCo MPC (MJPC) provides ready-to-use MPC algorithms and implementations built directly on top of the MuJoCo simulator. However, MJPC relies on finite differencing (FD) to compute derivatives through the underlying MuJoCo simulator, which is often a key bottleneck that can make it prohibitively costly for time-sensitive tasks, especially in high-DOF systems or complex scenes. In this paper, we introduce the use of Web of Affine Spaces (WASP) derivatives within MJPC as a drop-in replacement for FD. WASP is a recently developed approach for efficiently computing sequences of accurate derivative approximations. By reusing information from prior, related derivative calculations, WASP accelerates and stabilizes the computation of new derivatives, making it especially well suited for MPC's iterative, fine-grained updates over time. We evaluate WASP across a diverse suite of MJPC tasks spanning multiple robot embodiments. Our results suggest that WASP derivatives are particularly effective in MJPC: it integrates seamlessly across tasks, delivers consistently robust performance, and achieves up to a 2$\mathsf{x}$ speedup compared to an FD backend when used with derivative-based planners, such as iLQG. In addition, WASP-based MPC outperforms MJPC's stochastic sampling-based planners on our evaluation tasks, offering both greater efficiency and reliability. To support adoption and future research, we release an open-source implementation of MJPC with WASP derivatives fully integrated.
comment: Submitted to 2026 IEEE International Conference on Robotics & Automation (ICRA 2026)
Global End-Effector Pose Control of an Underactuated Aerial Manipulator via Reinforcement Learning
Aerial manipulators, which combine robotic arms with multi-rotor drones, face strict constraints on arm weight and mechanical complexity. In this work, we study a lightweight 2-degree-of-freedom (DoF) arm mounted on a quadrotor via a differential mechanism, capable of full six-DoF end-effector pose control. While the minimal design enables simplicity and reduced payload, it also introduces challenges such as underactuation and sensitivity to external disturbances, including manipulation of heavy loads and pushing tasks. To address these, we employ reinforcement learning, training a Proximal Policy Optimization (PPO) agent in simulation to generate feedforward commands for quadrotor acceleration and body rates, along with joint angle targets. These commands are tracked by an incremental nonlinear dynamic inversion (INDI) attitude controller and a PID joint controller, respectively. Flight experiments demonstrate centimeter-level position accuracy and degree-level orientation precision, with robust performance under external force disturbances. The results highlight the potential of learning-based control strategies for enabling contact-rich aerial manipulation using simple, lightweight platforms.
comment: 8 pages, 6 figures
Language-Guided Grasp Detection with Coarse-to-Fine Learning for Robotic Manipulation
Grasping is one of the most fundamental challenging capabilities in robotic manipulation, especially in unstructured, cluttered, and semantically diverse environments. Recent researches have increasingly explored language-guided manipulation, where robots not only perceive the scene but also interpret task-relevant natural language instructions. However, existing language-conditioned grasping methods typically rely on shallow fusion strategies, leading to limited semantic grounding and weak alignment between linguistic intent and visual grasp reasoning.In this work, we propose Language-Guided Grasp Detection (LGGD) with a coarse-to-fine learning paradigm for robotic manipulation. LGGD leverages CLIP-based visual and textual embeddings within a hierarchical cross-modal fusion pipeline, progressively injecting linguistic cues into the visual feature reconstruction process. This design enables fine-grained visual-semantic alignment and improves the feasibility of the predicted grasps with respect to task instructions. In addition, we introduce a language-conditioned dynamic convolution head (LDCH) that mixes multiple convolution experts based on sentence-level features, enabling instruction-adaptive coarse mask and grasp predictions. A final refinement module further enhances grasp consistency and robustness in complex scenes.Experiments on the OCID-VLG and Grasp-Anything++ datasets show that LGGD surpasses existing language-guided grasping methods, exhibiting strong generalization to unseen objects and diverse language queries. Moreover, deployment on a real robotic platform demonstrates the practical effectiveness of our approach in executing accurate, instruction-conditioned grasp actions. The code will be released publicly upon acceptance.
comment: Submitted to IEEE Journal
Tracing Energy Flow: Learning Tactile-based Grasping Force Control to Prevent Slippage in Dynamic Object Interaction
Regulating grasping force to reduce slippage during dynamic object interaction remains a fundamental challenge in robotic manipulation, especially when objects are manipulated by multiple rolling contacts, have unknown properties (such as mass or surface conditions), and when external sensing is unreliable. In contrast, humans can quickly regulate grasping force by touch, even without visual cues. Inspired by this ability, we aim to enable robotic hands to rapidly explore objects and learn tactile-driven grasping force control under motion and limited sensing. We propose a physics-informed energy abstraction that models the object as a virtual energy container. The inconsistency between the fingers' applied power and the object's retained energy provides a physically grounded signal for inferring slip-aware stability. Building on this abstraction, we employ model-based learning and planning to efficiently model energy dynamics from tactile sensing and perform real-time grasping force optimization. Experiments in both simulation and hardware demonstrate that our method can learn grasping force control from scratch within minutes, effectively reduce slippage, and extend grasp duration across diverse motion-object pairs, all without relying on external sensing or prior object knowledge.
comment: 8 pages. Accepted by IEEE Robotics and Automation Letters (RA-L)
Multimodal Sensing for Robot-Assisted Sub-Tissue Feature Detection in Physiotherapy Palpation
Robotic palpation relies on force sensing, but force signals in soft-tissue environments are variable and cannot reliably reveal subtle subsurface features. We present a compact multimodal sensor that integrates high-resolution vision-based tactile imaging with a 6-axis force-torque sensor. In experiments on silicone phantoms with diverse subsurface tendon geometries, force signals alone frequently produce ambiguous responses, while tactile images reveal clear structural differences in presence, diameter, depth, crossings, and multiplicity. Yet accurate force tracking remains essential for maintaining safe, consistent contact during physiotherapeutic interaction. Preliminary results show that combining tactile and force modalities enables robust subsurface feature detection and controlled robotic palpation.
comment: 6 pages, 9 figures, submitted to DMD2026
Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
Bayesian Reinforcement Learning (BRL) provides a framework for generalisation of Reinforcement Learning (RL) problems from its use of Bayesian task parameters in the transition and reward models. However, classical BRL methods assume known forms of transition and reward models, reducing their applicability in real-world problems. As a result, recent deep BRL methods have started to incorporate model learning, though the use of neural networks directly on the joint data and task parameters requires optimising the Evidence Lower Bound (ELBO). ELBOs are difficult to optimise and may result in indistinctive task parameters, hence compromised BRL policies. To this end, we introduce a novel deep BRL method, Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL), that enables efficient and accurate learning of transition and reward models, with fully tractable marginal likelihood and Bayesian inference on task parameters and model noises. On challenging MetaWorld ML10/45 benchmarks, GLiBRL improves the success rate of one of the state-of-the-art deep BRL methods, VariBAD, by up to 2.7x. Comparing against representative or recent deep BRL / Meta-RL methods, such as MAML, RL2, SDVT, TrMRL and ECET, GLiBRL also demonstrates its low-variance and decent performance consistently.
From Human Bias to Robot Choice: How Occupational Contexts and Racial Priming Shape Robot Selection
As artificial agents increasingly integrate into professional environments, fundamental questions have emerged about how societal biases influence human-robot selection decisions. We conducted two comprehensive experiments (N = 1,038) examining how occupational contexts and stereotype activation shape robotic agent choices across construction, healthcare, educational, and athletic domains. Participants made selections from artificial agents that varied systematically in skin tone and anthropomorphic characteristics. Our study revealed distinct context-dependent patterns. Healthcare and educational scenarios demonstrated strong favoritism toward lighter-skinned artificial agents, while construction and athletic contexts showed greater acceptance of darker-toned alternatives. Participant race was associated with systematic differences in selection patterns across professional domains. The second experiment demonstrated that exposure to human professionals from specific racial backgrounds systematically shifted later robotic agent preferences in stereotype-consistent directions. These findings show that occupational biases and color-based discrimination transfer directly from human-human to human-robot evaluation contexts. The results highlight mechanisms through which robotic deployment may unintentionally perpetuate existing social inequalities.
comment: HRI '26
ETP-R1: Evolving Topological Planning with Reinforcement Fine-tuning for Vision-Language Navigation in Continuous Environments
Vision-Language Navigation in Continuous Environments (VLN-CE) requires an embodied agent to navigate towards target in continuous environments, following natural language instructions. While current graph-based methods offer an efficient, structured approach by abstracting the environment into a topological map and simplifying the action space to waypoint selection, they lag behind methods based on Large Vision-Language Models (LVLMs) in leveraging large-scale data and advanced training paradigms. In this paper, we try to bridge this gap by introducing ETP-R1, a framework that applies the paradigm of scaling up data and Reinforcement Fine-Tuning (RFT) to a graph-based VLN-CE model. To build a strong foundation, we first construct a high-quality, large-scale pretraining dataset using the Gemini API. This dataset consists of diverse, low-hallucination instructions for topological trajectories, providing rich supervision for our graph-based policy to map language to topological paths. This foundation is further strengthened by unifying data from both R2R and RxR tasks for joint pretraining. Building on this, we introduce a three-stage training paradigm, which culminates in the first application of closed-loop, online RFT to a graph-based VLN-CE model, powered by the Group Relative Policy Optimization (GRPO) algorithm. Extensive experiments demonstrate that our approach is highly effective, establishing new state-of-the-art performance across all major metrics on both the R2R-CE and RxR-CE benchmarks. Our code is available at https://github.com/Cepillar/ETP-R1.
comment: 8 pages, 6 figures
Certifiable Alignment of GNSS and Local Frames via Lagrangian Duality
Estimating the absolute orientation of a local system relative to a global navigation satellite system (GNSS) reference often suffers from local minima and high dependency on satellite availability. Existing methods for this alignment task rely on abundant satellites unavailable in GNSS-degraded environments, or use local optimization methods which cannot guarantee the optimality of a solution. This work introduces a globally optimal solver that transforms raw pseudo-range or Doppler measurements into a convexly relaxed problem. The proposed method is certifiable, meaning it can numerically verify the correctness of the result, filling a gap where existing local optimizers fail. We first formulate the original frame alignment problem as a nonconvex quadratically constrained quadratic program (QCQP) problem and relax the QCQP problem to a concave Lagrangian dual problem that provides a lower cost bound for the original problem. Then we perform relaxation tightness and observability analysis to derive criteria for certifiable optimality of the solution. Finally, simulation and real world experiments are conducted to evaluate the proposed method. The experiments show that our method provides certifiably optimal solutions even with only 2 satellites with Doppler measurements and 2D vehicle motion, while the traditional velocity-based VOBA method and the advanced GVINS alignment technique may fail or converge to local optima without notice. To support the development of GNSS-based navigation techniques in robotics, all code and data are open-sourced at https://github.com/Baoshan-Song/Certifiable-Doppler-alignment.
Stretchable and High-Precision Optical Tactile Sensor for Trajectory Tracking of Parallel Mechanisms IROS
Stretchable sensors indicate promising prospects for soft robotics, medical devices, and human-machine interactions due to the high compliance of soft materials. Discrete sensing strategies, including sensor arrays and distributed sensors, are broadly involved in tactile sensors across versatile applications. However, it remains a challenge to achieve high spatial resolution with self-decoupled capacity and insensitivity to other off-axis stimuli for stretchable tactile sensors. Herein, we develop a stretchable tactile sensor based on the proposed continuous spectral-filtering principle, allowing superhigh resolution for applied stimuli. This proposed sensor enables a high-linear spatial response (0.996) even during stretching and bending, and high continuous spatial (7 μm) and force (5 mN) resolutions with design scalability and interaction robustness to survive piercing and cutting. We further demonstrate the sensors' performance by integrating them into a planar parallel mechanism for precise trajectory tracking (rotational resolution: 0.02°) in real time.
comment: Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Proprioception Enhances Vision Language Model in Generating Captions and Subtask Segmentations for Robot Task
From the perspective of future developments in robotics, it is crucial to verify whether foundation models trained exclusively on offline data, such as images and language, can understand the robot motion. In particular, since Vision Language Models (VLMs) do not include low-level motion information from robots in their training datasets, video understanding including trajectory information remains a significant challenge. In this study, we assess two capabilities of VLMs through a video captioning task with low-level robot motion information: (1) automatic captioning of robot tasks and (2) segmentation of a series of tasks. Both capabilities are expected to enhance the efficiency of robot imitation learning by linking language and motion and serve as a measure of the foundation model's performance. The proposed method generates multiple "scene" captions using image captions and trajectory data from robot tasks. The full task caption is then generated by summarizing these individual captions. Additionally, the method performs subtask segmentation by comparing the similarity between text embeddings of image captions. In both captioning tasks, the proposed method aims to improve performance by providing the robot's motion data - joint and end-effector states - as input to the VLM. Simulator experiments were conducted to validate the effectiveness of the proposed method.
Early warning signals for loss of control
Maintaining stability in feedback systems, from aircraft and autonomous robots to biological and physiological systems, relies on monitoring their behavior and continuously adjusting their inputs. Incremental damage can make such control fragile. This tends to go unnoticed until a small perturbation induces instability (i.e. loss of control). Traditional methods in the field of engineering rely on accurate system models to compute a safe set of operating instructions, which become invalid when the, possibly damaged, system diverges from its model. Here we demonstrate that the approach of such a feedback system towards instability can nonetheless be monitored through dynamical indicators of resilience. This holistic system safety monitor does not rely on a system model and is based on the generic phenomenon of critical slowing down, shown to occur in the climate, biology and other complex nonlinear systems approaching criticality. Our findings for engineered devices opens up a wide range of applications involving real-time early warning systems as well as an empirical guidance of resilient system design exploration, or "tinkering". While we demonstrate the validity using drones, the generic nature of the underlying principles suggest that these indicators could apply across a wider class of controlled systems including reactors, aircraft, and self-driving cars.
Planetary Terrain Datasets and Benchmarks for Rover Path Planning
Planetary rover exploration is attracting renewed interest with several upcoming space missions to the Moon and Mars. However, a substantial amount of data from prior missions remain underutilized for path planning and autonomous navigation research. As a result, there is a lack of space mission-based planetary datasets, standardized benchmarks, and evaluation protocols. In this paper, we take a step towards coordinating these three research directions in the context of planetary rover path planning. We propose the first two large planar benchmark datasets, MarsPlanBench and MoonPlanBench, derived from high-resolution digital terrain images of Mars and the Moon. In addition, we set up classical and learned path planning algorithms, in a unified framework, and evaluate them on our proposed datasets and on a popular planning benchmark. Through comprehensive experiments, we report new insights on the performance of representative path planning algorithms on planetary terrains, for the first time to the best of our knowledge. Our results show that classical algorithms can achieve up to 100% global path planning success rates on average across challenging terrains such as Moon's north and south poles. This suggests, for instance, why these algorithms are used in practice by NASA. Conversely, learning-based models, although showing promising results in less complex environments, still struggle to generalize to planetary domains. To serve as a starting point for fundamental path planning research, our code and datasets will be released at: https://github.com/mchancan/PlanetaryPathBench.
EVE: A Generator-Verifier System for Generative Policies
Visuomotor policies based on generative architectures such as diffusion and flow-based matching have shown strong performance but degrade under distribution shifts, demonstrating limited recovery capabilities without costly finetuning. In the language modeling domain, test-time compute scaling has revolutionized reasoning capabilities of modern LLMs by leveraging additional inference-time compute for candidate solution refinement. These methods typically leverage foundation models as verification modules in a zero-shot manner to synthesize improved candidate solutions. In this work, we hypothesize that generative policies can similarly benefit from additional inference-time compute that employs zero-shot VLM-based verifiers. A systematic analysis of improving policy performance through the generation-verification framework remains relatively underexplored in the current literature. To this end, we introduce EVE - a modular, generator-verifier interaction framework - that boosts the performance of pretrained generative policies at test time, with no additional training. EVE wraps a frozen base policy with multiple zero-shot, VLM-based verifier agents. Each verifier proposes action refinements to the base policy candidate actions, while an action incorporator fuses the aggregated verifier output into the base policy action prediction to produce the final executed action. We study design choices for generator-verifier information interfacing across a system of verifiers with distinct capabilities. Across a diverse suite of manipulation tasks, EVE consistently improves task success rates without any additional policy training. Through extensive ablations, we isolate the contribution of verifier capabilities and action incorporator strategies, offering practical guidelines to build scalable, modular generator-verifier systems for embodied control.
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
Fast Navigation Through Occluded Spaces via Language-Conditioned Map Prediction
In cluttered environments, motion planners often face a trade-off between safety and speed due to uncertainty caused by occlusions and limited sensor range. In this work, we investigate whether co-pilot instructions can help robots plan more decisively while remaining safe. We introduce PaceForecaster, as an approach that incorporates such co-pilot instructions into local planners. PaceForecaster takes the robot's local sensor footprint (Level-1) and the provided co-pilot instructions as input and predicts (i) a forecasted map with all regions visible from Level-1 (Level-2) and (ii) an instruction-conditioned subgoal within Level-2. The subgoal provides the planner with explicit guidance to exploit the forecasted environment in a goal-directed manner. We integrate PaceForecaster with a Log-MPPI controller and demonstrate that using language-conditioned forecasts and goals improves navigation performance by 36% over a local-map-only baseline while in polygonal environments.
Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks
Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadows and specular reflection (sun glint), can cause severe spectral distortion, thereby reducing data availability. To maximize the acquisition of high-quality data while ensuring flight safety, this paper proposes an active path planning method for dynamic light and shadow disturbance avoidance. First, a dynamic prediction model is constructed to transform the time-varying light and shadow disturbance areas into three-dimensional virtual obstacles. Second, an improved Interfered Fluid Dynamical System (IFDS) algorithm is introduced, which generates a smooth initial obstacle avoidance path by building a repulsive force field. Subsequently, a Model Predictive Control (MPC) framework is employed for rolling-horizon path optimization to handle flight dynamics constraints and achieve real-time trajectory tracking. Furthermore, a Dynamic Flight Altitude Adjustment (DFAA) mechanism is designed to actively reduce the flight altitude when the observable area is narrow, thereby enhancing spatial resolution. Simulation results show that, compared with traditional PID and single obstacle avoidance algorithms, the proposed method achieves an obstacle avoidance success rate of 98% in densely disturbed scenarios, significantly improves path smoothness, and increases the volume of effective observation data by approximately 27%. This research provides an effective engineering solution for precise UAV water quality monitoring in complex illumination environments.
NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance
Learning to navigate in dynamic and complex open-world environments is a critical yet challenging capability for autonomous robots. Existing approaches often rely on cascaded modular frameworks, which require extensive hyperparameter tuning or learning from limited real-world demonstration data. In this paper, we propose Navigation Diffusion Policy (NavDP), an end-to-end network trained solely in simulation that enables zero-shot sim-to-real transfer across diverse environments and robot embodiments. The core of NavDP is a unified transformer-based architecture that jointly learns trajectory generation and trajectory evaluation, both conditioned solely on local RGB-D observation. By learning to predict critic values for contrastive trajectory samples, our proposed approach effectively leverages supervision from privileged information available in simulation, thereby fostering accurate spatial understanding and enabling the distinction between safe and dangerous behaviors. To support this, we develop an efficient data generation pipeline in simulation and construct a large-scale dataset encompassing over one million meters of navigation experience across 3,000 scenes. Empirical experiments in both simulated and real-world environments demonstrate that NavDP significantly outperforms prior state-of-the-art methods. Furthermore, we identify key factors influencing the generalization performance of NavDP. The dataset and code are publicly available at https://wzcai99.github.io/navigation-diffusion-policy.github.io.
comment: Project Page: https://wzcai99.github.io/navigation-diffusion-policy.github.io/
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning
We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.
RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation
Humanoid robots exhibit significant potential in executing diverse human-level skills. However, current research predominantly relies on data-driven approaches that necessitate extensive training datasets to achieve robust multimodal decision-making capabilities and generalizable visuomotor control. These methods raise concerns due to the neglect of geometric reasoning in unseen scenarios and the inefficient modeling of robot-target relationships within the training data, resulting in significant waste of training resources. To address these limitations, we present the Recurrent Geometric-prior Multimodal Policy (RGMP), an end-to-end framework that unifies geometric-semantic skill reasoning with data-efficient visuomotor control. For perception capabilities, we propose the Geometric-prior Skill Selector, which infuses geometric inductive biases into a vision language model, producing adaptive skill sequences for unseen scenes with minimal spatial common sense tuning. To achieve data-efficient robotic motion synthesis, we introduce the Adaptive Recursive Gaussian Network, which parameterizes robot-object interactions as a compact hierarchy of Gaussian processes that recursively encode multi-scale spatial relationships, yielding dexterous, data-efficient motion synthesis even from sparse demonstrations. Evaluated on both our humanoid robot and desktop dual-arm robot, the RGMP framework achieves 87% task success in generalization tests and exhibits 5x greater data efficiency than the state-of-the-art model. This performance underscores its superior cross-domain generalization, enabled by geometric-semantic reasoning and recursive-Gaussion adaptation.
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PbRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of discriminable queries available for learning preferences. This paper identifies preference discriminability, which quantifies how easily a human can judge which trajectory is closer to their ideal behavior, as a key metric for improving query efficiency. To address this, we move beyond comparisons within a single policy and instead generate queries by comparing trajectories from multiple policies, as training them from scratch promotes diversity without policy bias. We propose Discriminability-Aware Policy-to-Policy Preference-Based Efficient Reinforcement Learning (DAPPER), which integrates preference discriminability with trajectory diversification achieved by multiple policies. DAPPER trains new policies from scratch after each reward update and employs a discriminator that learns to estimate preference discriminability, enabling the prioritized sampling of more discriminable queries. During training, it jointly maximizes the preference reward and preference discriminability score, encouraging the discovery of highly rewarding and easily distinguishable policies. Experiments in simulated and real-world legged robot environments demonstrate that DAPPER outperforms previous methods in query efficiency, particularly under challenging preference discriminability conditions.
comment: Accepted for IEEE Robotics & Automation Magazine (RAM)
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
TacMan-Turbo: Proactive Tactile Control for Robust and Efficient Articulated Object Manipulation
Adept manipulation of articulated objects is essential for robots to operate successfully in human environments. Such manipulation requires both effectiveness--reliable operation despite uncertain object structures--and efficiency--swift execution with minimal redundant steps and smooth actions. Existing approaches struggle to achieve both objectives simultaneously: methods relying on predefined kinematic models lack effectiveness when encountering structural variations, while tactile-informed approaches achieve robust manipulation without kinematic priors but compromise efficiency through reactive, step-by-step exploration-compensation cycles. This paper introduces TacMan-Turbo, a novel proactive tactile control framework for articulated object manipulation that mitigates this fundamental trade-off. Unlike previous approaches that treat tactile contact deviations merely as error signals requiring compensation, our method interprets these deviations as rich sources of local kinematic information. This new perspective enables our controller to predict optimal future interactions and make proactive adjustments, significantly enhancing manipulation efficiency. In comprehensive evaluations across 200 diverse simulated articulated objects and real-world experiments, our approach maintains a 100% success rate while significantly outperforming the previous tactile-informed method in time efficiency, action efficiency, and trajectory smoothness (all p-values < 0.0001). These results demonstrate that the long-standing trade-off between effectiveness and efficiency in articulated object manipulation can be successfully resolved without relying on prior kinematic knowledge.
Analyzing Key Objectives in Human-to-Robot Retargeting for Dexterous Manipulation
Kinematic retargeting from human hands to robot hands is essential for transferring dexterity from humans to robots in manipulation teleoperation and imitation learning. However, due to mechanical differences between human and robot hands, completely reproducing human motions on robot hands is impossible. Existing works on retargeting incorporate various optimization objectives, focusing on different aspects of hand configuration. However, the lack of experimental comparative studies leaves the significance and effectiveness of these objectives unclear. This work aims to analyze these retargeting objectives for dexterous manipulation through extensive real-world comparative experiments. Specifically, we propose a comprehensive retargeting objective formulation that integrates intuitively crucial factors appearing in recent approaches. The significance of each factor is evaluated through experimental ablation studies on the full objective in kinematic posture retargeting and real-world teleoperated manipulation tasks. Experimental results and conclusions provide valuable insights for designing more accurate and effective retargeting algorithms for real-world dexterous manipulation.
comment: v2: Extended the main text with additional analysis and implementation details
CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models
Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.
comment: This paper is accepted by the IEEE Internet of Things Journal (IoT-J) for publication in the Special Issue on "Augmented Edge Sensing Intelligence for Low-Altitude IoT Systems"
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Multiagent Systems
A Plan Reuse Mechanism for LLM-Driven Agent
Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests' semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
comment: This paper is an English version of A Plan Reuse Mechanism for LLM-Driven Agent published in 2024 in the Journal of Computer Research and Development
CoTDeceptor:Adversarial Code Obfuscation Against CoT-Enhanced LLM Code Agents
LLM-based code agents(e.g., ChatGPT Codex) are increasingly deployed as detector for code review and security auditing tasks. Although CoT-enhanced LLM vulnerability detectors are believed to provide improved robustness against obfuscated malicious code, we find that their reasoning chains and semantic abstraction processes exhibit exploitable systematic weaknesses.This allows attackers to covertly embed malicious logic, bypass code review, and propagate backdoored components throughout real-world software supply chains.To investigate this issue, we present CoTDeceptor, the first adversarial code obfuscation framework targeting CoT-enhanced LLM detectors. CoTDeceptor autonomously constructs evolving, hard-to-reverse multi-stage obfuscation strategy chains that effectively disrupt CoT-driven detection logic.We obtained malicious code provided by security enterprise, experimental results demonstrate that CoTDeceptor achieves stable and transferable evasion performance against state-of-the-art LLMs and vulnerability detection agents. CoTDeceptor bypasses 14 out of 15 vulnerability categories, compared to only 2 bypassed by prior methods. Our findings highlight potential risks in real-world software supply chains and underscore the need for more robust and interpretable LLM-powered security analysis systems.
FinAgent: An Agentic AI Framework Integrating Personal Finance and Nutrition Planning
The issue of limited household budgets and nutritional demands continues to be a challenge especially in the middle-income environment where food prices fluctuate. This paper introduces a price aware agentic AI system, which combines personal finance management with diet optimization. With household income and fixed expenditures, medical and well-being status, as well as real-time food costs, the system creates nutritionally sufficient meals plans at comparatively reasonable prices that automatically adjust to market changes. The framework is implemented in a modular multi-agent architecture, which has specific agents (budgeting, nutrition, price monitoring, and health personalization). These agents share the knowledge base and use the substitution graph to ensure that the nutritional quality is maintained at a minimum cost. Simulations with a representative Saudi household case study show a steady 12-18\% reduction in costs relative to a static weekly menu, nutrient adequacy of over 95\% and high performance with price changes of 20-30%. The findings indicate that the framework can locally combine affordability with nutritional adequacy and provide a viable avenue of capacity-building towards sustainable and fair diet planning in line with Sustainable Development Goals on Zero Hunger and Good Health.
comment: This paper was presented at the IEEE International Conference on Computing and Applications (ICCA 2025), Bahrain
A Blockchain-Monitored Agentic AI Architecture for Trusted Perception-Reasoning-Action Pipelines
The application of agentic AI systems in autonomous decision-making is growing in the areas of healthcare, smart cities, digital forensics, and supply chain management. Even though these systems are flexible and offer real-time reasoning, they also raise concerns of trust and oversight, and integrity of the information and activities upon which they are founded. The paper suggests a single architecture model comprising of LangChain-based multi-agent system with a permissioned blockchain to guarantee constant monitoring, policy enforcement, and immutable auditability of agentic action. The framework relates the perception conceptualization-action cycle to a blockchain layer of governance that verifies the inputs, evaluates recommended actions, and documents the outcomes of the execution. A Hyperledger Fabric-based system, action executors MCP-integrated, and LangChain agent are introduced and experiments of smart inventory management, traffic-signal control, and healthcare monitoring are done. The results suggest that blockchain-security verification is efficient in preventing unauthorized practices, offers traceability throughout the whole decision-making process, and maintains operational latency within reasonable ranges. The suggested framework provides a universal system of implementing high-impact agentic AI applications that are autonomous yet responsible.
comment: This paper was presented at the IEEE International Conference on Computing and Applications (ICCA 2025), Bahrain
DAO-Agent: Zero Knowledge-Verified Incentives for Decentralized Multi-Agent Coordination
Autonomous Large Language Model (LLM)-based multi-agent systems have emerged as a promising paradigm for facilitating cross-application and cross-organization collaborations. These autonomous agents often operate in trustless environments, where centralized coordination faces significant challenges, such as the inability to ensure transparent contribution measurement and equitable incentive distribution. While blockchain is frequently proposed as a decentralized coordination platform, it inherently introduces high on-chain computation costs and risks exposing sensitive execution information of the agents. Consequently, the core challenge lies in enabling auditable task execution and fair incentive distribution for autonomous LLM agents in trustless environments, while simultaneously preserving their strategic privacy and minimizing on-chain costs. To address this challenge, we propose DAO-Agent, a novel framework that integrates three key technical innovations: (1) an on-chain decentralized autonomous organization (DAO) governance mechanism for transparent coordination and immutable logging; (2) a ZKP mechanism approach that enables Shapley-based contribution measurement off-chain, and (3) a hybrid on-chain/off-chain architecture that verifies ZKP-validated contribution measurements on-chain with minimal computational overhead. We implement DAO-Agent and conduct end-to-end experiments using a crypto trading task as a case study. Experimental results demonstrate that DAO-Agent achieves up to 99.9% reduction in verification gas costs compared to naive on-chain alternatives, with constant-time verification complexity that remains stable as coalition size increases, thereby establishing a scalable foundation for agent coordination in decentralized environments.
comment: 10 pages, 1 figure
Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning
Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling specialized tools, yet existing methods rely on either fixed toolsets or speculative tool induction before solving problems, resulting in suboptimal programs and poor utilization of induced tools. We present Transductive Visual Programming (TVP), a novel framework that builds new tools from its own experience rather than speculation. TVP first solves problems using basic tools while accumulating experiential solutions into an Example Library, then abstracts recurring patterns from these programs into reusable higher-level tools for an evolving Tool Library. This allows TVP to tackle new problems with increasingly powerful tools learned from experience. On Omni3D-Bench, TVP achieves state-of-the-art performance, outperforming GPT-4o by 22% and the previous best visual programming system by 11%. Our transductively learned tools are used 5x more frequently as core program dependency than inductively created ones, demonstrating more effective tool discovery and reuse. The evolved tools also show strong generalization to unseen spatial tasks, achieving superior performance on benchmarks from SpatialScore-Hard collection without any testset-specific modification. Our work establishes experience-driven transductive tool creation as a powerful paradigm for building self-evolving visual programming agents that effectively tackle challenging spatial reasoning tasks. We release our code at https://transductive-visualprogram.github.io/.
comment: Project Website: https://transductive-visualprogram.github.io/
Developing a Fundamental Diagram for Urban Air Mobility Based on Physical Experiments
Urban Air Mobility (UAM) is an emerging application of unmanned aerial vehicles (UAVs) that promises to reduce travel time and alleviate congestion in urban transportation systems. As drone density increases, UAM operations are expected to experience congestion similar to that in ground traffic. However, the fundamental characteristics of UAM traffic flow, particularly under real-world operating conditions, remain poorly understood. This study proposes a general framework for constructing the fundamental diagram (FD) of UAM traffic by integrating theoretical analysis with physical experiments. To the best of our knowledge, this is the first study to derive a UAM FD using real-world physical test data. On the theoretical side, we design two drone control laws for collision avoidance and develop simulation-based traffic generation methods to produce diverse UAM traffic scenarios. Based on Edie's definition, traffic flow theory is then applied to construct the FD and characterize the macroscopic properties of UAM traffic. To account for real-world disturbances and modeling uncertainties, we further conduct physical experiments on a reduced-scale testbed using Bitcraze Crazyflie drones. Both simulation and physical test trajectory data are collected and organized into the UAMTra2Flow dataset, which is analyzed using the proposed framework. Preliminary results indicate that classical FD structures for ground transportation are also applicable to UAM systems. Notably, FD curves obtained from physical experiments exhibit deviations from simulation-based results, highlighting the importance of experimental validation. Finally, results from the reduced-scale testbed are scaled to realistic operating conditions to provide practical insights for future UAM traffic systems. The dataset and code for this paper are publicly available at https://github.com/CATS-Lab/UAM-FD.
Electromagnetic Formation Flying Using Alternating Magnetic Field Forces and Control Barrier Functions for State and Input Constraints
This article presents a feedback control algorithm for electromagnetic formation flying with constraints on the satellites' states and control inputs. The algorithm combines several key techniques. First, we use alternating magnetic field forces to decouple the electromagnetic forces between each pair of satellites in the formation. Each satellite's electromagnetic actuation system is driven by a sum of amplitude-modulated sinusoids, where amplitudes are controlled in order to prescribe the time-averaged force between each pair of satellites. Next, the desired time-averaged force is computed from a optimal control that satisfies state constraints (i.e., no collisions and an upper limit on intersatellite speeds) and input constraints (i.e., not exceeding satellite's apparent power capability). The optimal time-averaged force is computed using a single relaxed control barrier function that is obtained by composing multiple control barrier functions that are designed to enforce each state and input constraint. Finally, we demonstrate the satellite formation control method in numerical simulations.
comment: Preprint submitted to IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: substantial text overlap with arXiv:2411.16908
Computational Foundations for Strategic Coopetition: Formalizing Interdependence and Complementarity
Coopetition refers to simultaneous cooperation and competition among actors wherein actors 'cooperate to grow the pie and compete to split it up.' Modern socio-technical systems are characterized by strategic coopetition wherein actors concomitantly cooperate to create value and compete to capture it. While conceptual modeling languages such as i* provide rich qualitative representations of strategic dependencies, they lack mechanisms for quantitative analysis of dynamic trade-offs. Conversely, classical game theory offers mathematical rigor but strips away contextual richness. This report bridges this gap by developing computational foundations that formalize two critical dimensions of coopetition: interdependence and complementarity. We ground interdependence in i* structural dependency analysis, translating depender-dependee-dependum relationships into quantitative interdependence coefficients via a structured translation framework. We formalize complementarity following Brandenburger and Nalebuff's Added Value concept, modeling synergistic value creation with validated parameterization. We integrate structural dependencies with bargaining power in value appropriation and introduce a game-theoretic formulation where Nash Equilibrium incorporates structural interdependence. Validation combines over 22,000 experimental trials across power and logarithmic specifications with the Samsung-Sony S-LCD joint venture (2004-2011). Under strict historical alignment scoring, logarithmic specifications achieve 58/60 compared to power functions (46/60), producing realistic 41% cooperation increases aligning with documented S-LCD patterns while power functions produce 166% increases exceeding realistic bounds. Statistical significance confirmed at p < 0.001, Cohen's d > 9.
comment: 39 pages, 9 figures, This technical report serves as the foundational reference for a coordinated research program examining strategic coopetition in requirements engineering and multi-agent systems, with companion work addressing trust dynamics, team production, and reciprocity mechanisms
Consistent Opponent Modeling in Imperfect-Information Games
The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy under standard Bayesian identifiability and visitation assumptions, given observations from gameplay and possibly additional historical data if it is available.
Systems and Control (EESS)
A Lyapunov-Based Small-Gain Theorem for Fixed-Time ISS: Theory, Optimization, and Games
We develop a Lyapunov-based small-gain theorem for establishing fixed-time input-to-state stability (FxT-ISS) guarantees in interconnected nonlinear dynamical systems. The proposed framework considers interconnections in which each subsystem admits a FxT-ISS Lyapunov function, providing robustness with respect to external inputs. We show that, under an appropriate nonlinear small-gain condition, the overall interconnected system inherits the FxT-ISS property. In this sense, the proposed result complements existing Lyapunov-based smallgain theorems for asymptotic and finite-time stability, and enables a systematic analysis of interconnection structures exhibiting fixed-time stability. To illustrate the applicability of the theory, we study feedback-based optimization problems with time-varying cost functions, and Nash-equilibrium seeking for noncooperative games with nonlinear dynamical plants in the loop. For both problems, we present a class of non-smooth gradient or pseudogradient-based controllers that achieve fixed-time convergence without requiring time-scale separation and using real-time feedback. Numerical examples are provided to validate the theoretical findings.
Enhancing Grid Resilience for Giga-Watt Scale Data Centers Using High Voltage Circuit Breaker Operated Braking Resistors
As hyperscale and co-located data centers scale, the electric grid sees an increase in large, voltage-sensitive IT loads with these data center plant size ranging between 500 MW to 2 GW. A sudden loss of these loads as they switch to onsite UPS during grid voltage excursion events causes a grid frequency rise from generation and load imbalance, and a voltage rise because less power is flowing through the network. This paper proposes and theoretically demonstrates the use of high voltage circuit breaker operated braking resistors at data center transmission substations as an effective strategy in enhancing grid resilience under such large load loss scenarios. We developed a test bed to illustrate the dynamic behavior of the system with resistive braking on a gigawatt scale data center load cluster connected to a 345 kV network. The braking resistor(s), which in the case of inverter rich system comes in a multi-stage configuration, are connected or disconnected via high-speed circuit breaker(s). Results show that insertion for 0.25 to 0.85 seconds sufficiently reduce rate of change of frequency and provides time for primary governor response and capacitor switching to restore steady state. Sensitivity across different synchronous machines and inverter-based resource mix are tested and confirms robustness. We conclude circuit breaker controlled resistive braking is a practical means to enhance Bulk Electric System (BES) resilience for gigawatt scale data centers. The approach integrates with protection, needs no generator changes, and can be scaled with cluster size or growth of the data center facility load.
comment: Provincially accepted for publication in 2025 IEEE International Conference on Energy Technologies for Future Grids (ETFG) conference proceedings
$\mathcal{K}$-Lorentzian Polynomials, Semipositive Cones, and Cone-Stable EVI Systems
Lorentzian and completely log-concave polynomials have recently emerged as a unifying framework for negative dependence, log-concavity, and convexity in combinatorics and probability. We extend this theory to variational analysis and cone-constrained dynamics by studying $K$-Lorentzian and $K$-completely log-concave polynomials over a proper convex cone $K\subset\mathbb{R}^n$. For a $K$-Lorentzian form $f$ and $v\in\operatorname{int}K$, we define an open cone $K^\circ(f,v)$ and a closed cone $K(f,v)$ via directional derivatives along $v$, recovering the usual hyperbolicity cone when $f$ is hyperbolic. We prove that $K^\circ(f,v)$ is a proper cone and equals $\operatorname{int}K(f,v)$. If $f$ is $K(f,v)$-Lorentzian, then $K(f,v)$ is convex and maximal among convex cones on which $f$ is Lorentzian. Using the Rayleigh matrix $M_f(x)=\nabla f(x)\nabla f(x)^T - f(x)\nabla^2 f(x)$, we obtain cone-restricted Rayleigh inequalities and show that two-direction Rayleigh inequalities on $K$ are equivalent to an acuteness condition for the bilinear form $v^T M_f(x) w$. This yields a cone-restricted negative-dependence interpretation linking the curvature of $\log f$ to covariance properties of associated Gibbs measures. For determinantal generating polynomials, we identify the intersection of the hyperbolicity cone with the nonnegative orthant as the classical semipositive cone, and we extend this construction to general proper cones via $K$-semipositive cones. Finally, for linear evolution variational inequality (LEVI) systems, we show that if $q(x)=x^T A x$ is (strictly) $K$-Lorentzian, then $A$ is (strictly) $K$-copositive and yields Lyapunov (semi-)stability on $K$, giving new Lyapunov criteria for cone-constrained dynamics.
comment: 23 pages, 5 figures
ARX-Implementation of encrypted nonlinear dynamic controllers using observer form
While computation-enabled cryptosystems applied to control systems have improved security and privacy, a major issue is that the number of recursive operations on encrypted data is limited to a finite number of times in most cases, especially where fast computation is required. To allow for nonlinear dynamic control under this constraint, a method for representing a state-space system model as an auto-regressive model with exogenous inputs (ARX model) is proposed. With the input as well as the output of the plant encrypted and transmitted to the controller, the reformulated ARX form can compute each output using only a finite number of operations, from its several previous inputs and outputs. Existence of a stable observer for the controller is a key condition for the proposed representation. The representation replaces the controller with an observer form and applies a method similar to finite-impulse-response approximation. It is verified that the approximation error and its effect can be made arbitrarily small by an appropriate choice of a parameter, under stability of the observer and the closed-loop system. Simulation results demonstrate the effectiveness of the proposed method.
comment: 5 pages, 2 figures
Relative Localization System Design for SnailBot: A Modular Self-reconfigurable Robot
This paper presents the design and implementation of a relative localization system for SnailBot, a modular self reconfigurable robot. The system integrates ArUco marker recognition, optical flow analysis, and IMU data processing into a unified fusion framework, enabling robust and accurate relative positioning for collaborative robotic tasks. Experimental validation demonstrates the effectiveness of the system in realtime operation, with a rule based fusion strategy ensuring reliability across dynamic scenarios. The results highlight the potential for scalable deployment in modular robotic systems.
comment: 7 pages, 7 figures, 4 algorithms
Wireless Center of Pressure Feedback System for Humanoid Robot Balance Control using ESP32-C3
Maintaining stability during the single-support phase is a fundamental challenge in humanoid robotics, particularly in dance robots that require complex maneuvers and high mechanical freedom. Traditional tethered sensor configurations often restrict joint movement and introduce mechanical noises. This study proposes a wireless embedded balance system designed to maintain stability on uneven surfaces. The system utilizes a custom-designed foot unit integrated with four load cells and an ESP32-C3 microcontroller to estimate the Center of Pressure (CoP) in real time. The CoP data were transmitted wirelessly to the main controller to minimize the wiring complexity of the 29-DoF VI-ROSE humanoid robot. A PID control strategy is implemented to adjust the torso, hip, and ankle roll joints based on CoP feedback. Experimental characterization demonstrated high sensor precision with an average measurement error of 14.8 g. Furthermore, the proposed control system achieved a 100% success rate in maintaining balance during single-leg lifting tasks at a 3-degree inclination with optimized PID parameters (Kp=0.10, Kd=0.005). These results validate the efficacy of wireless CoP feedback in enhancing the postural stability of humanoid robots, without compromising their mechanical flexibility.
A Multimodal Human-Centered Framework for Assessing Pedestrian Well-Being in the Wild
Pedestrian well-being is a critical yet rarely measured component of sustainable urban mobility and livable city design. Existing approaches to evaluating pedestrian environments often rely on static, infrastructure-based indices or retrospective surveys, which overlook the dynamic, subjective, and psychophysiological dimensions of everyday walking experience. This paper introduces a multimodal, human-centered framework for assessing pedestrian well-being in the wild by integrating three complementary data streams: continuous physiological sensing, geospatial tracking, and momentary self-reports collected using the Experience Sampling Method. The framework conceptualizes pedestrian experience as a triangulation enabling a holistic understanding of how urban environments influence well-being. The utility of our framework is then demonstrated through a naturalistic case study conducted in the Greater Philadelphia region, in which participants wore research-grade wearable sensors and carried GPS-enabled smartphones during their regular daily activities. Physiological indicators of autonomic nervous system activity, including heart rate variability and electrodermal activity, were synchronized with spatial trajectories and in situ self-reports of stress, affect, and perceived infrastructure conditions. Results illustrate substantial inter- and intra-individual variability in both subjective experience and physiological response, as well as context-dependent patterns associated with traffic exposure, pedestrian infrastructure quality, and environmental enclosure. The findings also suggest that commonly used walkability indices may not fully capture experiential dimensions of pedestrian well-being. By enabling real-world, multimodal measurement of pedestrian experience, the proposed framework offers a scalable and transferable approach for advancing human-centered urban analytics.
Safe Navigation with Zonotopic Tubes: An Elastic Tube-based MPC Framework
This paper presents an elastic tube-based model predictive control (MPC) framework for unknown discrete-time linear systems subject to disturbances. Unlike most existing elastic tube-based MPC methods, we do not assume perfect knowledge of the system model or disturbance realizations bounds. Instead, a conservative zonotopic disturbance set is initialized and iteratively refined using data and prior knowledge: data are used to identify matrix zonotope model sets for the system dynamics, while prior physical knowledge is employed to discard models and disturbances inconsistent with known constraints. This process yields constrained matrix zonotopes representing disturbance realizations and dynamics that enable a principled fusion of offline information with limited online data, improving MPC feasibility and performance. The proposed design leverages closed-loop system characterization to learn and refine control gains that maintain a small tube size. By separating open-loop model mismatch from closed-loop effects in the error dynamics, the method avoids dependence on the size of the state and input operating regions, thereby reducing conservatism. An adaptive co-design of the tube and ancillary feedback ensures $λ$-contractive zonotopic tubes, guaranteeing robust positive invariance, improved feasibility margins, and enhanced disturbance tolerance. We establish recursive feasibility conditions and introduce a polyhedral Lyapunov candidate for the error tube, proving exponential stability of the closed-loop error dynamics under the adaptive tube-gain updates. Simulations demonstrate improved robustness, enlarged feasibility regions, and safe closed-loop performance using only a small amount of online data.
Flocking phase transition and threat responses in bio-inspired autonomous drone swarms
Collective motion inspired by animal groups offers powerful design principles for autonomous aerial swarms. We present a bio-inspired 3D flocking algorithm in which each drone interacts only with a minimal set of influential neighbors, relying solely on local alignment and attraction cues. By systematically tuning these two interaction gains, we map a phase diagram revealing sharp transitions between swarming and schooling, as well as a critical region where susceptibility, polarization fluctuations, and reorganization capacity peak. Outdoor experiments with a swarm of ten drones, combined with simulations using a calibrated flight-dynamics model, show that operating near this transition enhances responsiveness to external disturbances. When confronted with an intruder, the swarm performs rapid collective turns, transient expansions, and reliably recovers high alignment within seconds. These results demonstrate that minimal local-interaction rules are sufficient to generate multiple collective phases and that simple gain modulation offers an efficient mechanism to adjust stability, flexibility, and resilience in drone swarms.
Dyna-Style Reinforcement Learning Modeling and Control of Non-linear Dynamics
Controlling systems with complex, nonlinear dynamics poses a significant challenge, particularly in achieving efficient and robust control. In this paper, we propose a Dyna-Style Reinforcement Learning control framework that integrates Sparse Identification of Nonlinear Dynamics (SINDy) with Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning. SINDy is used to identify a data-driven model of the system, capturing its key dynamics without requiring an explicit physical model. This identified model is used to generate synthetic rollouts that are periodically injected into the reinforcement learning replay buffer during training on the real environment, enabling efficient policy learning with limited data available. By leveraging this hybrid approach, we mitigate the sample inefficiency of traditional model-free reinforcement learning methods while ensuring accurate control of nonlinear systems. To demonstrate the effectiveness of this framework, we apply it to a bi-rotor system as a case study, evaluating its performance in stabilization and trajectory tracking. The results show that our SINDy-TD3 approach achieves superior accuracy and robustness compared to direct reinforcement learning techniques, highlighting the potential of combining data-driven modeling with reinforcement learning for complex dynamical systems.
LSTM-Based Modeling and Reinforcement Learning Control of a Magnetically Actuated Catheter
Autonomous magnetic catheter systems are emerging as a promising approach for the future of minimally invasive interventions. This study presents a novel approach that begins by modeling the nonlinear and hysteretic dynamics of a magnetically actuated catheter system, consists of a magnetic catheter manipulated by servo-controlled magnetic fields generated by two external permanent magnets, and its complex behavior is captured using a Long Short-Term Memory (LSTM) neural network. This model validated against experimental setup's data with a root mean square error (RMSE) of 0.42 mm and 99.8% coverage within 3 mm, establishing it as a reliable surrogate model. This LSTM enables the training of Reinforcement Learning (RL) agents for controlling the system and avoiding damage to the real setup, with the potential for subsequent fine-tuning on the physical system. We implemented Deep Q-Network (DQN) and actor-critic RL controllers, comparing these two agents first for regulation and subsequently for path following along linear and half-sinusoidal paths for the catheter tip. The actor-critic outperforms DQN, offering greater accuracy and faster performance with less error, along with smoother trajectories at a 10 Hz sampling rate, in both regulation and path following compared to the DQN controller. This performance, due to the continuous action space, suits dynamic navigation tasks like navigating curved vascular structures for practical applications.
comment: Presented at the 13th RSI International Conference on Robotics and Mechatronics (ICRoM 2025), Dec. 16-18, 2025, Tehran, Iran
Energy-Gain Control of Time-Varying Systems: Receding Horizon Approximation
Standard formulations of prescribed worstcase disturbance energy-gain control policies for linear time-varying systems depend on all forward model data. In a discrete-time setting, this dependence arises through a backward Riccati recursion. The aim herein is to consider the infinite-horizon $\ell_2$ gain performance of state feedback policies with only finite receding-horizon preview of the model parameters. The proposed synthesis of controllers subject to such a constraint leverages the strict contraction of lifted Riccati operators under uniform controllability and observability. The main approximation result establishes a sufficient number of preview steps for the performance loss to remain below any set tolerance, relative to the baseline gain bound of the associated infinite-preview controller. Aspects of the main result are explored in the context of a numerical example.
comment: Accepted to appear in IEEE TAC
Partitioned robustness analysis of networks with uncertain links
An input-output model for networks with link uncertainty is developed. The main result presents a set of integral quadratic constraints (IQCs) that collectively imply robust stability of the uncertain network dynamics. The model dependency of each IQC is localized according to an edge-based partition of the network graph. The class of admissible network partitions affords scope for trading-off scalability against conservativeness. This is illustrated by numerical example.
comment: Submitted
Universal Transient Stability Analysis: A Large Language Model-Enabled Dynamics Prediction Framework
Existing dynamics prediction frameworks for transient stability analysis (TSA) fail to achieve multi-scenario "universality"--the inherent ability of a single, pre-trained architecture to generalize across diverse operating conditions, unseen faults, and heterogeneous systems. To address this, this paper proposes TSA-LLM, a large language model (LLM)-based universal framework that models multi-variate transient dynamics prediction as a univariate generative task with three key innovations: First, a novel data processing pipeline featuring channel independence decomposition to resolve dimensional heterogeneity, sample-wise normalization to eliminate separate stable or unstable pipelines, and temporal patching for efficient long-sequence modeling; Second, a parameter-efficient freeze-and-finetune strategy that augments the LLM's architecture with dedicated input embedding and output projection layers while freezing core transformer blocks to preserve generic feature extraction capabilities; Third, a two-stage fine-tuning scheme that combines teacher forcing, which feeds the model ground-truth data during initial training, with scheduled sampling, which gradually shifts to leveraging model-generated predictions, to mitigate cumulative errors in long-horizon iterative prediction. Comprehensive testing demonstrates the framework's universality, as TSA-LLM trained solely on the New England 39-bus system achieves zero-shot generalization to mixed stability conditions and unseen faults, and matches expert performance on the larger Iceland 189-bus system with only 5% fine-tuning data. This multi-scenario versatility validates a universal framework that eliminates scenario-specific retraining and achieves scalability via large-scale parameters and cross-scenario training data.
Decentralized water-level balancing for irrigation channels in storage critical operations
A feedback control system is proposed for balancing the deviations of water levels from set-points along open channels subject to uncertain supply-demand mismatch that exceeds individual pool capacity. Decentralized controllers adjust the gate flows between pools to regulate potentially weighted differences between neighbouring water-level errors to zero in steady state. A sequential SISO loop-shaping procedure is developed for the design of each local flow controller based on distributed parameter transfer function models of the channel dynamics. Recursive feasibility of the procedure for relevant performance specifications, and stability of the resulting MIMO closed-loop, are verified by supporting analysis. Both numerical simulations and field trial results are presented.
comment: Accepted to appear in IEEE Transactions on Control Systems Technology
Accelerating Underground Pumped Hydro Energy Storage Scheduling with Decision-Focused Learning
Underground pumped hydro energy storage (UPHES) systems play a critical role in grid-scale energy storage for renewable integration, yet optimal day-ahead scheduling remains computationally prohibitive due to nonlinear turbine performance characteristics and discrete operational modes. This paper presents a decision-focused learning (DFL) framework that addresses the computational-accuracy trade-off in UPHES day-ahead scheduling. The proposed methodology employs neural networks to predict penalty weights that guide recursive linearization, transforming the intractable MINLP into a sequence of convex quadratic programs trained end-to-end via differentiable optimization layers. Case studies across 19 representative Belgian electricity market scenarios demonstrate that the DFL framework effectively navigates the trade-off between solution quality and computation time. As a refinement tool, the framework improves profit by 1.1% over piecewise MIQP baselines. Alternatively, as a real-time scheduler initialized with linear approximations, it achieves a 300-fold speedup (3.87s vs 1205.79s) while maintaining profitability within 3.6% of the piecewise MIQP benchmark. Thus, the presented DFL framework enables flexible prioritization between profit maximization and real-time responsiveness.
Systemization of Knowledge: Resilience and Fault Tolerance in Cyber-Physical Systems
Cyber-Physical Systems (CPS) now support critical infrastructure spanning transportation, energy, manufacturing, medical devices, and autonomous robotics. Their defining characteristic is the tight coupling between digital computation and continuous physical dynamics which enables sophisticated autonomy but also creates highly non-linear failure modes. Small disturbances at sensors, firmware, networks, or physical interfaces can propagate through estimation and control pipelines, producing cascading instabilities that defy traditional single-layer reasoning. This Systematization of Knowledge (SoK) unifies nearly two decades of CPS resilience research into a structured Origin-Layer-Effect (OLE) taxonomy. This taxonomy provides a cross-layer lens for understanding how faults arise, how they propagate, and why unrelated CPS failures often share deep structural similarities. By mapping representative systems including RockDrone, MAYDAY, M2MON, HACMS, Byzantine fault-tolerant control, and learning-based recovery mechanisms onto the taxonomy, we reveal patterns of coverage, persistent blind spots, and recurring pathways of fault amplification. Our analysis identifies four structural gaps that span multiple CPS domains: (1) physical-model manipulation, (2) ML-enabled control without stability guarantees, (3) semantic inconsistencies between formal models and firmware, and (4) inadequate forensic visibility across cyber and physical layers. These insights motivate new directions for resilient CPS design, integrating robust control, runtime monitoring, formal assurance, and system-level visibility.
comment: Systemization of knowledge paper. Approximately 13 pages, 3 figures, 3 tables
Early warning signals for loss of control
Maintaining stability in feedback systems, from aircraft and autonomous robots to biological and physiological systems, relies on monitoring their behavior and continuously adjusting their inputs. Incremental damage can make such control fragile. This tends to go unnoticed until a small perturbation induces instability (i.e. loss of control). Traditional methods in the field of engineering rely on accurate system models to compute a safe set of operating instructions, which become invalid when the, possibly damaged, system diverges from its model. Here we demonstrate that the approach of such a feedback system towards instability can nonetheless be monitored through dynamical indicators of resilience. This holistic system safety monitor does not rely on a system model and is based on the generic phenomenon of critical slowing down, shown to occur in the climate, biology and other complex nonlinear systems approaching criticality. Our findings for engineered devices opens up a wide range of applications involving real-time early warning systems as well as an empirical guidance of resilient system design exploration, or "tinkering". While we demonstrate the validity using drones, the generic nature of the underlying principles suggest that these indicators could apply across a wider class of controlled systems including reactors, aircraft, and self-driving cars.
Robustness Certificates for Neural Networks against Adversarial Attacks
The increasing use of machine learning in safety-critical domains amplifies the risk of adversarial threats, especially data poisoning attacks that corrupt training data to degrade performance or induce unsafe behavior. Most existing defenses lack formal guarantees or rely on restrictive assumptions about the model class, attack type, extent of poisoning, or point-wise certification, limiting their practical reliability. This paper introduces a principled formal robustness certification framework that models gradient-based training as a discrete-time dynamical system (dt-DS) and formulates poisoning robustness as a formal safety verification problem. By adapting the concept of barrier certificates (BCs) from control theory, we introduce sufficient conditions to certify a robust radius ensuring that the terminal model remains safe under worst-case ${\ell}_p$-norm based poisoning. To make this practical, we parameterize BCs as neural networks trained on finite sets of poisoned trajectories. We further derive probably approximately correct (PAC) bounds by solving a scenario convex program (SCP), which yields a confidence lower bound on the certified robustness radius generalizing beyond the training set. Importantly, our framework also extends to certification against test-time attacks, making it the first unified framework to provide formal guarantees in both training and test-time attack settings. Experiments on MNIST, SVHN, and CIFAR-10 show that our approach certifies non-trivial perturbation budgets while being model-agnostic and requiring no prior knowledge of the attack or contamination level.
Lyapunov-Based Kolmogorov-Arnold Network (KAN) Adaptive Control
Recent advancements in Lyapunov-based Deep Neural Networks (Lb-DNNs) have demonstrated improved performance over shallow NNs and traditional adaptive control for nonlinear systems with uncertain dynamics. Existing Lb-DNNs rely on multi-layer perceptrons (MLPs), which lack interpretable insights. As a first step towards embedding interpretable insights in the control architecture, this paper develops the first Lyapunov-based Kolmogorov-Arnold Networks (Lb-KAN) adaptive control method for uncertain nonlinear systems. Unlike MLPs with deep-layer matrix multiplications, KANs provide interpretable insights by direct functional decomposition. In this framework, KANs are employed to approximate uncertain dynamics and embedded into the control law, enabling visualizable functional decomposition. The analytical update laws are constructed from a Lyapunov-based analysis for real-time learning without prior data in a KAN architecture. The analysis uses the distinct KAN approximation theorem to formally bound the reconstruction error and its effect on the performance. The update law is derived by incorporating the KAN's learnable parameters into a Jacobian matrix, enabling stable, analytical, real-time adaptation and ensuring asymptotic convergence of tracking errors. Moreover, the Lb-KAN provides a foundation for interpretability characteristics by achieving visualizable functional decomposition. Simulation results demonstrate that the Lb-KAN controller reduces the function approximation error by 20.2% and 18.0% over the baseline Lb-LSTM and Lb-DNN methods, respectively.
A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning
The age of information (AoI) has become a central measure of data freshness in modern wireless systems, yet existing surveys either focus on classical AoI formulations or provide broad discussions of reinforcement learning (RL) in wireless networks without addressing freshness as a unified learning problem. Motivated by this gap, this survey examines RL specifically through the lens of AoI and generalized freshness optimization. We organize AoI and its variants into native, function-based, and application-oriented families, providing a clearer view of how freshness should be modeled in B5G and 6G systems. Building on this foundation, we introduce a policy-centric taxonomy that reflects the decisions most relevant to freshness, consisting of update-control RL, medium-access RL, risk-sensitive RL, and multi-agent RL. This structure provides a coherent framework for understanding how learning can support sampling, scheduling, trajectory planning, medium access, and distributed coordination. We further synthesize recent progress in RL-driven freshness control and highlight open challenges related to delayed decision processes, stochastic variability, and cross-layer design. The goal is to establish a unified foundation for learning-based freshness optimization in next-generation wireless networks.
Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks
Unmanned Aerial Vehicle (UAV) spectral remote sensing technology is widely used in water quality monitoring. However, in dynamic environments, varying illumination conditions, such as shadows and specular reflection (sun glint), can cause severe spectral distortion, thereby reducing data availability. To maximize the acquisition of high-quality data while ensuring flight safety, this paper proposes an active path planning method for dynamic light and shadow disturbance avoidance. First, a dynamic prediction model is constructed to transform the time-varying light and shadow disturbance areas into three-dimensional virtual obstacles. Second, an improved Interfered Fluid Dynamical System (IFDS) algorithm is introduced, which generates a smooth initial obstacle avoidance path by building a repulsive force field. Subsequently, a Model Predictive Control (MPC) framework is employed for rolling-horizon path optimization to handle flight dynamics constraints and achieve real-time trajectory tracking. Furthermore, a Dynamic Flight Altitude Adjustment (DFAA) mechanism is designed to actively reduce the flight altitude when the observable area is narrow, thereby enhancing spatial resolution. Simulation results show that, compared with traditional PID and single obstacle avoidance algorithms, the proposed method achieves an obstacle avoidance success rate of 98% in densely disturbed scenarios, significantly improves path smoothness, and increases the volume of effective observation data by approximately 27%. This research provides an effective engineering solution for precise UAV water quality monitoring in complex illumination environments.
Electromagnetic Formation Flying Using Alternating Magnetic Field Forces and Control Barrier Functions for State and Input Constraints
This article presents a feedback control algorithm for electromagnetic formation flying with constraints on the satellites' states and control inputs. The algorithm combines several key techniques. First, we use alternating magnetic field forces to decouple the electromagnetic forces between each pair of satellites in the formation. Each satellite's electromagnetic actuation system is driven by a sum of amplitude-modulated sinusoids, where amplitudes are controlled in order to prescribe the time-averaged force between each pair of satellites. Next, the desired time-averaged force is computed from a optimal control that satisfies state constraints (i.e., no collisions and an upper limit on intersatellite speeds) and input constraints (i.e., not exceeding satellite's apparent power capability). The optimal time-averaged force is computed using a single relaxed control barrier function that is obtained by composing multiple control barrier functions that are designed to enforce each state and input constraint. Finally, we demonstrate the satellite formation control method in numerical simulations.
comment: Preprint submitted to IEEE Transactions on Aerospace and Electronic Systems (TAES). arXiv admin note: substantial text overlap with arXiv:2411.16908
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Turbulent Multiple-Scattering Channel Modeling for Ultraviolet Communications: A Monte-Carlo Integration Approach
Modeling of multiple-scattering channels in atmospheric turbulence is essential for the performance analysis of long-distance non-line-of-sight (NLOS) ultraviolet (UV) communications. Existing works on the turbulent channel modeling for NLOS UV communications either focused on single-scattering cases or estimate the turbulent fluctuation effect in an unreliable way based on Monte-Carlo simulation (MCS) approach. In this paper, we establish a comprehensive turbulent multiple-scattering channel model by using a more efficient Monte-Carlo integration (MCI) approach for NLOS UV communications, where both the scattering, absorption, and turbulence effects are considered. Compared with the MCS approach, the MCI approach is more interpretable for estimating the turbulent fluctuation. To achieve this, we first introduce the scattering, absorption, and turbulence effects for NLOS UV communications in turbulent channels. Then we propose the estimation methods based on MCI approach for estimating both the turbulent fluctuation and the distribution of turbulent fading coefficient. Numerical results demonstrate that the turbulence-induced scattering effect can always be ignored for typical UV communication scenarios. Besides, the turbulent fluctuation will increase as either the communication distance increases or the zenith angle decreases, which is compatible with existing experimental results and also with our experimental results. Moreover, we demonstrate numerically that the distribution of the turbulent fading coefficient for UV multiple-scattering channels under all turbulent conditions can be approximated as log-normal distribution; and we also demonstrate both numerically and experimentally that the turbulent fading can be approximated as a Gaussian distribution under weak turbulence.
comment: 28 pages,9 figures
Closed-Loop Control Law for Low Thrust Orbit Transfer with Guaranteed Stability
Electric propulsion is used to maximize payload capacity in communication satellites. These orbit raising maneuvers span several months and hundreds of revolutions, making trajectory design a complex challenge. The literature typically addresses this problem using feedback laws, with Q-law being one of the most prominent approaches. However, Q-law suffers from closed-loop stability issues, limiting its suitability for real-time on-board implementation. In this work, we focus on closed-loop orbit raising rather than offline trajectory planning and address the stability limitations of the Q-law through a Lyapunov based control design. A Lyapunov-guided modification of the classical Q-law is proposed to ensure closed-loop stability and enable real-time implementation. The effectiveness of the proposed method is demonstrated through closed-loop orbit transfers across various scenarios, including co-planar transfers, equatorial to polar orbit transfers, and geostationary transfer orbit (GTO) to geostationary earth orbit (GEO) transfers.
comment: 6 pages, 5 figures, 3 tables -- Accepted for publication in Indian Control Conference 2025
Powered Descent Trajectory Optimization of Chandrayaan-3 using Radau Collocation and Controllable Sets
India achieved a significant milestone on August $23^{\text{rd}}$ 2023, becoming the fourth country to accomplish a soft landing on the Moon. This paper presents the powered descent trajectory design for the Chandrayaan-3 mission. The optimization framework is based on pseudospectral Radau collocation, and controllability-based waypoint refinement is employed to further enhance the robustness of the trajectory against state and control perturbations. Furthermore, the trade-off between fuel consumption and robustness is explicitly quantified, providing insights into the practical considerations of mission planning.
comment: 6 pages, 6 figure, Accepted for publication in Indian Control Conference 2025
Reaction-diffusion systems derived from kinetic theory for Multiple Sclerosis
We present a mathematical study for the development of Multiple Sclerosis in which a spatio-temporal kinetic { theory} model describes, at the mesoscopic level, the dynamics of a high number of interacting agents. We consider both interactions among different populations of human cells and the motion of immune cells, stimulated by cytokines. Moreover, we reproduce the consumption of myelin sheath due to anomalously activated lymphocytes and its restoration by oligodendrocytes. Successively, we fix a small time parameter and assume that the considered processes occur at different scales. This allows us to perform a formal limit, obtaining macroscopic reaction-diffusion equations for the number densities with a chemotaxis term. A natural step is then to study the system, inquiring about the formation of spatial patterns through a Turing instability analysis of the problem and basing the discussion on the microscopic parameters of the model. In particular, we get spatial patterns oscillating in time that may reproduce brain lesions characteristic of different phases of the pathology.
Linear Stability Analysis of a Constant Quaternion Difference Attitude Controller
It is quite often claimed, and correctly so, that linear methods cannot achieve global stability results for attitude control, and conversely that nonlinear control is essential in order to achieve (almost) globally stable tracking of general attitude trajectories. On account of this definitive result, and also because of the existence of powerful nonlinear control techniques, there has been relatively very little work analyzing the limits and performance of linear attitude control. It is the purpose of this paper to provide a characterization of the stability achievable for one class of linear attitude control problems, namely those leading to a constant quaternion difference. In this paper, we analytically derive a critical error angle below which linearized dynamics lead to natural marginal stability for such a system, and above which the system is unstable. The dynamics are then used to derive a locally stable linear attitude controller whose performance is validated using simulations.
Towards Hierarchical Multi-Agent Decision-Making for Uncertainty-Aware EV Charging
Recent advances in bidirectional EV charging and discharging systems have spurred interest in workplace applications. However, real-world deployments face various dynamic factors, such as fluctuating electricity prices and uncertain EV departure times, that hinder effective energy management. To address these issues and minimize building electricity costs while meeting EV charging requirements, we design a hierarchical multi-agent structure in which a high-level agent coordinates overall charge or discharge decisions based on real-time pricing, while multiple low-level agents manage individual power level accordingly. For uncertain EV departure times, we propose a novel uncertainty-aware critic augmentation mechanism for low-level agents that improves the evaluation of power-level decisions and ensures robust control under such uncertainty. Building upon these two key designs, we introduce HUCA, a real-time charging control framework that coordinates energy supply among the building and EVs. Experiments on real-world electricity datasets show that HUCA significantly reduces electricity costs and maintains competitive performance in meeting EV charging requirements under both simulated certain and uncertain departure scenarios. The results further highlight the importance of hierarchical control and the proposed critic augmentation under the uncertain departure scenario. A case study illustrates HUCA's capability to allocate energy between the building and EVs in real time, underscoring its potential for practical use.
Nonholonomic Robot Parking by Feedback -- Part II: Nonmodular, Inverse Optimal, Adaptive, Prescribed/Fixed-Time and Safe Designs
For the unicycle system, we provide constructive methods for the design of feedback laws that have one or more of the following properties: being nonmodular and globally exponentially stabilizing, inverse optimal, robust to arbitrary decrease or increase of input coefficients, adaptive, prescribed/fixed-time stabilizing, and safe (ensuring the satisfaction of state constraints). Our nonmodular backstepping feedbacks are implementable with either unidirectional or bidirectional velocity actuation. Thanks to constructing families of strict CLFs for the unicycle, we introduce a general design framework and families of feedback laws for the unicycle, which are inverse optimal with respect to meaningful costs. These inverse optimal feedback laws are endowed with robustness to actuator uncertainty and arbitrarily low input saturation due to the unicycle's driftlessness. Besides ensuring robustness to unknown input coefficients, we also develop adaptive laws for these unknown coefficients, enabling the achievement of good transient performance with lower initial control effort. Additionally, we develop controllers that achieve stabilization within a user-specified time horizon using two systematic methods: time-dilated prescribed-time design with smooth-in-time convergence to zero of both the states and the inputs and homogeneity-based fixed-time control that provides an explicit bound on the settling time. Finally, with a nonovershooting design we guarantee strictly forward motion without curb violation. This article, along with its Part I, lays a broad constructive design foundation for stabilization of the nonholonomic unicycle.
comment: 16 pages. arXiv admin note: text overlap with arXiv:2509.25563
Adaptive Trajectory Bundle Method for Roll-to-Roll Manufacturing Systems
Roll-to-roll (R2R) manufacturing requires precise tension and velocity control under operational constraints. Model predictive control demands gradient computation, while sampling-based methods like MPPI struggle with hard constraint satisfaction. This paper presents an adaptive trajectory bundle method that achieves rigorous constraint handling through derivative-free sequential convex programming. The approach approximates nonlinear dynamics and costs via interpolated sample bundles, replacing Taylor-series linearization with function-value interpolation. Adaptive trust region and penalty mechanisms automatically adjust based on constraint violation metrics, eliminating manual tuning. We establish convergence guarantees proving finite-time feasibility and convergence to stationary points of the constrained problem. Simulations on a six-zone R2R system demonstrate that the adaptive method achieves 4.3\% lower tension RMSE than gradient-based MPC and 11.1\% improvement over baseline TBM in velocity transients, with superior constraint satisfaction compared to MPPI variants. Experimental validation on an R2R dry transfer system confirms faster settling and reduced overshoot relative to LQR and non-adaptive TBM.
Optimal Control with Natural Images: Efficient Reinforcement Learning using Overcomplete Sparse Codes
Optimal control and sequential decision making are widely used in many complex tasks. Optimal control over a sequence of natural images is a first step towards understanding the role of vision in control. Here, we formalize this problem as a reinforcement learning task, and derive general conditions under which an image includes enough information to implement an optimal policy. Reinforcement learning is shown to provide a computationally efficient method for finding optimal policies when natural images are encoded into "efficient" image representations. This is demonstrated by introducing a new reinforcement learning benchmark that easily scales to large numbers of states and long horizons. In particular, by representing each image as an overcomplete sparse code, we are able to efficiently solve an optimal control task that is orders of magnitude larger than those tasks solvable using complete codes. Theoretical justification for this behaviour is provided. This work also demonstrates that deep learning is not necessary for efficient optimal control with natural images.
Nonholonomic Robot Parking by Feedback -- Part I: Modular Strict CLF Designs
It has been known in the robotics literature since about 1995 that, in polar coordinates, the nonholonomic unicycle is asymptotically stabilizable by smooth feedback, even globally. We introduce a modular design framework that selects the forward velocity to decouple the radial coordinate, allowing the steering subsystem to be stabilized independently. Within this structure, we develop families of feedback laws using passivity, backstepping, and integrator forwarding. Each law is accompanied by a strict control Lyapunov function, including barrier variants that enforce angular constraints. These strict CLFs provide constructive class KL convergence estimates and enable eigenvalue assignment at the target equilibrium. The framework generalizes and extends prior modular and nonmodular approaches, while preparing the ground for inverse optimal and adaptive redesigns in the sequel paper.
comment: arXiv admin note: text overlap with arXiv:2509.25575
Safe Control of Multi-Agent Systems with Minimal Communication
In many multi-agent systems, communication is limited by bandwidth, latency, and energy constraints. Designing controllers that achieve coordination and safety with minimal communication is critical for scalable and reliable deployment. This paper presents a method for designing controllers that minimize inter-agent communication in multi-agent systems while satisfying safety and coordination requirements, while conforming to communication delay constraints. The control synthesis problem is cast as a rank minimization problem, where a convex relaxation is obtained via system level synthesis. Simulation results on various tasks, including trajectory tracking with relative and heterogeneous sensing, demonstrate that the proposed method significantly reduces inter-agent transmission compared to baseline approaches.
comment: to appear at 2025 IEEE Conference on Decision and Control (CDC)
Robotics
LightTact: A Visual-Tactile Fingertip Sensor for Deformation-Independent Contact Sensing
Contact often occurs without macroscopic surface deformation, such as during interaction with liquids, semi-liquids, or ultra-soft materials. Most existing tactile sensors rely on deformation to infer contact, making such light-contact interactions difficult to perceive robustly. To address this, we present LightTact, a visual-tactile fingertip sensor that makes contact directly visible via a deformation-independent, optics-based principle. LightTact uses an ambient-blocking optical configuration that suppresses both external light and internal illumination at non-contact regions, while transmitting only the diffuse light generated at true contacts. As a result, LightTact produces high-contrast raw images in which non-contact pixels remain near-black (mean gray value < 3) and contact pixels preserve the natural appearance of the contacting surface. Built on this, LightTact achieves accurate pixel-level contact segmentation that is robust to material properties, contact force, surface appearance, and environmental lighting. We further integrate LightTact on a robotic arm and demonstrate manipulation behaviors driven by extremely light contact, including water spreading, facial-cream dipping, and thin-film interaction. Finally, we show that LightTact's spatially aligned visual-tactile images can be directly interpreted by existing vision-language models, enabling resistor value reasoning for robotic sorting.
LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving
Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance. Motivated by this gap, we empirically study how misalignment between privileged expert demonstrations and sensor-based student observations can limit the effectiveness of imitation learning. More precisely, experts have significantly higher visibility (e.g., ignoring occlusions) and far lower uncertainty (e.g., knowing other vehicles' actions), making them difficult to imitate reliably. Furthermore, navigational intent (i.e., the route to follow) is under-specified in student models at test time via only a single target point. We demonstrate that these asymmetries can measurably limit driving performance in CARLA and offer practical interventions to address them. After careful modifications to narrow the gaps between expert and student, our TransFuser v6 (TFv6) student policy achieves a new state of the art on all major publicly available CARLA closed-loop benchmarks, reaching 95 DS on Bench2Drive and more than doubling prior performances on Longest6~v2 and Town13. Additionally, by integrating perception supervision from our dataset into a shared sim-to-real pipeline, we show consistent gains on the NAVSIM and Waymo Vision-Based End-to-End driving benchmarks. Our code, data, and models are publicly available at https://github.com/autonomousvision/lead.
Drift-Corrected Monocular VIO and Perception-Aware Planning for Autonomous Drone Racing
The Abu Dhabi Autonomous Racing League(A2RL) x Drone Champions League competition(DCL) requires teams to perform high-speed autonomous drone racing using only a single camera and a low-quality inertial measurement unit -- a minimal sensor set that mirrors expert human drone racing pilots. This sensor limitation makes the system susceptible to drift from Visual-Inertial Odometry (VIO), particularly during long and fast flights with aggressive maneuvers. This paper presents the system developed for the championship, which achieved a competitive performance. Our approach corrected VIO drift by fusing its output with global position measurements derived from a YOLO-based gate detector using a Kalman filter. A perception-aware planner generated trajectories that balance speed with the need to keep gates visible for the perception system. The system demonstrated high performance, securing podium finishes across multiple categories: third place in the AI Grand Challenge with top speed of 43.2 km/h, second place in the AI Drag Race with over 59 km/h, and second place in the AI Multi-Drone Race. We detail the complete architecture and present a performance analysis based on experimental data from the competition, contributing our insights on building a successful system for monocular vision-based autonomous drone flight.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
FAR-AVIO: Fast and Robust Schur-Complement Based Acoustic-Visual-Inertial Fusion Odometry with Sensor Calibration
Underwater environments impose severe challenges to visual-inertial odometry systems, as strong light attenuation, marine snow and turbidity, together with weakly exciting motions, degrade inertial observability and cause frequent tracking failures over long-term operation. While tightly coupled acoustic-visual-inertial fusion, typically implemented through an acoustic Doppler Velocity Log (DVL) integrated with visual-inertial measurements, can provide accurate state estimation, the associated graph-based optimization is often computationally prohibitive for real-time deployment on resource-constrained platforms. Here we present FAR-AVIO, a Schur-Complement based, tightly coupled acoustic-visual-inertial odometry framework tailored for underwater robots. FAR-AVIO embeds a Schur complement formulation into an Extended Kalman Filter(EKF), enabling joint pose-landmark optimization for accuracy while maintaining constant-time updates by efficiently marginalizing landmark states. On top of this backbone, we introduce Adaptive Weight Adjustment and Reliability Evaluation(AWARE), an online sensor health module that continuously assesses the reliability of visual, inertial and DVL measurements and adaptively regulates their sigma weights, and we develop an efficient online calibration scheme that jointly estimates DVL-IMU extrinsics, without dedicated calibration manoeuvres. Numerical simulations and real-world underwater experiments consistently show that FAR-AVIO outperforms state-of-the-art underwater SLAM baselines in both localization accuracy and computational efficiency, enabling robust operation on low-power embedded platforms. Our implementation has been released as open source software at https://far-vido.gitbook.io/far-vido-docs.
Design and Modeling of a Simple-Structured Continuously Variable Transmission Utilizing Shape Memory Alloy Superelasticity for Twisted String Actuator
Twisted String Actuators (TSAs) are widely used in robotics but suffer from a limited range of Transmission Ratio (TR) variation, restricting their efficiency under varying loads.To overcome this, we propose a novel lightweight, simple-structured Continuously Variable Transmission (CVT) mechanism for TSA utilizing Shape Memory Alloy (SMA) superelasticity. The CVT mechanism consists solely of a pair of highly lightweight superelastic SMA rods connecting the ends of twisted strings. These rods deform under external loads, adjusting the inter-string distance to enable continuous TR variation.We develop a comprehensive theoretical model that integrates three critical nonlinearities
comment: 15pages,Fig14
Pneumatic bladder links with wide range of motion joints for articulated inflatable robots IROS2024
Exploration of various applications is the frontier of research on inflatable robots. We proposed an articulated robots consisting of multiple pneumatic bladder links connected by rolling contact joints called Hillberry joints. The bladder link is made of a double-layered structure of tarpaulin sheet and polyurethane sheet, which is both airtight and flexible in shape. The integration of the Hilberry joint into an inflatable robot is also a new approach. The rolling contact joint allows wide range of motion of $\pm 150 ^{\circ}$, the largest among the conventional inflatable joints. Using the proposed mechanism for inflatable robots, we demonstrated moving a 500 g payload with a 3-DoF arm and lifting 3.4 kg and 5 kg payloads with 2-DoF and 1-DoF arms, respectively. We also experimented with a single 3-DoF inflatable leg attached to a dolly to show that the proposed structure worked for legged locomotion.
comment: Accepted at IROS2024 (IEEE/RSJ International Conference on Intelligent Robots and Systems)
KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System
Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, making it difficult to capture the complex logic underlying decision-making through imitation or limited reinforcement rewards. To address this, we propose KnowVal, a new autonomous driving system that enables visual-language reasoning through the synergistic integration of open-world perception and knowledge retrieval. Specifically, we construct a comprehensive driving knowledge graph that encodes traffic laws, defensive driving principles, and ethical norms, complemented by an efficient LLM-based retrieval mechanism tailored for driving scenarios. Furthermore, we develop a human-preference dataset and train a Value Model to guide interpretable, value-aligned trajectory assessment. Experimental results show that our method substantially improves planning performance while remaining compatible with existing architectures. Notably, KnowVal achieves the lowest collision rate on nuScenes and state-of-the-art results on Bench2Drive.
ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge
Vision-Language-Action (VLA) models have emerged as a unified paradigm for robotic perception and control, enabling emergent generalization and long-horizon task execution. However, their deployment in dynamic, real-world environments is severely hin dered by high inference latency. While smooth robotic interaction requires control frequencies of 20 to 30 Hz, current VLA models typi cally operate at only 3-5 Hz on edge devices due to the memory bound nature of autoregressive decoding. Existing optimizations often require extensive retraining or compromise model accuracy. To bridge this gap, we introduce ActionFlow, a system-level inference framework tailored for resource-constrained edge plat forms. At the core of ActionFlow is a Cross-Request Pipelin ing strategy, a novel scheduler that redefines VLA inference as a macro-pipeline of micro-requests. The strategy intelligently batches memory-bound Decode phases with compute-bound Prefill phases across continuous time steps to maximize hardware utilization. Furthermore, to support this scheduling, we propose a Cross Request State Packed Forward operator and a Unified KV Ring Buffer, which fuse fragmented memory operations into efficient dense computations. Experimental results demonstrate that ActionFlow achieves a 2.55x improvement in FPS on the OpenVLA-7B model without retraining, enabling real-time dy namic manipulation on edge hardware. Our work is available at https://anonymous.4open.science/r/ActionFlow-1D47.
Finite-Time Control Based on Differential Flatness for Wheeled Mobile Robots with Experimental Validation
A robust tracking control strategy is designed to empower wheeled mobile robots (WMRs) to track predetermined routes while operating in diverse fields and encountering disturbances like strong winds or uneven path conditions, which affect tracking performance. Ensuring the applicability of this tracking method in real-world scenarios is essential. To accomplish this, the WMR model is initially transformed into a linear canonical form by leveraging the differential flatness of its kinematic model, facilitating controller design. Subsequently, a novel integral nonlinear hyperplane-based sliding mode control (INH-SMC) technique is proposed for WMR under disturbances. The stability of the technique is analyzed and verified. Finally, its practical viability is demonstrated through a comparative real-world indoor experiment on a TurtleBot3 WMR subjected to disturbances, confirming the feasibility and efficacy of the proposed approach.
UrbanV2X: A Multisensory Vehicle-Infrastructure Dataset for Cooperative Navigation in Urban Areas SC 2025
Due to the limitations of a single autonomous vehicle, Cellular Vehicle-to-Everything (C-V2X) technology opens a new window for achieving fully autonomous driving through sensor information sharing. However, real-world datasets supporting vehicle-infrastructure cooperative navigation in complex urban environments remain rare. To address this gap, we present UrbanV2X, a comprehensive multisensory dataset collected from vehicles and roadside infrastructure in the Hong Kong C-V2X testbed, designed to support research on smart mobility applications in dense urban areas. Our onboard platform provides synchronized data from multiple industrial cameras, LiDARs, 4D radar, ultra-wideband (UWB), IMU, and high-precision GNSS-RTK/INS navigation systems. Meanwhile, our roadside infrastructure provides LiDAR, GNSS, and UWB measurements. The entire vehicle-infrastructure platform is synchronized using the Precision Time Protocol (PTP), with sensor calibration data provided. We also benchmark various navigation algorithms to evaluate the collected cooperative data. The dataset is publicly available at https://polyu-taslab.github.io/UrbanV2X/.
comment: 8 pages, 9 figures, IEEE ITSC 2025
Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation
Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at a single unified frequency. As a result, policy performance is constrained by the low inference speed of large VLMs. This mandatory synchronous execution severely limits control stability and real-time performance in whole-body robotic manipulation, which involves more joints, larger motion spaces, and dynamically changing views. We introduce a truly asynchronous Fast-Slow VLA framework (DuoCore-FS), organizing the system into a fast pathway for high-frequency action generation and a slow pathway for rich VLM reasoning. The system is characterized by two key features. First, a latent representation buffer bridges the slow and fast systems. It stores instruction semantics and action-reasoning representation aligned with the scene-instruction context, providing high-level guidance to the fast pathway. Second, a whole-body action tokenizer provides a compact, unified representation of whole-body actions. Importantly, the VLM and action expert are still jointly trained end-to-end, preserving unified policy learning while enabling asynchronous execution. DuoCore-FS supports a 3B-parameter VLM while achieving 30 Hz whole-body action-chunk generation, approximately three times as fast as prior VLA models with comparable model sizes. Real-world whole-body manipulation experiments demonstrate improved task success rates and significantly enhanced responsiveness compared to synchronous Fast-Slow VLA baselines. The implementation of DuoCore-FS, including training, inference, and deployment, is provided to commercial users by Astribot as part of the Astribot robotic platform.
LoLA: Long Horizon Latent Action Learning for General Robot Manipulation
The capability of performing long-horizon, language-guided robotic manipulation tasks critically relies on leveraging historical information and generating coherent action sequences. However, such capabilities are often overlooked by existing Vision-Language-Action (VLA) models. To solve this challenge, we propose LoLA (Long Horizon Latent Action Learning), a framework designed for robot manipulation that integrates long-term multi-view observations and robot proprioception to enable multi-step reasoning and action generation. We first employ Vision-Language Models to encode rich contextual features from historical sequences and multi-view observations. We further introduces a key module, State-Aware Latent Re-representation, which transforms visual inputs and language commands into actionable robot motion space. Unlike existing VLA approaches that merely concatenate robot proprioception (e.g., joint angles) with VL embeddings, this module leverages such robot states to explicitly ground VL representations in physical scale through a learnable "embodiment-anchored" latent space. We trained LoLA on diverse robotic pre-training datasets and conducted extensive evaluations on simulation benchmarks (SIMPLER and LIBERO), as well as two real-world tasks on Franka and Bi-Manual Aloha robots. Results show that LoLA significantly outperforms prior state-of-the-art methods (e.g., pi0), particularly in long-horizon manipulation tasks.
Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS)
Automating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such as the calyx, are often occluded. Recently developed pose estimation methods no longer rely on these key points, but still require them for annotations, making annotating challenging and time-consuming. Due to the abovementioned occlusions, there can be conflicting and missing annotations of the same fruit between different images. Novel 3D reconstruction methods can be used to simplify annotating and enlarge datasets. We propose a novel pipeline consisting of 3D Gaussian Splatting to reconstruct an orchard scene, simplified annotations, automated projection of the annotations to images, and the training and evaluation of a pose estimation method. Using our pipeline, 105 manual annotations were required to obtain 28,191 training labels, a reduction of 99.6%. Experimental results indicated that training with labels of fruits that are $\leq95\%$ occluded resulted in the best performance, with a neutral F1 score of 0.927 on the original images and 0.970 on the rendered images. Adjusting the size of the training dataset had small effects on the model performance in terms of F1 score and pose estimation accuracy. It was found that the least occluded fruits had the best position estimation, which worsened as the fruits became more occluded. It was also found that the tested pose estimation method was unable to correctly learn the orientation estimation of apples.
comment: 33 pages, excluding appendices. 17 figures
Detecting Non-Optimal Decisions of Embodied Agents via Diversity-Guided Metamorphic Testing
As embodied agents advance toward real-world deployment, ensuring optimal decisions becomes critical for resource-constrained applications. Current evaluation methods focus primarily on functional correctness, overlooking the non-functional optimality of generated plans. This gap can lead to significant performance degradation and resource waste. We identify and formalize the problem of Non-optimal Decisions (NoDs), where agents complete tasks successfully but inefficiently. We present NoD-DGMT, a systematic framework for detecting NoDs in embodied agent task planning via diversity-guided metamorphic testing. Our key insight is that optimal planners should exhibit invariant behavioral properties under specific transformations. We design four novel metamorphic relations capturing fundamental optimality properties: position detour suboptimality, action optimality completeness, condition refinement monotonicity, and scene perturbation invariance. To maximize detection efficiency, we introduce a diversity-guided selection strategy that actively selects test cases exploring different violation categories, avoiding redundant evaluations while ensuring comprehensive diversity coverage. Extensive experiments on the AI2-THOR simulator with four state-of-the-art planning models demonstrate that NoD-DGMT achieves violation detection rates of 31.9% on average, with our diversity-guided filter improving rates by 4.3% and diversity scores by 3.3 on average. NoD-DGMT significantly outperforms six baseline methods, with 16.8% relative improvement over the best baseline, and demonstrates consistent superiority across different model architectures and task complexities.
Learning Skills from Action-Free Videos
Learning from videos offers a promising path toward generalist robots by providing rich visual and temporal priors beyond what real robot datasets contain. While existing video generative models produce impressive visual predictions, they are difficult to translate into low-level actions. Conversely, latent-action models better align videos with actions, but they typically operate at the single-step level and lack high-level planning capabilities. We bridge this gap by introducing Skill Abstraction from Optical Flow (SOF), a framework that learns latent skills from large collections of action-free videos. Our key idea is to learn a latent skill space through an intermediate representation based on optical flow that captures motion information aligned with both video dynamics and robot actions. By learning skills in this flow-based latent space, SOF enables high-level planning over video-derived skills and allows for easier translation of these skills into actions. Experiments show that our approach consistently improves performance in both multitask and long-horizon settings, demonstrating the ability to acquire and compose skills directly from raw visual data.
Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting
While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup", where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that equips frozen VLAs with top-down selective attention. VAP treats the reference images as a non-parametric visual memory, grounds the personal object in the scene through open-vocabulary detection and embedding-based matching, and then injects this grounding as a visual prompt by highlighting the object and rewriting the instruction. We construct two simulation benchmarks, Personalized-SIMPLER and Personalized-VLABench, and a real-world tabletop benchmark to evaluate personalized manipulation across multiple robots and tasks. Experiments show that VAP consistently outperforms generic policies and token-learning baselines in both success rate and correct-object manipulation, helping to bridge the gap between semantic understanding and instance-level control.
YCB-Handovers Dataset: Analyzing Object Weight Impact on Human Handovers to Adapt Robotic Handover Motion
This paper introduces the YCB-Handovers dataset, capturing motion data of 2771 human-human handovers with varying object weights. The dataset aims to bridge a gap in human-robot collaboration research, providing insights into the impact of object weight in human handovers and readiness cues for intuitive robotic motion planning. The underlying dataset for object recognition and tracking is the YCB (Yale-CMU-Berkeley) dataset, which is an established standard dataset used in algorithms for robotic manipulation, including grasping and carrying objects. The YCB-Handovers dataset incorporates human motion patterns in handovers, making it applicable for data-driven, human-inspired models aimed at weight-sensitive motion planning and adaptive robotic behaviors. This dataset covers an extensive range of weights, allowing for a more robust study of handover behavior and weight variation. Some objects also require careful handovers, highlighting contrasts with standard handovers. We also provide a detailed analysis of the object's weight impact on the human reaching motion in these handovers.
comment: Paper presented at the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2025
Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication
Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.
comment: 9 pages, 3 figures, 2 tables
A General Purpose Method for Robotic Interception of Non-Cooperative Dynamic Targets
This paper presents a general purpose framework for autonomous, vision-based interception of dynamic, non-cooperative targets, validated across three distinct mobility platforms: an unmanned aerial vehicle (UAV), a four-wheeled ground rover, and an air-thruster spacecraft testbed. The approach relies solely on a monocular camera with fiducials for target tracking and operates entirely in the local observer frame without the need for global information. The core contribution of this work is a streamlined and general approach to autonomous interception that can be adapted across robots with varying dynamics, as well as our comprehensive study of the robot interception problem across heterogenous mobility systems under limited observability and no global localization. Our method integrates (1) an Extended Kalman Filter for relative pose estimation amid intermittent measurements, (2) a history-conditioned motion predictor for dynamic target trajectory propagation, and (3) a receding-horizon planner solving a constrained convex program in real time to ensure time-efficient and kinematically feasible interception paths. Our operating regime assumes that observability is restricted by partial fields of view, sensor dropouts, and target occlusions. Experiments are performed in these conditions and include autonomous UAV landing on dynamic targets, rover rendezvous and leader-follower tasks, and spacecraft proximity operations. Results from simulated and physical experiments demonstrate robust performance with low interception errors (both during station-keeping and upon scenario completion), high success rates under deterministic and stochastic target motion profiles, and real-time execution on embedded processors such as the Jetson Orin, VOXL2, and Raspberry Pi 5. These results highlight the framework's generalizability, robustness, and computational efficiency.
comment: 10 pages, 11 figures, 5 tables. Accepted to IEEE Aerospace Conference 2026
Fixed-time control with prescribed performance for path following of underwater gliders
Underwater gliders are increasingly deployed in challenging missions - such as hurricane-season observations and long-endurance environmental monitoring - where strong currents and turbulence pose significant risks to navigation safety. To address these practical challenges, this paper presents a fixed-time prescribed performance control scheme for the 3D path following of underwater gliders subject to model uncertainties and environmental disturbances. The primary contribution is the integration of a finite-time performance function within a fixed-time control framework. This synthesis ensures that the tracking errors are constrained within prescribed performance bounds and converge to a compact set within a fixed time, independent of initial conditions. A second key contribution is the development of a fixed-time sliding mode disturbance observer that provides accurate finite-time estimation of lumped disturbances, enhancing the system's robustness. Integrated with an iLOS guidance law, the proposed controller enables precise and safe waypoint following. Numerical simulations demonstrate that the proposed method outperforms conventional sliding mode and prescribed performance controllers in tracking accuracy, convergence speed, and control effort smoothness, validating its efficacy for robust underwater navigation.
comment: 22 pages, 13 figures, 2 tables, Submitted to Ocean Engineering
Anytime Metaheuristic Framework for Global Route Optimization in Expected-Time Mobile Search
Expected-time mobile search (ETS) is a fundamental robotics task where a mobile sensor navigates an environment to minimize the expected time required to locate a hidden object. Global route optimization for ETS in static 2D continuous environments remains largely underexplored due to the intractability of objective evaluation, stemming from the continuous nature of the environment and the interplay of motion and visibility constraints. Prior work has addressed this through partial discretization, leading to discrete-sensing formulations tackled via utility-greedy heuristics. Others have taken an indirect approach by heuristically approximating the objective using minimum latency problems on fixed graphs, enabling global route optimization via efficient metaheuristics. This paper builds on and significantly extends the latter by introducing Milaps (Minimum latency problems), a model-based solution framework for ETS. Milaps integrates novel auxiliary objectives and adapts a recent anytime metaheuristic for the traveling deliveryman problem, chosen for its strong performance under tight runtime constraints. Evaluations on a novel large-scale dataset demonstrate superior trade-offs between solution quality and runtime compared to state-of-the-art baselines. The best-performing strategy rapidly generates a preliminary solution, assigns static weights to sensing configurations, and optimizes global costs metaheuristically. Additionally, a qualitative study highlights the framework's flexibility across diverse scenarios.
comment: 20 pages, 42 figures (including subfigures); submitted to IEEE Transactions on Robotics (T-RO) in February 2025
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations
Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.
comment: 8 pages, 4 figures
Categorical Equivariant Deep Learning: Category-Equivariant Neural Networks and Universal Approximation Theorems
We develop a theory of category-equivariant neural networks (CENNs) that unifies group/groupoid-equivariant networks, poset/lattice-equivariant networks, graph and sheaf neural networks. Equivariance is formulated as naturality in a topological category with Radon measures. Formulating linear and nonlinear layers in the categorical setup, we prove the equivariant universal approximation theorem in the general setting: the class of finite-depth CENNs is dense in the space of continuous equivariant transformations. We instantiate the framework for groups/groupoids, posets/lattices, graphs and cellular sheaves, deriving universal approximation theorems for them in a systematic manner. Categorical equivariant deep learning thus allows us to expand the horizons of equivariant deep learning beyond group actions, encompassing not only geometric symmetries but also contextual and compositional symmetries.
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Verifiers--functions assigning rewards to agent behavior--have been key for AI progress in domains like math and code. However, extending gains to domains without clear-cut success criteria (e.g., computer use) remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal Large Language Models (MLLMs) emerge as a promising solution, given their world knowledge, human-preference alignment, and reasoning skills. We evaluate MLLMs as verifiers across web navigation, computer use, and robotic manipulation, and identify a critical limitation: a strong tendency to over-validate agent behavior, a phenomenon we term agreement bias. This bias is pervasive across models, resilient to test-time scaling, and poses risks to existing methods relying on MLLM evaluations. We discuss methods to evaluate and improve MLLM verifiers and introduce Self-Grounded Verification (SGV), a lightweight method that harnesses MLLMs' own sampling mechanisms by modulating (un)conditional generation to better leverage their knowledge, alignment, and reasoning. SGV operates in two steps: first, the MLLM is elicited to generate broad priors about desired behavior, independent of the data under evaluation. Then, conditioned on self-generated priors, it reasons over and evaluates a candidate trajectory. SGV yields more human-aligned evaluations with gains of up to 25pp in failure detection, 14pp in accuracy, and benefits extending to downstream applications. In self-refinement and online supervision, SGV boosts task completion of a GUI specialist in OSWorld, a diffusion policy in robomimic, and a ReAct agent in VisualWebArena--setting a new state of the art, surpassing the previous best by 20pp. We release an updated version of VisualWebArena featuring more human-aligned evaluators, high-fidelity environment parallelism, and speedups of over 10x.
comment: Our code, models, and data are publicly available at https://mshalimay.github.io/agreement-bias-sgv/
Learning Stack-of-Tasks Management for Redundant Robots
This paper presents a novel framework for automatically learning complete Stack-of-Tasks (SoT) controllers for redundant robotic systems, including task priorities, activation logic, and control parameters. Unlike classical SoT pipelines-where task hierarchies are manually defined and tuned-our approach optimizes the full SoT structure directly from a user-specified cost function encoding intuitive preferences such as safety, precision, manipulability, or execution speed. The method combines Genetic Programming with simulation-based evaluation to explore both discrete (priority order, task activation) and continuous (gains, trajectory durations) components of the controller. We validate the framework on a dual-arm mobile manipulator (the ABB mobile-YuMi research platform), demonstrating robust convergence across multiple cost definitions, automatic suppression of irrelevant tasks, and strong resilience to distractors. Learned SoTs exhibit expert-like hierarchical structure and adapt naturally to multi-objective trade-offs. Crucially, all controllers transfer from Gazebo simulation to the real robot, achieving safe and precise motion without additional tuning. Experiments in static and dynamic environments show reliable obstacle avoidance, high tracking accuracy, and predictable behavior in the presence of humans. The proposed method provides an interpretable and scalable alternative to manual SoT design, enabling rapid, user-driven generation of task execution hierarchies for complex robotic systems.
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
DRAE: Dynamic Retrieval-Augmented Expert Networks for Lifelong Learning and Task Adaptation in Robotics ACL 2025
We introduce Dynamic Retrieval-Augmented Expert Networks (DRAE), a groundbreaking architecture that addresses the challenges of lifelong learning, catastrophic forgetting, and task adaptation by combining the dynamic routing capabilities of Mixture-of-Experts (MoE); leveraging the knowledge-enhancement power of Retrieval-Augmented Generation (RAG); incorporating a novel hierarchical reinforcement learning (RL) framework; and coordinating through ReflexNet-SchemaPlanner-HyperOptima (RSHO).DRAE dynamically routes expert models via a sparse MoE gating mechanism, enabling efficient resource allocation while leveraging external knowledge through parametric retrieval (P-RAG) to augment the learning process. We propose a new RL framework with ReflexNet for low-level task execution, SchemaPlanner for symbolic reasoning, and HyperOptima for long-term context modeling, ensuring continuous adaptation and memory retention. Experimental results show that DRAE significantly outperforms baseline approaches in long-term task retention and knowledge reuse, achieving an average task success rate of 82.5% across a set of dynamic robotic manipulation tasks, compared to 74.2% for traditional MoE models. Furthermore, DRAE maintains an extremely low forgetting rate, outperforming state-of-the-art methods in catastrophic forgetting mitigation. These results demonstrate the effectiveness of our approach in enabling flexible, scalable, and efficient lifelong learning for robotics.
comment: Accepted to the main conference of the Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors
Interactive reinforcement learning (IRL) has shown promise in enabling autonomous agents and robots to learn complex behaviours from human teachers, yet the dynamics of teacher selection remain poorly understood. This paper reveals an unexpected phenomenon in IRL: when given a choice between teachers with different reward structures, learning agents overwhelmingly prefer conservative, low-reward teachers (93.16% selection rate) over those offering 20x higher rewards. Through 1,250 experimental runs in navigation tasks with multiple expert teachers, we discovered: (1) Conservative bias dominates teacher selection: agents systematically choose the lowest-reward teacher, prioritising consistency over optimality; (2) Critical performance thresholds exist at teacher availability rho >= 0.6 and accuracy omega >= 0.6, below which the framework fails catastrophically; (3) The framework achieves 159% improvement over baseline Q-learning under concept drift. These findings challenge fundamental assumptions about optimal teaching in RL and suggest potential implications for human-robot collaboration, where human preferences for safety and consistency may align with the observed agent selection behaviour, potentially informing training paradigms for safety-critical robotic applications.
comment: 10 pages, 5 figures. Accepted at ACRA 2025 (Australasian Conference on Robotics and Automation)
Tactile-based Object Retrieval From Granular Media
We introduce GEOTACT, the first robotic system capable of grasping and retrieving objects of potentially unknown shapes buried in a granular environment. While important in many applications, ranging from mining and exploration to search and rescue, this type of interaction with granular media is difficult due to the uncertainty stemming from visual occlusion and noisy contact signals. To address these challenges, we use a learning method relying exclusively on touch feedback, trained end-to-end with simulated sensor noise. We show that our problem formulation leads to the natural emergence of learned pushing behaviors that the manipulator uses to reduce uncertainty and funnel the object to a stable grasp despite spurious and noisy tactile readings. We introduce a training curriculum that bootstraps learning in simulated granular environments, enabling zero-shot transfer to real hardware. Despite being trained only on seven objects with primitive shapes, our method is shown to successfully retrieve 35 different objects, including rigid, deformable, and articulated objects with complex shapes. Videos and additional information can be found at https://jxu.ai/geotact.
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation. We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the https://steinate.github.io/logoplanner.github.io.
comment: Project page:https://steinate.github.io/logoplanner.github.io/
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundation models to specialize into reliable real-world experts.
Interaction Dataset of Autonomous Vehicles with Traffic Lights and Signs
This paper presents the development of a comprehensive dataset capturing interactions between Autonomous Vehicles (AVs) and traffic control devices, specifically traffic lights and stop signs. Derived from the Waymo Motion dataset, our work addresses a critical gap in the existing literature by providing real-world trajectory data on how AVs navigate these traffic control devices. We propose a methodology for identifying and extracting relevant interaction trajectory data from the Waymo Motion dataset, incorporating over 37,000 instances with traffic lights and 44,000 with stop signs. Our methodology includes defining rules to identify various interaction types, extracting trajectory data, and applying a wavelet-based denoising method to smooth the acceleration and speed profiles and eliminate anomalous values, thereby enhancing the trajectory quality. Quality assessment metrics indicate that trajectories obtained in this study have anomaly proportions in acceleration and jerk profiles reduced to near-zero levels across all interaction categories. By making this dataset publicly available, we aim to address the current gap in datasets containing AV interaction behaviors with traffic lights and signs. Based on the organized and published dataset, we can gain a more in-depth understanding of AVs' behavior when interacting with traffic lights and signs. This will facilitate research on AV integration into existing transportation infrastructures and networks, supporting the development of more accurate behavioral models and simulation tools.
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.
Towards Autonomous Navigation in Endovascular Interventions
Cardiovascular diseases remain the leading cause of global mortality, with minimally invasive treatment options offered through endovascular interventions. However, the precision and adaptability of current robotic systems for endovascular navigation are limited by heuristic control, low autonomy, and the absence of haptic feedback. This thesis presents an integrated AI-driven framework for autonomous guidewire navigation in complex vascular environments, addressing key challenges in data availability, simulation fidelity, and navigational accuracy. A high-fidelity, real-time simulation platform, CathSim, is introduced for reinforcement learning based catheter navigation, featuring anatomically accurate vascular models and contact dynamics. Building on CathSim, the Expert Navigation Network is developed, a policy that fuses visual, kinematic, and force feedback for autonomous tool control. To mitigate data scarcity, the open-source, bi-planar fluoroscopic dataset Guide3D is proposed, comprising more than 8,700 annotated images for 3D guidewire reconstruction. Finally, SplineFormer, a transformer-based model, is introduced to directly predict guidewire geometry as continuous B-spline parameters, enabling interpretable, real-time navigation. The findings show that combining high-fidelity simulation, multimodal sensory fusion, and geometric modelling substantially improves autonomous endovascular navigation and supports safer, more precise minimally invasive procedures.
Multiagent Systems
LongVideoAgent: Multi-Agent Reasoning with Long Videos
Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into lossy summaries or rely on limited toolsets, weakening temporal grounding and missing fine-grained cues. We propose a multi-agent framework in which a master LLM coordinates a grounding agent to localize question-relevant segments and a vision agent to extract targeted textual observations. The master agent plans with a step limit, and is trained with reinforcement learning to encourage concise, correct, and efficient multi-agent cooperation. This design helps the master agent focus on relevant clips via grounding, complements subtitles with visual detail, and yields interpretable trajectories. On our proposed LongTVQA and LongTVQA+ which are episode-level datasets aggregated from TVQA/TVQA+, our multi-agent system significantly outperforms strong non-agent baselines. Experiments also show reinforcement learning further strengthens reasoning and planning for the trained agent. Code and data will be shared at https://longvideoagent.github.io/.
When Natural Strategies Meet Fuzziness and Resource-Bounded Actions (Extended Version)
In formal strategic reasoning for Multi-Agent Systems (MAS), agents are typically assumed to (i) employ arbitrarily complex strategies, (ii) execute each move at zero cost, and (iii) operate over fully crisp game structures. These idealized assumptions stand in stark contrast with human decision making in real world environments. The natural strategies framework along with some of its recent variants, partially addresses this gap by restricting strategies to concise rules guarded by regular expressions. Yet, it still overlook both the cost of each action and the uncertainty that often characterizes human perception of facts over the time. In this work, we introduce HumanATLF, a logic that builds upon natural strategies employing both fuzzy semantics and resource bound actions: each action carries a real valued cost drawn from a non refillable budget, and atomic conditions and goals have degrees in [0,1]. We give a formal syntax and semantics, and prove that model checking is in P when both the strategy complexity k and resource budget b are fixed, NP complete if just one strategic operator over Boolean objectives is allowed, and Delta^P_2 complete when k and b vary. Moreover, we show that recall based strategies can be decided in PSPACE. We implement our algorithms in VITAMIN, an open source model checking tool for MAS and validate them on an adversarial resource aware drone rescue scenario.
Chain-of-Anomaly Thoughts with Large Vision-Language Models
Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategies show significant potential for improving performance in language tasks, the lack of inductive anomaly biases in their reasoning further steers the models towards normal interpretations. To address this, we propose Chain-of-Anomaly-Thoughts (CoAT), a multi-agent reasoning framework that introduces inductive criminal bias in the reasoning process through a final, anomaly-focused classification layer. Our method significantly improves Anomaly Detection, boosting F1-score by 11.8 p.p. on challenging low-resolution footage and Anomaly Classification by 3.78 p.p. in high-resolution videos.
comment: 2 pages, 3 figures, 1 table. Accepted for RECPAD 2025
MAR:Multi-Agent Reflexion Improves Reasoning Abilities in LLMs
LLMs have shown the capacity to improve their performance on reasoning tasks through reflecting on their mistakes, and acting with these reflections in mind. However, continual reflections of the same LLM onto itself exhibit degeneration of thought, where the LLM continues to repeat the same errors again and again even with the knowledge that its wrong. To address this problem, we instead introduce multi-agent with multi-persona debators as the method to generate reflections. Through out extensive experimentation, we've found that the leads to better diversity of in the reflections generated by the llm agent. We demonstrate an accuracy of 47% EM HotPot QA (question answering) and 82.7% on HumanEval (programming), both performances surpassing reflection with a single llm.
Towards Optimal Performance and Action Consistency Guarantees in Dec-POMDPs with Inconsistent Beliefs and Limited Communication
Multi-agent decision-making under uncertainty is fundamental for effective and safe autonomous operation. In many real-world scenarios, each agent maintains its own belief over the environment and must plan actions accordingly. However, most existing approaches assume that all agents have identical beliefs at planning time, implying these beliefs are conditioned on the same data. Such an assumption is often impractical due to limited communication. In reality, agents frequently operate with inconsistent beliefs, which can lead to poor coordination and suboptimal, potentially unsafe, performance. In this paper, we address this critical challenge by introducing a novel decentralized framework for optimal joint action selection that explicitly accounts for belief inconsistencies. Our approach provides probabilistic guarantees for both action consistency and performance with respect to open-loop multi-agent POMDP (which assumes all data is always communicated), and selectively triggers communication only when needed. Furthermore, we address another key aspect of whether, given a chosen joint action, the agents should share data to improve expected performance in inference. Simulation results show our approach outperforms state-of-the-art algorithms.
comment: 9 pages, 3 figures, 2 tables
Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification
Verifiers--functions assigning rewards to agent behavior--have been key for AI progress in domains like math and code. However, extending gains to domains without clear-cut success criteria (e.g., computer use) remains a challenge: while humans can recognize desired outcomes, translating this intuition into scalable rules is nontrivial. Multimodal Large Language Models (MLLMs) emerge as a promising solution, given their world knowledge, human-preference alignment, and reasoning skills. We evaluate MLLMs as verifiers across web navigation, computer use, and robotic manipulation, and identify a critical limitation: a strong tendency to over-validate agent behavior, a phenomenon we term agreement bias. This bias is pervasive across models, resilient to test-time scaling, and poses risks to existing methods relying on MLLM evaluations. We discuss methods to evaluate and improve MLLM verifiers and introduce Self-Grounded Verification (SGV), a lightweight method that harnesses MLLMs' own sampling mechanisms by modulating (un)conditional generation to better leverage their knowledge, alignment, and reasoning. SGV operates in two steps: first, the MLLM is elicited to generate broad priors about desired behavior, independent of the data under evaluation. Then, conditioned on self-generated priors, it reasons over and evaluates a candidate trajectory. SGV yields more human-aligned evaluations with gains of up to 25pp in failure detection, 14pp in accuracy, and benefits extending to downstream applications. In self-refinement and online supervision, SGV boosts task completion of a GUI specialist in OSWorld, a diffusion policy in robomimic, and a ReAct agent in VisualWebArena--setting a new state of the art, surpassing the previous best by 20pp. We release an updated version of VisualWebArena featuring more human-aligned evaluators, high-fidelity environment parallelism, and speedups of over 10x.
comment: Our code, models, and data are publicly available at https://mshalimay.github.io/agreement-bias-sgv/
Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology
Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.
CardAIc-Agents: A Multimodal Framework with Hierarchical Adaptation for Cardiac Care Support
Cardiovascular diseases (CVDs) remain the foremost cause of mortality worldwide, a burden worsened by a severe deficit of healthcare workers. Artificial intelligence (AI) agents have shown potential to alleviate this gap through automated detection and proactive screening, yet their clinical application remains limited by: 1) rigid sequential workflows, whereas clinical care often requires adaptive reasoning that select specific tests and, based on their results, guides personalised next steps; 2) reliance solely on intrinsic model capabilities to perform role assignment without domain-specific tool support; 3) general and static knowledge bases without continuous learning capability; and 4) fixed unimodal or bimodal inputs and lack of on-demand visual outputs when clinicians require visual clarification. In response, a multimodal framework, CardAIc-Agents, was proposed to augment models with external tools and adaptively support diverse cardiac tasks. First, a CardiacRAG agent generated task-aware plans from updatable cardiac knowledge, while the Chief agent integrated tools to autonomously execute these plans and deliver decisions. Second, to enable adaptive and case-specific customization, a stepwise update strategy was developed to dynamically refine plans based on preceding execution results, once the task was assessed as complex. Third, a multidisciplinary discussion team was proposed which was automatically invoked to interpret challenging cases, thereby supporting further adaptation. In addition, visual review panels were provided to assist validation when clinicians raised concerns. Experiments across three datasets showed the efficiency of CardAIc-Agents compared to mainstream Vision-Language Models (VLMs) and state-of-the-art agentic systems.
A Game-Theoretic Framework for Distributed Load Balancing: Static and Dynamic Game Models
Motivated by applications in job scheduling, queuing networks, and load balancing in cyber-physical systems, we develop and analyze a game-theoretic framework to balance the load among servers in static and dynamic settings. In these applications, jobs/tasks are held by selfish entities that do not want to coordinate with each other, yet the goal is to balance the load among servers in a distributed manner. First, we provide a static game formulation in which each player holds a job with a specific processing requirement and wants to schedule it fractionally among a set of heterogeneous servers to minimize its average processing time. We show that this static game is a potential game with a pure Nash equilibrium (NE). In particular, the best-response dynamics converge to such an NE after $n$ iterations, where $n$ is the number of players. Additionally, we bound the price of anarchy (PoA) of the static game in terms of game parameters. We then extend our results to a dynamic game setting, where jobs arrive and get processed, and players observe the load on the servers to decide how to schedule their jobs. In this setting, we show that if the players update their strategies using dynamic best-response, the system eventually becomes fully load-balanced and the players' strategies converge to the pure NE of the static game. In particular, we show that the convergence time scales only polynomially with respect to the game parameters. Finally, we provide numerical results to evaluate the performance of our proposed algorithms.
comment: To appear in Automatica, vol. 185, 2026
Social learning moderates the tradeoffs between efficiency, stability, and equity in group foraging
Collective foragers, from animals to robotic swarms, must balance exploration and exploitation to locate sparse resources efficiently. While social learning is known to facilitate this balance, how the range of information sharing shapes group-level outcomes remains unclear. Here, we develop a minimal collective foraging model in which individuals combine independent exploration, local exploitation, and socially guided movement. We show that foraging efficiency is maximized at an intermediate social learning range, where groups exploit discovered resources without suppressing independent discovery. This optimal regime also minimizes temporal burstiness in resource intake, reducing starvation risk. Increasing social learning range further improves equity among individuals but degrades efficiency through redundant exploitation. Introducing risky (negative) targets shifts the optimal range upward; in contrast, when penalties are ignored, randomly distributed negative cues can further enhance efficiency by constraining unproductive exploration. Together, these results reveal how local information rules regulate a fundamental trade-off between efficiency, stability, and equity, providing design principles for biological foraging systems and engineered collectives.
comment: Code and data: https://github.com/LoneStar97/social-learning-search ; additional videos of agents' movement: https://www.youtube.com/playlist?list=PLgRFM9nAjJRwoZvCGBAdCIE-BYNgPmSuV
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or implementing a small feature. However, real-world software engineering is fundamentally a long-horizon endeavor: developers must interpret high-level requirements, plan coordinated changes across many files, and evolve codebases over multiple iterations while preserving existing functionality. We introduce SWE-EVO, a benchmark that evaluates agents on this long-horizon software evolution challenge. Constructed from release notes and version histories of seven mature open-source Python projects, Tool comprises 48 evolution tasks that require agents to implement multi-step modifications spanning an average of 21 files, validated against comprehensive test suites averaging 874 tests per instance. Experiments with state-of-the-art models reveal a striking capability gap: even GPT-5 with OpenHands achieves only a 21 percent resolution rate on Tool, compared to 65 percent on the single-issue SWE-Bench Verified. This demonstrates that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a fine-grained metric that captures partial progress toward solving these complex, long-horizon tasks.
Systems and Control (EESS)
Modeling Economic Systems as Multiport Networks
In this paper, we demonstrate how multiport network theory can be used as a powerful modeling tool in economics. The critical insight is using the port concept to pair the flow of goods (the electrical current) with the agent's incentive (the voltage) in an economic interaction. By building networks of agents interacting through ports, we create models with multiple levels of abstraction, from the macro level down to the micro level. We are thereby able to model complex macroeconomic systems whose dynamical behavior is emergent from the micro level. Using the LTSpice circuit simulator, we then design and analyze a series of example systems that range in complexity from the textbook Robinson Crusoe economy to a model of an entire economy.
comment: 29 pages, 16 figures, to be submitted to Journal of the Franklin Institute
Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information
As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which is increasingly being accepted as a new line of thinking for the SE community. Moreover, mission environments are uncertain, dynamic, and mission outcomes are a direct function of how the mission assets will interact with this environment. This proves static architectures brittle and calls for analytically rigorous approaches for ME. To that end, this paper proposes an intelligent mission coordination methodology that integrates digital mission models with Reinforcement Learning (RL), that specifically addresses the need for adaptive task allocation and reconfiguration. More specifically, we are leveraging a Digital Engineering (DE) based infrastructure that is composed of a high-fidelity digital mission model and agent-based simulation; and then we formulate the mission tactics management problem as a Markov Decision Process (MDP), and employ an RL agent trained via Proximal Policy Optimization. By leveraging the simulation as a sandbox, we map the system states to actions, refining the policy based on realized mission outcomes. The utility of the RL-based intelligent mission coordinator is demonstrated through an aerial firefighting case study. Our findings indicate that the RL-based intelligent mission coordinator not only surpasses baseline performance but also significantly reduces the variability in mission performance. Thus, this study serves as a proof of concept demonstrating that DE-enabled mission simulations combined with advanced analytical tools offer a mission-agnostic framework for improving ME practice; which can be extended to more complicated fleet design and selection problems in the future from a mission-first perspective.
Expected Revenue, Risk, and Grid Impact of Bitcoin Mining: A Decision-Theoretic Perspective
Most current assessments use ex post proxies that miss uncertainty and fail to consistently capture the rapid change in bitcoin mining. We introduce a unified, ex ante statistical model that derives expected return, downside risk, and upside potential profit from the first principles of mining: Each hash is a Bernoulli trial with a Bitcoin block difficulty-based success probability. The model yields closed-form expected revenue per hash-rate unit, risk metrics in different scenarios, and upside-profit probabilities for different fleet sizes. Empirical calibration closely matches previously reported observations, yielding a unified, faithful quantification across hardware, pools, and operating conditions. This foundation enables more reliable analysis of mining impacts and behavior.
Fast Fixed-time Convergence in Nonlinear Dynamical Systems
A fast convergence in a fixed-time of solutions of nonlinear dynamical systems, for which special requirements are satisfied on the derivative of a quadratic function calculated along the solutions of the system, is proposed. The conditions for the system solutions to converge to zero and to a given region within a fixed-time are obtained. To achieve fast convergence, a negative power is applied to the derivative of a quadratic function within a specific time interval during the evolution of the system. The application of the proposed results to the design of control laws for arbitrary order linear plants using the backstepping method is considered. All the main results are accompanied by numerical modelling and a comparison of the proposed solutions with some existing ones.
Divergence Method to Stability Study of Andronov-Vyshnegradsky Problem. Hidden Oscillations
The classical Andronov-Vyshnegradsky problem, which deals with locating regions of stability and oscillations in control systems with a Watt regulator, is solved using a divergence method for studying the stability of dynamic systems. This system is studied both with and without the self-regulation effect. The exact value of the hidden boundary of the global stability region is obtained. The stability criteria for a system with a Watt regulator are also presented in the context of the solvability of a linear matrix inequality. Computer modelling shows that the system exhibits hidden oscillations when the self-regulation effect is present and when it is not. The conditions for computing the hidden boundary of global stability are determined by three parameters in the Watt regulator model.
Contingency Model-based Control (CMC) for Communicationless Cooperative Collision Avoidance in Robot Swarms
Cooperative collision avoidance between robots in swarm operations remains an open challenge. Assuming a decentralized architecture, each robot is responsible for making its own control decisions, including motion planning. To this end, most existing approaches mostly rely some form of (wireless) communication between the agents of the swarm. In reality, however, communication is brittle. It may be affected by latency, further delays and packet losses, transmission faults, and is subject to adversarial attacks, such as jamming or spoofing. This paper proposes Contingency Model-based Control (CMC) as a communicationless alternative. It follows the implicit cooperation paradigm, under which the design of the robots is based on consensual (offline) rules, similar to traffic rules. They include the definition of a contingency trajectory for each robot, and a method for construction of mutual collision avoidance constraints. The setup is shown to guarantee the recursive feasibility and collision avoidance between all swarm members in closed-loop operation. Moreover, CMC naturally satisfies the Plug \& Play paradigm, i.e., for new robots entering the swarm. Two numerical examples demonstrate that the collision avoidance guarantee is intact and that the robot swarm operates smoothly under the CMC regime.
A Tutorial to Multirate Extended Kalman Filter Design for Monitoring of Agricultural Anaerobic Digestion Plants
In many applications of biotechnology, measurements are available at different sampling rates, e.g., due to online sensors and offline lab analysis. Offline measurements typically involve time delays that may be unknown a priori due to the underlying laboratory procedures. This multirate (MR) setting poses a challenge to Kalman filtering, where conventionally measurement data is assumed to be available on an equidistant time grid and without delays. The present study derives the MR version of an extended Kalman filter (EKF) based on sample state augmentation, and applies it to the anaerobic digestion (AD) process in a simulative agricultural setting. The performance of the MR-EKF is investigated for various scenarios, i.e., varying delay lengths, measurement noise levels, plant-model mismatch (PMM), and initial state error. Provided with an adequate tuning, the MR-EKF could be demonstrated to reliably estimate the process state, to appropriately fuse delayed offline measurements, and to smooth noisy online measurements well. Because of the sample state augmentation approach, the delay length of offline measurements does not critically impair state estimation performance, provided observability is not lost during the delays. Poor state initialization and PMM affect convergence more than measurement noise levels. Further, selecting an appropriate tuning was found to be critically important for successful application of the MR-EKF, for which a systematic approach is presented. This study provides implementation guidance for practitioners aiming at successfully applying state estimation for multirate systems. It thereby contributes to develop demand-driven operation of biogas plants, which may aid in stabilizing a renewable electricity grid.
comment: extended appendix
Data-based Moving Horizon Estimation under Irregularly Measured Data
In this work, we introduce a sample- and data-based moving horizon estimation framework for linear systems. We perform state estimation in a sample-based fashion in the sense that we assume to have only few, irregular output measurements available. This setting is encountered in applications where measuring is expensive or time-consuming. Furthermore, the state estimation framework does not rely on a standard mathematical model, but on an implicit system representation based on measured data. We prove sample-based practical robust exponential stability of the proposed estimator under mild assumptions. Furthermore, we apply the proposed scheme to estimate the states of a gastrointestinal tract absorption system.
comment: 8 pages
Inference in Latent Force Models Using Optimal State Estimation
Latent force models, a class of hybrid modeling approaches, integrate physical knowledge of system dynamics with a latent force - an unknown, unmeasurable input modeled as a Gaussian process. In this work, we introduce two optimal state estimation frameworks to reconstruct the latent forces and to estimate the states. In contrast to state-of-the-art approaches, the designed estimators enable the consideration of system-inherent constraints. Finally, the performance of the novel frameworks is investigated in several numerical examples. In particular, we demonstrate the performance of the new framework in a real-world biomedical example - the hypothalamic-pituitary-thyroid axis - using hormone measurements.
comment: 8 pages
Sliding Mode Control for a Parabolic-Elliptic PDE System with Boundary Perturbation
In this paper, we address the robustness of parabolic-elliptic systems under boundary control. A sliding mode control strategy is proposed to reject matched perturbations. The stability analysis establishes finite-time convergence of the sliding manifold and exponential stability of the closed-loop system. Since the closed-loop system is discontinuous, we also prove its well-posedness. A numerical example is provided to validate the effectiveness of the proposed approach.
Finite-Time Control Based on Differential Flatness for Wheeled Mobile Robots with Experimental Validation
A robust tracking control strategy is designed to empower wheeled mobile robots (WMRs) to track predetermined routes while operating in diverse fields and encountering disturbances like strong winds or uneven path conditions, which affect tracking performance. Ensuring the applicability of this tracking method in real-world scenarios is essential. To accomplish this, the WMR model is initially transformed into a linear canonical form by leveraging the differential flatness of its kinematic model, facilitating controller design. Subsequently, a novel integral nonlinear hyperplane-based sliding mode control (INH-SMC) technique is proposed for WMR under disturbances. The stability of the technique is analyzed and verified. Finally, its practical viability is demonstrated through a comparative real-world indoor experiment on a TurtleBot3 WMR subjected to disturbances, confirming the feasibility and efficacy of the proposed approach.
Robust safety design for strict-feedback nonlinear systems via observer-based linear time varying feedback
This paper develops a robust safety-critical control method for nonlinear strictfeedback systems with mismatched disturbances. Using a state transformation and a linear time-varying disturbance observer, the system is converted into a form that enables safe control design. The approach ensures forward invariance of the safety set and also applies to disturbancefree systems. Safety is proven for all cases, and a numerical example illustrates the results.
Joint Design of Embedded Index Coding and Beamforming for MIMO-based Distributed Computing via Multi-Agent Reinforcement Learning
In distributed computing systems, reducing the communication load during the data shuffling phase is a critical challenge, as excessive inter-node transmissions are a major performance bottleneck. One promising approach to alleviate this burden is Embedded Index Coding (EIC), which exploits cached data at user nodes to encode transmissions more efficiently. However, most prior work on EIC has focused on minimizing code length in wired, error-free environments-an objective often suboptimal for wireless multiple-input multiple-output (MIMO) systems, where channel conditions and spatial multiplexing gains must be considered. This paper investigates the joint design of EIC and transmit beamforming in MIMO systems to minimize total transmission time, an NP-hard problem. We first present a conventional optimization method that determines the optimal EIC via exhaustive search. To address its prohibitive complexity and adapt to dynamic wireless environments, we propose a novel, low-complexity multi-agent reinforcement learning (MARL) framework. The proposed framework enables decentralized agents to act on local observations while effectively managing the hybrid action space of discrete EIC selection and continuous beamforming design. Simulation results demonstrate that the proposed MARL-based approach achieves near-optimal performance with significantly reduced complexity, underscoring its effectiveness and practicality for real-world wireless systems.
From Dissipativity Property to Data-Driven GAS Certificate of Degree-One Homogeneous Networks with Unknown Topology
In this work, we propose a data-driven divide and conquer strategy for the stability analysis of interconnected homogeneous nonlinear networks of degree one with unknown models and a fully unknown topology. The proposed scheme leverages joint dissipativity-type properties of subsystems described by storage functions, while providing a stability certificate over unknown interconnected networks. In our data-driven framework, we begin by formulating the required conditions for constructing storage functions as a robust convex program (RCP). Given that unknown models of subsystems are integrated into one of the constraints of the RCP, we collect data from trajectories of each unknown subsystem and provide a scenario convex program (SCP) that aligns with the original RCP. We solve the SCP as a linear program and construct a storage function for each subsystem with unknown dynamics. Under some newly developed data-driven compositionality conditions, we then construct a Lyapunov function for the fully unknown interconnected network utilizing storage functions derived from data of individual subsystems. We show that our data-driven {divide and conquer strategy} provides correctness guarantees (as opposed to probabilistic confidence) while significantly mitigating the sample complexity problem existing in data-driven approaches. To illustrate the effectiveness of our proposed results, we apply our approaches to three different case studies involving interconnected homogeneous (nonlinear) networks with unknown models. We collect data from trajectories of unknown subsystems and verify the global asymptotic stability (GAS) of the interconnected system with a guaranteed confidence.
An Optimal Policy for Learning Controllable Dynamics by Exploration
Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the general form of an optimal policy for learning controllable dynamics in an unknown environment by exploring over a limited time horizon. This policy is simple to implement and efficient to compute, and allows an agent to ``learn by exploring" as it maximizes its information gain in a greedy fashion by selecting controls from a constraint set that changes over time during exploration. We give a simple parameterization for the set of controls, and present an algorithm for finding an optimal policy. The reason for this policy is due to the existence of certain types of states that restrict control of the dynamics; such as transient states, absorbing states, and non-backtracking states. We show why the occurrence of these states makes a non-stationary policy essential for achieving optimal exploration. Six interesting examples of controllable dynamics are treated in detail. Policy optimality is demonstrated using counting arguments, comparing with suboptimal policies, and by making use of a sequential improvement property from dynamic programming.
$\mathscr{H}_2$ Model Reduction for Augmented Model of Linear Non-Markovian Quantum Systems
An augmented system model provides an effective way to model non-Markovian quantum systems, which is useful in filtering and control for this class of systems. However, since a large number of ancillary quantum oscillators representing internal modes of a non-Markovian environment directly interact with the principal system in these models, the dimension of the augmented system may be very large causing significant computational burden in designing filters and controllers. In this context, this paper proposes an $\mathscr{H}_2$ model reduction method for the augmented model of linear non-Markovian quantum systems. We first establish necessary and sufficient conditions for the physical realizability of the augmented model of linear non-Markovian quantum systems, which are more stringent than those for Markovian quantum systems. However, these physical realizability conditions of augmented system model pose non-convex constrains in the optimization problem of model reduction, which makes the problem different from the corresponding classical model reduction problem. To solve the problem, we derive necessary conditions for determining the input matrix in the reduced model, with which a theorem for designing the system matrix of the ancillary system in the reduced system is proved. Building on this, we convert the nonlinear equality constraints into inequality constraints so that a semidefinite programming algorithm can be developed to solve the optimization problem for model reduction. A numerical example of a two-mode linear quantum system driven by three internal modes of a non-Markovian environment validates the effectiveness of our method.
comment: 15 pages, 3 figures
HeylandCircle: A Computational Framework for the Geometric Reconstruction of the Heyland Circle Diagram
The Heyland circle diagram is a classical graphical tool for representing the steady-state behavior of induction machines using no-load and blocked-rotor test data. While widely used in alternating-current machinery texts, the diagram is typically presented as a hand-constructed aid and lacks a standardized computational formulation. This paper presents HeylandCircle, a computational framework that reconstructs the classical Heyland circle diagram directly from standard test parameters. The framework formalizes the traditional geometric construction as a deterministic, reproducible sequence of geometric operations, establishing a clear mapping between measured data, fixed geometric objects, and steady-state operating points. Quantities such as power factor, slip, output power, torque, and efficiency are obtained through explicit geometric relationships on the constructed diagram. Validation using a representative textbook example demonstrates close agreement with classical results. The framework provides a computational realization of the traditional Heyland diagram suitable for instruction, analysis, and systematic extension.
comment: 8 pages, 1 figure
Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis
This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices from herd-level operational records. The methodology employs a novel, end-to-end pipeline utilizing a Variational Autoencoder (VAE) to augment Irish Cattle Breeding Federation (ICBF) datasets, preserving joint distributions while mitigating sparsity. A first-ever pillar-based scoring formulation is derived via Principal Component Analysis, identifying Reproductive Efficiency, Genetic Management, Herd Health, and Herd Management, to construct weighted composite indices. These indices are modelled using a novel STGNN architecture that explicitly encodes geographic dependencies and non-linear temporal dynamics to generate multi-year forecasts for 2026-2030.
Extragradient methods with complexity guarantees for hierarchical variational inequalities
In the framework of a real Hilbert space we consider the problem of approaching solutions to a class of hierarchical variational inequality problems, subsuming several other problem classes including certain mathematical programs under equilibrium constraints, constrained min-max problems, hierarchical game problems, optimal control under VI constraints, and simple bilevel optimization problems. For this general problem formulation, we establish rates of convergence in terms of suitably constructed gap functions, measuring feasibility gaps and optimality gaps. We present worst-case iteration complexity results on both levels of the variational problem, as well as weak convergence under a geometric weak sharpness condition on the lower level solution set. Our results match and improve the state of the art in terms of their iteration complexity and the generality of the problem formulation.
X-GridAgent: An LLM-Powered Agentic AI System for Assisting Power Grid Analysis
The growing complexity of power system operations has created an urgent need for intelligent, automated tools to support reliable and efficient grid management. Conventional analysis tools often require significant domain expertise and manual effort, which limits their accessibility and adaptability. To address these challenges, this paper presents X-GridAgent, a novel large language model (LLM)-powered agentic AI system designed to automate complex power system analysis through natural language queries. The system integrates domain-specific tools and specialized databases under a three-layer hierarchical architecture comprising planning, coordination, and action layers. This architecture offers high flexibility and adaptability to previously unseen tasks, while providing a modular and extensible framework that can be readily expanded to incorporate new tools, data sources, or analytical capabilities. To further enhance performance, we introduce two novel algorithms: (1) LLM-driven prompt refinement with human feedback, and (2) schema-adaptive hybrid retrieval-augmented generation (RAG) for accurate information retrieval from large-scale structured grid datasets. Experimental evaluations across a variety of user queries and power grid cases demonstrate the effectiveness and reliability of X-GridAgent in automating interpretable and rigorous power system analysis.
Fixed-time control with prescribed performance for path following of underwater gliders
Underwater gliders are increasingly deployed in challenging missions - such as hurricane-season observations and long-endurance environmental monitoring - where strong currents and turbulence pose significant risks to navigation safety. To address these practical challenges, this paper presents a fixed-time prescribed performance control scheme for the 3D path following of underwater gliders subject to model uncertainties and environmental disturbances. The primary contribution is the integration of a finite-time performance function within a fixed-time control framework. This synthesis ensures that the tracking errors are constrained within prescribed performance bounds and converge to a compact set within a fixed time, independent of initial conditions. A second key contribution is the development of a fixed-time sliding mode disturbance observer that provides accurate finite-time estimation of lumped disturbances, enhancing the system's robustness. Integrated with an iLOS guidance law, the proposed controller enables precise and safe waypoint following. Numerical simulations demonstrate that the proposed method outperforms conventional sliding mode and prescribed performance controllers in tracking accuracy, convergence speed, and control effort smoothness, validating its efficacy for robust underwater navigation.
comment: 22 pages, 13 figures, 2 tables, Submitted to Ocean Engineering
Optimized Rolling Allocation of Outages for Damage Assesment
Natural disasters often inflict severe damage on distribution grids. Rapid, reliable damage assessment (DA) is essential for storm restoration, yet most optimization work targets repair dispatch after faults are identified. This paper presents a production, rolling horizon DA crew allocation system deployed across multiple U.S. states in Eversource Energy's service territory and used during live storms. The method implements a sequential k-job assignment policy per available crew, executed on a fixed cadence and on operators' control. The objective jointly prioritizes critical facilities and customer impact while controlling travel time on the actual road network via the Google Maps API. A key constraint is the absence of live crew GPS; we infer crew locations from the last confirmed DA site and robustify travel estimates for staleness, yielding stable recommendations without continuous tracking. The operator remains in the loop with controls to limit churn and to publish a feasible plan. Using data from the March 7 New Hampshire storm with 90 moderate outages and seven DA crews, we observe shorter time to first assessment, fewer revisits with reduced distance traveled. To our knowledge, this is among the first multi-state enterprise integrated deployments to treat DA crews as a first-class optimized resource in storm restoration.
Adaptive Real-Time Scheduling Algorithms for Embedded Systems
Embedded systems are becoming more in demand to work in dynamic and uncertain environments, and being confined to the strong requirements of real-time. Conventional static scheduling models usually cannot cope with runtime modification in workload, resource availability, or system updates. This brief survey covers the area of feedback-based control (e.g., Feedback Control Scheduling) and interdependence between tasks (e.g., Symbiotic Scheduling of Periodic Tasks) models. It also borders on predictive methods and power management, combining methods based on Dynamic Voltage and Frequency Scaling (DVFS). In this paper, key mechanisms are briefly summarized, influencing trade-offs relating to adaptivity/predictability, typical metrics of evaluation, and ongoing problems, especially in situations where safety is a critical factor, giving a succinct and easy-to-understand introduction to researchers and practitioners who have to cope with the changing environment of adaptive real-time systems.
An Equivalent and Unified Virtual Battery Modeling Framework for Flexibility Characterization of Building HVAC Systems
The heating, ventilation and air-conditioning (HVAC) system dominates building's energy consumption and meanwhile exhibits substantial operational flexibility that can be exploited for providing grid services. However, the goal is largely hindered by the difficulty to characterize the system's operating flexibility due to the complex building thermal dynamics, system operating limits and human comfort constraints. To address this challenge, this paper develops an unified virtual battery (VB) modeling framework for characterizing the operating flexibility of both single-zone and multi-zone building HVAC systems, enabling flexible buildings to function like virtual batteries. Specifically, a physically meaningful representation state is first identified to represent building thermal conditions under thermal comfort constraints and a VB model is then established for characterizing the operating flexibility of single-zone HVAC systems. We subsequently extend the VB modeling framework to multi-zone HVAC systems and establish a set of zone-level VB models to characterize the building's zonal operating flexibility. We further develop a systematic method to aggregate the VB models into a low-order and low-complexity aggregated VB model, significantly reducing model and computational complexity. We demonstrate the VB model through demand response (DR) applications and conclude that the VB model can well capture the operating flexibility of building HVAC systems and enable effective DR participation. The DR strategies obtained from the VB model can be efficiently decomposed to zone-level control inputs for maintaining human thermal comfort while achieving near-optimal operation cost.
comment: 13 pages, 6 figures
From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
Background: The House-Tree-Person (HTP) drawing test, introduced by John Buck in 1948, remains a widely used projective technique in clinical psychology. However, it has long faced challenges such as heterogeneous scoring standards, reliance on examiners subjective experience, and a lack of a unified quantitative coding system. Results: Quantitative experiments showed that the mean semantic similarity between Multimodal Large Language Model (MLLM) interpretations and human expert interpretations was approximately 0.75 (standard deviation about 0.05). In structurally oriented expert data sets, this similarity rose to 0.85, indicating expert-level baseline comprehension. Qualitative analyses demonstrated that the multi-agent system, by integrating social-psychological perspectives and destigmatizing narratives, effectively corrected visual hallucinations and produced psychological reports with high ecological validity and internal coherence. Conclusions: The findings confirm the potential of multimodal large models as standardized tools for projective assessment. The proposed multi-agent framework, by dividing roles, decouples feature recognition from psychological inference and offers a new paradigm for digital mental-health services. Keywords: House-Tree-Person test; multimodal large language model; multi-agent collaboration; cosine similarity; computational psychology; artificial intelligence
comment: 16 pages, 8 figures
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic Programming
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
comment: 9 pages, 9 figures
Turbulent Multiple-Scattering Channel Modeling for Ultraviolet Communications: A Monte-Carlo Integration Approach
Modeling of multiple-scattering channels in atmospheric turbulence is essential for the performance analysis of long-distance non-line-of-sight (NLOS) ultraviolet (UV) communications. Existing works on the turbulent channel modeling for NLOS UV communications either focused on single-scattering cases or estimate the turbulent fluctuation effect in an unreliable way based on Monte-Carlo simulation (MCS) approach. In this paper, we establish a comprehensive turbulent multiple-scattering channel model by using a more efficient Monte-Carlo integration (MCI) approach for NLOS UV communications, where both the scattering, absorption, and turbulence effects are considered. Compared with the MCS approach, the MCI approach is more interpretable for estimating the turbulent fluctuation. To achieve this, we first introduce the scattering, absorption, and turbulence effects for NLOS UV communications in turbulent channels. Then we propose the estimation methods based on MCI approach for estimating both the turbulent fluctuation and the distribution of turbulent fading coefficient. Numerical results demonstrate that the turbulence-induced scattering effect can always be ignored for typical UV communication scenarios. Besides, the turbulent fluctuation will increase as either the communication distance increases or the zenith angle decreases, which is compatible with existing experimental results and also with our experimental results. Moreover, we demonstrate numerically that the distribution of the turbulent fading coefficient for UV multiple-scattering channels under all turbulent conditions can be approximated as log-normal distribution; and we also demonstrate both numerically and experimentally that the turbulent fading can be approximated as a Gaussian distribution under weak turbulence.
comment: 28 pages,9 figures
Learning Stack-of-Tasks Management for Redundant Robots
This paper presents a novel framework for automatically learning complete Stack-of-Tasks (SoT) controllers for redundant robotic systems, including task priorities, activation logic, and control parameters. Unlike classical SoT pipelines-where task hierarchies are manually defined and tuned-our approach optimizes the full SoT structure directly from a user-specified cost function encoding intuitive preferences such as safety, precision, manipulability, or execution speed. The method combines Genetic Programming with simulation-based evaluation to explore both discrete (priority order, task activation) and continuous (gains, trajectory durations) components of the controller. We validate the framework on a dual-arm mobile manipulator (the ABB mobile-YuMi research platform), demonstrating robust convergence across multiple cost definitions, automatic suppression of irrelevant tasks, and strong resilience to distractors. Learned SoTs exhibit expert-like hierarchical structure and adapt naturally to multi-objective trade-offs. Crucially, all controllers transfer from Gazebo simulation to the real robot, achieving safe and precise motion without additional tuning. Experiments in static and dynamic environments show reliable obstacle avoidance, high tracking accuracy, and predictable behavior in the presence of humans. The proposed method provides an interpretable and scalable alternative to manual SoT design, enabling rapid, user-driven generation of task execution hierarchies for complex robotic systems.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
Integrating Conductor Health into Dynamic Line Rating and Unit Commitment under Uncertainty
Dynamic line rating (DLR) enables greater utilization of existing transmission lines by leveraging real-time weather data. However, the elevated temperature operation (ETO) of conductors under DLR is often overlooked, despite its long-term impact on conductor health. This paper addresses this issue by 1) quantifying depreciation costs associated with ETO and 2) proposing a Conductor Health-Aware Unit Commitment (CHA-UC) that internalizes these costs in operational decisions. The CHA-UC incorporates a robust linear approximation of conductor temperature and integration of expected depreciation costs due to hourly ETO into the objective function. Case studies on the Texas 123-bus backbone test system using NOAA weather data demonstrate that the proposed CHA-UC model reduces the total cost by 0.8% and renewable curtailment by 84%compared to static line rating (SLR), while conventional DLR operation without risk consideration resulted in higher costs due to excessive ETO. Further analysis of the commitment decisions and the line temperature statistics confirms that the CHA-UC achieves safer line flows by shifting generator commitments. Finally, we examine the emergent correlation between wind generation and DLR forecast errors, and show that CHA-UC adaptively manages this effect by relaxing flows for risk-hedging conditions while tightening flows for risk-amplifying ones.
Learning Enhanced Ensemble Filters
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.
comment: Accepted by the Journal of Computational Physics
Safe Online Control-Informed Learning
This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constraints into an online learning process. It employs an extended Kalman filter to incrementally update system parameters in real time, enabling robust and data-efficient adaptation under uncertainty. A softplus barrier function enforces constraint satisfaction during learning and control while eliminating the dependence on high-quality initial guesses. Theoretical analysis establishes convergence and safety guarantees, and the framework's effectiveness is demonstrated on cart-pole and robot-arm systems.
Gait Transitions in Load-Pulling Quadrupeds: Insights from Sled Dogs and a Minimal SLIP Model
Quadrupedal animals employ diverse galloping strategies to optimize speed, stability, and energy efficiency. However, the biomechanical mechanisms that enable adaptive gait transitions during high-speed locomotion under load remain poorly understood. In this study, we present new empirical and modeling insights into the biomechanics of load-pulling quadrupeds, using sprint sled dogs as a model system. High-speed video and force recordings reveal that sled dogs often switch between rotary and transverse galloping gaits within just a few strides and without any observable changes in speed, stride duration, or terrain, providing clear evidence of locomotor multistability during high-speed load-pulling. To investigate the mechanical basis of these transitions, a physics-based quadrupedal Spring-Loaded Inverted Pendulum model with hybrid dynamics and prescribed footfall sequences to reproduce the asymmetric galloping patterns observed in racing sled dogs. Through trajectory optimization, we replicate experimentally observed gait sequences and identify swing-leg stiffness modulation as a key control mechanism for inducing transitions. This work provides a much-needed biomechanical perspective on high-speed animal draft and establishes a modeling framework for studying locomotion in pulling quadrupeds, with implications for both biological understanding and the design of adaptive legged systems.
Network-Realised Model Predictive Control Part II: Distributed Constraint Management
A two-layer control architecture is proposed, which promotes scalable implementations for model predictive controllers. The top layer acts as both reference governor for the bottom layer, and as a feedback controller for the regulated network. By employing set-based methods, global theoretical guarantees are obtained by enforcing local constraints upon the network's variables and upon those of the first layer's implementation. The proposed technique offers recursive feasibility guarantees as one of its central features, and the expressions of the resulting predictive strategies bear a striking resemblance to classical formulations from model predictive control literature, allowing for flexible and easily customisable implementations.
comment: 20 pages, 9 figures, 4 tables
Demonstrating Integrative, Scalable and Extensible Modeling of Hydrological Systems with Model-Based Systems Engineering and Hetero-functional Graph Theory
Worsening global challenges demand solutions grounded in a systems-level understanding of coupled social and environmental dynamics. Existing environmental models encode extensive knowledge of individual systems, yet much of this information remains isolated within domain-specific formulations and data structures. This paper introduces a unified modeling framework that formalizes information from existing process models by asserting real-world physical relationships onto their underlying mathematical representations. By integrating Model-Based Systems Engineering (MBSE) with Hetero-functional Graph Theory (HFGT), the framework establishes a consistent ontology that explicitly defines system structure and behavior. Illustrative hydrological examples demonstrate implementation of the methodology, showing how relationships embedded in conventional process models can be made explicit and scalable. While simplified, these examples provide a first step toward applying the approach to complex environmental systems. More broadly, the methodology offers a foundation for future modeling of systems of systems within a shared computational architecture.
Modeling skiers flows via Wardrope equilibrium in closed capacitated networks
We propose an equilibrium model of ski resorts where users are assigned to cycles in a closed network. As queues form on lifts with limited capacity, we derive an efficient way to find waiting times via convex optimization. The equilibrium problem is formulated as a variational inequality, and numerical experiments show that it can be solved using standard algorithms.
comment: Corrected the statement about the uniqueness of waiting times: lift waiting times are not necessarily unique, but cycle waiting times are
Robotics
Zero-shot Reconstruction of In-Scene Object Manipulation from Video
We build the first system to address the problem of reconstructing in-scene object manipulation from a monocular RGB video. It is challenging due to ill-posed scene reconstruction, ambiguous hand-object depth, and the need for physically plausible interactions. Existing methods operate in hand centric coordinates and ignore the scene, hindering metric accuracy and practical use. In our method, we first use data-driven foundation models to initialize the core components, including the object mesh and poses, the scene point cloud, and the hand poses. We then apply a two-stage optimization that recovers a complete hand-object motion from grasping to interaction, which remains consistent with the scene information observed in the input video.
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry
Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across perception, localization, mapping, and planning modules. Recent end-to-end learning methods map raw visual observations directly to control signals or trajectories, promising greater performance and efficiency in open-world settings. However, most prior end-to-end approaches still rely on separate localization modules that depend on accurate sensor extrinsic calibration for self-state estimation, thereby limiting generalization across embodiments and environments. We introduce LoGoPlanner, a localization-grounded, end-to-end navigation framework that addresses these limitations by: (1) finetuning a long-horizon visual-geometry backbone to ground predictions with absolute metric scale, thereby providing implicit state estimation for accurate localization; (2) reconstructing surrounding scene geometry from historical observations to supply dense, fine-grained environmental awareness for reliable obstacle avoidance; and (3) conditioning the policy on implicit geometry bootstrapped by the aforementioned auxiliary tasks, thereby reducing error propagation.We evaluate LoGoPlanner in both simulation and real-world settings, where its fully end-to-end design reduces cumulative error while metric-aware geometry memory enhances planning consistency and obstacle avoidance, leading to more than a 27.3\% improvement over oracle-localization baselines and strong generalization across embodiments and environments. The code and models have been made publicly available on the \href{https://steinate.github.io/logoplanner.github.io/}{project page}.
comment: Project page:https://steinate.github.io/logoplanner.github.io/
Learning Generalizable Hand-Object Tracking from Synthetic Demonstrations
We present a system for learning generalizable hand-object tracking controllers purely from synthetic data, without requiring any human demonstrations. Our approach makes two key contributions: (1) HOP, a Hand-Object Planner, which can synthesize diverse hand-object trajectories; and (2) HOT, a Hand-Object Tracker that bridges synthetic-to-physical transfer through reinforcement learning and interaction imitation learning, delivering a generalizable controller conditioned on target hand-object states. Our method extends to diverse object shapes and hand morphologies. Through extensive evaluations, we show that our approach enables dexterous hands to track challenging, long-horizon sequences including object re-arrangement and agile in-hand reorientation. These results represent a significant step toward scalable foundation controllers for manipulation that can learn entirely from synthetic data, breaking the data bottleneck that has long constrained progress in dexterous manipulation.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository https://github.com/kdjebko/lelar-in-orbit-data
LIMOncello: Revisited IKFoM on the SGal(3) Manifold for Fast LiDAR-Inertial Odometry
This work introduces LIMOncello, a tightly coupled LiDAR-Inertial Odometry system that models 6-DoF motion on the $\mathrm{SGal}(3)$ manifold within an iterated error-state Kalman filter backend. Compared to state representations defined on $\mathrm{SO}(3)\times\mathbb{R}^6$, the use of $\mathrm{SGal}(3)$ provides a coherent and numerically stable discrete-time propagation model that helps limit drift in low-observability conditions. LIMOncello also includes a lightweight incremental i-Octree mapping backend that enables faster updates and substantially lower memory usage than incremental kd-tree style map structures, without relying on locality-restricted search heuristics. Experiments on multiple real-world datasets show that LIMOncello achieves competitive accuracy while improving robustness in geometrically sparse environments. The system maintains real-time performance with stable memory growth and is released as an extensible open-source implementation at https://github.com/CPerezRuiz335/LIMOncello.
Results of the 2024 CommonRoad Motion Planning Competition for Autonomous Vehicles
Over the past decade, a wide range of motion planning approaches for autonomous vehicles has been developed to handle increasingly complex traffic scenarios. However, these approaches are rarely compared on standardized benchmarks, limiting the assessment of relative strengths and weaknesses. To address this gap, we present the setup and results of the 4th CommonRoad Motion Planning Competition held in 2024, conducted using the CommonRoad benchmark suite. This annual competition provides an open-source and reproducible framework for benchmarking motion planning algorithms. The benchmark scenarios span highway and urban environments with diverse traffic participants, including passenger cars, buses, and bicycles. Planner performance is evaluated along four dimensions: efficiency, safety, comfort, and compliance with selected traffic rules. This report introduces the competition format and provides a comparison of representative high-performing planners from the 2023 and 2024 editions.
REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation
Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the π_{0}, π_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm
comment: 9 pages, 10 figures
MaP-AVR: A Meta-Action Planner for Agents Leveraging Vision Language Models and Retrieval-Augmented Generation
Embodied robotic AI systems designed to manage complex daily tasks rely on a task planner to understand and decompose high-level tasks. While most research focuses on enhancing the task-understanding abilities of LLMs/VLMs through fine-tuning or chain-of-thought prompting, this paper argues that defining the planned skill set is equally crucial. To handle the complexity of daily environments, the skill set should possess a high degree of generalization ability. Empirically, more abstract expressions tend to be more generalizable. Therefore, we propose to abstract the planned result as a set of meta-actions. Each meta-action comprises three components: {move/rotate, end-effector status change, relationship with the environment}. This abstraction replaces human-centric concepts, such as grasping or pushing, with the robot's intrinsic functionalities. As a result, the planned outcomes align seamlessly with the complete range of actions that the robot is capable of performing. Furthermore, to ensure that the LLM/VLM accurately produces the desired meta-action format, we employ the Retrieval-Augmented Generation (RAG) technique, which leverages a database of human-annotated planning demonstrations to facilitate in-context learning. As the system successfully completes more tasks, the database will self-augment to continue supporting diversity. The meta-action set and its integration with RAG are two novel contributions of our planner, denoted as MaP-AVR, the meta-action planner for agents composed of VLM and RAG. To validate its efficacy, we design experiments using GPT-4o as the pre-trained LLM/VLM model and OmniGibson as our robotic platform. Our approach demonstrates promising performance compared to the current state-of-the-art method. Project page: https://map-avr.github.io/.
comment: 8 pages, 10 figures, This work was completed in December 2024
Sign Language Recognition using Parallel Bidirectional Reservoir Computing
Sign language recognition (SLR) facilitates communication between deaf and hearing communities. Deep learning based SLR models are commonly used but require extensive computational resources, making them unsuitable for deployment on edge devices. To address these limitations, we propose a lightweight SLR system that combines parallel bidirectional reservoir computing (PBRC) with MediaPipe. MediaPipe enables real-time hand tracking and precise extraction of hand joint coordinates, which serve as input features for the PBRC architecture. The proposed PBRC architecture consists of two echo state network (ESN) based bidirectional reservoir computing (BRC) modules arranged in parallel to capture temporal dependencies, thereby creating a rich feature representation for classification. We trained our PBRC-based SLR system on the Word-Level American Sign Language (WLASL) video dataset, achieving top-1, top-5, and top-10 accuracies of 60.85%, 85.86%, and 91.74%, respectively. Training time was significantly reduced to 18.67 seconds due to the intrinsic properties of reservoir computing, compared to over 55 minutes for deep learning based methods such as Bi-GRU. This approach offers a lightweight, cost-effective solution for real-time SLR on edge devices.
Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework
An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.
comment: 37 pages, 16 figures, currently under review
Real2Edit2Real: Generating Robotic Demonstrations via a 3D Control Interface
Recent progress in robot learning has been driven by large-scale datasets and powerful visuomotor policy architectures, yet policy robustness remains limited by the substantial cost of collecting diverse demonstrations, particularly for spatial generalization in manipulation tasks. To reduce repetitive data collection, we present Real2Edit2Real, a framework that generates new demonstrations by bridging 3D editability with 2D visual data through a 3D control interface. Our approach first reconstructs scene geometry from multi-view RGB observations with a metric-scale 3D reconstruction model. Based on the reconstructed geometry, we perform depth-reliable 3D editing on point clouds to generate new manipulation trajectories while geometrically correcting the robot poses to recover physically consistent depth, which serves as a reliable condition for synthesizing new demonstrations. Finally, we propose a multi-conditional video generation model guided by depth as the primary control signal, together with action, edge, and ray maps, to synthesize spatially augmented multi-view manipulation videos. Experiments on four real-world manipulation tasks demonstrate that policies trained on data generated from only 1-5 source demonstrations can match or outperform those trained on 50 real-world demonstrations, improving data efficiency by up to 10-50x. Moreover, experimental results on height and texture editing demonstrate the framework's flexibility and extensibility, indicating its potential to serve as a unified data generation framework.
TwinAligner: Visual-Dynamic Alignment Empowers Physics-aware Real2Sim2Real for Robotic Manipulation
The robotics field is evolving towards data-driven, end-to-end learning, inspired by multimodal large models. However, reliance on expensive real-world data limits progress. Simulators offer cost-effective alternatives, but the gap between simulation and reality challenges effective policy transfer. This paper introduces TwinAligner, a novel Real2Sim2Real system that addresses both visual and dynamic gaps. The visual alignment module achieves pixel-level alignment through SDF reconstruction and editable 3DGS rendering, while the dynamic alignment module ensures dynamic consistency by identifying rigid physics from robot-object interaction. TwinAligner improves robot learning by providing scalable data collection and establishing a trustworthy iterative cycle, accelerating algorithm development. Quantitative evaluations highlight TwinAligner's strong capabilities in visual and dynamic real-to-sim alignment. This system enables policies trained in simulation to achieve strong zero-shot generalization to the real world. The high consistency between real-world and simulated policy performance underscores TwinAligner's potential to advance scalable robot learning. Code and data will be released on https://twin-aligner.github.io
OMP: One-step Meanflow Policy with Directional Alignment
Robot manipulation, a key capability of embodied AI, has turned to data-driven generative policy frameworks, but mainstream approaches like Diffusion Models suffer from high inference latency and Flow-based Methods from increased architectural complexity. While simply applying meanFlow on robotic tasks achieves single-step inference and outperforms FlowPolicy, it lacks few-shot generalization due to fixed temperature hyperparameters in its Dispersive Loss and misaligned predicted-true mean velocities. To solve these issues, this study proposes an improved MeanFlow-based Policies: we introduce a lightweight Cosine Loss to align velocity directions and use the Differential Derivation Equation (DDE) to optimize the Jacobian-Vector Product (JVP) operator. Experiments on Adroit and Meta-World tasks show the proposed method outperforms MP1 and FlowPolicy in average success rate, especially in challenging Meta-World tasks, effectively enhancing few-shot generalization and trajectory accuracy of robot manipulation policies while maintaining real-time performance, offering a more robust solution for high-precision robotic manipulation.
Comparison and Evaluation of Different Simulation Environments for Rigid Body Systems
Rigid body dynamics simulators are important tools for the design, analysis and optimization of mechanical systems in a variety of technical and scientific applications. This study examines four different simulation environments (Adams, Simscape, OpenModelica, and VEROSIM), focusing in particular on the comparison of the modeling methods, the numerical solvers, and the treatment of numerical problems that arise especially in closed-loop kinematics (esp. redundant boundary conditions and static equilibrium problem). A novel and complex crane boom of a real forestry machine serves as a practical benchmark application example. The direct comparison of the different solution approaches in the examined simulation tools supports the user in selecting the most suitable tool for his application.
comment: Accepted at the 10th MHI-Fachkolloquium
Are All Data Necessary? Efficient Data Pruning for Large-scale Autonomous Driving Dataset via Trajectory Entropy Maximization
Collecting large-scale naturalistic driving data is essential for training robust autonomous driving planners. However, real-world datasets often contain a substantial amount of repetitive and low-value samples, which lead to excessive storage costs and bring limited benefits to policy learning. To address this issue, we propose an information-theoretic data pruning method that effectively reduces the training data volume without compromising model performance. Our approach evaluates the trajectory distribution information entropy of driving data and iteratively selects high-value samples that preserve the statistical characteristics of the original dataset in a model-agnostic manner. From a theoretical perspective, we show that maximizing trajectory entropy effectively constrains the Kullback-Leibler divergence between the pruned subset and the original data distribution, thereby maintaining generalization ability. Comprehensive experiments on the NuPlan benchmark with a large-scale imitation learning framework demonstrate that the proposed method can reduce the dataset size by up to 40% while maintaining closed-loop performance. This work provides a lightweight and theoretically grounded approach for scalable data management and efficient policy learning in autonomous driving systems.
comment: 7 pages, 4 figures
Translating Flow to Policy via Hindsight Online Imitation
Recent advances in hierarchical robot systems leverage a high-level planner to propose task plans and a low-level policy to generate robot actions. This design allows training the planner on action-free or even non-robot data sources (e.g., videos), providing transferable high-level guidance. Nevertheless, grounding these high-level plans into executable actions remains challenging, especially with the limited availability of high-quality robot data. To this end, we propose to improve the low-level policy through online interactions. Specifically, our approach collects online rollouts, retrospectively annotates the corresponding high-level goals from achieved outcomes, and aggregates these hindsight-relabeled experiences to update a goal-conditioned imitation policy. Our method, Hindsight Flow-conditioned Online Imitation (HinFlow), instantiates this idea with 2D point flows as the high-level planner. Across diverse manipulation tasks in both simulation and physical world, our method achieves more than $2\times$ performance improvement over the base policy, significantly outperforming the existing methods. Moreover, our framework enables policy acquisition from planners trained on cross-embodiment video data, demonstrating its potential for scalable and transferable robot learning.
Vision-Aided Relative State Estimation for Approach and Landing on a Moving Platform with Inertial Measurements
This paper tackles the problem of estimating the relative position, orientation, and velocity between a UAV and a planar platform undergoing arbitrary 3D motion during approach and landing. The estimation relies on measurements from Inertial Measurement Units (IMUs) mounted on both systems, assuming there is a suitable communication channel to exchange data, together with visual information provided by an onboard monocular camera, from which the bearing (line-of-sight direction) to the platform's center and the normal vector of its planar surface are extracted. We propose a cascade observer with a complementary filter on SO(3) to reconstruct the relative attitude, followed by a linear Riccati observer for relative position and velocity estimation. Convergence of both observers is established under persistently exciting conditions, and the cascade is shown to be almost globally asymptotically and locally exponentially stable. We further extend the design to the case where the platform's rotation is restricted to its normal axis and show that its measured linear acceleration can be exploited to recover the remaining unobservable rotation angle. A sufficient condition to ensure local exponential convergence in this setting is provided. The performance of the proposed observers is validated through extensive simulations.
comment: 13 pages, 4 figures. Submitted to IFAC World Congress 2026
Vision-Language-Policy Model for Dynamic Robot Task Planning
Bridging the gap between natural language commands and autonomous execution in unstructured environments remains an open challenge for robotics. This requires robots to perceive and reason over the current task scene through multiple modalities, and to plan their behaviors to achieve their intended goals. Traditional robotic task-planning approaches often struggle to bridge low-level execution with high-level task reasoning, and cannot dynamically update task strategies when instructions change during execution, which ultimately limits their versatility and adaptability to new tasks. In this work, we propose a novel language model-based framework for dynamic robot task planning. Our Vision-Language-Policy (VLP) model, based on a vision-language model fine-tuned on real-world data, can interpret semantic instructions and integrate reasoning over the current task scene to generate behavior policies that control the robot to accomplish the task. Moreover, it can dynamically adjust the task strategy in response to changes in the task, enabling flexible adaptation to evolving task requirements. Experiments conducted with different robots and a variety of real-world tasks show that the trained model can efficiently adapt to novel scenarios and dynamically update its policy, demonstrating strong planning autonomy and cross-embodiment generalization. Videos: https://robovlp.github.io/
comment: Manuscript under review
A Flexible Field-Based Policy Learning Framework for Diverse Robotic Systems and Sensors
We present a cross robot visuomotor learning framework that integrates diffusion policy based control with 3D semantic scene representations from D3Fields to enable category level generalization in manipulation. Its modular design supports diverse robot camera configurations including UR5 arms with Microsoft Azure Kinect arrays and bimanual manipulators with Intel RealSense sensors through a low latency control stack and intuitive teleoperation. A unified configuration layer enables seamless switching between setups for flexible data collection training and evaluation. In a grasp and lift block task the framework achieved an 80 percent success rate after only 100 demonstration episodes demonstrating robust skill transfer between platforms and sensing modalities. This design paves the way for scalable real world studies in cross robotic generalization.
comment: 6 pages, 7 figures, conference: SII 2026. Cancun, Mexico
WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving AAAI 2026
Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).
comment: AAAI 2026, first version
CoDrone: Autonomous Drone Navigation Assisted by Edge and Cloud Foundation Models
Autonomous navigation for Unmanned Aerial Vehicles faces key challenges from limited onboard computational resources, which restrict deployed deep neural networks to shallow architectures incapable of handling complex environments. Offloading tasks to remote edge servers introduces high latency, creating an inherent trade-off in system design. To address these limitations, we propose CoDrone - the first cloud-edge-end collaborative computing framework integrating foundation models into autonomous UAV cruising scenarios - effectively leveraging foundation models to enhance performance of resource-constrained unmanned aerial vehicle platforms. To reduce onboard computation and data transmission overhead, CoDrone employs grayscale imagery for the navigation model. When enhanced environmental perception is required, CoDrone leverages the edge-assisted foundation model Depth Anything V2 for depth estimation and introduces a novel one-dimensional occupancy grid-based navigation method - enabling fine-grained scene understanding while advancing efficiency and representational simplicity of autonomous navigation. A key component of CoDrone is a Deep Reinforcement Learning-based neural scheduler that seamlessly integrates depth estimation with autonomous navigation decisions, enabling real-time adaptation to dynamic environments. Furthermore, the framework introduces a UAV-specific vision language interaction module incorporating domain-tailored low-level flight primitives to enable effective interaction between the cloud foundation model and the UAV. The introduction of VLM enhances open-set reasoning capabilities in complex unseen scenarios. Experimental results show CoDrone outperforms baseline methods under varying flight speeds and network conditions, achieving a 40% increase in average flight distance and a 5% improvement in average Quality of Navigation.
comment: This paper is accepted by the IEEE Internet of Things Journal (IoT-J) for publication in the Special Issue on "Augmented Edge Sensing Intelligence for Low-Altitude IoT Systems"
EGM: Efficiently Learning General Motion Tracking Policy for High Dynamic Humanoid Whole-Body Control
Learning a general motion tracking policy from human motions shows great potential for versatile humanoid whole-body control. Conventional approaches are not only inefficient in data utilization and training processes but also exhibit limited performance when tracking highly dynamic motions. To address these challenges, we propose EGM, a framework that enables efficient learning of a general motion tracking policy. EGM integrates four core designs. Firstly, we introduce a Bin-based Cross-motion Curriculum Adaptive Sampling strategy to dynamically orchestrate the sampling probabilities based on tracking error of each motion bin, eficiently balancing the training process across motions with varying dificulty and durations. The sampled data is then processed by our proposed Composite Decoupled Mixture-of-Experts (CDMoE) architecture, which efficiently enhances the ability to track motions from different distributions by grouping experts separately for upper and lower body and decoupling orthogonal experts from shared experts to separately handle dedicated features and general features. Central to our approach is a key insight we identified: for training a general motion tracking policy, data quality and diversity are paramount. Building on these designs, we develop a three-stage curriculum training flow to progressively enhance the policy's robustness against disturbances. Despite training on only 4.08 hours of data, EGM generalized robustly across 49.25 hours of test motions, outperforming baselines on both routine and highly dynamic tasks.
IndoorUAV: Benchmarking Vision-Language UAV Navigation in Continuous Indoor Environments
Vision-Language Navigation (VLN) enables agents to navigate in complex environments by following natural language instructions grounded in visual observations. Although most existing work has focused on ground-based robots or outdoor Unmanned Aerial Vehicles (UAVs), indoor UAV-based VLN remains underexplored, despite its relevance to real-world applications such as inspection, delivery, and search-and-rescue in confined spaces. To bridge this gap, we introduce \textbf{IndoorUAV}, a novel benchmark and method specifically tailored for VLN with indoor UAVs. We begin by curating over 1,000 diverse and structurally rich 3D indoor scenes from the Habitat simulator. Within these environments, we simulate realistic UAV flight dynamics to collect diverse 3D navigation trajectories manually, further enriched through data augmentation techniques. Furthermore, we design an automated annotation pipeline to generate natural language instructions of varying granularity for each trajectory. This process yields over 16,000 high-quality trajectories, comprising the \textbf{IndoorUAV-VLN} subset, which focuses on long-horizon VLN. To support short-horizon planning, we segment long trajectories into sub-trajectories by selecting semantically salient keyframes and regenerating concise instructions, forming the \textbf{IndoorUAV-VLA} subset. Finally, we introduce \textbf{IndoorUAV-Agent}, a novel navigation model designed for our benchmark, leveraging task decomposition and multimodal reasoning. We hope IndoorUAV serves as a valuable resource to advance research on vision-language embodied AI in the indoor aerial navigation domain.
VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Despite remarkable progress in Vision-Language Navigation (VLN), existing benchmarks remain confined to fixed, small-scale datasets with naive physical simulation. These shortcomings limit the insight that the benchmarks provide into sim-to-real generalization, and create a significant research gap. Furthermore, task fragmentation prevents unified/shared progress in the area, while limited data scales fail to meet the demands of modern LLM-based pretraining. To overcome these limitations, we introduce VLNVerse: a new large-scale, extensible benchmark designed for Versatile, Embodied, Realistic Simulation, and Evaluation. VLNVerse redefines VLN as a scalable, full-stack embodied AI problem. Its Versatile nature unifies previously fragmented tasks into a single framework and provides an extensible toolkit for researchers. Its Embodied design moves beyond intangible and teleporting "ghost" agents that support full-kinematics in a Realistic Simulation powered by a robust physics engine. We leverage the scale and diversity of VLNVerse to conduct a comprehensive Evaluation of existing methods, from classic models to MLLM-based agents. We also propose a novel unified multi-task model capable of addressing all tasks within the benchmark. VLNVerse aims to narrow the gap between simulated navigation and real-world generalization, providing the community with a vital tool to boost research towards scalable, general-purpose embodied locomotion agents.
PalpAid: Multimodal Pneumatic Tactile Sensor for Tissue Palpation
The tactile properties of tissue, such as elasticity and stiffness, often play an important role in surgical oncology when identifying tumors and pathological tissue boundaries. Though extremely valuable, robot-assisted surgery comes at the cost of reduced sensory information to the surgeon; typically, only vision is available. Sensors proposed to overcome this sensory desert are often bulky, complex, and incompatible with the surgical workflow. We present PalpAid, a multimodal pneumatic tactile sensor equipped with a microphone and pressure sensor, converting contact force into an internal pressure differential. The pressure sensor acts as an event detector, while the auditory signature captured by the microphone assists in tissue delineation. We show the design, fabrication, and assembly of sensory units with characterization tests to show robustness to use, inflation-deflation cycles, and integration with a robotic system. Finally, we show the sensor's ability to classify 3D-printed hard objects with varying infills and soft ex vivo tissues. Overall, PalpAid aims to fill the sensory gap intelligently and allow improved clinical decision-making.
DTCCL: Disengagement-Triggered Contrastive Continual Learning for Autonomous Bus Planners
Autonomous buses run on fixed routes but must operate in open, dynamic urban environments. Disengagement events on these routes are often geographically concentrated and typically arise from planner failures in highly interactive regions. Such policy-level failures are difficult to correct using conventional imitation learning, which easily overfits to sparse disengagement data. To address this issue, this paper presents a Disengagement-Triggered Contrastive Continual Learning (DTCCL) framework that enables autonomous buses to improve planning policies through real-world operation. Each disengagement triggers cloud-based data augmentation that generates positive and negative samples by perturbing surrounding agents while preserving route context. Contrastive learning refines policy representations to better distinguish safe and unsafe behaviors, and continual updates are applied in a cloud-edge loop without human supervision. Experiments on urban bus routes demonstrate that DTCCL improves overall planning performance by 48.6 percent compared with direct retraining, validating its effectiveness for scalable, closed-loop policy improvement in autonomous public transport.
Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation ICRA 2026
In this study, we address the problem of open-vocabulary mobile manipulation, where a robot is required to carry a wide range of objects to receptacles based on free-form natural language instructions. This task is challenging, as it involves understanding visual semantics and the affordance of manipulation actions. To tackle these challenges, we propose Affordance RAG, a zero-shot hierarchical multimodal retrieval framework that constructs Affordance-Aware Embodied Memory from pre-explored images. The model retrieves candidate targets based on regional and visual semantics and reranks them with affordance scores, allowing the robot to identify manipulation options that are likely to be executable in real-world environments. Our method outperformed existing approaches in retrieval performance for mobile manipulation instruction in large-scale indoor environments. Furthermore, in real-world experiments where the robot performed mobile manipulation in indoor environments based on free-form instructions, the proposed method achieved a task success rate of 85%, outperforming existing methods in both retrieval performance and overall task success.
comment: Accepted to IEEE RA-L, with presentation at ICRA 2026
A Framework for Deploying Learning-based Quadruped Loco-Manipulation
Quadruped mobile manipulators offer strong potential for agile loco-manipulation but remain difficult to control and transfer reliably from simulation to reality. Reinforcement learning (RL) shows promise for whole-body control, yet most frameworks are proprietary and hard to reproduce on real hardware. We present an open pipeline for training, benchmarking, and deploying RL-based controllers on the Unitree B1 quadruped with a Z1 arm. The framework unifies sim-to-sim and sim-to-real transfer through ROS, re-implementing a policy trained in Isaac Gym, extending it to MuJoCo via a hardware abstraction layer, and deploying the same controller on physical hardware. Sim-to-sim experiments expose discrepancies between Isaac Gym and MuJoCo contact models that influence policy behavior, while real-world teleoperated object-picking trials show that coordinated whole-body control extends reach and improves manipulation over floating-base baselines. The pipeline provides a transparent, reproducible foundation for developing and analyzing RL-based loco-manipulation controllers and will be released open source to support future research.
Point What You Mean: Visually Grounded Instruction Policy
Vision-Language-Action (VLA) models align vision and language with embodied control, but their object referring ability remains limited when relying solely on text prompt, especially in cluttered or out-of-distribution (OOD) scenes. In this study, we introduce the Point-VLA, a plug-and-play policy that augments language instructions with explicit visual cues (e.g., bounding boxes) to resolve referential ambiguity and enable precise object-level grounding. To efficiently scale visually grounded datasets, we further develop an automatic data annotation pipeline requiring minimal human effort. We evaluate Point-VLA on diverse real-world referring tasks and observe consistently stronger performance than text-only instruction VLAs, particularly in cluttered or unseen-object scenarios, with robust generalization. These results demonstrate that Point-VLA effectively resolves object referring ambiguity through pixel-level visual grounding, achieving more generalizable embodied control.
A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones
Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existing flocking algorithms often struggle with efficiency and scalability, particularly when potential collisions force drones into suboptimal trajectories. This paper presents a time-efficient prioritised scheduling algorithm that improves the initial formation process of drone flocks. The method assigns each drone a priority based on its number of potential collisions and its likelihood of reaching its target position without permanently obstructing other drones. Using this hierarchy, each drone computes an appropriate delay to ensure a collision-free path. Simulation results show that the proposed algorithm successfully generates collision-free trajectories for flocks of up to 5000 drones and outperforms the coupling-degree-based heuristic prioritised planning method (CDH-PP) in both performance and computational efficiency.
comment: 35 pages
Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study
This letter extends the exactly sparse Gaussian variational inference (ESGVI) algorithm for state estimation in two complementary directions. First, ESGVI is generalized to operate on matrix Lie groups, enabling the estimation of states with orientation components while respecting the underlying group structure. Second, factors are introduced to accommodate heavy-tailed and skewed noise distributions, as commonly encountered in ultra-wideband (UWB) localization due to non-line-of-sight (NLOS) and multipath effects. Both extensions are shown to integrate naturally within the ESGVI framework while preserving its sparse and derivative-free structure. The proposed approach is validated in a UWB localization experiment with NLOS-rich measurements, demonstrating improved accuracy and comparable consistency. Finally, a Python implementation within a factor-graph-based estimation framework is made open-source (https://github.com/decargroup/gvi_ws) to support broader research use.
A Class of Axis-Angle Attitude Control Laws for Rotational Systems
We introduce a new class of attitude control laws for rotational systems, which generalizes the use of the Euler axis-angle representation beyond quaternion-based formulations. Using basic Lyapunov's stability theory and the notion of extended $K_{\infty}$ functions, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting closed-loop (CL) scheme. In contrast with traditional quaternion-based methods, the proposed generalized axis-angle approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and real-time experimental results, we demonstrate the effectiveness of the proposed approach. According to the recorded data, in the execution of high-speed tumble-recovery maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the quaternion-based and geometric-control methods used as benchmarks.
comment: 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
Vision-Language-Action (VLA) models adapt large vision-language backbones to map images and instructions into robot actions. However, prevailing VLAs either generate actions auto-regressively in a fixed left-to-right order or attach separate MLP or diffusion heads outside the backbone, leading to fragmented information pathways and specialized training requirements that hinder a unified, scalable architecture. We present Discrete Diffusion VLA, a unified-transformer policy that models discretized action chunks with discrete diffusion. The design retains diffusion's progressive refinement paradigm while remaining natively compatible with the discrete token interface of VLMs. Our method achieves an adaptive decoding order that resolves easy action elements before harder ones and uses secondary re-masking to revisit uncertain predictions across refinement rounds, which improves consistency and enables robust error correction. This unified decoder preserves pre-trained vision-language priors, supports parallel decoding, breaks the autoregressive bottleneck, and reduces the number of function evaluations. Discrete Diffusion VLA achieves 96.3% avg. success rates on LIBERO, 71.2% visual matching on SimplerEnv-Fractal and 54.2% overall on SimplerEnv-Bridge. We also provide ablation study on vision-language ability retention on LIBERO-OOD (Out-of-Distribution) benchmark, with our method improving over autoregressive, MLP decoder and continuous diffusion baselines. These findings indicate that discrete-diffusion VLA supports precise action modeling and consistent training, laying groundwork for scaling VLA to larger models and datasets. Our code is available at https://github.com/Liang-ZX/DiscreteDiffusionVLA/tree/libero.
comment: New experiments on VL retention and new ablations. 18 pages
AsyMoE: Leveraging Modal Asymmetry for Enhanced Expert Specialization in Large Vision-Language Models
Large Vision-Language Models (LVLMs) have demonstrated impressive performance on multimodal tasks through scaled architectures and extensive training. However, existing Mixture of Experts (MoE) approaches face challenges due to the asymmetry between visual and linguistic processing. Visual information is spatially complete, while language requires maintaining sequential context. As a result, MoE models struggle to balance modality-specific features and cross-modal interactions. Through systematic analysis, we observe that language experts in deeper layers progressively lose contextual grounding and rely more on parametric knowledge rather than utilizing the provided visual and linguistic information. To address this, we propose AsyMoE, a novel architecture that models this asymmetry using three specialized expert groups. We design intra-modality experts for modality-specific processing, hyperbolic inter-modality experts for hierarchical cross-modal interactions, and evidence-priority language experts to suppress parametric biases and maintain contextual grounding. Extensive experiments demonstrate that AsyMoE achieves 26.58% and 15.45% accuracy improvements over vanilla MoE and modality-specific MoE respectively, with 25.45% fewer activated parameters than dense models.
comment: This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results
Towards Senior-Robot Interaction: Reactive Robot Dog Gestures
As the global population ages, many seniors face the problem of loneliness. Companion robots offer a potential solution. However, current companion robots often lack advanced functionality, while task-oriented robots are not designed for social interaction, limiting their suitability and acceptance by seniors. Our work introduces a senior-oriented system for quadruped robots that allows for more intuitive user input and provides more socially expressive output. For user input, we implemented a MediaPipe-based module for hand gesture and head movement recognition, enabling control without a remote. For output, we designed and trained robotic dog gestures using curriculum-based reinforcement learning in Isaac Gym, progressing from simple standing to three-legged balancing and leg extensions, and more. The final tests achieved over 95\% success on average in simulation, and we validated a key social gesture (the paw-lift) on a Unitree robot. Real-world tests demonstrated the feasibility and social expressiveness of this framework, while also revealing sim-to-real challenges in joint compliance, load distribution, and balance control. These contributions advance the development of practical quadruped robots as social companions for the senior and outline pathways for sim-to-real adaptation and inform future user studies.
comment: Accepted at the Australasian Conference on Robotics and Automation (ACRA) 2025
VERDI: VLM-Embedded Reasoning for Autonomous Driving
While autonomous driving (AD) stacks struggle with decision making under partial observability and real-world complexity, human drivers are capable of commonsense reasoning to make near-optimal decisions with limited information. Recent work has attempted to leverage finetuned Vision-Language Models (VLMs) for trajectory planning at inference time to emulate human behavior. Despite their success in benchmark evaluations, these methods are often impractical to deploy (a 70B parameter VLM inference at merely 8 tokens per second requires more than 160G of memory), and their monolithic network structure prohibits safety decomposition. To bridge this gap, we propose VLM-Embedded Reasoning for autonomous Driving (VERDI), a training-time framework that distills the reasoning process and commonsense knowledge of VLMs into the AD stack. VERDI augments modular differentiable end-to-end (e2e) AD models by aligning intermediate module outputs at the perception, prediction, and planning stages with text features explaining the driving reasoning process produced by VLMs. By encouraging alignment in latent space, VERDI enables the modular AD stack to internalize structured reasoning, without incurring the inference-time costs of large VLMs. We validate VERDI in both open-loop (NuScenes and Bench2Drive benchmarks) and closed-loop (HugSim Simulator) settings. We find that VERDI outperforms existing e2e methods that do not embed reasoning by up to 11% in $\ell_{2}$ distance and 11% in driving performance, while maintaining real-time inference speed.
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
BRIC: Bridging Kinematic Plans and Physical Control at Test Time AAAI'26
We propose BRIC, a novel test-time adaptation (TTA) framework that enables long-term human motion generation by resolving execution discrepancies between diffusion-based kinematic motion planners and reinforcement learning-based physics controllers. While diffusion models can generate diverse and expressive motions conditioned on text and scene context, they often produce physically implausible outputs, leading to execution drift during simulation. To address this, BRIC dynamically adapts the physics controller to noisy motion plans at test time, while preserving pre-trained skills via a loss function that mitigates catastrophic forgetting. In addition, BRIC introduces a lightweight test-time guidance mechanism that steers the diffusion model in the signal space without updating its parameters. By combining both adaptation strategies, BRIC ensures consistent and physically plausible long-term executions across diverse environments in an effective and efficient manner. We validate the effectiveness of BRIC on a variety of long-term tasks, including motion composition, obstacle avoidance, and human-scene interaction, achieving state-of-the-art performance across all tasks.
comment: Accepted to AAAI'26
Confidence Calibration in Vision-Language-Action Models
Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present a first-of-its-kind study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural language instructions to low-level robot motor commands. We establish a confidence baseline for VLAs, examine how task success relates to calibration error and how calibration evolves over time, and introduce two lightweight techniques to remedy the miscalibration we observe: prompt ensembles and action-wise Platt scaling. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.
comment: 38 pages, 19 figures; additional experiments with VLA variants
PRISM-Loc: a Lightweight Long-range LiDAR Localization in Urban Environments with Topological Maps ICRA 2026
We propose PRISM-Loc - a lightweight and robust approach for localization in large outdoor environments that combines a compact topological representation with a novel scan-matching and curb-detection module operating on raw LiDAR scans. The method is designed for resource-constrained platforms and emphasizes real-time performance and resilience to common urban sensing challenges. It provides accurate localization in compact topological maps using global place recognition and an original scan matching technique. Experiments on standard benchmarks and on an embedded platform demonstrate the effectiveness of our approach. Our method achieves a 99\% success rate on the large-scale ITLP-Campus dataset while running at 150 ms per localization and using a 20 MB map for localization. We highlight three main contributions: (1) a compact representation for city-scale localization; (2) a novel curb detection and scan matching pipeline operating directly on raw LiDAR points; (3) a thorough evaluation of our method with performance analysis.
comment: This version was submitted to ICRA 2026 conference
DeltaFlow: An Efficient Multi-frame Scene Flow Estimation Method NeurIPS 2025
Previous dominant methods for scene flow estimation focus mainly on input from two consecutive frames, neglecting valuable information in the temporal domain. While recent trends shift towards multi-frame reasoning, they suffer from rapidly escalating computational costs as the number of frames grows. To leverage temporal information more efficiently, we propose DeltaFlow ($Δ$Flow), a lightweight 3D framework that captures motion cues via a $Δ$ scheme, extracting temporal features with minimal computational cost, regardless of the number of frames. Additionally, scene flow estimation faces challenges such as imbalanced object class distributions and motion inconsistency. To tackle these issues, we introduce a Category-Balanced Loss to enhance learning across underrepresented classes and an Instance Consistency Loss to enforce coherent object motion, improving flow accuracy. Extensive evaluations on the Argoverse 2, Waymo and nuScenes datasets show that $Δ$Flow achieves state-of-the-art performance with up to 22% lower error and $2\times$ faster inference compared to the next-best multi-frame supervised method, while also demonstrating a strong cross-domain generalization ability. The code is open-sourced at https://github.com/Kin-Zhang/DeltaFlow along with trained model weights.
comment: NeurIPS 2025 Spotlight, 18 pages (10 main pages + 8 supp materail), 11 figures, code at https://github.com/Kin-Zhang/DeltaFlow
Geometric Data-Driven Multi-Jet Locomotion Inspired by Salps
Salps are marine animals consisting of chains of jellyfish-like units. Their efficient underwater locomotion by coordinating multi-jet propulsion has aroused great interest in robotics. This paper presents a geometric mechanics framework for salp-inspired robots. We study a new type of geometric mechanics models inspired by salps, in which control inputs are not restricted to the shape axes, analyze nonlinear controllability, and develop motion planning and feedback control methods. We introduce the "LandSalp" robot, which serves as a physical realization of the reduced-order, drag-dominated model of salp swimming, enabling controlled evaluation of locomotion strategies without many confounding factors of underwater experiments. We extend least-squares- and inverse-dynamics-based system identification to learn the Riemannian metric of the drag-dominated model from experimental data using Lie group differentiation. With about three minutes of data, we identify an accurate model of LandSalp. Simulation and hardware experiments demonstrate omnidirectional locomotion, shape regulation, and bending maneuvers, providing a principled pathway toward more capable salp-inspired robots.
comment: 20 pages, 16 figures
GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments
Autonomous vehicles performing navigation tasks in complex environments face significant challenges due to uncertainty in state estimation. In many scenarios, such as stealth operations or resource-constrained settings, accessing high-precision localization comes at a significant cost, forcing robots to rely primarily on less precise state estimates. Our key observation is that different tasks require varying levels of precision in different regions: a robot navigating a crowded space might need precise localization near obstacles but can operate effectively with less precision elsewhere. In this paper, we present a planning method for integrating task-specific uncertainty requirements directly into navigation policies. We introduce Task-Specific Uncertainty Maps (TSUMs), which abstract the acceptable levels of state estimation uncertainty across different regions. TSUMs align task requirements and environmental features using a shared representation space, generated via a domain-adapted encoder. Using TSUMs, we propose Generalized Uncertainty Integration for Decision-Making and Execution (GUIDE), a policy conditioning framework that incorporates these uncertainty requirements into robot decision-making. We find that TSUMs provide an effective way to abstract task-specific uncertainty requirements, and conditioning policies on TSUMs enables the robot to reason about the context-dependent value of certainty and adapt its behavior accordingly. We show how integrating GUIDE into reinforcement learning frameworks allows the agent to learn navigation policies that effectively balance task completion and uncertainty management without explicit reward engineering. We evaluate GUIDE on various real-world robotic navigation tasks and find that it demonstrates significant improvement in task completion rates compared to baseline methods that do not explicitly consider task-specific uncertainty.
comment: Accepted for publication at RAL (Robotics and automation letters). Updated with the final version
TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution. How to leverage those expert takeover data from disengagement scenarios and effectively expand the IL policy's capability presents a valuable yet unexplored challenge. In this paper, we propose TakeAD, a novel preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data to enhance the closed-loop driving performance. First, we design an efficient expert takeover data collection pipeline inspired by human takeover mechanisms in real-world autonomous driving systems. Then, this post optimization framework integrates iterative Dataset Aggregation (DAgger) for imitation learning with Direct Preference Optimization (DPO) for preference alignment. The DAgger stage equips the policy with fundamental capabilities to handle disengagement states through direct imitation of expert interventions. Subsequently, the DPO stage refines the policy's behavior to better align with expert preferences in disengagement scenarios. Through multiple iterations, the policy progressively learns recovery strategies for disengagement states, thereby mitigating the open-loop gap. Experiments on the closed-loop Bench2Drive benchmark demonstrate our method's effectiveness compared with pure IL methods, with comprehensive ablations confirming the contribution of each component.
comment: This work has been accepted by IEEE RA-L. Manuscript submitted: July, 8, 2025; Accepted: November, 24, 2025
QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models
Spatial perception and reasoning are crucial for Vision-Language-Action (VLA) models to accomplish fine-grained manipulation tasks. However, existing approaches often lack the ability to understand and reason over the essential 3D structures necessary for precise control. To address this limitation, we propose QDepth-VLA, a general framework that augments VLA models with an auxiliary depth prediction task. A dedicated depth expert is designed to predict quantized latent tokens of depth maps obtained from a VQ-VAE encoder, enabling the model to learn depth-aware representations that capture critical geometric cues. Experimental results on the simulation benchmarks and real-world tasks demonstrate that QDepth-VLA yields strong spatial reasoning and competitive performance on manipulation tasks.
Deformable Cluster Manipulation via Whole-Arm Policy Learning
Manipulating clusters of deformable objects presents a substantial challenge with widespread applicability, but requires contact-rich whole-arm interactions. A potential solution must address the limited capacity for realistic model synthesis, high uncertainty in perception, and the lack of efficient spatial abstractions, among others. We propose a novel framework for learning model-free policies integrating two modalities: 3D point clouds and proprioceptive touch indicators, emphasising manipulation with full body contact awareness, going beyond traditional end-effector modes. Our reinforcement learning framework leverages a distributional state representation, aided by kernel mean embeddings, to achieve improved training efficiency and real-time inference. Furthermore, we propose a novel context-agnostic occlusion heuristic to clear deformables from a target region for exposure tasks. We deploy the framework in a power line clearance scenario and observe that the agent generates creative strategies leveraging multiple arm links for de-occlusion. Finally, we perform zero-shot sim-to-real policy transfer, allowing the arm to clear real branches with unknown occlusion patterns, unseen topology, and uncertain dynamics. Website: https://sites.google.com/view/dcmwap/
DRO-EDL-MPC: Evidential Deep Learning-Based Distributionally Robust Model Predictive Control for Safe Autonomous Driving
Safety is a critical concern in motion planning for autonomous vehicles. Modern autonomous vehicles rely on neural network-based perception, but making control decisions based on these inference results poses significant safety risks due to inherent uncertainties. To address this challenge, we present a distributionally robust optimization (DRO) framework that accounts for both aleatoric and epistemic perception uncertainties using evidential deep learning (EDL). Our approach introduces a novel ambiguity set formulation based on evidential distributions that dynamically adjusts the conservativeness according to perception confidence levels. We integrate this uncertainty-aware constraint into model predictive control (MPC), proposing the DRO-EDL-MPC algorithm with computational tractability for autonomous driving applications. Validation in the CARLA simulator demonstrates that our approach maintains efficiency under high perception confidence while enforcing conservative constraints under low confidence.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes ICCV 2025
Recent advances in deep learning-based point cloud registration have improved generalization, yet most methods still require retraining or manual parameter tuning for each new environment. In this paper, we identify three key factors limiting generalization: (a) reliance on environment-specific voxel size and search radius, (b) poor out-of-domain robustness of learning-based keypoint detectors, and (c) raw coordinate usage, which exacerbates scale discrepancies. To address these issues, we present a zero-shot registration pipeline called BUFFER-X by (a) adaptively determining voxel size/search radii, (b) using farthest point sampling to bypass learned detectors, and (c) leveraging patch-wise scale normalization for consistent coordinate bounds. In particular, we present a multi-scale patch-based descriptor generation and a hierarchical inlier search across scales to improve robustness in diverse scenes. We also propose a novel generalizability benchmark using 11 datasets that cover various indoor/outdoor scenarios and sensor modalities, demonstrating that BUFFER-X achieves substantial generalization without prior information or manual parameter tuning for the test datasets. Our code is available at https://github.com/MIT-SPARK/BUFFER-X.
comment: 20 pages, 14 figures. Accepted as a highlight paper at ICCV 2025
FORWARD: Dataset of a forwarder operating in rough terrain
We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large Komatsu model equipped with vehicle telematics sensors, including global positioning via satellite navigation, movement sensors, accelerometers, and engine sensors. The vehicle was additionally equipped with cameras, operator vibration sensors, and multiple IMUs. The data includes event time logs recorded at 5 Hz of driving speed, fuel consumption, vehicle position with centimeter accuracy, and crane use while the vehicle operates in forest areas, aerially laser-scanned with a resolution of around 1500 points per square meter. Production log files (StanForD standard) with time-stamped machine events, extensive video material, and terrain data in various formats are included as well. About 18 hours of regular wood extraction work during three days is annotated from 360-video material into individual work elements and included in the dataset. We also include scenario specifications of conducted experiments on forest roads and in terrain. Scenarios include repeatedly driving the same routes with and without steel tracks, different load weights, and different target driving speeds. The dataset is intended for developing models and algorithms for trafficability, perception, and autonomous control of forest machines using artificial intelligence, simulation, and experiments on physical testbeds. In part, we focus on forwarders traversing terrain, avoiding or handling obstacles, and loading or unloading logs, with consideration for efficiency, fuel consumption, safety, and environmental impact. Other benefits of the open dataset include the ability to explore auto-generation and calibration of forestry machine simulators and automation scenario descriptions using the data recorded in the field.
comment: 28 pages, 22 figures
Explainable deep learning improves human mental models of self-driving cars
Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging for the human behind the wheel to accurately anticipate when they will fail, with potentially catastrophic consequences. While research into interpreting these systems has surged, most of it is confined to simulations or toy setups due to the difficulty of real-world deployment, leaving the practical utility of such techniques unknown. Here, we introduce the Concept-Wrapper Network (CW-Net), a method for explaining the behavior of machine-learning-based planners by grounding their reasoning in human-interpretable concepts. We deploy CW-Net on a real self-driving car and show that the resulting explanations improve the human driver's mental model of the car, allowing them to better predict its behavior. To our knowledge, this is the first demonstration that explainable deep learning integrated into self-driving cars can be both understandable and useful in a realistic deployment setting. CW-Net accomplishes this level of intelligibility while providing explanations which are causally faithful and do not sacrifice driving performance. Overall, our study establishes a general pathway to interpretability for autonomous agents by way of concept-based explanations, which could help make them more transparent and safe.
comment: MST & JAS contributed equally to this work
Multiagent Systems
A Multi-Agent Retrieval-Augmented Framework for Work-in-Progress Predictio
Work-in-Progress (WiP) prediction is critical for predictive process monitoring, enabling accurate anticipation of workload fluctuations and optimized operational planning. This paper proposes a retrieval-augmented, multi-agent framework that combines retrieval-augmented generation (RAG) and collaborative multi-agent reasoning for WiP prediction. The narrative generation component transforms structured event logs into semantically rich natural language stories, which are embedded into a semantic vector-based process memory to facilitate dynamic retrieval of historical context during inference. The framework includes predictor agents that independently leverage retrieved historical contexts and a decision-making assistant agent that extracts high-level descriptive signals from recent events. A fusion agent then synthesizes predictions using ReAct-style reasoning over agent outputs and retrieved narratives. We evaluate our framework on two real-world benchmark datasets. Results show that the proposed retrieval-augmented multi-agent approach achieves competitive prediction accuracy, obtaining a Mean Absolute Percentage Error (MAPE) of 1.50\% on one dataset, and surpassing Temporal Convolutional Networks (TCN), Long Short-Term Memory (LSTM), and persistence baselines. The results highlight improved robustness, demonstrating the effectiveness of integrating retrieval mechanisms and multi-agent reasoning in WiP prediction.
comment: 15th International Conference on Digital Image Processing and Pattern Recognition (DPPR 2025), December 20 ~ 21, 2025, Sydney, Australia
Mechanism-Based Intelligence (MBI): Differentiable Incentives for Rational Coordination and Guaranteed Alignment in Multi-Agent Systems
Autonomous multi-agent systems are fundamentally fragile: they struggle to solve the Hayekian Information problem (eliciting dispersed private knowledge) and the Hurwiczian Incentive problem (aligning local actions with global objectives), making coordination computationally intractable. I introduce Mechanism-Based Intelligence (MBI), a paradigm that reconceptualizes intelligence as emergent from the coordination of multiple "brains", rather than a single one. At its core, the Differentiable Price Mechanism (DPM) computes the exact loss gradient $$ \mathbf{G}_i = - \frac{\partial \mathcal{L}}{\partial \mathbf{x}_i} $$ as a dynamic, VCG-equivalent incentive signal, guaranteeing Dominant Strategy Incentive Compatibility (DSIC) and convergence to the global optimum. A Bayesian extension ensures incentive compatibility under asymmetric information (BIC). The framework scales linearly ($\mathcal{O}(N)$) with the number of agents, bypassing the combinatorial complexity of Dec-POMDPs and is empirically 50x faster than Model-Free Reinforcement Learning. By structurally aligning agent self-interest with collective objectives, it provides a provably efficient, auditable and generalizable approach to coordinated, trustworthy and scalable multi-agent intelligence grounded in economic principles.
HyperAgent: Leveraging Hypergraphs for Topology Optimization in Multi-Agent Communication
Recent advances in large language model-powered multi-agent systems have demonstrated remarkable collective intelligence through effective communication. However, existing approaches face two primary challenges: (i) \textit{Ineffective group collaboration modeling}, as they rely on pairwise edge representations in graph structures, limiting their ability to capture relationships among multiple agents; and (ii) \textit{Limited task-adaptiveness in communication topology design}, leading to excessive communication cost for simple tasks and insufficient coordination for complex scenarios. These issues restrict the scalability and practical deployment of adaptive collaboration frameworks. To address these challenges, we propose \textbf{HyperAgent}, a hypergraph-based framework that optimizes communication topologies and effectively captures group collaboration patterns using direct hyperedge representations. Unlike edge-based approaches, HyperAgent uses hyperedges to link multiple agents within the same subtask and employs hypergraph convolutional layers to achieve one-step information aggregation in collaboration groups. Additionally, it incorporates a variational autoencoder framework with sparsity regularization to dynamically adjust hypergraph topologies based on task complexity. Experiments highlight the superiority of HyperAgent in both performance and efficiency. For instance, on GSM8K, HyperAgent achieves 95.07\% accuracy while reducing token consumption by 25.33\%, demonstrating the potential of hypergraph-based optimization for multi-agent communication.
comment: This submission has been withdrawn by the authors due to a fundamental error in the methodology that affects the validity of the main results
RadAgents: Multimodal Agentic Reasoning for Chest X-ray Interpretation with Radiologist-like Workflows ML4H'25
Agentic systems offer a potential path to solve complex clinical tasks through collaboration among specialized agents, augmented by tool use and external knowledge bases. Nevertheless, for chest X-ray (CXR) interpretation, prevailing methods remain limited: (i) reasoning is frequently neither clinically interpretable nor aligned with guidelines, reflecting mere aggregation of tool outputs; (ii) multimodal evidence is insufficiently fused, yielding text-only rationales that are not visually grounded; and (iii) systems rarely detect or resolve cross-tool inconsistencies and provide no principled verification mechanisms. To bridge the above gaps, we present RadAgents, a multi-agent framework that couples clinical priors with task-aware multimodal reasoning and encodes a radiologist-style workflow into a modular, auditable pipeline. In addition, we integrate grounding and multimodal retrieval-augmentation to verify and resolve context conflicts, resulting in outputs that are more reliable, transparent, and consistent with clinical practice.
comment: ML4H'25; Work in progress
Personalized and Resilient Distributed Learning Through Opinion Dynamics
In this paper, we address two practical challenges of distributed learning in multi-agent network systems, namely personalization and resilience. Personalization is the need of heterogeneous agents to learn local models tailored to their own data and tasks, while still generalizing well; on the other hand, the learning process must be resilient to cyberattacks or anomalous training data to avoid disruption. Motivated by a conceptual affinity between these two requirements, we devise a distributed learning algorithm that combines distributed gradient descent and the Friedkin-Johnsen model of opinion dynamics to fulfill both of them. We quantify its convergence speed and the neighborhood that contains the final learned models, which can be easily controlled by tuning the algorithm parameters to enforce a more personalized/resilient behavior. We numerically showcase the effectiveness of our algorithm on synthetic and real-world distributed learning tasks, where it achieves high global accuracy both for personalized models and with malicious agents compared to standard strategies.
comment: Published on IEEE Transactions on Control of Network Systems. Final accepted version
Networked Communication for Decentralised Cooperative Agents in Mean-Field Control
The mean-field framework has been used to find approximate solutions to problems involving very large populations of symmetric, anonymous agents, which may be intractable by other methods. The cooperative mean-field control (MFC) problem has received less attention than the non-cooperative mean-field game (MFG), despite the former potentially being more useful as a tool for engineering large-scale collective behaviours. Decentralised communication algorithms have recently been introduced to MFGs, giving benefits to learning speed and robustness. Inspired by this, we introduce networked communication to MFC - where populations arguably have broader incentive to communicate - and in particular to the setting where decentralised agents learn online from a single, non-episodic run of the empirical system. We adapt recent MFG algorithms to this new setting, as well as contributing a novel sub-routine allowing networked agents to estimate the global average reward from their local neighbourhood. Previous theoretical analysis of decentralised communication in MFGs does not extend trivially to MFC. We therefore contribute new theory proving that in MFC the networked communication scheme allows agents to increase social welfare faster than under both of the two typical alternative architectures, namely independent and centralised learning. We provide experiments that support this new result across different classes of cooperative game, and also give numerous ablation studies and additional experiments concerning numbers of communication round and robustness to communication failures.
Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation
Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Networked Communication for Decentralised Agents in Mean-Field Games
Methods like multi-agent reinforcement learning struggle to scale with growing population size. Mean-field games (MFGs) are a game-theoretic approach that can circumvent this by finding a solution for an abstract infinite population, which can then be used as an approximate solution for the $N$-agent problem. However, classical mean-field algorithms usually only work under restrictive conditions. We take steps to address this by introducing networked communication to MFGs, in particular to settings that use a single, non-episodic run of $N$ decentralised agents to simulate the infinite population, as is likely to be most reasonable in real-world deployments. We prove that our architecture's sample guarantees lie between those of earlier theoretical algorithms for the centralised- and independent-learning architectures, varying dependent on network structure and the number of communication rounds. However, the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations. We then show that in practical settings where the theoretical hyperparameters are not observed, giving fewer loops but poorer estimation of the Q-function, our communication scheme still respects the earlier theoretical analysis: it considerably accelerates learning over the independent case, which hardly seems to learn at all, and often performs similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.
MAPPO-LCR: Multi-Agent Proximal Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games
Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.
Cooperation as a Black Box: Conceptual Fluctuation and Diagnostic Tools for Misalignment in MAS
Misalignment in Multi-Agent Systems (MAS) is frequently treated as a technical failure. Yet, issues may arise from the conceptual design phase, where semantic ambiguity and normative projection occur. The Rabbit-Duck illusion illustrates how perspective-dependent readings of agent behavior, such as the conflation of cooperation-coordination, can create epistemic instability; e.g., coordinated agents in cooperative Multi-Agent Reinforcement Learning (MARL) benchmarks being interpreted as morally aligned, despite being optimized for shared utility maximization only. Motivated by three drivers of meaning-level misalignment in MAS (coordination-cooperation ambiguity, conceptual fluctuation, and semantic instability), we introduce the Misalignment Mosaic: a framework for diagnosing how misalignment emerges through language, framing, and design assumptions. The Mosaic comprises four components: 1. Terminological Inconsistency, 2. Interpretive Ambiguity, 3. Concept-to-Code Decay, and 4. Morality as Cooperation. Building on insights from the Morality-as-Cooperation Theory, we call for consistent meaning-level grounding in MAS to ensure systems function as intended: technically and ethically. This need is particularly urgent as MAS principles influence broader Artificial Intelligence (AI) workflows, amplifying risks in trust, interpretability, and governance. While this work focuses on the coordination/cooperation ambiguity, the Mosaic generalizes to other overloaded terms, such as alignment, autonomy, and trust.
Systems and Control (EESS)
Optimal-coupling-observer AV motion control securing comfort in the presence of cyber attacks
The security of Automated Vehicles (AVs) is an important emerging area of research in traffic safety. Methods have been published and evaluated in experimental vehicles to secure safe AV control in the presence of attacks, but human motion comfort is rarely investigated in such studies. In this paper, we present an innovative optimal-coupling-observer-based framework that rejects the impact of bounded sensor attacks in a network of connected and automated vehicles from safety and comfort point of view. We demonstrate its performance in car following with cooperative adaptive cruise control for platoons with redundant distance and velocity sensors. The error dynamics are formulated as a Linear Time Variant (LTV) system, resulting in complex stability conditions that are investigated using a Linear Matrix Inequality (LMI) approach guaranteeing global asymptotic stability. We prove the capability of the framework to secure occupants' safety and comfort in the presence of bounded attacks. In the onset of attack, the framework rapidly detects attacked sensors and switches to the most reliable observer eliminating attacked sensors, even with modest attack magnitudes. Without our proposed method, severe (but bounded) attacks result in collisions and major discomfort. With our method, attacks had negligible effects on motion comfort evaluated using ISO-2631 Ride Comfort and Motion Sickness indexes. The results pave the path to bring comfort to the forefront of AVs security.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository https://github.com/kdjebko/lelar-in-orbit-data
Exact Recourse Functions for Aggregations of EVs Operating in Imbalance Markets
We study optimal charging of large electric vehicle populations that are exposed to a single real-time imbalance price. The problem is naturally cast as a multistage stochastic linear programme (MSLP), which can be solved by algorithms such as Stochastic Dual Dynamic Programming. However, these methods scale poorly with the number of devices and stages. This paper presents a novel approach to overcome this curse of dimensionality. Building prior work that characterises the aggregate flexibility sets of populations of EVs as a permutahdron, we reformulate the original problem in terms of aggregated quantities. The geometric structure of permutahedra lets us (i) construct an optimal disaggregation policy, (ii) derive an exact, lower-dimensional MSLP, and (iii) characterise the expected recourse function as piecewise affine with a finite, explicit partition. In particular, we provide closed-form expressions for the slopes and intercepts of each affine region via truncated expectations of future prices, yielding an exact form for the recourse function and first-stage policy. Comprehensive numerical studies validate our claims and demonstrate the practical utility of this work.
A Gauss-Newton-Induced Structure-Exploiting Algorithm for Differentiable Optimal Control
Differentiable optimal control, particularly differentiable nonlinear model predictive control (NMPC), provides a powerful framework that enjoys the complementary benefits of machine learning and control theory. A key enabler of differentiable optimal control is the computation of derivatives of the optimal trajectory with respect to problem parameters, i.e., trajectory derivatives. Previous works compute trajectory derivatives by solving a differential Karush-Kuhn-Tucker (KKT) system, and achieve this efficiently by constructing an equivalent auxiliary system. However, we find that directly exploiting the matrix structures in the differential KKT system yields significant computation speed improvements. Motivated by this insight, we propose FastDOC, which applies a Gauss-Newton approximation of Hessian and takes advantage of the resulting block-sparsity and positive semidefinite properties of the matrices involved. These structural properties enable us to accelerate the computationally expensive matrix factorization steps, resulting in a factor-of-two speedup in theoretical computational complexity, and in a synthetic benchmark FastDOC achieves up to a 180% time reduction compared to the baseline method. Finally, we validate the method on an imitation learning task for human-like autonomous driving, where the results demonstrate the effectiveness of the proposed FastDOC in practical applications.
comment: 8 pages, 7 figures, submitted to IFAC World Congress 2026 for possible publication
Hybrid Analytical-Machine Learning Framework for Ripple Factor Estimation in Cockcroft-Walton Voltage Multipliers with Residual Correction for Non-Ideal Effects
Cockcroft-Walton (CW) voltage multipliers suffer from output ripple that classical analytical models underestimate due to neglected non-idealities like diode drops and capacitor ESR, particularly in high-stage, low-frequency and heavy-load regimes. This paper proposes a hybrid framework that generates a comprehensive 324-case MATLAB/Simulink dataset varying stages (2-8), input voltage (5-25 kV), capacitance (1-10 μF), frequency (50-500 Hz) and load (6-60 MΩ), then trains a Random Forest model to predict residuals between simulated and theoretical peak-to-peak ripple. The approach achieves 70.6% RMSE reduction (131 V vs. 448 V) globally and 66.7% in critical regimes, with near-zero bias, enabling physically interpretable design optimization while outperforming pure ML in extrapolation reliability.
comment: 6 Pages, 2 figures, IEEE Conference Template used
Machine Learning of Temperature-dependent Chemical Kinetics Using Parallel Droplet Microreactors
Temperature is a fundamental regulator of chemical and biochemical kinetics, yet capturing nonlinear thermal effects directly from experimental data remains a major challenge due to limited throughput and model flexibility. Recent advances in machine learning have enabled flexible modeling beyond conventional physical laws, but most existing strategies remain confined to surrogate models of end-point yields rather than full kinetic dynamics. Consequently, an end-to-end framework that unifies systematic kinetic data acquisition with machine learning based modeling has been lacking. In this paper, we present a unified framework that integrates droplet microfluidics with machine learning for the systematic analysis of temperature-dependent reaction kinetics. The platform is specifically designed to enable stable immobilization and long-term time-lapse imaging of thousands of droplets under dynamic thermal gradients. This configuration yields massively parallel time-resolved datasets across diverse temperature conditions that capture transient kinetics and provides particularly suitable inputs for training machine-learning models of reaction dynamics. Leveraging these datasets, we train Neural ODE models, which embed neural networks within differential equations to flexibly represent nonlinear temperature dependencies beyond conventional formulations. We demonstrate accurate prediction of enzymatic kinetics across diverse thermal environments, highlighting the robustness and versatility of the approach. Our framework bridges high-throughput experimental data acquisition with data-driven modeling, establishing a versatile foundation for enhanced predictive ability and rational analysis and design of temperature-sensitive biochemical processes.
Mixed formulation and structure-preserving discretization of Cosserat rod dynamics in a port-Hamiltonian framework
An energy-based modeling framework for the nonlinear dynamics of spatial Cosserat rods undergoing large displacements and rotations is proposed. The mixed formulation features independent displacement, velocity and stress variables and is further objective and locking-free. Finite rotations are represented using a director formulation that avoids singularities and yields a constant mass matrix. This results in an infinite-dimensional nonlinear port-Hamiltonian (PH) system governed by partial differential-algebraic equations with a quadratic energy functional. Using a time-differentiated compliance form of the stress-strain relations allows for the imposition of kinematic constraints, such as inextensibility or shear-rigidity. A structure-preserving finite element discretization leads to a finite-dimensional system with PH structure, thus facilitating the design of an energy-momentum consistent integration scheme. Dissipative material behavior (via the generalized-Maxwell model) and non-standard actuation approaches (via pneumatic chambers or tendons) integrate naturally into the framework. As illustrated by selected numerical examples, the present framework establishes a new approach to energy-momentum consistent formulations in computational mechanics involving finite rotations.
comment: 37 pages, 16 figures, currently under review
Power feedback strategy based on efficiency trajectory analysis for HCPV sun tracking
This paper presents a control strategy for sun trackers which adapts continuously to different sources of error, avoiding the necessity of any kind of calibration by analyzing the produced electric power to sense the position of the Sun. The proposed strategy is able to meet the strict specifications for HCPV sun trackers despite of mechanical uncertainties (misalignments in the structure itself, misalignment of the solar modules with respect to the wing, etc.) and installation uncertainties (misalignments of the platform with respect to geographical north). Experimental results with an industrial-grade solar tracker showing the validity of the proposed control strategy under sunny and moderate cloudy conditions, as well as with different installation precisions by un-calibrating the system on purpose are exposed.
Stability Analysis of a B-Spline Deep Neural Operator for Nonlinear Systems
This paper investigates the stability properties of neural operators through the structured representation offered by the Hybrid B-spline Deep Neural Operator (HBDNO). While existing stability-aware architectures typically enforce restrictive constraints that limit universality, HBDNO preserves full expressive power by representing outputs via B-spline control points. We show that these control points form a natural observable for post-training stability analysis. By applying Dynamic Mode Decomposition and connecting the resulting discrete dynamics to the Koopman operator framework, we provide a principled approach to spectral characterization of learned operators. Numerical results demonstrate the ability to assess stability and reveal future directions for safety-critical applications.
Topology-based Conditions for Multiconsensus under the Signed Friedkin-Johnsen Model
In this paper, we address the multiconsensus problem in networked systems, where agents are partitioned into disjoint subgroups and the states of agents within a subgroup are driven to consensus. Our objective is to present a distributed control law that leads to multiconsensus in signed digraphs. To this end, we examine the convergence of opinions under the opposing rule-based signed Friedkin-Johnsen (SFJ) model and present conditions that lead to multiconsensus under this model. Interestingly, the proposed conditions depend only on graph topology and signed interactions and not on the edge weights of the network. Consequently, the proposed SFJ-based control law relaxes the in-degree balance and homogeneity of trust-distrust, frequently assumed in the literature. Finally, we add simulation results to demonstrate the proposed conditions for multiconsensus.
Vision-Aided Relative State Estimation for Approach and Landing on a Moving Platform with Inertial Measurements
This paper tackles the problem of estimating the relative position, orientation, and velocity between a UAV and a planar platform undergoing arbitrary 3D motion during approach and landing. The estimation relies on measurements from Inertial Measurement Units (IMUs) mounted on both systems, assuming there is a suitable communication channel to exchange data, together with visual information provided by an onboard monocular camera, from which the bearing (line-of-sight direction) to the platform's center and the normal vector of its planar surface are extracted. We propose a cascade observer with a complementary filter on SO(3) to reconstruct the relative attitude, followed by a linear Riccati observer for relative position and velocity estimation. Convergence of both observers is established under persistently exciting conditions, and the cascade is shown to be almost globally asymptotically and locally exponentially stable. We further extend the design to the case where the platform's rotation is restricted to its normal axis and show that its measured linear acceleration can be exploited to recover the remaining unobservable rotation angle. A sufficient condition to ensure local exponential convergence in this setting is provided. The performance of the proposed observers is validated through extensive simulations.
comment: 13 pages, 4 figures. Submitted to IFAC World Congress 2026
Modular Landfill Remediation for AI Grid Resilience
Rising AI electricity demand and persistent landfill methane emissions constitute coupled constraints on U.S. digital infrastructure and decarbonization. While China has achieved a rapid 'de-landfilling' transition through centralized coordination, the U.S. remains structurally 'locked in' to landfilling due to fragmented governance and carbon accounting incentives. This paper proposes a modular legacy landfill remediation framework to address these dual challenges within U.S. institutional constraints. By treating legacy sites as stock resources, the proposed system integrates excavation, screening, and behind-the-meter combined heat and power (CHP) to transform environmental liabilities into resilience assets. A system analysis of a representative AI corridor demonstrates that such modules can mitigate site-level methane by 60-70% and recover urban land, while supplying approximately 20 MW of firm, islandable power. Although contributing only approximately 5% of a hyperscale data center's bulk load, it provides critical microgrid resilience and black-start capability. We conclude that remediation-oriented waste-to-energy should be valued not as a substitute for bulk renewables, but as a strategic control volume for buffering critical loads against grid volatility while resolving long-term environmental liabilities.
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems
The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth constraints. To address this challenge, this paper proposes a unified framework for goal-oriented semantic communication in distributed SCC systems. Building upon Weaver's three-level model, we establish a hierarchical semantic formulation with three error levels (L1: observation reconstruction, L2: state estimation, and L3: control) to jointly optimize their corresponding objectives. Based on this formulation, we propose a unified goal-oriented semantic compression and rate adaptation framework that is applicable to different semantic error levels and optimization goals across the SCC loop. A rate-limited multi-sensor LQR system is used as a case study to validate the proposed framework. We employ a GRU-based AE for semantic compression and a PPO-based rate adaptation algorithm that dynamically allocates transmission rates across sensors. Results show that the proposed framework effectively captures task-relevant semantics and adapts its resource allocation strategies across different semantic levels, thereby achieving level-specific performance gains under bandwidth constraints.
comment: 13 pages, 18 figures. This work has been submitted to the IEEE for possible publication
Finite-sample guarantees for data-driven forward-backward operator methods
We establish finite sample certificates on the quality of solutions produced by data-based forward-backward (FB) operator splitting schemes. As frequently happens in stochastic regimes, we consider the problem of finding a zero of the sum of two operators, where one is either unavailable in closed form or computationally expensive to evaluate, and shall therefore be approximated using a finite number of noisy oracle samples. Under the lens of algorithmic stability, we then derive probabilistic bounds on the distance between a true zero and the FB output without making specific assumptions about the underlying data distribution. We show that under weaker conditions ensuring the convergence of FB schemes, stability bounds grow proportionally to the number of iterations. Conversely, stronger assumptions yield stability guarantees that are independent of the iteration count. We then specialize our results to a popular FB stochastic Nash equilibrium seeking algorithm and validate our theoretical bounds on a control problem for smart grids, where the energy price uncertainty is approximated by means of historical data.
Optical design and characterization of a multi-depth vision simulator
We present a vision simulator device (Katsim), a compact near-eye optical display designed for assessing postoperative corrected vision, preoperative intraocular lens (IOL) assessment, and objective IOL characterization. The system forms a virtual image using an amplitude-modulated LCoS spatial light modulator (AM-SLM), RGB LED illumination, and a high-speed varifocal lens. In the proposed architecture, the LED illumination and varifocal lens diopter changes are triggered in synchrony with the SLM RGB subframes, rendering three depth planes perceptually simultaneously via high-frequency time-multiplexing. Operating at 60 frames per second (fps), the system achieves an effective 180 Hz depth-coded cycle, enabling sharp, multi-depth rendering within a dynamically adjustable depth range from 0.2 m to optical infinity. The system's eyebox is configurable from 1 to 5 mm, while maintaining a fixed spatial location and preserving angular magnification regardless of changes in focus or eyebox size. The designed system features a 9.15-degree field of view. An integrated infrared pupil-tracking module detects non-cataractous regions of the cataractous crystalline lens, and the projected imagery is mechanically steered through those clear zones in real time. The proposed vision simulator supports both subjective simulation of post-surgical vision for patient-specific counseling and objective optical evaluation of IOLs, including resolution and contrast fidelity (e.g., modulation transfer function, contrast transfer function, and defocus curves). By decoupling depth modulation from eyebox position and size, the system offers a modular, portable platform that supports enhanced preoperative planning, personalized IOL selection, objective IOL characterization, and use as a novel research vision tool.
comment: 16 pages, 8 figures. Preprint submitted to Biomedical Optics Express journal
Rapid stabilization of the heat equation with localized disturbance
This paper studies the rapid stabilization of a multidimensional heat equation in the presence of an unknown spatially localized disturbance. A novel multivalued feedback control strategy is proposed, which synthesizes the frequency Lyapunov method (introduced by Xiang [41]) with the sign multivalued operator. This methodology connects Lyapunov-based stability analysis with spectral inequalities, while the inclusion of the sign operator ensures robustness against the disturbance. The closed-loop system is governed by a differential inclusion, for which well-posedness is proved via the theory of maximal monotone operators. This approach not only guarantees exponential stabilization but also circumvents the need for explicit disturbance modeling or estimation.
OPBO: Order-Preserving Bayesian Optimization
Bayesian optimization is an effective method for solving expensive black-box optimization problems. Most existing methods use Gaussian processes (GP) as the surrogate model for approximating the black-box objective function, it is well-known that it can fail in high-dimensional space (e.g., dimension over 500). We argue that the reliance of GP on precise numerical fitting is fundamentally ill-suited in high-dimensional space, where it leads to prohibitive computational complexity. In order to address this, we propose a simple order-preserving Bayesian optimization (OPBO) method, where the surrogate model preserves the order, instead of the value, of the black-box objective function. Then we can use a simple but effective OP neural network (NN) to replace GP as the surrogate model. Moreover, instead of searching for the best solution from the acquisition model, we select good-enough solutions in the ordinal set to reduce computational cost. The experimental results show that for high-dimensional (over 500) black-box optimization problems, the proposed OPBO significantly outperforms traditional BO methods based on regression NN and GP. The source code is available at https://github.com/pengwei222/OPBO.
comment: 13 pages
A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones
Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existing flocking algorithms often struggle with efficiency and scalability, particularly when potential collisions force drones into suboptimal trajectories. This paper presents a time-efficient prioritised scheduling algorithm that improves the initial formation process of drone flocks. The method assigns each drone a priority based on its number of potential collisions and its likelihood of reaching its target position without permanently obstructing other drones. Using this hierarchy, each drone computes an appropriate delay to ensure a collision-free path. Simulation results show that the proposed algorithm successfully generates collision-free trajectories for flocks of up to 5000 drones and outperforms the coupling-degree-based heuristic prioritised planning method (CDH-PP) in both performance and computational efficiency.
comment: 35 pages
A Class of Axis-Angle Attitude Control Laws for Rotational Systems
We introduce a new class of attitude control laws for rotational systems, which generalizes the use of the Euler axis-angle representation beyond quaternion-based formulations. Using basic Lyapunov's stability theory and the notion of extended $K_{\infty}$ functions, we developed a method for determining and enforcing the global asymptotic stability of the single fixed point of the resulting closed-loop (CL) scheme. In contrast with traditional quaternion-based methods, the proposed generalized axis-angle approach enables greater flexibility in the design of the control law, which is of great utility when employed in combination with a switching scheme whose transition state depends on the angular velocity of the controlled rotational system. Through simulation and real-time experimental results, we demonstrate the effectiveness of the proposed approach. According to the recorded data, in the execution of high-speed tumble-recovery maneuvers, the new method consistently achieves shorter stabilization times and requires lower control effort relative to those corresponding to the quaternion-based and geometric-control methods used as benchmarks.
comment: 6 pages, 4 figures. Accepted for publication in IEEE Control Systems Letters
Semantic Communication for Rate-Limited Closed-Loop Distributed Communication-Sensing-Control Systems
The growing integration of distributed integrated sensing and communication (ISAC) with closed-loop control in intelligent networks demands efficient information transmission under stringent bandwidth constraints. To address this challenge, this paper proposes a unified framework for goal-oriented semantic communication in distributed SCC systems. Building upon Weaver's three-level model, we establish a hierarchical semantic formulation with three error levels (L1: observation reconstruction, L2: state estimation, and L3: control) to jointly optimize their corresponding objectives. Based on this formulation, we propose a unified goal-oriented semantic compression and rate adaptation framework that is applicable to different semantic error levels and optimization goals across the SCC loop. A rate-limited multi-sensor LQR system is used as a case study to validate the proposed framework. We employ a GRU-based AE for semantic compression and a PPO-based rate adaptation algorithm that dynamically allocates transmission rates across sensors. Results show that the proposed framework effectively captures task-relevant semantics and adapts its resource allocation strategies across different semantic levels, thereby achieving level-specific performance gains under bandwidth constraints.
comment: 13 pages, 18 figures. This work has been submitted to the IEEE for possible publication
An overview of systems-theoretic guarantees in data-driven model predictive control
The development of control methods based on data has seen a surge of interest in recent years. When applying data-driven controllers in real-world applications, providing theoretical guarantees for the closed-loop system is of crucial importance to ensure reliable operation. In this review, we provide an overview of data-driven model predictive control (MPC) methods for controlling unknown systems with guarantees on systems-theoretic properties such as stability, robustness, and constraint satisfaction. The considered approaches rely on the Fundamental Lemma from behavioral theory in order to predict input-output trajectories directly from data. We cover various setups, ranging from linear systems and noise-free data to more realistic formulations with noise and nonlinearities, and we provide an overview of different techniques to ensure guarantees for the closed-loop system. Moreover, we discuss avenues for future research that may further improve the theoretical understanding and practical applicability of data-driven MPC.
On Zero-sum Game Representation for Replicator Dynamics
Replicator dynamics have been widely used in evolutionary game theory to model how strategy frequencies evolve over time in large populations. The so-called payoff matrix encodes the pairwise fitness that each strategy obtains when interacting with every other strategy, and it solely determines the replicator dynamics. If the payoff matrix is unknown, we show in this paper that it cannot be inferred from observed strategy frequencies alone -- distinct payoff matrices can induce the same replicator dynamics. We thus look for a canonical representative of the payoff matrix in the equivalence class. The main result of the paper is to show that for every polynomial replicator dynamics (i.e., the vector field is a polynomial), there always exists a skew-symmetric, polynomial payoff matrix that can induce the given dynamics.
Configuration-Constrained Tube MPC for Periodic Operation
Periodic operation often emerges as the economically optimal mode in industrial processes, particularly under varying economic or environmental conditions. This paper proposes a robust model predictive control (MPC) framework for uncertain systems modeled as polytopic linear differential inclusions (LDIs), where the dynamics evolve as convex combinations of finitely many affine control systems with additive disturbances. The robust control problem is reformulated as a convex optimization program by optimizing over configuration-constrained polytopic tubes and tracks a periodic trajectory that is optimal for a given economic criterion. Artificial variables embedded in the formulation ensure recursive feasibility and robust constraint satisfaction when the economic criterion is updated online, while guaranteeing convergence to the corresponding optimal periodic tube when the criterion remains constant. To improve computational efficiency, we introduce a quadratic over-approximation of the periodic cost under a Lipschitz continuity assumption, yielding a Quadratic Program (QP) formulation that preserves the above theoretical guarantees. The effectiveness and scalability of the approach are demonstrated on a benchmark example and a ball-plate system with eight states.
comment: 11 pages, 3 figures, submitted for IEEE-TACON
Deep Reinforcement Learning Optimization for Uncertain Nonlinear Systems via Event-Triggered Robust Adaptive Dynamic Programming
This work proposes a unified control architecture that couples a Reinforcement Learning (RL)-driven controller with a disturbance-rejection Extended State Observer (ESO), complemented by an Event-Triggered Mechanism (ETM) to limit unnecessary computations. The ESO is utilized to estimate the system states and the lumped disturbance in real time, forming the foundation for effective disturbance compensation. To obtain near-optimal behavior without an accurate system description, a value-iteration-based Adaptive Dynamic Programming (ADP) method is adopted for policy approximation. The inclusion of the ETM ensures that parameter updates of the learning module are executed only when the state deviation surpasses a predefined bound, thereby preventing excessive learning activity and substantially reducing computational load. A Lyapunov-oriented analysis is used to characterize the stability properties of the resulting closed-loop system. Numerical experiments further confirm that the developed approach maintains strong control performance and disturbance tolerance, while achieving a significant reduction in sampling and processing effort compared with standard time-triggered ADP schemes.
comment: we have identified some technical issues, including the mathematical derivation. After discussion, all authors have agreed that the analysis requires a thorough re-derivation to ensure correctness and rigor
A Riemannian Optimization Perspective of the Gauss-Newton Method for Feedforward Neural Networks
In this work, we establish non-asymptotic convergence bounds for the Gauss-Newton method in training neural networks with smooth activations. In the underparameterized regime, the Gauss-Newton gradient flow in parameter space induces a Riemannian gradient flow on a low-dimensional embedded submanifold of the function space. Using tools from Riemannian optimization, we establish geodesic Polyak-Lojasiewicz and Lipschitz-smoothness conditions for the loss under appropriately chosen output scaling, yielding geometric convergence to the optimal in-class predictor at an explicit rate independent of the conditioning of the Gram matrix. In the overparameterized regime, we propose adaptive, curvature-aware regularization schedules that ensure fast geometric convergence to a global optimum at a rate independent of the minimum eigenvalue of the neural tangent kernel and, locally, of the modulus of strong convexity of the loss. These results demonstrate that Gauss-Newton achieves accelerated convergence rates in settings where first-order methods exhibit slow convergence due to ill-conditioned kernel matrices and loss landscapes.
Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation
Recent algorithms allow decentralised agents, possibly connected via a communication network, to learn equilibria in mean-field games from a non-episodic run of the empirical system. However, these algorithms are for tabular settings: this computationally limits the size of agents' observation space, meaning the algorithms cannot handle anything but small state spaces, nor generalise beyond policies depending only on the agent's local state to so-called 'population-dependent' policies. We address this limitation by introducing function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method that has previously been employed only in finite-horizon, episodic, centralised settings. While this permits us to include the mean field in the observation for players' policies, it is unrealistic to assume decentralised agents have access to this global information: we therefore also provide new algorithms allowing agents to locally estimate the global empirical distribution, and to improve this estimate via inter-agent communication. We prove theoretically that exchanging policy information helps networked agents outperform both independent and even centralised agents in function-approximation settings. Our experiments demonstrate this happening empirically, and show that the communication network allows decentralised agents to estimate the mean field for population-dependent policies.
Networked Communication for Decentralised Agents in Mean-Field Games
Methods like multi-agent reinforcement learning struggle to scale with growing population size. Mean-field games (MFGs) are a game-theoretic approach that can circumvent this by finding a solution for an abstract infinite population, which can then be used as an approximate solution for the $N$-agent problem. However, classical mean-field algorithms usually only work under restrictive conditions. We take steps to address this by introducing networked communication to MFGs, in particular to settings that use a single, non-episodic run of $N$ decentralised agents to simulate the infinite population, as is likely to be most reasonable in real-world deployments. We prove that our architecture's sample guarantees lie between those of earlier theoretical algorithms for the centralised- and independent-learning architectures, varying dependent on network structure and the number of communication rounds. However, the sample guarantees of the three theoretical algorithms do not actually result in practical convergence times. We thus contribute practical enhancements to all three algorithms allowing us to present their first empirical demonstrations. We then show that in practical settings where the theoretical hyperparameters are not observed, giving fewer loops but poorer estimation of the Q-function, our communication scheme still respects the earlier theoretical analysis: it considerably accelerates learning over the independent case, which hardly seems to learn at all, and often performs similarly to the centralised case, while removing the restrictive assumption of the latter. We provide ablations and additional studies showing that our networked approach also has advantages over both alternatives in terms of robustness to update failures and to changes in population size.
Neural Controller for Incremental Stability of Unknown Continuous-time Systems
This work primarily focuses on synthesizing a controller that guarantees an unknown continuous-time system to be incrementally input-to-state stable ($δ$-ISS). In this context, the notion of $δ$-ISS control Lyapunov function ($δ$-ISS-CLF) for the continuous-time system is introduced. Combined with the controller, the $δ$-ISS-CLF guarantees that the system is incrementally stable. As the paper deals with unknown dynamical systems, the controller as well as the $δ$-ISS-CLF are parametrized using neural networks. The data set used to train the neural networks is generated from the state space of the system by proper sampling. Now, to give a formal guarantee that the controller makes the system incrementally stable, we develop a validity condition by having some Lipschitz continuity assumptions and incorporate the condition into the training framework to ensure a provable correctness guarantee at the end of the training process. Finally, we demonstrate the effectiveness of the proposed approach through several case studies: a scalar system with a non-affine, non-polynomial structure, a one-link manipulator system, a nonlinear Moore-Greitzer model of a jet engine, a magnetic levitator system and a rotating rigid spacecraft model.
comment: arXiv admin note: substantial text overlap with arXiv:2503.04129
GUIDEd Agents: Enhancing Navigation Policies through Task-Specific Uncertainty Abstraction in Localization-Limited Environments
Autonomous vehicles performing navigation tasks in complex environments face significant challenges due to uncertainty in state estimation. In many scenarios, such as stealth operations or resource-constrained settings, accessing high-precision localization comes at a significant cost, forcing robots to rely primarily on less precise state estimates. Our key observation is that different tasks require varying levels of precision in different regions: a robot navigating a crowded space might need precise localization near obstacles but can operate effectively with less precision elsewhere. In this paper, we present a planning method for integrating task-specific uncertainty requirements directly into navigation policies. We introduce Task-Specific Uncertainty Maps (TSUMs), which abstract the acceptable levels of state estimation uncertainty across different regions. TSUMs align task requirements and environmental features using a shared representation space, generated via a domain-adapted encoder. Using TSUMs, we propose Generalized Uncertainty Integration for Decision-Making and Execution (GUIDE), a policy conditioning framework that incorporates these uncertainty requirements into robot decision-making. We find that TSUMs provide an effective way to abstract task-specific uncertainty requirements, and conditioning policies on TSUMs enables the robot to reason about the context-dependent value of certainty and adapt its behavior accordingly. We show how integrating GUIDE into reinforcement learning frameworks allows the agent to learn navigation policies that effectively balance task completion and uncertainty management without explicit reward engineering. We evaluate GUIDE on various real-world robotic navigation tasks and find that it demonstrates significant improvement in task completion rates compared to baseline methods that do not explicitly consider task-specific uncertainty.
comment: Accepted for publication at RAL (Robotics and automation letters). Updated with the final version
JITServe: SLO-aware LLM Serving with Imprecise Request Information
The integration of Large Language Models (LLMs) into applications ranging from interactive chatbots to multi-agent systems has introduced a wide spectrum of service-level objectives (SLOs) for responsiveness. These include latency-sensitive requests emphasizing per-token latency in streaming chat, deadline-sensitive requests requiring rapid full responses to trigger external tools, and compound requests with evolving dependencies across multiple LLM calls. Despite-or perhaps, because of-this workload diversity and unpredictable request information (e.g., response lengths and dependencies), existing request schedulers have focused on aggregate performance, unable to ensure application-level SLO needs. This paper presents JITServe, the first SLO-aware LLM serving system designed to maximize service goodput (e.g., the number of tokens meeting request SLOs) across diverse workloads. JITServe novelly schedules requests using imprecise request information and gradually relaxes this conservatism by refining request information estimates as generation progresses. It applies a grouped margin goodput maximization algorithm to allocate just enough serving bandwidth to satisfy each request's SLO just-in-time (JIT), maximizing residual capacity for others, while deciding the composition of requests in a batch to maximize efficiency and goodput with provable guarantees. Our evaluation across diverse realistic workloads, including chat, deep research, and agentic pipelines, shows that JITServe improves service goodput by 1.4x-6.3x, alternatively achieving 28.5%-83.2% resource savings, compared to state-of-the-art designs.
Physics-Informed Dynamical Modeling of Extrusion-Based 3D Printing Processes
The trade-off between model fidelity and computational cost remains a central challenge in the computational modeling of extrusion-based 3D printing, particularly for real time optimization and control. Although high fidelity simulations have advanced considerably for offline analysis, dynamical modeling tailored for online, control-oriented applications is still significantly underdeveloped. In this study, we propose a reduced order dynamical flow model that captures the transient behavior of extrusion-based 3D printing. The model is grounded in physics-based principles derived from the Navier Stokes equations and further simplified through spatial averaging and input dependent parameterization. To assess its performance, the model is identified via a nonlinear least squares approach using Computational Fluid Dynamics (CFD) simulation data spanning a range of printing conditions and subsequently validated across multiple combinations of training and testing scenarios. The results demonstrate strong agreement with the CFD data within the nozzle, the nozzle substrate gap, and the deposited layer regions. Overall, the proposed reduced order model successfully captures the dominant flow dynamics of the process while maintaining a level of simplicity compatible with real time control and optimization.
comment: 19 pages, 12 figures
Robotics
Optimizing Robotic Placement via Grasp-Dependent Feasibility Prediction
In this paper, we study whether inexpensive, physics-free supervision can reliably prioritize grasp-place candidates for budget-aware pick-and-place. From an object's initial pose, target pose, and a candidate grasp, we generate two path-aware geometric labels: path-wise inverse kinematics (IK) feasibility across a fixed approach-grasp-lift waypoint template, and a transit collision flag from mesh sweeps along the same template. A compact dual-output MLP learns these signals from pose encodings, and at test time its scores rank precomputed candidates for a rank-then-plan policy under the same IK gate and planner as the baseline. Although learned from cheap labels only, the scores transfer to physics-enabled executed trajectories: at a fixed planning budget the policy finds successful paths sooner with fewer planner calls while keeping final success on par or better. This work targets a single rigid cuboid with side-face grasps and a fixed waypoint template, and we outline extensions to varied objects and richer waypoint schemes.
comment: Accepted in ACRA 2025
Construction and deformation of P-hedra using control polylines
In the 19th International Symposium on Advances in Robot Kinematics the author introduced a novel class of continuous flexible discrete surfaces and mentioned that these so-called P-hedra (or P-nets) allow direct access to their spatial shapes by three control polylines. In this follow-up paper we study this intuitive method, which makes these flexible planar quad surfaces suitable for transformable design tasks by means of interactive tools. The construction of P-hedra from the control polylines can also be used for an efficient algorithmic computation of their isometric deformations. In addition we discuss flexion limits, bifurcation configurations, developable/flat-foldable pattern and tubular P-hedra.
comment: 8 pages, 4 figures
InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement
Model-based reinforcement learning (MBRL) can reduce interaction cost for autonomous driving by learning a predictive world model, but it typically still depends on task-specific rewards that are difficult to design and often brittle under distribution shift. This paper presents InDRiVE, a DreamerV3-style MBRL agent that performs reward-free pretraining in CARLA using only intrinsic motivation derived from latent ensemble disagreement. Disagreement acts as a proxy for epistemic uncertainty and drives the agent toward under-explored driving situations, while an imagination-based actor-critic learns a planner-free exploration policy directly from the learned world model. After intrinsic pretraining, we evaluate zero-shot transfer by freezing all parameters and deploying the pretrained exploration policy in unseen towns and routes. We then study few-shot adaptation by training a task policy with limited extrinsic feedback for downstream objectives (lane following and collision avoidance). Experiments in CARLA across towns, routes, and traffic densities show that disagreement-based pretraining yields stronger zero-shot robustness and robust few-shot collision avoidance under town shift and matched interaction budgets, supporting the use of intrinsic disagreement as a practical reward-free pretraining signal for reusable driving world models.
Multimodal Classification Network Guided Trajectory Planning for Four-Wheel Independent Steering Autonomous Parking Considering Obstacle Attributes
Four-wheel Independent Steering (4WIS) vehicles have attracted increasing attention for their superior maneuverability. Human drivers typically choose to cross or drive over the low-profile obstacles (e.g., plastic bags) to efficiently navigate through narrow spaces, while existing planners neglect obstacle attributes, causing inefficiency or path-finding failures. To address this, we propose a trajectory planning framework integrating the 4WIS hybrid A* and Optimal Control Problem (OCP), in which the hybrid A* provides an initial path to enhance the OCP solution. Specifically, a multimodal classification network is introduced to assess scene complexity (hard/easy task) by fusing image and vehicle state data. For hard tasks, guided points are set to decompose complex tasks into local subtasks, improving the search efficiency of 4WIS hybrid A*. The multiple steering modes of 4WIS vehicles (Ackermann, diagonal, and zero-turn) are also incorporated into node expansion and heuristic designs. Moreover, a hierarchical obstacle handling strategy is designed to guide the node expansion considering obstacle attributes, i.e., 'non-traversable', 'crossable', and 'drive-over' obstacles. It allows crossing or driving over obstacles instead of the 'avoid-only' strategy, greatly enhancing success rates of pathfinding. We also design a logical constraint for the 'drive-over' obstacle by limiting its velocity to ensure safety. Furthermore, to address dynamic obstacles with motion uncertainty, we introduce a probabilistic risk field model, constructing risk-aware driving corridors that serve as linear collision constraints in OCP. Experimental results demonstrate the proposed framework's effectiveness in generating safe, efficient, and smooth trajectories for 4WIS vehicles, especially in constrained environments.
DSO-VSA: a Variable Stiffness Actuator with Decoupled Stiffness and Output Characteristics for Rehabilitation Robotics
Stroke-induced motor impairment often results in substantial loss of upper-limb function, creating a strong demand for rehabilitation robots that enable safe and transparent physical human-robot interaction (pHRI). Variable stiffness actuators are well suited for such applications. However, in most existing designs, stiffness is coupled with the deflection angle, complicating both modeling and control. To address this limitation, this paper presents a variable stiffness actuator featuring decoupled stiffness and output behavior for rehabilitation robotics. The system integrates a variable stiffness mechanism that combines a variable-length lever with a hypocycloidal straight-line mechanism to achieve a linear torque-deflection relationship and continuous stiffness modulation from near zero to theoretically infinite. It also incorporates a differential transmission mechanism based on a planetary gear system that enables dual-motor load sharing. A cascade PI controller is further developed on the basis of the differential configuration, in which the position-loop term jointly regulates stiffness and deflection angle, effectively suppressing stiffness fluctuations and output disturbances. The performance of prototype was experimentally validated through stiffness calibration, stiffness regulation, torque control, decoupled characteristics, and dual-motor load sharing, indicating the potential for rehabilitation exoskeletons and other pHRI systems.
CauTraj: A Causal-Knowledge-Guided Framework for Lane-Changing Trajectory Planning of Autonomous Vehicles
Enhancing the performance of trajectory planners for lane - changing vehicles is one of the key challenges in autonomous driving within human - machine mixed traffic. Most existing studies have not incorporated human drivers' prior knowledge when designing trajectory planning models. To address this issue, this study proposes a novel trajectory planning framework that integrates causal prior knowledge into the control process. Both longitudinal and lateral microscopic behaviors of vehicles are modeled to quantify interaction risk, and a staged causal graph is constructed to capture causal dependencies in lane-changing scenarios. Causal effects between the lane-changing vehicle and surrounding vehicles are then estimated using causal inference, including average causal effects (ATE) and conditional average treatment effects (CATE). These causal priors are embedded into a model predictive control (MPC) framework to enhance trajectory planning. The proposed approach is validated on naturalistic vehicle trajectory datasets. Experimental results show that: (1) causal inference provides interpretable and stable quantification of vehicle interactions; (2) individual causal effects reveal driver heterogeneity; and (3) compared with the baseline MPC, the proposed method achieves a closer alignment with human driving behaviors, reducing maximum trajectory deviation from 1.2 m to 0.2 m, lateral velocity fluctuation by 60%, and yaw angle variability by 50%. These findings provide methodological support for human-like trajectory planning and practical value for improving safety, stability, and realism in autonomous vehicle testing and traffic simulation platforms.
Offline Reinforcement Learning for End-to-End Autonomous Driving
End-to-end (E2E) autonomous driving models that take only camera images as input and directly predict a future trajectory are appealing for their computational efficiency and potential for improved generalization via unified optimization; however, persistent failure modes remain due to reliance on imitation learning (IL). While online reinforcement learning (RL) could mitigate IL-induced issues, the computational burden of neural rendering-based simulation and large E2E networks renders iterative reward and hyperparameter tuning costly. We introduce a camera-only E2E offline RL framework that performs no additional exploration and trains solely on a fixed simulator dataset. Offline RL offers strong data efficiency and rapid experimental iteration, yet is susceptible to instability from overestimation on out-of-distribution (OOD) actions. To address this, we construct pseudo ground-truth trajectories from expert driving logs and use them as a behavior regularization signal, suppressing imitation of unsafe or suboptimal behavior while stabilizing value learning. Training and closed-loop evaluation are conducted in a neural rendering environment learned from the public nuScenes dataset. Empirically, the proposed method achieves substantial improvements in collision rate and route completion compared with IL baselines. Our code will be available at [URL].
comment: 15 pages
Geometric-Photometric Event-based 3D Gaussian Ray Tracing
Event cameras offer a high temporal resolution over traditional frame-based cameras, which makes them suitable for motion and structure estimation. However, it has been unclear how event-based 3D Gaussian Splatting (3DGS) approaches could leverage fine-grained temporal information of sparse events. This work proposes a framework to address the trade-off between accuracy and temporal resolution in event-based 3DGS. Our key idea is to decouple the rendering into two branches: event-by-event geometry (depth) rendering and snapshot-based radiance (intensity) rendering, by using ray-tracing and the image of warped events. The extensive evaluation shows that our method achieves state-of-the-art performance on the real-world datasets and competitive performance on the synthetic dataset. Also, the proposed method works without prior information (e.g., pretrained image reconstruction models) or COLMAP-based initialization, is more flexible in the event selection number, and achieves sharp reconstruction on scene edges with fast training time. We hope that this work deepens our understanding of the sparse nature of events for 3D reconstruction. The code will be released.
comment: 15 pages, 10 figures, 5 tables
ChronoDreamer: Action-Conditioned World Model as an Online Simulator for Robotic Planning
We present ChronoDreamer, an action-conditioned world model for contact-rich robotic manipulation. Given a history of egocentric RGB frames, contact maps, actions, and joint states, ChronoDreamer predicts future video frames, contact distributions, and joint angles via a spatial-temporal transformer trained with MaskGIT-style masked prediction. Contact is encoded as depth-weighted Gaussian splat images that render 3D forces into a camera-aligned format suitable for vision backbones. At inference, predicted rollouts are evaluated by a vision-language model that reasons about collision likelihood, enabling rejection sampling of unsafe actions before execution. We train and evaluate on DreamerBench, a simulation dataset generated with Project Chrono that provides synchronized RGB, contact splat, proprioception, and physics annotations across rigid and deformable object scenarios. Qualitative results demonstrate that the model preserves spatial coherence during non-contact motion and generates plausible contact predictions, while the LLM-based judge distinguishes collision from non-collision trajectories.
SD2AIL: Adversarial Imitation Learning from Synthetic Demonstrations via Diffusion Models
Adversarial Imitation Learning (AIL) is a dominant framework in imitation learning that infers rewards from expert demonstrations to guide policy optimization. Although providing more expert demonstrations typically leads to improved performance and greater stability, collecting such demonstrations can be challenging in certain scenarios. Inspired by the success of diffusion models in data generation, we propose SD2AIL, which utilizes synthetic demonstrations via diffusion models. We first employ a diffusion model in the discriminator to generate synthetic demonstrations as pseudo-expert data that augment the expert demonstrations. To selectively replay the most valuable demonstrations from the large pool of (pseudo-) expert demonstrations, we further introduce a prioritized expert demonstration replay strategy (PEDR). The experimental results on simulation tasks demonstrate the effectiveness and robustness of our method. In particular, in the Hopper task, our method achieves an average return of 3441, surpassing the state-of-the-art method by 89. Our code will be available at https://github.com/positron-lpc/SD2AIL.
CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives SP
We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction relies on data-driven priors and joint optimization with no physics in the loop, or recovers noisy geometry with artifacts that cause motion tracking policies with scene interactions to fail. In contrast, our key insight is to recover convex, clean, and simulation-ready geometry by fitting planar primitives to a point cloud reconstruction of the scene, via a simple clustering pipeline over depth, normals, and flow. To reconstruct scene geometry that might be occluded during interactions, we make use of human-scene contact modeling (e.g., we use human posture to reconstruct the occluded seat of a chair). Finally, we ensure that human and scene reconstructions are physically-plausible by using them to drive a humanoid controller via reinforcement learning. Our approach reduces motion tracking failure rates from 55.2\% to 6.9\% on human-centric video benchmarks (EMDB, PROX), while delivering a 43\% faster RL simulation throughput. We further validate it on in-the-wild videos including casually-captured videos, Internet videos, and even Sora-generated videos. This demonstrates CRISP's ability to generate physically-valid human motion and interaction environments at scale, greatly advancing real-to-sim applications for robotics and AR/VR.
comment: Project page: https://crisp-real2sim.github.io/CRISP-Real2Sim/
Flying robots for a smarter life
Innovative ideas are continuously emerging to produce better life conditions where essential human needs are supposed to be fulfilled with perfect scenarios leading us to propose modern strategies drawing the future of smart cities. In this context, flying robots are increasingly exploited in many fields to improve the quality of our life. This paper illustrates new designs of flying robots that could be used to perform a variety of advanced missions like investigating the state of high-power lines and manipulating cabling maintenance procedures when failures are detected, evaluating the state of the outer edge of sidewalks to color their partially or wholly erased parts, and spraying pesticides to trees or crops that are affected by different diseases. Creating such smart devices demands developing many other partial designs relying on AI-based algorithms, computer vision techniques, and embedded systems. A variety of techniques that we have recently developed in this field are presented here.
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities AAAI 2026
Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardization, leading to redundant features that hinder cross-sensor generalization. Moreover, existing methods fail to fully integrate the intermediate communication among tactile, language, and vision modalities. To address this, we propose TLV-CoRe, a CLIP-based Tactile-Language-Vision Collaborative Representation learning method. TLV-CoRe introduces a Sensor-Aware Modulator to unify tactile features across different sensors and employs tactile-irrelevant decoupled learning to disentangle irrelevant tactile features. Additionally, a Unified Bridging Adapter is introduced to enhance tri-modal interaction within the shared representation space. To fairly evaluate the effectiveness of tactile models, we further propose the RSS evaluation framework, focusing on Robustness, Synergy, and Stability across different methods. Experimental results demonstrate that TLV-CoRe significantly improves sensor-agnostic representation learning and cross-modal alignment, offering a new direction for multimodal tactile representation.
comment: Accepted by AAAI 2026
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge NeurIPS
We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.
comment: 2025 NeurIPS Behavior Challenge 1st place solution
Event Spectroscopy: Event-based Multispectral and Depth Sensing using Structured Light
Uncrewed aerial vehicles (UAVs) are increasingly deployed in forest environments for tasks such as environmental monitoring and search and rescue, which require safe navigation through dense foliage and precise data collection. Traditional sensing approaches, including passive multispectral and RGB imaging, suffer from latency, poor depth resolution, and strong dependence on ambient light - especially under forest canopies. In this work, we present a novel event spectroscopy system that simultaneously enables high-resolution, low-latency depth reconstruction with integrated multispectral imaging using a single sensor. Depth is reconstructed using structured light, and by modulating the wavelength of the projected structured light, our system captures spectral information in controlled bands between 650 nm and 850 nm. We demonstrate up to $60\%$ improvement in RMSE over commercial depth sensors and validate the spectral accuracy against a reference spectrometer and commercial multispectral cameras, demonstrating comparable performance. A portable version limited to RGB (3 wavelengths) is used to collect real-world depth and spectral data from a Masoala Rainforest. We demonstrate the use of this prototype for color image reconstruction and material differentiation between leaves and branches using spectral and depth data. Our results show that adding depth (available at no extra effort with our setup) to material differentiation improves the accuracy by over $30\%$ compared to color-only method. Our system, tested in both lab and real-world rainforest environments, shows strong performance in depth estimation, RGB reconstruction, and material differentiation - paving the way for lightweight, integrated, and robust UAV perception and data collection in complex natural environments.
comment: This work has been accepted for publication in IEEE Robotics and Automation Letters
OW-Rep: Open World Object Detection with Instance Representation Learning WACV 2026
Open World Object Detection(OWOD) addresses realistic scenarios where unseen object classes emerge, enabling detectors trained on known classes to detect unknown objects and incrementally incorporate the knowledge they provide. While existing OWOD methods primarily focus on detecting unknown objects, they often overlook the rich semantic relationships between detected objects, which are essential for scene understanding and applications in open-world environments (e.g., open-world tracking and novel class discovery). In this paper, we extend the OWOD framework to jointly detect unknown objects and learn semantically rich instance embeddings, enabling the detector to capture fine-grained semantic relationships between instances. To this end, we propose two modules that leverage the rich and generalizable knowledge of Vision Foundation Models(VFMs) and can be integrated into open-world object detectors. First, the Unknown Box Refine Module uses instance masks from the Segment Anything Model to accurately localize unknown objects. The Embedding Transfer Module then distills instance-wise semantic similarities from VFM features to the detector's embeddings via a relaxed contrastive loss, enabling the detector to learn a semantically meaningful and generalizable instance feature. Extensive experiments show that our method significantly improves both unknown object detection and instance embedding quality, while also enhancing performance in downstream tasks such as open-world tracking.
comment: Accepted to WACV 2026. Our project website can be found at https://sunohlee.github.io/OW-Rep/
Multiagent Systems
Quantifying the Lifelong Impact of Resilience Interventions via Agent-Based LLM Simulation
Establishing the long-term, causal impact of psychological interventions on life outcomes is a grand challenge for the social sciences, caught between the limitations of correlational longitudinal studies and short-term randomized controlled trials (RCTs). This paper introduces Large-Scale Agent-based Longitudinal Simulation (LALS), a framework that resolves this impasse by simulating multi-decade, counterfactual life trajectories. The methodology employs a "digital clone" design where 2,500 unique LLM-based agent personas (grounded in a curated corpus of 3,917 empirical research articles) are each cloned across a 2x2 factorial experiment. Specifically, the simulation models the efficacy of extended psychological resilience training (Intervention vs. Control) either in childhood or as a young adult (age 6 vs. age 18). Comparing digital clones enables exceptionally precise causal inference. The simulation provides a quantitative, causal estimate of a resilience intervention's lifelong effects, revealing significant reductions in mortality, a lower incidence of dementia, and a substantial increase in accumulated wealth. Crucially, the results uncover a crucial developmental window: the intervention administered at age 6 produced more than double the positive impact on lifetime wealth compared to the same intervention at age 18. These benefits were most pronounced for agents from low-socioeconomic backgrounds, highlighting a powerful buffering effect. The LALS framework serves as a "computational wind tunnel" for social science, offering a new paradigm for generating and testing causal hypotheses about the complex, lifelong dynamics that shape human capital and well-being.
comment: 31 pages, 5 figures
MemEvolve: Meta-Evolution of Agent Memory Systems
Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.
Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.
comment: 14 pages, 3 tables, 5 figures
A Multi-agent Text2SQL Framework using Small Language Models and Execution Feedback
Text2SQL, the task of generating SQL queries from natural language text, is a critical challenge in data engineering. Recently, Large Language Models (LLMs) have demonstrated superior performance for this task due to their advanced comprehension and generation capabilities. However, privacy and cost considerations prevent companies from using Text2SQL solutions based on external LLMs offered as a service. Rather, small LLMs (SLMs) that are openly available and can hosted in-house are adopted. These SLMs, in turn, lack the generalization capabilities of larger LLMs, which impairs their effectiveness for complex tasks such as Text2SQL. To address these limitations, we propose MATS, a novel Text2SQL framework designed specifically for SLMs. MATS uses a multi-agent mechanism that assigns specialized roles to auxiliary agents, reducing individual workloads and fostering interaction. A training scheme based on reinforcement learning aligns these agents using feedback obtained during execution, thereby maintaining competitive performance despite a limited LLM size. Evaluation results using on benchmark datasets show that MATS, deployed on a single- GPU server, yields accuracy that are on-par with large-scale LLMs when using significantly fewer parameters. Our source code and data are available at https://github.com/thanhdath/mats-sql.
DASH: Deception-Augmented Shared Mental Model for a Human-Machine Teaming System
We present DASH (Deception-Augmented Shared mental model for Human-machine teaming), a novel framework that enhances mission resilience by embedding proactive deception into Shared Mental Models (SMM). Designed for mission-critical applications such as surveillance and rescue, DASH introduces "bait tasks" to detect insider threats, e.g., compromised Unmanned Ground Vehicles (UGVs), AI agents, or human analysts, before they degrade team performance. Upon detection, tailored recovery mechanisms are activated, including UGV system reinstallation, AI model retraining, or human analyst replacement. In contrast to existing SMM approaches that neglect insider risks, DASH improves both coordination and security. Empirical evaluations across four schemes (DASH, SMM-only, no-SMM, and baseline) show that DASH sustains approximately 80% mission success under high attack rates, eight times higher than the baseline. This work contributes a practical human-AI teaming framework grounded in shared mental models, a deception-based strategy for insider threat detection, and empirical evidence of enhanced robustness under adversarial conditions. DASH establishes a foundation for secure, adaptive human-machine teaming in contested environments.
comment: 17 pages, 16 figures
Adaptive Accountability in Networked MAS: Tracing and Mitigating Emergent Norms at Scale
Large-scale networked multi-agent systems increasingly underpin critical infrastructure, yet their collective behavior can drift toward undesirable emergent norms that elude conventional governance mechanisms. We introduce an adaptive accountability framework that (i) continuously traces responsibility flows through a lifecycle-aware audit ledger, (ii) detects harmful emergent norms online via decentralized sequential hypothesis tests, and (iii) deploys local policy and reward-shaping interventions that realign agents with system-level objectives in near real time. We prove a bounded-compromise theorem showing that whenever the expected intervention cost exceeds an adversary's payoff, the long-run proportion of compromised interactions is bounded by a constant strictly less than one. Extensive high-performance simulations with up to 100 heterogeneous agents, partial observability, and stochastic communication graphs show that our framework prevents collusion and resource hoarding in at least 90% of configurations, boosts average collective reward by 12-18%, and lowers the Gini inequality index by up to 33% relative to a PPO baseline. These results demonstrate that a theoretically principled accountability layer can induce ethically aligned, self-regulating behavior in complex MAS without sacrificing performance or scalability.
ShibuyaSocial: Multi-scale Model of Pedestrian Flows in Scramble Crossing
This paper presents a learning-based model of pedestrian flows that integrates multi scale behaviors such as global route selection and local collision avoidance in urban spaces, particularly focusing on pedestrian movements at Shibuya scramble crossing. Since too much congestion of pedestrian flows can cause serious accidents, mathematically modeling and predicting pedestrian behaviors is important for preventing such accidents and providing a safe and comfortable environment. Although numerous studies have investigated learning-based modeling methods, most of them focus only on the local behavior of pedestrians, such as collision avoidance with neighbors and environmental objects. In an actual environment, pedestrian behavior involves more complicated decision making including global route selection. Moreover, a state transition from stopping to walking at a traffic light should be considered simultaneously. In this study, the proposed model integrates local behaviors with global route selection, using an Attention mechanism to ensure consistent global and local behavior predictions. We recorded video data of pedestrians at Shibuya scramble crossing and trained the proposed model using pedestrian walking trajectory data obtained from the video. Simulations of pedestrian behaviors based on the trained model qualitatively and quantitatively validated that the proposed model can appropriately predict pedestrian behaviors.
Learning to Design City-scale Transit Routes
Designing efficient transit route networks is an NP-hard problem with exponentially large solution spaces that traditionally relies on manual planning processes. We present an end-to-end reinforcement learning (RL) framework based on graph attention networks for sequential transit network construction. To address the long-horizon credit assignment challenge, we introduce a two-level reward structure combining incremental topological feedback with simulation-based terminal rewards. We evaluate our approach on a new real-world dataset from Bloomington, Indiana with topologically accurate road networks, census-derived demand, and existing transit routes. Our learned policies substantially outperform existing designs and traditional heuristics across two initialization schemes and two modal-split scenarios. Under high transit adoption with transit center initialization, our approach achieves 25.6% higher service rates, 30.9\% shorter wait times, and 21.0% better bus utilization compared to the real-world network. Under mixed-mode conditions with random initialization, it delivers 68.8% higher route efficiency than demand coverage heuristics and 5.9% lower travel times than shortest path construction. These results demonstrate that end-to-end RL can design transit networks that substantially outperform both human-designed systems and hand-crafted heuristics on realistic city-scale benchmarks.
Multi-Agent LLM Committees for Autonomous Software Beta Testing
Manual software beta testing is costly and time-consuming, while single-agent large language model (LLM) approaches suffer from hallucinations and inconsistent behavior. We propose a multi-agent committee framework in which diverse vision-enabled LLMs collaborate through a three-round voting protocol to reach consensus on testing actions. The framework combines model diversity, persona-driven behavioral variation, and visual user interface understanding to systematically explore web applications. Across 84 experimental runs with 9 testing personas and 4 scenarios, multi-agent committees achieve an 89.5 percent overall task success rate. Configurations with 2 to 4 agents reach 91.7 to 100 percent success, compared to 78.0 percent for single-agent baselines, yielding improvements of 13.7 to 22.0 percentage points. At the action level, the system attains a 93.1 percent success rate with a median per-action latency of 0.71 seconds, enabling real-time and continuous integration testing. Vision-enabled agents successfully identify user interface elements, with navigation and reporting achieving 100 percent success and form filling achieving 99.2 percent success. We evaluate the framework on WebShop and OWASP benchmarks, achieving 74.7 percent success on WebShop compared to a 50.1 percent published GPT-3 baseline, and 82.0 percent success on OWASP Juice Shop security testing with coverage of 8 of the 10 OWASP Top 10 vulnerability categories. Across 20 injected regressions, the committee achieves an F1 score of 0.91 for bug detection, compared to 0.78 for single-agent baselines. The open-source implementation enables reproducible research and practical deployment of LLM-based software testing in CI/CD pipelines.
Simulation of Crowd Egress with Environmental Stressors
This article introduces a modeling framework to characterize evacuee response to environmental stimuli during emergency egress. The model is developed in consistency with stress theory, which explains how an organism reacts to environmental stressors (e.g., alarm signals or hazardous factors such as smoke and fire). We integrate the theory into the well-known social force model, and develop a framework to simulate crowd evacuation behavior in multi-compartment layout in buildings. Our method serves as a theoretical basis to study crowd movement at bottlenecks, and simulate their herding behavior and way-finding activities in normal and hazardous conditions. The pre-movement behavior is also briefly investigated by using opinion dynamics with a social group model. The algorithms have been partly tested in FDS+EVAC as well as our simulation platform crowdEgress.
comment: 30 pages, 19 figures
Systems and Control (EESS)
Distribution Network Restoration with Mobile Resources Dispatch: A Simulation-Based Online Dynamic Programming Approach
Dispatching mobile resources such as repair crews and mobile emergency generators is essential for the rapid restoration of distribution systems after extreme events. However, the restoration process is affected by various uncertain factors including repair time, road condition, and newly observed failures, necessitating online decision-making in response to real-time information. This paper proposes a simulation-based online dynamic programming approach to provide real-time restoration actions considering the dispatch of mobile resources. Using an index-based priority rule as the base policy, the remaining cumulative loss at the current state and a given action is evaluated from online simulation. As the base policy is explicit, the simulation is efficient. Then, the action leading to the minimum cumulative loss is chosen. It is proven that such a strategy improves the performance of base policy. The proposed policy adapts to real-time information updates and does not rely on offline training, so incurs no data and convergence-related issues, which is important in restoration tasks where the historical data of extreme events is rare. The rolling optimization approach may not meet the requirement of online use, because routing mobile resources gives rise to large-scale discrete optimization problems. Case studies on 123-bus and 8500-bus systems demonstrate that the proposed method achieves higher efficiency and better performance compared with rolling horizon optimization.
Real-Time Remote Monitoring of Correlated Markovian Sources
We investigate real-time tracking of two correlated stochastic processes over a shared wireless channel. The joint evolution of the processes is modeled as a two-dimensional discrete-time Markov chain. Each process is observed by a dedicated sampler and independently reconstructed at a remote monitor according to a task-specific objective. Although both processes originate from a common underlying phenomenon (e.g., distinct features of the same source), each monitor is interested only in its corresponding feature. A reconstruction error is incurred when the true and reconstructed states mismatch at one or both monitors. To address this problem, we propose an error-aware joint sampling and transmission policy, under which each sampler probabilistically generates samples only when the current process state differs from the most recently reconstructed state at its corresponding monitor. We adopt the time-averaged reconstruction error as the primary performance metric and benchmark the proposed policy against state-of-the-art joint sampling and transmission schemes. For each policy, we derive closed-form expressions for the resulting time-averaged reconstruction error. We further formulate and solve an optimization problem that minimizes the time-averaged reconstruction error subject to an average sampling cost constraint. Analytical and numerical results demonstrate that the proposed error-aware policy achieves the minimum time-averaged reconstruction error among the considered schemes while efficiently utilizing the sampling budget. The performance gains are particularly pronounced in regimes with strong inter-process correlation and stringent tracking requirements, where frequent sampling by both samplers is necessary.
Smart nudging for efficient routing through networks
In this paper, we formulate the design of efficient digitalised deposit return schemes as a control problem. We focus on the recycling of paper cups, though the proposed methodology applies more broadly to reverse logistics systems arising in circular economy R-strategies. Each item is assumed to carry a digital wallet through which monetary rewards are allocated to actors transferring the item across successive stages, incentivising completion of the recycling process. System efficiency is ensured by: (i) decentralised algorithms that avoid congestion at individual nodes; (ii) a decentralised AIMD-based algorithm that optimally splits the deposit across layers; and (iii) a feedback control loop that dynamically adjusts the deposit to achieve a desired throughput. The effectiveness of the framework is demonstrated through extensive simulations using realistic paper cup recycling data.
Comparing Dynamical Models Through Diffeomorphic Vector Field Alignment
Dynamical systems models such as recurrent neural networks (RNNs) are increasingly popular in theoretical neuroscience for hypothesis-generation and data analysis. Evaluating the dynamics in such models is key to understanding their learned generative mechanisms. However, such evaluation is impeded by two major challenges: First, comparison of learned dynamics across models is difficult because there is no enforced equivalence of their coordinate systems. Second, identification of mechanistically important low-dimensional motifs (e.g., limit sets) is intractable in high-dimensional nonlinear models such as RNNs. Here, we propose a comprehensive framework to address these two issues, termed Diffeomorphic vector field alignment FOR learned Models (DFORM). DFORM learns a nonlinear coordinate transformation between the state spaces of two dynamical systems, which aligns their trajectories in a maximally one-to-one manner. In so doing, DFORM enables an assessment of whether two models exhibit topological equivalence, i.e., similar mechanisms despite differences in coordinate systems. A byproduct of this method is a means to locate dynamical motifs on low-dimensional manifolds embedded within higher-dimensional systems. We verified DFORM's ability to identify linear and nonlinear coordinate transformations using canonical topologically equivalent systems, RNNs, and systems related by nonlinear flows. DFORM was also shown to provide a quantification of similarity between topologically distinct systems. We then demonstrated that DFORM can locate important dynamical motifs including invariant manifolds and saddle limit sets within high-dimensional models. Finally, using a set of RNN models trained on human functional MRI (fMRI) recordings, we illustrated that DFORM can identify limit cycles from high-dimensional data-driven models, which agreed well with prior numerical analysis.
comment: 57 pages, 18 figures. For associated code, see https://github.com/rq-Chen/DFORM_stable
Distributionally Robust Multi-Agent Reinforcement Learning for Intelligent Traffic Control
Learning-based traffic signal control is typically optimized for average performance under a few nominal demand patterns, which can result in poor behavior under atypical traffic conditions. To address this, we develop a distributionally robust multi-agent reinforcement learning framework for signal control on a 3x3 urban grid calibrated from a contiguous 3x3 subarea of central Athens covered by the pNEUMA trajectory dataset (Barmpounakis and Geroliminis, 2020). Our approach proceeds in three stages. First, we train a baseline multi-agent RL controller in which each intersection is governed by a proximal policy optimization agent with discrete signal phases, using a centralized training, decentralized execution paradigm. Second, to capture demand uncertainty, we construct eight heterogeneous origin-destination-based traffic scenarios-one directly derived from pNEUMA and seven synthetically generated-to span a wide range of spatial and temporal demand patterns. Over this scenario set, we train a contextual-bandit worst-case estimator that assigns mixture weights to estimate adversarial demand distributions conditioned on context. Finally, without modifying the controller architecture, we fine-tune the baseline multi-agent reinforcement learning agents under these estimated worst-case mixtures to obtain a distributionally robust multi-agent reinforcement learning controller. Across all eight scenarios, as well as on an unseen validation network based on the Sioux Falls configuration, the distributionally robust multi-agent reinforcement learning controller consistently reduces horizon-averaged queues and increases average speeds relative to the baseline, achieving up to 51% shorter queues and 38% higher speeds on the worst-performing scenarios.
Improving the Accuracy of DC Optimal Power Flow Formulations via Parameter Optimization
DC Optimal Power Flow (DC-OPF) problems optimize the generators' active power setpoints while satisfying constraints based on the DC power flow linearization. The computational tractability advantages of DC-OPF problems come at the expense of inaccuracies relative to AC Optimal Power Flow (AC-OPF) problems that accurately model the nonlinear steady-state behavior of power grids. This paper proposes an algorithm that significantly improves the accuracy of the generators' active power setpoints from DC-OPF problems with respect to the corresponding AC-OPF problems over a specified range of operating conditions. Using sensitivity information in a machine learning-inspired methodology, this algorithm tunes coefficient and bias parameters in the DC power flow approximation to improve the accuracy of the resulting DC-OPF solutions. Employing the Truncated Newton Conjugate-Gradient (TNC) method, a Quasi-Newton optimization technique, this parameter tuning occurs during an offline training phase, with the resulting parameters then used in online computations. Numerical results underscore the algorithm's efficacy with accuracy improvements in squared two-norm and $\infty$-norm losses of up to $90\%$ and $79\%$, respectively, relative to traditional DC-OPF formulations.
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
Robotics
Systematic Benchmarking of SUMO Against Data-Driven Traffic Simulators
This paper presents a systematic benchmarking of the model-based microscopic traffic simulator SUMO against state-of-the-art data-driven traffic simulators using large-scale real-world datasets. Using the Waymo Open Motion Dataset (WOMD) and the Waymo Open Sim Agents Challenge (WOSAC), we evaluate SUMO under both short-horizon (8s) and long-horizon (60s) closed-loop simulation settings. To enable scalable evaluation, we develop Waymo2SUMO, an automated pipeline that converts WOMD scenarios into SUMO simulations. On the WOSAC benchmark, SUMO achieves a realism meta metric of 0.653 while requiring fewer than 100 tunable parameters. Extended rollouts show that SUMO maintains low collision and offroad rates and exhibits stronger long-horizon stability than representative data-driven simulators. These results highlight complementary strengths of model-based and data-driven approaches for autonomous driving simulation and benchmarking.
comment: Source code is available at https://github.com/LuminousLamp/SUMO-Benchmark
STORM: Search-Guided Generative World Models for Robotic Manipulation
We present STORM (Search-Guided Generative World Models), a novel framework for spatio-temporal reasoning in robotic manipulation that unifies diffusion-based action generation, conditional video prediction, and search-based planning. Unlike prior Vision-Language-Action (VLA) models that rely on abstract latent dynamics or delegate reasoning to language components, STORM grounds planning in explicit visual rollouts, enabling interpretable and foresight-driven decision-making. A diffusion-based VLA policy proposes diverse candidate actions, a generative video world model simulates their visual and reward outcomes, and Monte Carlo Tree Search (MCTS) selectively refines plans through lookahead evaluation. Experiments on the SimplerEnv manipulation benchmark demonstrate that STORM achieves a new state-of-the-art average success rate of 51.0 percent, outperforming strong baselines such as CogACT. Reward-augmented video prediction substantially improves spatio-temporal fidelity and task relevance, reducing Frechet Video Distance by over 75 percent. Moreover, STORM exhibits robust re-planning and failure recovery behavior, highlighting the advantages of search-guided generative world models for long-horizon robotic manipulation.
comment: Under submission
When Robots Say No: The Empathic Ethical Disobedience Benchmark
Robots must balance compliance with safety and social expectations as blind obedience can cause harm, while over-refusal erodes trust. Existing safe reinforcement learning (RL) benchmarks emphasize physical hazards, while human-robot interaction trust studies are small-scale and hard to reproduce. We present the Empathic Ethical Disobedience (EED) Gym, a standardized testbed that jointly evaluates refusal safety and social acceptability. Agents weigh risk, affect, and trust when choosing to comply, refuse (with or without explanation), clarify, or propose safer alternatives. EED Gym provides different scenarios, multiple persona profiles, and metrics for safety, calibration, and refusals, with trust and blame models grounded in a vignette study. Using EED Gym, we find that action masking eliminates unsafe compliance, while explanatory refusals help sustain trust. Constructive styles are rated most trustworthy, empathic styles -- most empathic, and safe RL methods improve robustness but also make agents more prone to overly cautious behavior. We release code, configurations, and reference policies to enable reproducible evaluation and systematic human-robot interaction research on refusal and trust. At submission time, we include an anonymized reproducibility package with code and configs, and we commit to open-sourcing the full repository after the paper is accepted.
comment: Accepted at the ACM/IEEE International Conference on Human-Robot Interaction (HRI 2026). This is a preprint of the author-accepted manuscript
AOMGen: Photoreal, Physics-Consistent Demonstration Generation for Articulated Object Manipulation
Recent advances in Vision-Language-Action (VLA) and world-model methods have improved generalization in tasks such as robotic manipulation and object interaction. However, Successful execution of such tasks depends on large, costly collections of real demonstrations, especially for fine-grained manipulation of articulated objects. To address this, we present AOMGen, a scalable data generation framework for articulated manipulation which is instantiated from a single real scan, demonstration and a library of readily available digital assets, yielding photoreal training data with verified physical states. The framework synthesizes synchronized multi-view RGB temporally aligned with action commands and state annotations for joints and contacts, and systematically varies camera viewpoints, object styles, and object poses to expand a single execution into a diverse corpus. Experimental results demonstrate that fine-tuning VLA policies on AOMGen data increases the success rate from 0% to 88.7%, and the policies are tested on unseen objects and layouts.
Learning Semantic Atomic Skills for Multi-Task Robotic Manipulation
While imitation learning has shown impressive results in single-task robot manipulation, scaling it to multi-task settings remains a fundamental challenge due to issues such as suboptimal demonstrations, trajectory noise, and behavioral multi-modality. Existing skill-based methods attempt to address this by decomposing actions into reusable abstractions, but they often rely on fixed-length segmentation or environmental priors that limit semantic consistency and cross-task generalization. In this work, we propose AtomSkill, a novel multi-task imitation learning framework that learns and leverages a structured Atomic Skill Space for composable robot manipulation. Our approach is built on two key technical contributions. First, we construct a Semantically Grounded Atomic Skill Library by partitioning demonstrations into variable-length skills using gripper-state keyframe detection and vision-language model annotation. A contrastive learning objective ensures the resulting skill embeddings are both semantically consistent and temporally coherent. Second, we propose an Action Generation module with Keypose Imagination, which jointly predicts a skill's long-horizon terminal keypose and its immediate action sequence. This enables the policy to reason about overarching motion goals and fine-grained control simultaneously, facilitating robust skill chaining. Extensive experiments in simulated and real-world environments show that AtomSkill consistently outperforms state-of-the-art methods across diverse manipulation tasks.
Dynamic Entropy Tuning in Reinforcement Learning Low-Level Quadcopter Control: Stochasticity vs Determinism
This paper explores the impact of dynamic entropy tuning in Reinforcement Learning (RL) algorithms that train a stochastic policy. Its performance is compared against algorithms that train a deterministic one. Stochastic policies optimize a probability distribution over actions to maximize rewards, while deterministic policies select a single deterministic action per state. The effect of training a stochastic policy with both static entropy and dynamic entropy and then executing deterministic actions to control the quadcopter is explored. It is then compared against training a deterministic policy and executing deterministic actions. For the purpose of this research, the Soft Actor-Critic (SAC) algorithm was chosen for the stochastic algorithm while the Twin Delayed Deep Deterministic Policy Gradient (TD3) was chosen for the deterministic algorithm. The training and simulation results show the positive effect the dynamic entropy tuning has on controlling the quadcopter by preventing catastrophic forgetting and improving exploration efficiency.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Reinforcement Learning Position Control of a Quadrotor Using Soft Actor-Critic (SAC)
This paper proposes a new Reinforcement Learning (RL) based control architecture for quadrotors. With the literature focusing on controlling the four rotors' RPMs directly, this paper aims to control the quadrotor's thrust vector. The RL agent computes the percentage of overall thrust along the quadrotor's z-axis along with the desired Roll ($φ$) and Pitch ($θ$) angles. The agent then sends the calculated control signals along with the current quadrotor's Yaw angle ($ψ$) to an attitude PID controller. The PID controller then maps the control signals to motor RPMs. The Soft Actor-Critic algorithm, a model-free off-policy stochastic RL algorithm, was used to train the RL agents. Training results show the faster training time of the proposed thrust vector controller in comparison to the conventional RPM controllers. Simulation results show smoother and more accurate path-following for the proposed thrust vector controller.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Deterministic Reconstruction of Tennis Serve Mechanics: From Aerodynamic Constraints to Internal Torques via Rigid-Body Dynamics
Most conventional studies on tennis serve biomechanics rely on phenomenological observations comparing professional and amateur players or, more recently, on AI-driven statistical analyses of motion data. While effective at describing \textit{what} elite players do, these approaches often fail to explain \textit{why} such motions are physically necessary from a mechanistic perspective. This paper proposes a deterministic, physics-based approach to the tennis serve using a 12-degree-of-freedom multi-segment model of the human upper body. Rather than fitting the model to motion capture data, we solve the inverse kinematics problem via trajectory optimization to rigorously satisfy the aerodynamic boundary conditions required for Flat, Slice, and Kick serves. We subsequently perform an inverse dynamics analysis based on the Principle of Virtual Power to compute the net joint torques. The simulation results reveal that while the kinematic trajectories for different serves may share visual similarities, the underlying kinetic profiles differ drastically. A critical finding is that joints exhibiting minimal angular displacement (kinematically ``quiet'' phases), particularly at the wrist, require substantial and highly time-varying torques to counteract gravitational loading and dynamic coupling effects. By elucidating the dissociation between visible kinematics and internal kinetics, this study provides a first-principles framework for understanding the mechanics of the tennis serve, moving beyond simple imitation of elite techniques.
comment: 9 figures
On The Computational Complexity for Minimizing Aerial Photographs for Full Coverage of a Planar Region
With the popularity of drone technologies, aerial photography have become prevalent in many daily scenarios such as environment monitoring, structure inspection, law enforcement etc. A central challenge in this domain is the efficient coverage of a target area with photographs that can entirely capture the region, while respecting constraints such as the image resolution, and limited number of pictures that can be taken. This work investigates the computational complexity of several fundamental problems arised from this challenge. By abstracting the aerial photography problem into the coverage problems in computational geometry, we demonstrate that most of these problems are in fact computationally intractable, with the implication that traditional algorithms cannot solve them efficiently. The intuitions of this work can extend beyond aerial photography to broader applications such as pesticide spraying, and strategic sensor placement.
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
Photorealistic 3-D reconstruction from monocular video collapses in large-scale scenes when depth, pose, and radiance are solved in isolation: scale-ambiguous depth yields ghost geometry, long-horizon pose drift corrupts alignment, and a single global NeRF cannot model hundreds of metres of content. We introduce a joint learning framework that couples all three factors and demonstrably overcomes each failure case. Our system begins with a Vision-Transformer (ViT) depth network trained with metric-scale supervision, giving globally consistent depths despite wide field-of-view variations. A multi-scale feature bundle-adjustment (BA) layer refines camera poses directly in feature space--leveraging learned pyramidal descriptors instead of brittle keypoints--to suppress drift on unconstrained trajectories. For scene representation, we deploy an incremental local-radiance-field hierarchy: new hash-grid NeRFs are allocated and frozen on-the-fly when view overlap falls below a threshold, enabling city-block-scale coverage on a single GPU. Evaluated on the Tanks and Temples benchmark, our method reduces Absolute Trajectory Error to 0.001-0.021 m across eight indoor-outdoor sequences--up to 18x lower than BARF and 2x lower than NoPe-NeRF--while maintaining sub-pixel Relative Pose Error. These results demonstrate that metric-scale, drift-free 3-D reconstruction and high-fidelity novel-view synthesis are achievable from a single uncalibrated RGB camera.
comment: 8 pages, 2 figures, 2 tables
Fractional-order Modeling for Nonlinear Soft Actuators via Particle Swarm Optimization
Modeling soft pneumatic actuators with high precision remains a fundamental challenge due to their highly nonlinear and compliant characteristics. This paper proposes an innovative modeling framework based on fractional-order differential equations (FODEs) to accurately capture the dynamic behavior of soft materials. The unknown parameters within the fractional-order model are identified using particle swarm optimization (PSO), enabling parameter estimation directly from experimental data without reliance on pre-established material databases or empirical constitutive laws. The proposed approach effectively represents the complex deformation phenomena inherent in soft actuators. Experimental results validate the accuracy and robustness of the developed model, demonstrating improvement in predictive performance compared to conventional modeling techniques. The presented framework provides a data-efficient and database-independent solution for soft actuator modeling, advancing the precision and adaptability of soft robotic system design.
comment: 6 pages, 4 figures
LLaViDA: A Large Language Vision Driving Assistant for Explicit Reasoning and Enhanced Trajectory Planning
Trajectory planning is a fundamental yet challenging component of autonomous driving. End-to-end planners frequently falter under adverse weather, unpredictable human behavior, or complex road layouts, primarily because they lack strong generalization or few-shot capabilities beyond their training data. We propose LLaViDA, a Large Language Vision Driving Assistant that leverages a Vision-Language Model (VLM) for object motion prediction, semantic grounding, and chain-of-thought reasoning for trajectory planning in autonomous driving. A two-stage training pipeline--supervised fine-tuning followed by Trajectory Preference Optimization (TPO)--enhances scene understanding and trajectory planning by injecting regression-based supervision, produces a powerful "VLM Trajectory Planner for Autonomous Driving." On the NuScenes benchmark, LLaViDA surpasses state-of-the-art end-to-end and other recent VLM/LLM-based baselines in open-loop trajectory planning task, achieving an average L2 trajectory error of 0.31 m and a collision rate of 0.10% on the NuScenes test set. The code for this paper is available at GitHub.
Alternating Minimization for Time-Shifted Synergy Extraction in Human Hand Coordination
Identifying motor synergies -- coordinated hand joint patterns activated at task-dependent time shifts -- from kinematic data is central to motor control and robotics. Existing two-stage methods first extract candidate waveforms (via SVD) and then select shifted templates using sparse optimization, requiring at least two datasets and complicating data collection. We introduce an optimization-based framework that jointly learns a small set of synergies and their sparse activation coefficients. The formulation enforces group sparsity for synergy selection and element-wise sparsity for activation timing. We develop an alternating minimization method in which coefficient updates decouple across tasks and synergy updates reduce to regularized least-squares problems. Our approach requires only a single data set, and simulations show accurate velocity reconstruction with compact, interpretable synergies.
comment: 7 pages, 5 figures
On Swarm Leader Identification using Probing Policies
Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.
comment: 13 pages, journal
Structural Integrality in Task Assignment and Path Finding via Total Unimodularity of Petri Net Models
Task Assignment and Path Finding (TAPF) concerns computing collision-free motions for multiple robots while jointly selecting goal locations. In this paper, safety is enforced by requiring unit-capacity traversal between successive intermediate markings, yielding coordination strategies that are valid independently of any specific time interpretation. Existing optimization-based approaches typically rely on time-expanded network-flow models, which result in large mixed-integer programs and limited scalability. We instead develop a Petri net (PN)-based optimization framework that exploits structural properties of the motion model to improve computational efficiency without explicit time expansion. When robot motion is modeled by strongly connected state-machine PNs, we show that, once the congestion level (equivalently, the synchronization depth) is fixed to an integer value, the resulting motion-planning constraint matrix is totally unimodular. Consequently, the corresponding LP relaxation admits integral optimal solutions for the motion variables. When the estimated congestion exceeds one, we introduce a synchronization-on-demand mechanism based on intermediate markings; for a fixed number of synchronization stages, the associated constraint matrices remain totally unimodular, thereby preserving integrality of the motion variables. Finally, we extend TAPF to Boolean specifications over regions of interest and propose a two-stage LP/mixed-integer linear programming (MILP) scheme in which integrality is confined to task-selection variables. Simulations on large benchmarks demonstrate substantial scalability improvements over time-expanded optimization baselines.
AnyNav: Visual Neuro-Symbolic Friction Learning for Off-road Navigation
Off-road navigation is critical for a wide range of field robotics applications from planetary exploration to disaster response. However, it remains a longstanding challenge due to unstructured environments and the inherently complex terrain-vehicle interactions. Traditional physics-based methods struggle to accurately capture the nonlinear dynamics underlying these interactions, while purely data-driven approaches often overfit to specific motion patterns, vehicle geometries, or platforms, limiting their generalization in diverse, real-world scenarios. To address these limitations, we introduce AnyNav, a vision-based friction estimation and navigation framework grounded in neuro-symbolic principles. Our approach integrates neural networks for visual perception with symbolic physical models for reasoning about terrain-vehicle dynamics. To enable self-supervised learning in real-world settings, we adopt the imperative learning paradigm, employing bilevel optimization to train the friction network through physics-based optimization. This explicit incorporation of physical reasoning substantially enhances generalization across terrains, vehicle types, and operational conditions. Leveraging the predicted friction coefficients, we further develop a physics-informed navigation system capable of generating physically feasible, time-efficient paths together with corresponding speed profiles. We demonstrate that AnyNav seamlessly transfers from simulation to real-world robotic platforms, exhibiting strong robustness across different four-wheeled vehicles and diverse off-road environments.
Py-DiSMech: A Scalable and Efficient Framework for Discrete Differential Geometry-Based Modeling and Control of Soft Robots
High-fidelity simulation has become essential to the design and control of soft robots, where large geometric deformations and complex contact interactions challenge conventional modeling tools. Recent advances in the field demand simulation frameworks that combine physical accuracy, computational scalability, and seamless integration with modern control and optimization pipelines. In this work, we present Py-DiSMech, a Python-based, open-source simulation framework for modeling and control of soft robotic structures grounded in the principles of Discrete Differential Geometry (DDG). By discretizing geometric quantities such as curvature and strain directly on meshes, Py-DiSMech captures the nonlinear deformation of rods, shells, and hybrid structures with high fidelity and reduced computational cost. The framework introduces (i) a fully vectorized NumPy implementation achieving order-of-magnitude speed-ups over existing geometry-based simulators; (ii) a penalty-energy-based fully implicit contact model that supports rod-rod, rod-shell, and shell-shell interactions; (iii) a natural-strain-based feedback-control module featuring a proportional-integral (PI) controller for shape regulation and trajectory tracking; and (iv) a modular, object-oriented software design enabling user-defined elastic energies, actuation schemes, and integration with machine-learning libraries. Benchmark comparisons demonstrate that Py-DiSMech substantially outperforms the state-of-the-art simulator Elastica in computational efficiency while maintaining physical accuracy. Together, these features establish Py-DiSMech as a scalable, extensible platform for simulation-driven design, control validation, and sim-to-real research in soft robotics.
comment: Software: https://github.com/structuresComp/dismech-python Supplementary Video: https://youtu.be/AfFcoAZR0Go
Iterative Compositional Data Generation for Robot Control
Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.
comment: Added experiments in Appendix A; no change to main paper
Vidar: Embodied Video Diffusion Model for Generalist Manipulation
Scaling general-purpose manipulation to new robot embodiments remains challenging: each platform typically needs large, homogeneous demonstrations, and end-to-end pixel-to-action pipelines may degenerate under background and viewpoint shifts. Based on previous advances in video-based robot control, we present Vidar, consisting of an embodied video diffusion model as the generalizable prior and a masked inverse dynamics model (MIDM) as the adapter. We leverage a video diffusion model pre-trained at Internet scale, and further continuously pre-train it for the embodied domain using 750K multi-view trajectories collected from three real-world robot platforms. For this embodied pre-training, we introduce a unified observation space that jointly encodes robot, camera, task, and scene contexts. The MIDM module learns action-relevant pixel masks without dense labels, grounding the prior into the target embodiment's action space while suppressing distractors. With only 20 minutes of human demonstrations on an unseen robot (1% of typical data), Vidar outperforms state-of-the-art baselines and generalizes to unseen tasks, backgrounds, and camera layouts. Our results suggest a scalable recipe for "one prior, many embodiments": strong, inexpensive video priors together with minimal on-robot alignment.
Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference AAMAS 2026
Dynamic game theory is a powerful tool in modeling multi-agent interactions and human-robot systems. In practice, since the objective functions of both agents may not be explicitly known to each other, these interactions can be modeled as incomplete-information general-sum dynamic games. Solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (ILQ) approximation of dynamic games, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions and updating its own control policy accordingly in real time, which leads to unbiased and fast learning of the unknown objective function of the peer agent. Additionally, we demonstrate how N-PACE enables intent communication by explicitly modeling the peer's learning dynamics. Finally, we show how N-PACE outperforms baseline methods that disregard the learning behavior of the other agent, both analytically and using our case studies
comment: Extended version of our AAMAS 2026 accepted paper with an expanded appendix. Compared to the previous arXiv version, we add theoretical guarantees, additional experiments, new baselines, and more in-depth appendix discussion, along with a link to the GitHub repository
Human Centric General Physical Intelligence for Agile Manufacturing Automation
Agile human-centric manufacturing increasingly requires resilient robotic solutions that are capable of safe and productive interactions within unstructured environments of modern factories. While multi-modal sensor fusion provides comprehensive situational awareness yet robots must also contextualize their reasoning to achieve deep semantic understanding of complex scenes. Foundation model particularly Vision-Language-Action (VLA) models have emerged as promising approach on integrating diverse perceptual modalities and spatio-temporal reasoning abilities to ground physical actions to realize General Physical Intelligence (GPI) across various robotic embodiments. Although GPI has been conceptually discussed in literature but its pivotal role and practical deployment in agile manufacturing remain underexplored. To address this gap, this practical review systematically surveys recent advances in VLA models through the lens of GPI by offering comparative analysis of leading implementations and evaluating their industrial readiness via structured ablation study. The state of the art is organized into six thematic pillars including multisensory representation learning, sim2real transfer, planning and control, uncertainty and safety measures and benchmarking. Finally, the review highlights open challenges and future directions for integrating GPI into industrial ecosystems to align with the vision of Industry 5.0 for intelligent, adaptive and collaborative manufacturing ecosystem.
comment: Advanced Engineering Informatics
cVLA: Towards Efficient Camera-Space VLAs
Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive performance of Vision Language Models (VLMs) on 2D images to directly infer robot end-effector poses in image frame coordinates. Unlike prior VLA models that output low-level controls, our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment agnostic. Despite its lightweight design, our next-token prediction architecture effectively learns meaningful and executable robot trajectories. We further explore the underutilized potential of incorporating depth images, inference-time techniques such as decoding strategies, and demonstration-conditioned action generation. Our model is trained on a simulated dataset and exhibits strong sim-to-real transfer capabilities. We evaluate our approach using a combination of simulated and real data, demonstrating its effectiveness on a real robotic system.
comment: 20 pages, 10 figures; CoRL 2025 Workshop on Generalizable Priors for Robot Manipulation
Multiagent Systems
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or implementing a small feature. However, real-world software engineering is fundamentally a long-horizon endeavor: developers must interpret high-level requirements, plan coordinated changes across many files, and evolve codebases over multiple iterations while preserving existing functionality. We introduce SWE-EVO, a benchmark that evaluates agents on this long-horizon software evolution challenge. Constructed from release notes and version histories of seven mature open-source Python projects, Tool comprises 48 evolution tasks that require agents to implement multi-step modifications spanning an average of 21 files, validated against comprehensive test suites averaging 874 tests per instance. Experiments with state-of-the-art models reveal a striking capability gap: even GPT-5 with OpenHands achieves only a 21 percent resolution rate on Tool, compared to 65 percent on the single-issue SWE-Bench Verified. This demonstrates that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a fine-grained metric that captures partial progress toward solving these complex, long-horizon tasks.
Agent-Based Output Drift Detection for Breast Cancer Response Prediction in a Multisite Clinical Decision Support System
Modern clinical decision support systems can concurrently serve multiple, independent medical imaging institutions, but their predictive performance may degrade across sites due to variations in patient populations, imaging hardware, and acquisition protocols. Continuous surveillance of predictive model outputs offers a safe and reliable approach for identifying such distributional shifts without ground truth labels. However, most existing methods rely on centralized monitoring of aggregated predictions, overlooking site-specific drift dynamics. We propose an agent-based framework for detecting drift and assessing its severity in multisite clinical AI systems. To evaluate its effectiveness, we simulate a multi-center environment for output-based drift detection, assigning each site a drift monitoring agent that performs batch-wise comparisons of model outputs against a reference distribution. We analyse several multi-center monitoring schemes, that differ in how the reference is obtained (site-specific, global, production-only and adaptive), alongside a centralized baseline. Results on real-world breast cancer imaging data using a pathological complete response prediction model shows that all multi-center schemes outperform centralized monitoring, with F1-score improvements up to 10.3% in drift detection. In the absence of site-specific references, the adaptive scheme performs best, with F1-scores of 74.3% for drift detection and 83.7% for drift severity classification. These findings suggest that adaptive, site-aware agent-based drift monitoring can enhance reliability of multisite clinical decision support systems.
comment: Accepted at MICAD (Medical Imaging and Computer-Aided Diagnosis) 2025
Snowveil: A Framework for Decentralised Preference Discovery
Aggregating subjective preferences of a large group is a fundamental challenge in computational social choice, traditionally reliant on central authorities. To address the limitations of this model, this paper introduces Decentralised Preference Discovery (DPD), the problem of determining the collective will of an electorate under constraints of censorship resistance, partial information, and asynchronous communication. We propose Snowveil, a novel framework for this task. Snowveil uses an iterative, gossip-based protocol where voters repeatedly sample the preferences of a small, random subset of the electorate to progressively converge on a collective outcome. We demonstrate the framework's modularity by designing the Constrained Hybrid Borda (CHB), a novel aggregation rule engineered to balance broad consensus with strong plurality support, and provide a rigorous axiomatic analysis of its properties. By applying a potential function and submartingale theory, we develop a multi-level analytical method to show that the system almost surely converges to a stable, single-winner in finite time, a process that can then be iterated to construct a set of winning candidates for multi-winner scenarios. This technique is largely agnostic to the specific aggregation rule, requiring only that it satisfies core social choice axioms like Positive Responsiveness, thus offering a formal toolkit for a wider class of DPD protocols. Furthermore, we present a comprehensive empirical analysis through extensive simulation, validating Snowveil's $O(n)$ scalability. Overall, this work advances the understanding of how a stable consensus can emerge from subjective, complex, and diverse preferences in decentralised systems for large electorates.
On Swarm Leader Identification using Probing Policies
Identifying the leader within a robotic swarm is crucial, especially in adversarial contexts where leader concealment is necessary for mission success. This work introduces the interactive Swarm Leader Identification (iSLI) problem, a novel approach where an adversarial probing agent identifies a swarm's leader by physically interacting with its members. We formulate the iSLI problem as a Partially Observable Markov Decision Process (POMDP) and employ Deep Reinforcement Learning, specifically Proximal Policy Optimization (PPO), to train the prober's policy. The proposed approach utilizes a novel neural network architecture featuring a Timed Graph Relationformer (TGR) layer combined with a Simplified Structured State Space Sequence (S5) model. The TGR layer effectively processes graph-based observations of the swarm, capturing temporal dependencies and fusing relational information using a learned gating mechanism to generate informative representations for policy learning. Extensive simulations demonstrate that our TGR-based model outperforms baseline graph neural network architectures and exhibits significant zero-shot generalization capabilities across varying swarm sizes and speeds different from those used during training. The trained prober achieves high accuracy in identifying the leader, maintaining performance even in out-of-training distribution scenarios, and showing appropriate confidence levels in its predictions. Real-world experiments with physical robots further validate the approach, confirming successful sim-to-real transfer and robustness to dynamic changes, such as unexpected agent disconnections.
comment: 13 pages, journal
SafeSieve: From Heuristics to Experience in Progressive Pruning for LLM-based Multi-Agent Communication AAAI-2026
LLM-based multi-agent systems exhibit strong collaborative capabilities but often suffer from redundant communication and excessive token overhead. Existing methods typically enhance efficiency through pretrained GNNs or greedy algorithms, but often isolate pre- and post-task optimization, lacking a unified strategy. To this end, we present SafeSieve, a progressive and adaptive multi-agent pruning algorithm that dynamically refines the inter-agent communication through a novel dual-mechanism. SafeSieve integrates initial LLM-based semantic evaluation with accumulated performance feedback, enabling a smooth transition from heuristic initialization to experience-driven refinement. Unlike existing greedy Top-k pruning methods, SafeSieve employs 0-extension clustering to preserve structurally coherent agent groups while eliminating ineffective links. Experiments across benchmarks (SVAMP, HumanEval, etc.) showcase that SafeSieve achieves 94.01% average accuracy while reducing token usage by 12.4%-27.8%. Results further demonstrate robustness under prompt injection attacks (1.23% average accuracy drop). In heterogeneous settings, SafeSieve reduces deployment costs by 13.3% while maintaining performance. These results establish SafeSieve as an efficient, GPU-free, and scalable framework for practical multi-agent systems. Our code can be found here: https://github.com/csgen/SafeSieve
comment: AAAI-2026 poster; 7 pages for main content, 5 figures, 4 tables
JustAct+: A Framework for Auditable Multi-Agent Systems Regulated by Inter-Organisational Policies
In open multi-agent agent systems that cross organisational boundaries, agent actions must be regulated by complex policies. Consider medical data processing systems, which must observe generic laws (e.g., EU data protection regulations) and also specific participants' resource conditions (e.g., Bob consents to sharing his X-Rays with EU hospitals). Presently, we address the implementation of these systems as distributed software. Solutions to key sub-problems are available: existing policy languages capture the necessary normative concepts and formalise the computational representation and reasoning about policies, and existing distributed algorithms and protocols coordinate agents' changing actions and policies. But which policies and protocols are useful in application? With the JustAct framework, we characterise a class of multi-agent systems where actors justify their actions with sufficient policy information collected from dynamic policy statements and agreements. We prove key properties of these systems, e.g., any decision that an action is permitted now cannot be refuted later, regardless of any added statements or updated agreements. We study a particular instance of the framework by specifying (in Rocq) and implementing (in Rust) a particular policy language and runtime system for mediating agent communications. We demonstrate and assess JustAct via a case study of this implementation: we reproduce the usage scenarios of Brane, an existing policy-regulated, inter-domain, medical data processing system.
Systems and Control (EESS)
Scaling up Stability: Reinforcement Learning for Distributed Control of Networked Systems in the Space of Stabilizing Policies
We study distributed control of networked systems through reinforcement learning, where neural policies must be simultaneously scalable, expressive and stabilizing. We introduce a policy parameterization that embeds Graph Neural Networks (GNNs) into a Youla-like magnitude-direction parameterization, yielding distributed stochastic controllers that guarantee network-level closed-loop stability by design. The magnitude is implemented as a stable operator consisting of a GNN acting on disturbance feedback, while the direction is a GNN acting on local observations. We prove robustness of the closed loop to perturbations in both the graph topology and model parameters, and show how to integrate our parameterization with Proximal Policy Optimization. Experiments on a multi-agent navigation task show that policies trained on small networks transfer directly to larger ones and unseen network topologies, achieve higher returns and lower variance than a state-of-the-art MARL baseline while preserving stability.
The Illusion of Consistency: Selection-Induced Bias in Gated Kalman Innovation Statistics
Validation gating is a fundamental component of classical Kalman-based tracking systems. Only measurements whose normalized innovation squared (NIS) falls below a prescribed threshold are considered for state update. While this procedure is statistically motivated by the chi-square distribution, it implicitly replaces the unconditional innovation process with a conditionally observed one, restricted to the validation event. This paper shows that innovation statistics computed after gating converge to gate-conditioned rather than nominal quantities. Under classical linear--Gaussian assumptions, we derive exact expressions for the first- and second-order moments of the innovation conditioned on ellipsoidal gating, and show that gating induces a deterministic, dimension-dependent contraction of the innovation covariance. The analysis is extended to NN association, which is shown to act as an additional statistical selection operator. We prove that selecting the minimum-norm innovation among multiple in-gate measurements introduces an unavoidable energy contraction, implying that nominal innovation statistics cannot be preserved under nontrivial gating and association. Closed-form results in the two-dimensional case quantify the combined effects and illustrate their practical significance.
comment: 8 pages, preprint
Sink Proximity: A Novel Approach for Online Vehicle Dispatch in Ride-hailing
Ride-hailing platforms have a profound impact on urban transportation systems, and their performance largely depends on how intelligently they dispatch vehicles in real time. In this work, we develop a new approach to online vehicle dispatch that strengthens a platform's ability to serve more requests under demand uncertainty. We introduce a novel measure called sink proximity, a network-science-inspired measure that captures how demand and vehicle flows are likely to evolve across the city. By integrating this measure into a shareability-network framework, we design an online dispatch algorithm that naturally considers future network states, without depending on fragile spatiotemporal forecasts. Numerical studies demonstrate that our proposed solution significantly improves the request service rate under peak hours within a receding horizon framework with limited future information available.
Prioritized Constraints in Optimization-Based Control
We provide theoretical foundations and computational tools for the systematic design of optimization-based control laws with constraints that have different priorities. By introducing the concept of prioritized intersections, we extend and unify previous work on the topic. Moreover, to enable the use of prioritized intersection in real-time applications, we propose an efficient solver for forming such intersections for polyhedral constraints. The solver in question is a tailored implementation of a dual active-set quadratic programming solver that leverages the particular problem structure of the optimization problems arising for prioritized intersections. The method is validated in a real-time MPC application for autonomous driving, where it successfully resolves six different levels of conflicting constraints, confirming its efficiency and practicality for control. Furthermore, we show that the proposed solver outperforms existing solvers for hierarchical quadratic programming, making it relevant beyond control applications.
A Distributed Hierarchical Spatio-Temporal Edge-Enhanced Graph Neural Network for City-Scale Dynamic Logistics Routing
City-scale logistics routing has become increasingly challenging as metropolitan road networks grow to tens of millions of edges and traffic conditions evolve rapidly under high-volume mobility demands. Conventional centralized routing algorithms and monolithic graph neural network (GNN) models suffer from limited scalability, high latency, and poor real-time adaptability, which restricts their effectiveness in large urban logistics systems. To address these challenges, this paper proposes a Distributed Hierarchical Spatio-Temporal Edge-Enhanced Graph Neural Network (HSTE-GNN) for dynamic routing over ultra-large road networks. The framework partitions the city-scale graph into regional subgraphs processed in parallel across distributed computing nodes, enabling efficient learning of localized traffic dynamics. Within each region, an edge-enhanced spatio-temporal module jointly models node states, dynamic edge attributes, and short-term temporal dependencies. A hierarchical coordination layer further aggregates cross-region representations through an asynchronous parameter-server mechanism, ensuring global routing coherence under high-frequency traffic updates. This distributed hierarchical design balances local responsiveness with global consistency, significantly improving scalability and inference efficiency. Experiments on real-world large-scale traffic datasets from Beijing and New York demonstrate that HSTE-GNN outperforms strong spatio-temporal baselines such as ST-GRAPH, achieving 34.9% lower routing delay, 14.7% lower MAPE, and 11.8% lower RMSE, while improving global route consistency by 7.3%. These results confirm that the proposed framework provides a scalable, adaptive, and efficient solution for next-generation intelligent transportation systems and large-scale logistics platforms.
On Hyperexponential Stabilization of Linear Infinite-Dimensional Systems
This paper study the hyperexponential stabilization for infinite-dimensional system on Hilbert space by a distributed time depending control law. The well-posedness of the closed loop for every time is obtained through the use of maximal monotone operator. The hyperexponential stability and ISS property of the closed loop is established using Lyapunov analysis and time scale transformation.
Virtual Resistance-Based Control for Grid-Connected Inverters using Persidskii Systems Approach
This work addresses virtual resistance (VR)based control for grid-connected inverters, which enhances transient damping, reduces steady-state errors, and improves robustness to grid disturbances without requiring additional voltage sensors. Classical passivity-based VR control is robust, but limited by restrictive sector bounds on nonlinearities. We extend these bounds and model the closed-loop system as a generalized Persidskii-type nonlinear system. Using this framework, we derive input-to-state stability (ISS) conditions that account for the extended nonlinearities and external disturbances, providing a systematic and less conservative approach to VR control design under practical operating conditions, which is validated through extensive simulations.
Why Most Optimism Bandit Algorithms Have the Same Regret Analysis: A Simple Unifying Theorem
Several optimism-based stochastic bandit algorithms -- including UCB, UCB-V, linear UCB, and finite-arm GP-UCB -- achieve logarithmic regret using proofs that, despite superficial differences, follow essentially the same structure. This note isolates the minimal ingredients behind these analyses: a single high-probability concentration condition on the estimators, after which logarithmic regret follows from two short deterministic lemmas describing radius collapse and optimism-forced deviations. The framework yields unified, near-minimal proofs for these classical algorithms and extends naturally to many contemporary bandit variants.
Neural Proofs for Sound Verification and Control of Complex Systems
This informal contribution presents an ongoing line of research that is pursuing a new approach to the construction of sound proofs for the formal verification and control of complex stochastic models of dynamical systems, of reactive programs and, more generally, of models of Cyber-Physical Systems. Neural proofs are made up of two key components: 1) proof rules encode requirements entailing the verification of general temporal specifications over the models of interest; and 2) certificates that discharge such rules, namely they are constructed from said proof rules with an inductive (that is, cyclic, repetitive) approach; this inductive approach involves: 2a) accessing samples from the model's dynamics and accordingly training neural networks, whilst 2b) generalising such networks via SAT-modulo-theory (SMT) queries that leverage the full knowledge of the models. In the context of sequential decision making problems over complex stochastic models, it is possible to additionally generate provably-correct policies/strategies/controllers, namely state-feedback functions that, in conjunction with neural certificates, formally attain the given specifications for the models of interest.
Robust H-infinity control under stochastic requirements: minimizing conditional value-at-risk instead of worst-case performance
Conventional robust $\mathcal H_2/\mathcal H_\infty$ control minimizes the worst-case performance, often leading to a conservative design driven by very rare and somewhat arbitrary parametric configurations. To reduce this conservatism while taking advantage of the stochastic properties of Monte-Carlo sampling and its compatibility with parallel computing, we introduce an alternative paradigm that optimizes the controller with respect to a stochastic criterion, namely the conditional value at risk. We illustrate the potential of this approach on a realistic satellite benchmark, showing that it can significantly improve overall performance by tolerating some degradation in very rare worst-case scenarios.
comment: Preprint
State-Space Averaging Revisited via Reconstruction Operators
This paper presents an operator-theoretic reconstruction of an equivalent continuous-time LTI model from an exact sampled-data (Poincaré-map) baseline of a piecewise-linear switching system. The rebuilding is explicitly expressed via matrix logarithms. By expanding the logarithm of a product of matrix exponentials using the Baker--Campbell--Hausdorff (BCH) formula, we show that the classical state-space averaging (SSA) model can be interpreted as the leading-order truncation of this exact reconstruction when the switching period is small and the ripple is small. The same view explains why SSA critically relies on low-frequency and small-ripple assumptions, and why the method becomes fragile for converters with more than two subintervals per cycle. Finally, we provide a complexity-reduced, SSA-flavoured implementation strategy for obtaining the required spectral quantities and a real-valued logarithm without explicitly calling eigen-decomposition or complex matrix logarithms, by exploiting $2\times 2$ invariants and a minimal real-lift construction.
Trustworthy and Explainable Deep Reinforcement Learning for Safe and Energy-Efficient Process Control: A Use Case in Industrial Compressed Air Systems
This paper presents a trustworthy reinforcement learning approach for the control of industrial compressed air systems. We develop a framework that enables safe and energy-efficient operation under realistic boundary conditions and introduce a multi-level explainability pipeline combining input perturbation tests, gradient-based sensitivity analysis, and SHAP (SHapley Additive exPlanations) feature attribution. An empirical evaluation across multiple compressor configurations shows that the learned policy is physically plausible, anticipates future demand, and consistently respects system boundaries. Compared to the installed industrial controller, the proposed approach reduces unnecessary overpressure and achieves energy savings of approximately 4\,\% without relying on explicit physics models. The results further indicate that system pressure and forecast information dominate policy decisions, while compressor-level inputs play a secondary role. Overall, the combination of efficiency gains, predictive behavior, and transparent validation supports the trustworthy deployment of reinforcement learning in industrial energy systems.
Power Converter DC Link Ripple and Network Unbalance as Active Constraints in Distribution System Optimal Power Flow
The mitigation of unbalanced grid voltages or currents by voltage source converters results in power ripple on the dc link, and is a key converter design parameter due to hardware or stability considerations. Despite the importance of this issue for system design and operation, the use of Optimal Power Flow (OPF)-based methods capturing the interaction between dc link ripple and converter unbalanced operation has been largely unexplored. In this work, the magnitude of the power ripple is derived for generic multi-terminal converters, then introduced as a bilinear OPF constraint for two-level converter topologies. OPF case studies demonstrate the necessity to model both neutral current and dc link ripple, with tradeoffs between capacitor sizing and leg sizing highlighted for phase current unbalance mitigation applications. Time domain simulations of a grid-connected four-wire voltage source converter verify the accuracy and validity of the algebraic formulation. It is concluded that awareness of dc link ripple impacts and constraints will be of growing importance for distribution system operators.
comment: This work has been submitted to the IEEE for possible publication
Regularized Distributed MPC for UAV Networks: Stabilizing Coupled Motion and Hybrid Beam Alignment
This letter investigates the coupled control problem in UAV networks utilizing high-frequency hybrid beamsteering. While phased arrays enable rapid electronic scanning, their finite Field of View (FoV) imposes a fundamental constraint that necessitates active mechanical steering of the airframe to maintain connectivity. We propose a decentralized Model Predictive Control (MPC) framework that jointly optimizes trajectory and heading to maximize network sum-capacity subject to safety constraints. Addressing the numerical instability caused by fast-fading channel nulls, we introduce a regularized surrogate cost function based on discrete spatial smoothing. We analytically prove that this approximation bounds the cost curvature, restoring the Lipschitz continuity of the gradient. Crucially, we derive a sufficient condition linking this Lipschitz constant to the controller gain, guaranteeing the contraction and linear convergence of the distributed best-response dynamics. Simulation results demonstrate that the proposed algorithm effectively navigates the trade-off between electronic beam tracking and kinematic safety, significantly systematically outperforming velocity-aligned baselines.
comment: Submitted to IEEE Control Systems Letters (LCSS). 6 pages, 3 figures
FedWiLoc: Federated Learning for Privacy-Preserving WiFi Indoor Localization
Current data-driven Wi-Fi-based indoor localization systems face three critical challenges: protecting user privacy, achieving accurate predictions in dynamic multipath environments, and generalizing across different deployments. Traditional Wi-Fi localization systems often compromise user privacy, particularly when facing compromised access points (APs) or man-in-the-middle attacks. As IoT devices proliferate in indoor environments, developing solutions that deliver accurate localization while robustly protecting privacy has become imperative. We introduce FedWiLoc, a privacy-preserving indoor localization system that addresses these challenges through three key innovations. First, FedWiLoc employs a split architecture where APs process Channel State Information (CSI) locally and transmit only privacy-preserving embedding vectors to user devices, preventing raw CSI exposure. Second, during training, FedWiLoc uses federated learning to collaboratively train the model across APs without centralizing sensitive user data. Third, we introduce a geometric loss function that jointly optimizes angle-of-arrival predictions and location estimates, enforcing geometric consistency to improve accuracy in challenging multipath conditions. Extensive evaluation across six diverse indoor environments spanning over 2,000 sq. ft. demonstrates that FedWiLoc outperforms state-of-the-art methods by up to 61.9% in median localization error while maintaining strong privacy guarantees throughout both training and inference.
Structural Integrality in Task Assignment and Path Finding via Total Unimodularity of Petri Net Models
Task Assignment and Path Finding (TAPF) concerns computing collision-free motions for multiple robots while jointly selecting goal locations. In this paper, safety is enforced by requiring unit-capacity traversal between successive intermediate markings, yielding coordination strategies that are valid independently of any specific time interpretation. Existing optimization-based approaches typically rely on time-expanded network-flow models, which result in large mixed-integer programs and limited scalability. We instead develop a Petri net (PN)-based optimization framework that exploits structural properties of the motion model to improve computational efficiency without explicit time expansion. When robot motion is modeled by strongly connected state-machine PNs, we show that, once the congestion level (equivalently, the synchronization depth) is fixed to an integer value, the resulting motion-planning constraint matrix is totally unimodular. Consequently, the corresponding LP relaxation admits integral optimal solutions for the motion variables. When the estimated congestion exceeds one, we introduce a synchronization-on-demand mechanism based on intermediate markings; for a fixed number of synchronization stages, the associated constraint matrices remain totally unimodular, thereby preserving integrality of the motion variables. Finally, we extend TAPF to Boolean specifications over regions of interest and propose a two-stage LP/mixed-integer linear programming (MILP) scheme in which integrality is confined to task-selection variables. Simulations on large benchmarks demonstrate substantial scalability improvements over time-expanded optimization baselines.
Any-Time Regret-Guaranteed Algorithm for Control of Linear Quadratic Systems
We propose a computationally efficient algorithm that achieves anytime regret of order $\mathcal{O}(\sqrt{t})$, with explicit dependence on the system dimensions and on the solution of the Discrete Algebraic Riccati Equation (DARE). Our approach uses an appropriately tuned regularization and a sufficiently accurate initial estimate to construct confidence ellipsoids for control design. A carefully designed input-perturbation mechanism is incorporated to ensure anytime performance. We develop two variants of the algorithm. The first enforces strong sequential stability, requiring each policy to be stabilizing and successive policies to remain close. This sequential condition helps prevent state explosion at policy update times; however, it results in a suboptimal regret scaling with respect to the DARE solution. Motivated by this limitation, we introduce a second class of algorithms that removes this requirement and instead requires only that each generated policy be stabilizing. Closed-loop stability is then preserved through a dwell-time inspired policy-update rule. This class of algorithms also addresses key shortcomings of most existing approaches which lack explicit high-probability bounds on the state trajectory expressed in system-theoretic terms. Our analysis shows that partially relaxing the sequential-stability requirement yields optimal regret. Finally, our method eliminates the need for any \emph{a priori} bound on the norm of the DARE solution, an assumption required by all existing computationally efficient OFU based algorithms.
PowerMamba: A Deep State Space Model and Comprehensive Benchmark for Time Series Prediction in Electric Power Systems
The electricity sector is undergoing substantial transformations due to the rising electrification of demand, enhanced integration of renewable energy resources, and the emergence of new technologies. These changes are rendering the electric grid more volatile and unpredictable, making it difficult to maintain reliable operations. In order to address these issues, advanced time series prediction models are needed for closing the gap between the forecasted and actual grid outcomes. In this paper, we introduce a multivariate time series prediction model that combines traditional state space models with deep learning methods to simultaneously capture and predict the underlying dynamics of multiple time series. Additionally, we design a time series processing module that incorporates high-resolution external forecasts into sequence-to-sequence prediction models, achieving this with negligible increases in size and no loss of accuracy. We also release an extended dataset spanning five years of load, electricity price, ancillary service price, and renewable generation. To complement this dataset, we provide an open-access toolbox that includes our proposed model, the dataset itself, and several state-of-the-art prediction models, thereby creating a unified framework for benchmarking advanced machine learning approaches. Our findings indicate that the proposed model outperforms existing models across various prediction tasks, improving state-of-the-art prediction error by an average of 7% and decreasing model parameters by 43%.
comment: This paper has been accepted for publication in the Journal of IEEE Transactions on Power Systems
An Architecture for Remote Container Builds and Artifact Delivery Using a Controller-Light Jenkins CI/CD Pipeline
Resource-intensive builds are often executed directly on the controller by conventional Jenkins installations, which can lower reliability and overload system resources. Jenkins functions as a containerized controller with persistent volumes in the controller-light CI/CD framework presented in this paper, delegating difficult build and packaging tasks to a remote Docker host. The controller container maintains secure SSH connections to remote compute nodes while focusing solely on orchestration and reporting. Atomic deployments with time-stamped backups, containerized build environments, immutable artifact packaging, and automated notifications are all included in the system. Faster build throughput, reduced CPU and RAM consumption on the controller, and reduced artifact delivery latency are all revealed by experimental evaluation. For small and medium-sized DevOps businesses looking for scalable automation without adding orchestration complexity, this method offers a repeatable, low-maintenance solution.
comment: v2: revised writing and presentation; clarified contributions and experimental discussion
Closing the Loop Inside Neural Networks: Causality-Guided Layer Adaptation for Fault Recovery Control
This paper studies the problem of real-time fault recovery control for nonlinear control-affine systems subject to actuator loss of effectiveness faults and external disturbances. We derive a two-stage framework that combines causal inference with selective online adaptation to achieve an effective learning-based recovery control method. In the offline phase, we develop a causal layer attribution technique based on the average causal effect (ACE) to evaluate the relative importance of each layer in a pretrained deep neural network (DNN) controller compensating for faults. This methodology identifies a subset of high-impact layers responsible for robust fault compensation. In the online phase, we deploy a Lyapunov-based gradient update to adapt only the ACE-selected layer to circumvent the need for full-network or last-layer only updates. The proposed adaptive controller guarantees uniform ultimate boundedness (UUB) with exponential convergence of the closed-loop system in the presence of actuator faults and external disturbances. Compared to conventional adaptive DNN controllers with full-network adaptation, our methodology has a reduced computational overhead. To demonstrate the effectiveness of our proposed methodology, a case study is provided on a 3-axis attitude control system of a spacecraft with four reaction wheels.
Informativity Conditions for Multiple Signals: Properties, Experimental Design, and Applications (extended version)
Recent studies highlight the importance of persistently exciting condition in single signal sequence for model identification and data-driven control methodologies. However, maintaining prolonged excitation in control signals introduces significant challenges, as continuous excitation can reduce the lifetime of mechanical devices. In this paper, we introduce three informativity conditions for various types of multi-signal data, each augmented by weight factors. We explore the interrelations between these conditions and their rank properties in linear time-invariant systems. Furthermore, we introduce open-loop experimental design methods tailored to each of the three conditions, which can synthesize the required excitation conditions either offline or online, even in the presence of limited information within each signal segment. We demonstrate the effectiveness of these informativity conditions in least-squares identification. Additionally, all three conditions can extend Willems' fundamental lemma and are utilized to assess the properties of the system. Illustrative examples confirm that these conditions yield satisfactory outcomes in both least-squares identification and the construction of data-driven controllers.
Peer-Aware Cost Estimation in Nonlinear General-Sum Dynamic Games for Mutual Learning and Intent Inference AAMAS 2026
Dynamic game theory is a powerful tool in modeling multi-agent interactions and human-robot systems. In practice, since the objective functions of both agents may not be explicitly known to each other, these interactions can be modeled as incomplete-information general-sum dynamic games. Solving for equilibrium policies for such games presents a major challenge, especially if the games involve nonlinear underlying dynamics. To simplify the problem, existing work often assumes that one agent is an expert with complete information about its peer, which can lead to biased estimates and failures in coordination. To address this challenge, we propose a nonlinear peer-aware cost estimation (N-PACE) algorithm for general-sum dynamic games. In N-PACE, using iterative linear quadratic (ILQ) approximation of dynamic games, each agent explicitly models the learning dynamics of its peer agent while inferring their objective functions and updating its own control policy accordingly in real time, which leads to unbiased and fast learning of the unknown objective function of the peer agent. Additionally, we demonstrate how N-PACE enables intent communication by explicitly modeling the peer's learning dynamics. Finally, we show how N-PACE outperforms baseline methods that disregard the learning behavior of the other agent, both analytically and using our case studies
comment: Extended version of our AAMAS 2026 accepted paper with an expanded appendix. Compared to the previous arXiv version, we add theoretical guarantees, additional experiments, new baselines, and more in-depth appendix discussion, along with a link to the GitHub repository
Robotics
Diffusion Forcing for Multi-Agent Interaction Sequence Modeling
Understanding and generating multi-person interactions is a fundamental challenge with broad implications for robotics and social computing. While humans naturally coordinate in groups, modeling such interactions remains difficult due to long temporal horizons, strong inter-agent dependencies, and variable group sizes. Existing motion generation methods are largely task-specific and do not generalize to flexible multi-agent generation. We introduce MAGNet (Multi-Agent Diffusion Forcing Transformer), a unified autoregressive diffusion framework for multi-agent motion generation that supports a wide range of interaction tasks through flexible conditioning and sampling. MAGNet performs dyadic prediction, partner inpainting, and full multi-agent motion generation within a single model, and can autoregressively generate ultra-long sequences spanning hundreds of v. Building on Diffusion Forcing, we introduce key modifications that explicitly model inter-agent coupling during autoregressive denoising, enabling coherent coordination across agents. As a result, MAGNet captures both tightly synchronized activities (e.g, dancing, boxing) and loosely structured social interactions. Our approach performs on par with specialized methods on dyadic benchmarks while naturally extending to polyadic scenarios involving three or more interacting people, enabled by a scalable architecture that is agnostic to the number of agents. We refer readers to the supplemental video, where the temporal dynamics and spatial coordination of generated interactions are best appreciated. Project page: https://von31.github.io/MAGNet/
RadarGen: Automotive Radar Point Cloud Generation from Cameras
We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar domain by representing radar measurements in bird's-eye-view form that encodes spatial structure together with radar cross section (RCS) and Doppler attributes. A lightweight recovery step reconstructs point clouds from the generated maps. To better align generation with the visual scene, RadarGen incorporates BEV-aligned depth, semantic, and motion cues extracted from pretrained foundation models, which guide the stochastic generation process toward physically plausible radar patterns. Conditioning on images makes the approach broadly compatible, in principle, with existing visual datasets and simulation frameworks, offering a scalable direction for multimodal generative simulation. Evaluations on large-scale driving data show that RadarGen captures characteristic radar measurement distributions and reduces the gap to perception models trained on real data, marking a step toward unified generative simulation across sensing modalities.
comment: Project page: https://radargen.github.io/
AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .
comment: 28 pages, 25 figures. The first four authors contributed equally
Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes
We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95\% success, strongly outperforming prior methods that peak at 68\%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.
UniStateDLO: Unified Generative State Estimation and Tracking of Deformable Linear Objects Under Occlusion for Constrained Manipulation
Perception of deformable linear objects (DLOs), such as cables, ropes, and wires, is the cornerstone for successful downstream manipulation. Although vision-based methods have been extensively explored, they remain highly vulnerable to occlusions that commonly arise in constrained manipulation environments due to surrounding obstacles, large and varying deformations, and limited viewpoints. Moreover, the high dimensionality of the state space, the lack of distinctive visual features, and the presence of sensor noises further compound the challenges of reliable DLO perception. To address these open issues, this paper presents UniStateDLO, the first complete DLO perception pipeline with deep-learning methods that achieves robust performance under severe occlusion, covering both single-frame state estimation and cross-frame state tracking from partial point clouds. Both tasks are formulated as conditional generative problems, leveraging the strong capability of diffusion models to capture the complex mapping between highly partial observations and high-dimensional DLO states. UniStateDLO effectively handles a wide range of occlusion patterns, including initial occlusion, self-occlusion, and occlusion caused by multiple objects. In addition, it exhibits strong data efficiency as the entire network is trained solely on a large-scale synthetic dataset, enabling zero-shot sim-to-real generalization without any real-world training data. Comprehensive simulation and real-world experiments demonstrate that UniStateDLO outperforms all state-of-the-art baselines in both estimation and tracking, producing globally smooth yet locally precise DLO state predictions in real time, even under substantial occlusions. Its integration as the front-end module in a closed-loop DLO manipulation system further demonstrates its ability to support stable feedback control in complex, constrained 3-D environments.
comment: The first two authors contributed equally. Project page: https://unistatedlo.github.io
A Dual Quaternion based RRT* Path Planning Approach for Satellite Rendezvous and Docking
This paper proposes a sampling-based motion planner that employs a dual quaternion representation to generate smooth, collision-free six-degree-of-freedom pose trajectories for satellite rendezvous and docking under keep-out zone constraints. The proposed planner integrates the dual quaternion algebra directly into an RRT* framework, thereby enabling natural screw motion interpolation in SE(3). The dual quaternion-based RRT* has been implemented in Python and demonstrated on a representative multi-obstacle scenario. A comparison with a standard RRT* using separate translation and quaternion steering highlights the enhanced pose continuity and obstacle avoidance of the proposed method. The present approach is purely kinematic in nature and does not take into account relative orbital dynamics. Consequently, the resulting path provides a preliminary estimate for a subsequent optimisation-based trajectory planner, which will refine the motion with dynamic constraints for the purpose of practical satellite rendezvous and docking missions.
comment: 6 pages, CAMSAT 2025, This work has been accepted to IFAC
Vidarc: Embodied Video Diffusion Model for Closed-loop Control
Robotic arm manipulation in data-scarce settings is a highly challenging task due to the complex embodiment dynamics and diverse contexts. Recent video-based approaches have shown great promise in capturing and transferring the temporal and physical interactions by pre-training on Internet-scale video data. However, such methods are often not optimized for the embodiment-specific closed-loop control, typically suffering from high latency and insufficient grounding. In this paper, we present Vidarc (Video Diffusion for Action Reasoning and Closed-loop Control), a novel autoregressive embodied video diffusion approach augmented by a masked inverse dynamics model. By grounding video predictions with action-relevant masks and incorporating real-time feedback through cached autoregressive generation, Vidarc achieves fast, accurate closed-loop control. Pre-trained on one million cross-embodiment episodes, Vidarc surpasses state-of-the-art baselines, achieving at least a 15% higher success rate in real-world deployment and a 91% reduction in latency. We also highlight its robust generalization and error correction capabilities across previously unseen robotic platforms.
Learning Safe Autonomous Driving Policies Using Predictive Safety Representations ICRA 2026
Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective creates a fundamental tension, as overly conservative policies limit driving efficiency while aggressive exploration risks safety violations. The Safety Representations for Safer Policy Learning (SRPL) framework addresses this challenge by equipping agents with a predictive model of future constraint violations and has shown promise in controlled environments. This paper investigates whether SRPL extends to real-world autonomous driving scenarios. Systematic experiments on the Waymo Open Motion Dataset (WOMD) and NuPlan demonstrate that SRPL can improve the reward-safety tradeoff, achieving statistically significant improvements in success rate (effect sizes r = 0.65-0.86) and cost reduction (effect sizes r = 0.70-0.83), with p < 0.05 for observed improvements. However, its effectiveness depends on the underlying policy optimizer and the dataset distribution. The results further show that predictive safety representations play a critical role in improving robustness to observation noise. Additionally, in zero-shot cross-dataset evaluation, SRPL-augmented agents demonstrate improved generalization compared to non-SRPL methods. These findings collectively demonstrate the potential of predictive safety representations to strengthen SafeRL for autonomous driving.
comment: 8 pages, 4 figures. Submitted to ICRA 2026
Optimized Scheduling and Positioning of Mobile Manipulators in Collaborative Applications
The growing integration of mobile robots in shared workspaces requires efficient path planning and coordination between the agents, accounting for safety and productivity. In this work, we propose a digital model-based optimization framework for mobile manipulators in human-robot collaborative environments, in order to determine the sequence of robot base poses and the task scheduling for the robot. The complete problem is treated as black-box, and Particle Swarm Optimization (PSO) is employed to balance conflicting Key-Performance Indicators (KPIs). We demonstrate improvements in cycle time, task sequencing, and adaptation to human presence in a collaborative box-packing scenario.
comment: Accepted at The IFAC Joint Conference on Computers, Cognition and Communication (J3C) 2025
On Using Neural Networks to Learn Safety Speed Reduction in Human-Robot Collaboration: A Comparative Analysis
In Human-Robot Collaboration, safety mechanisms such as Speed and Separation Monitoring and Power and Force Limitation dynamically adjust the robot's speed based on human proximity. While essential for risk reduction, these mechanisms introduce slowdowns that makes cycle time estimation a hard task and impact job scheduling efficiency. Existing methods for estimating cycle times or designing schedulers often rely on predefined safety models, which may not accurately reflect real-world safety implementations, as these depend on case-specific risk assessments. In this paper, we propose a deep learning approach to predict the robot's safety scaling factor directly from process execution data. We analyze multiple neural network architectures and demonstrate that a simple feed-forward network effectively estimates the robot's slowdown. This capability is crucial for improving cycle time predictions and designing more effective scheduling algorithms in collaborative robotic environments.
comment: Accepted at IEEE Internation Conference on Emerging Technologies and Factory Automation (ETFA) 2025
Kinematics-Aware Diffusion Policy with Consistent 3D Observation and Action Space for Whole-Arm Robotic Manipulation
Whole-body control of robotic manipulators with awareness of full-arm kinematics is crucial for many manipulation scenarios involving body collision avoidance or body-object interactions, which makes it insufficient to consider only the end-effector poses in policy learning. The typical approach for whole-arm manipulation is to learn actions in the robot's joint space. However, the unalignment between the joint space and actual task space (i.e., 3D space) increases the complexity of policy learning, as generalization in task space requires the policy to intrinsically understand the non-linear arm kinematics, which is difficult to learn from limited demonstrations. To address this issue, this letter proposes a kinematics-aware imitation learning framework with consistent task, observation, and action spaces, all represented in the same 3D space. Specifically, we represent both robot states and actions using a set of 3D points on the arm body, naturally aligned with the 3D point cloud observations. This spatially consistent representation improves the policy's sample efficiency and spatial generalizability while enabling full-body control. Built upon the diffusion policy, we further incorporate kinematics priors into the diffusion processes to guarantee the kinematic feasibility of output actions. The joint angle commands are finally calculated through an optimization-based whole-body inverse kinematics solver for execution. Simulation and real-world experimental results demonstrate higher success rates and stronger spatial generalizability of our approach compared to existing methods in body-aware manipulation policy learning.
comment: The first two authors contributed equally. Project Website: https://kinematics-aware-diffusion-policy.github.io
Learning-Based Safety-Aware Task Scheduling for Efficient Human-Robot Collaboration
Ensuring human safety in collaborative robotics can compromise efficiency because traditional safety measures increase robot cycle time when human interaction is frequent. This paper proposes a safety-aware approach to mitigate efficiency losses without assuming prior knowledge of safety logic. Using a deep-learning model, the robot learns the relationship between system state and safety-induced speed reductions based on execution data. Our framework does not explicitly predict human motions but directly models the interaction effects on robot speed, simplifying implementation and enhancing generalizability to different safety logics. At runtime, the learned model optimizes task selection to minimize cycle time while adhering to safety requirements. Experiments on a pick-and-packaging scenario demonstrated significant reductions in cycle times.
comment: 8 pages
Deep Learning-based Robust Autonomous Navigation of Aerial Robots in Dense Forests
Autonomous aerial navigation in dense natural environments remains challenging due to limited visibility, thin and irregular obstacles, GNSS-denied operation, and frequent perceptual degradation. This work presents an improved deep learning-based navigation framework that integrates semantically enhanced depth encoding with neural motion-primitive evaluation for robust flight in cluttered forests. Several modules are incorporated on top of the original sevae-ORACLE algorithm to address limitations observed during real-world deployment, including lateral control for sharper maneuvering, a temporal consistency mechanism to suppress oscillatory planning decisions, a stereo-based visual-inertial odometry solution for drift-resilient state estimation, and a supervisory safety layer that filters unsafe actions in real time. A depth refinement stage is included to improve the representation of thin branches and reduce stereo noise, while GPU optimization increases onboard inference throughput from 4 Hz to 10 Hz. The proposed approach is evaluated against several existing learning-based navigation methods under identical environmental conditions and hardware constraints. It demonstrates higher success rates, more stable trajectories, and improved collision avoidance, particularly in highly cluttered forest settings. The system is deployed on a custom quadrotor in three boreal forest environments, achieving fully autonomous completion in all flights in moderate and dense clutter, and 12 out of 15 flights in highly dense underbrush. These results demonstrate improved reliability and safety over existing navigation methods in complex natural environments.
Adaptive Covariance and Quaternion-Focused Hybrid Error-State EKF/UKF for Visual-Inertial Odometry
This study presents an innovative hybrid Visual-Inertial Odometry (VIO) method for Unmanned Aerial Vehicles (UAVs) that is resilient to environmental challenges and capable of dynamically assessing sensor reliability. Built upon a loosely coupled sensor fusion architecture, the system utilizes a novel hybrid Quaternion-focused Error-State EKF/UKF (Qf-ES-EKF/UKF) architecture to process inertial measurement unit (IMU) data. This architecture first propagates the entire state using an Error-State Extended Kalman Filter (ESKF) and then applies a targeted Scaled Unscented Kalman Filter (SUKF) step to refine only the orientation. This sequential process blends the accuracy of SUKF in quaternion estimation with the overall computational efficiency of ESKF. The reliability of visual measurements is assessed via a dynamic sensor confidence score based on metrics, such as image entropy, intensity variation, motion blur, and inference quality, adapting the measurement noise covariance to ensure stable pose estimation even under challenging conditions. Comprehensive experimental analyses on the EuRoC MAV dataset demonstrate key advantages: an average improvement of 49% in position accuracy in challenging scenarios, an average of 57% in rotation accuracy over ESKF-based methods, and SUKF-comparable accuracy achieved with approximately 48% lower computational cost than a full SUKF implementation. These findings demonstrate that the presented approach strikes an effective balance between computational efficiency and estimation accuracy, and significantly enhances UAV pose estimation performance in complex environments with varying sensor reliability.
ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning and improve exploration efficiency, their planning remains constrained by textual representations, which cannot adequately capture spatial occupancy or scene geometry--critical factors for navigation decisions. We explore whether Vision-Language Models (VLMs) can achieve mapless visual navigation using only onboard RGB/RGB-D streams, unlocking their potential for spatial perception and planning. We achieve this through an imagination-powered navigation framework, ImagineNav++, which imagines future observation images from candidate robot views and translates navigation planning into a simple best-view image selection problem for VLMs. First, a future-view imagination module distills human navigation preferences to generate semantically meaningful viewpoints with high exploration potential. These imagined views then serve as visual prompts for the VLM to identify the most informative viewpoint. To maintain spatial consistency, we develop a selective foveation memory mechanism, which hierarchically integrates keyframe observations via a sparse-to-dense framework, constructing a compact yet comprehensive memory for long-term spatial reasoning. This approach transforms goal-oriented navigation into a series of tractable point-goal navigation tasks. Extensive experiments on open-vocabulary object and instance navigation benchmarks show that ImagineNav++ achieves SOTA performance in mapless settings, even surpassing most map-based methods, highlighting the importance of scene imagination and memory in VLM-based spatial reasoning.
comment: 17 pages, 10 figures. arXiv admin note: text overlap with arXiv:2410.09874
Personalized Gait Patterns During Exoskeleton-Aided Training May Have Minimal Effect on User Experience. Insights from a Pilot Study
Robot-aided gait rehabilitation facilitates high-intensity and repeatable therapy. However, most exoskeletons rely on pre-recorded, non-personalized gait trajectories constrained to the sagittal plane, potentially limiting movement naturalness and user comfort. We present a data-driven gait personalization framework for an exoskeleton that supports multi-planar motion, including hip abduction/adduction and pelvic translation and rotation. Personalized trajectories to individual participants were generated using regression models trained on anthropometric, demographic, and walking speed data from a normative database. In a within-subject experiment involving ten unimpaired participants, these personalized trajectories were evaluated in regard to comfort, naturalness, and overall experience and compared against two standard patterns from the same database: one averaging all the trajectories, and one randomly selected. We did not find relevant differences across pattern conditions, despite all trajectories being executed with high accuracy thanks to a stiff position-derivative controller. We found, however, that pattern conditions in later trials were rated as more comfortable and natural than those in the first trial, suggesting that participants might have adapted to walking within the exoskeleton, regardless of the enforced gait pattern. Our findings highlight the importance of integrating subjective feedback when designing personalized gait controllers and accounting for user adaptation during experimentation.
TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data
Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. This misalignment often triggers driver-initiated takeovers and system disengagements during closed-loop execution. How to leverage those expert takeover data from disengagement scenarios and effectively expand the IL policy's capability presents a valuable yet unexplored challenge. In this paper, we propose TakeAD, a novel preference-based post-optimization framework that fine-tunes the pre-trained IL policy with this disengagement data to enhance the closed-loop driving performance. First, we design an efficient expert takeover data collection pipeline inspired by human takeover mechanisms in real-world autonomous driving systems. Then, this post optimization framework integrates iterative Dataset Aggregation (DAgger) for imitation learning with Direct Preference Optimization (DPO) for preference alignment. The DAgger stage equips the policy with fundamental capabilities to handle disengagement states through direct imitation of expert interventions. Subsequently, the DPO stage refines the policy's behavior to better align with expert preferences in disengagement scenarios. Through multiple iterations, the policy progressively learns recovery strategies for disengagement states, thereby mitigating the open-loop gap. Experiments on the closed-loop Bench2Drive benchmark demonstrate our method's effectiveness compared with pure IL methods, with comprehensive ablations confirming the contribution of each component.
Flying in Clutter on Monocular RGB by Learning in 3D Radiance Fields with Domain Adaptation
Modern autonomous navigation systems predominantly rely on lidar and depth cameras. However, a fundamental question remains: Can flying robots navigate in clutter using solely monocular RGB images? Given the prohibitive costs of real-world data collection, learning policies in simulation offers a promising path. Yet, deploying such policies directly in the physical world is hindered by the significant sim-to-real perception gap. Thus, we propose a framework that couples the photorealism of 3D Gaussian Splatting (3DGS) environments with Adversarial Domain Adaptation. By training in high-fidelity simulation while explicitly minimizing feature discrepancy, our method ensures the policy relies on domain-invariant cues. Experimental results demonstrate that our policy achieves robust zero-shot transfer to the physical world, enabling safe and agile flight in unstructured environments with varying illumination.
comment: 8 pages, 7 figures
Neuro-Symbolic Control with Large Language Models for Language-Guided Spatial Tasks
Although large language models (LLMs) have recently become effective tools for language-conditioned control in embodied systems, instability, slow convergence, and hallucinated actions continue to limit their direct application to continuous control. A modular neuro-symbolic control framework that clearly distinguishes between low-level motion execution and high-level semantic reasoning is proposed in this work. While a lightweight neural delta controller performs bounded, incremental actions in continuous space, a locally deployed LLM interprets symbolic tasks. We assess the suggested method in a planar manipulation setting with spatial relations between objects specified by language. Numerous tasks and local language models, such as Mistral, Phi, and LLaMA-3.2, are used in extensive experiments to compare LLM-only control, neural-only control, and the suggested LLM+DL framework. In comparison to LLM-only baselines, the results show that the neuro-symbolic integration consistently increases both success rate and efficiency, achieving average step reductions exceeding 70% and speedups of up to 8.83x while remaining robust to language model quality. The suggested framework enhances interpretability, stability, and generalization without any need of reinforcement learning or costly rollouts by controlling the LLM to symbolic outputs and allocating uninterpreted execution to a neural controller trained on artificial geometric data. These outputs show empirically that neuro-symbolic decomposition offers a scalable and principled way to integrate language understanding with ongoing control, this approach promotes the creation of dependable and effective language-guided embodied systems.
RecipeMasterLLM: Revisiting RoboEarth in the Era of Large Language Models
RoboEarth was a pioneering initiative in cloud robotics, establishing a foundational framework for robots to share and exchange knowledge about actions, objects, and environments through a standardized knowledge graph. Initially, this knowledge was predominantly hand-crafted by engineers using RDF triples within OWL Ontologies, with updates, such as changes in an object's pose, being asserted by the robot's control and perception routines. However, with the advent and rapid development of Large Language Models (LLMs), we believe that the process of knowledge acquisition can be significantly automated. To this end, we propose RecipeMasterLLM, a high-level planner, that generates OWL action ontologies based on a standardized knowledge graph in response to user prompts. This architecture leverages a fine-tuned LLM specifically trained to understand and produce action descriptions consistent with the RoboEarth standardized knowledge graph. Moreover, during the Retrieval-Augmented Generation (RAG) phase, environmental knowledge is supplied to the LLM to enhance its contextual understanding and improve the accuracy of the generated action descriptions.
A Service Robot's Guide to Interacting with Busy Customers
The growing use of service robots in hospitality highlights the need to understand how to effectively communicate with pre-occupied customers. This study investigates the efficacy of commonly used communication modalities by service robots, namely, acoustic/speech, visual display, and micromotion gestures in capturing attention and communicating intention with a user in a simulated restaurant scenario. We conducted a two-part user study (N=24) using a Temi robot to simulate delivery tasks, with participants engaged in a typing game (MonkeyType) to emulate a state of busyness. The participants' engagement in the typing game is measured by words per minute (WPM) and typing accuracy. In Part 1, we compared non-verbal acoustic cue versus baseline conditions to assess attention capture during a single-cup delivery task. In Part 2, we evaluated the effectiveness of speech, visual display, micromotion and their multimodal combination in conveying specific intentions (correct cup selection) during a two-cup delivery task. The results indicate that, while speech is highly effective in capturing attention, it is less successful in clearly communicating intention. Participants rated visual as the most effective modality for intention clarity, followed by speech, with micromotion being the lowest ranked.These findings provide insights into optimizing communication strategies for service robots, highlighting the distinct roles of attention capture and intention communication in enhancing user experience in dynamic hospitality settings.
comment: Presented at ACRA 2025. 10 pages, 4 figures. Includes a user study (N=24) using the Temi robot evaluating speech, visual, and micromotion modalities
Research on Dead Reckoning Algorithm for Self-Propelled Pipeline Robots in Three-Dimensional Complex Pipelines
In the field of gas pipeline location, existing pipeline location methods mostly rely on pipeline location instruments. However, when faced with complex and curved pipeline scenarios, these methods often fail due to problems such as cable entanglement and insufficient equipment flexibility. To address this pain point, we designed a self-propelled pipeline robot. This robot can autonomously complete the location work of complex and curved pipelines in complex pipe networks without external dragging. In terms of pipeline mapping technology, traditional visual mapping and laser mapping methods are easily affected by lighting conditions and insufficient features in the confined space of pipelines, resulting in mapping drift and divergence problems. In contrast, the pipeline location method that integrates inertial navigation and wheel odometers is less affected by pipeline environmental factors. Based on this, this paper proposes a pipeline robot location method based on extended Kalman filtering (EKF). Firstly, the body attitude angle is initially obtained through an inertial measurement unit (IMU). Then, the extended Kalman filtering algorithm is used to improve the accuracy of attitude angle estimation. Finally, high-precision pipeline location is achieved by combining wheel odometers. During the testing phase, the roll wheels of the pipeline robot needed to fit tightly against the pipe wall to reduce slippage. However, excessive tightness would reduce the flexibility of motion control due to excessive friction. Therefore, a balance needed to be struck between the robot's motion capability and positioning accuracy. Experiments were conducted using the self-propelled pipeline robot in a rectangular loop pipeline, and the results verified the effectiveness of the proposed dead reckoning algorithm.
comment: 8 pages, 9 figures
Design and Research of a Self-Propelled Pipeline Robot Based on Force Analysis and Dynamic Simulation
In pipeline inspection, traditional tethered inspection robots are severely constrained by cable length and weight, which greatly limit their travel range and accessibility. To address these issues, this paper proposes a self-propelled pipeline robot design based on force analysis and dynamic simulation, with a specific focus on solving core challenges including vertical climbing failure and poor passability in T-branch pipes. Adopting a wheeled configuration and modular design, the robot prioritizes the core demand of body motion control. Specifically, 3D modeling of the robot was first completed using SolidWorks. Subsequently, the model was imported into ADAMS for dynamic simulation, which provided a basis for optimizing the drive module and motion control strategy.To verify the robot's dynamic performance, an experimental platform with acrylic pipes was constructed. Through adjusting its body posture to surmount obstacles and select directions, the robot has demonstrated its ability to stably traverse various complex pipeline scenarios. Notably, this work offers a technical feasibility reference for the application of pipeline robots in the inspection of medium and low-pressure urban gas pipelines.
comment: 7 pages, 14 figures
Semantic Co-Speech Gesture Synthesis and Real-Time Control for Humanoid Robots
We present an innovative end-to-end framework for synthesizing semantically meaningful co-speech gestures and deploying them in real-time on a humanoid robot. This system addresses the challenge of creating natural, expressive non-verbal communication for robots by integrating advanced gesture generation techniques with robust physical control. Our core innovation lies in the meticulous integration of a semantics-aware gesture synthesis module, which derives expressive reference motions from speech input by leveraging a generative retrieval mechanism based on large language models (LLMs) and an autoregressive Motion-GPT model. This is coupled with a high-fidelity imitation learning control policy, the MotionTracker, which enables the Unitree G1 humanoid robot to execute these complex motions dynamically and maintain balance. To ensure feasibility, we employ a robust General Motion Retargeting (GMR) method to bridge the embodiment gap between human motion data and the robot platform. Through comprehensive evaluation, we demonstrate that our combined system produces semantically appropriate and rhythmically coherent gestures that are accurately tracked and executed by the physical robot. To our knowledge, this work represents a significant step toward general real-world use by providing a complete pipeline for automatic, semantic-aware, co-speech gesture generation and synchronized real-time physical deployment on a humanoid robot.
Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors
Interactive reinforcement learning (IRL) has shown promise in enabling autonomous agents and robots to learn complex behaviours from human teachers, yet the dynamics of teacher selection remain poorly understood. This paper reveals an unexpected phenomenon in IRL: when given a choice between teachers with different reward structures, learning agents overwhelmingly prefer conservative, low-reward teachers (93.16% selection rate) over those offering 20x higher rewards. Through 1,250 experimental runs in navigation tasks with multiple expert teachers, we discovered: (1) Conservative bias dominates teacher selection: agents systematically choose the lowest-reward teacher, prioritising consistency over optimality; (2) Critical performance thresholds exist at teacher availability rho >= 0.6 and accuracy omega >= 0.6, below which the framework fails catastrophically; (3) The framework achieves 159% improvement over baseline Q-learning under concept drift. These findings challenge fundamental assumptions about optimal teaching in RL and suggest potential implications for human-robot collaboration, where human preferences for safety and consistency may align with the observed agent selection behaviour, potentially informing training paradigms for safety-critical robotic applications.
comment: 10 pages, 5 figures. Accepted at ACRA 2025 (Australasian Conference on Robotics and Automation)
Towards Senior-Robot Interaction: Reactive Robot Dog Gestures
As the global population ages, many seniors face the problem of loneliness. Companion robots offer a potential solution. However, current companion robots often lack advanced functionality, while task-oriented robots are not designed for social interaction, limiting their suitability and acceptance by seniors. Our work introduces a senior-oriented system for quadruped robots that allows for more intuitive user input and provides more socially expressive output. For user input, we implemented a MediaPipe-based module for hand gesture and head movement recognition, enabling control without a remote. For output, we designed and trained robotic dog gestures using curriculum-based reinforcement learning in Isaac Gym, progressing from simple standing to three-legged balancing and leg extensions, and more. The final tests achieved over 95\% success on average in simulation, and we validated a key social gesture (the paw-lift) on a Unitree robot. Real-world tests demonstrated the feasibility and social expressiveness of this framework, while also revealing sim-to-real challenges in joint compliance, load distribution, and balance control. These contributions advance the development of practical quadruped robots as social companions for the senior and outline pathways for sim-to-real adaptation and inform future user studies.
comment: Accepted at the Australasian Conference on Robotics and Automation (ACRA) 2025
Towards Autonomous Navigation in Endovascular Interventions
Cardiovascular diseases remain the leading cause of global mortality, with minimally invasive treatment options offered through endovascular interventions. However, the precision and adaptability of current robotic systems for endovascular navigation are limited by heuristic control, low autonomy, and the absence of haptic feedback. This thesis presents an integrated AI-driven framework for autonomous guidewire navigation in complex vascular environments, addressing key challenges in data availability, simulation fidelity, and navigational accuracy. A high-fidelity, real-time simulation platform, CathSim, is introduced for reinforcement learning based catheter navigation, featuring anatomically accurate vascular models and contact dynamics. Building on CathSim, the Expert Navigation Network is developed, a policy that fuses visual, kinematic, and force feedback for autonomous tool control. To mitigate data scarcity, the open-source, bi-planar fluoroscopic dataset Guide3D is proposed, comprising more than 8,700 annotated images for 3D guidewire reconstruction. Finally, SplineFormer, a transformer-based model, is introduced to directly predict guidewire geometry as continuous B-spline parameters, enabling interpretable, real-time navigation. The findings show that combining high-fidelity simulation, multimodal sensory fusion, and geometric modelling substantially improves autonomous endovascular navigation and supports safer, more precise minimally invasive procedures.
SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning
Imitation learning (IL) has shown immense promise in enabling autonomous dexterous manipulation, including learning surgical tasks. To fully unlock the potential of IL for surgery, access to clinical datasets is needed, which unfortunately lack the kinematic data required for current IL approaches. A promising source of large-scale surgical demonstrations is monocular surgical videos available online, making monocular pose estimation a crucial step toward enabling large-scale robot learning. Toward this end, we propose SurgiPose, a differentiable rendering based approach to estimate kinematic information from monocular surgical videos, eliminating the need for direct access to ground truth kinematics. Our method infers tool trajectories and joint angles by optimizing tool pose parameters to minimize the discrepancy between rendered and real images. To evaluate the effectiveness of our approach, we conduct experiments on two robotic surgical tasks: tissue lifting and needle pickup, using the da Vinci Research Kit Si (dVRK Si). We train imitation learning policies with both ground truth measured kinematics and estimated kinematics from video and compare their performance. Our results show that policies trained on estimated kinematics achieve comparable success rates to those trained on ground truth data, demonstrating the feasibility of using monocular video based kinematic estimation for surgical robot learning. By enabling kinematic estimation from monocular surgical videos, our work lays the foundation for large scale learning of autonomous surgical policies from online surgical data.
comment: 8 pages, 6 figures, 2 tables
Design of a Polymer-based Steerable Cannula for Neurosurgical Applications
Robotically steerable compliant surgical tools offer several advantages over rigid tools, including enhanced dexterity, reduced tissue damage, and the ability to generate non-linear trajectories in minimally invasive neurosurgical procedures. Many existing robotic neurosurgical tools are designed using stainless steel or nitinol materials. Using polymer-based materials instead can offer advantages such as reduced interference in magnetic resonance imaging, enhanced safety for guiding electrically powered instruments, and reduced tissue damage due to inherent compliance. Several polymer materials have been used in robotic surgical applications, such as polyimide, polycarbonate, and elastic resin. Various fabrication strategies have also been proposed, including standard microfabrication techniques, thermal drawing, and 3-D printing. In our previous work, a tendon-driven, notched-tube was designed for several neurosurgical robotic tools, utilizing laser micromachining to reduce the stiffness of the tube in certain directions. This fabrication method is desirable because it has a single-step process, has high precision, and does not require a cleanroom or harsh chemicals. Past studies have explored laser-micromachining of polymer material for surgical applications such as stent fabrication. In this work, we explore extending the use of the laser micromachining approach to the fabrication of polyimide (PI) robotically steerable cannulas for neurosurgical applications. Utilizing the method presented in this work, we fabricated joints as small as 1.5 mm outer diameter (OD). Multiple joints were fabricated using PI tubes of different ODs, and the loading behavior of the fabricated joints was experimentally characterized.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Design and Integration of Thermal and Vibrotactile Feedback for Lifelike Touch in Social Robots
Zoomorphic Socially Assistive Robots (SARs) offer an alternative source of social touch for individuals who cannot access animal companionship. However, current SARs provide only limited, passive touch-based interactions and lack the rich haptic cues, such as warmth, heartbeat or purring, that are characteristic of human-animal touch. This limits their ability to evoke emotionally engaging, life-like physical interactions. We present a multimodal tactile prototype, which was used to augment the established PARO robot, integrating thermal and vibrotactile feedback to simulate feeling biophysiological signals. A flexible heating interface delivers body-like warmth, while embedded actuators generate heartbeat-like rhythms and continuous purring sensations. These cues were iteratively designed and calibrated with input from users and haptics experts. We outline the design process and offer reproducible guidelines to support the development of emotionally resonant and biologically plausible touch interactions with SARs.
Embodied4C: Measuring What Matters for Embodied Vision-Language Navigation
Vision-language navigation requires agents to reason and act under constraints of embodiment. While vision-language models (VLMs) demonstrate strong generalization, current benchmarks provide limited understanding of how embodiment -- i.e., the choice of physical platform, sensor configuration, and modality alignment -- influences perception, reasoning, and control. We introduce Embodied4C, a closed-loop benchmark designed as a Turing test for embodied reasoning. The benchmark evaluates the core embodied capabilities of VLMs across three heterogeneous embodiments -- autonomous vehicles, aerial drones, and robotic manipulators -- through approximately 1.1K one-shot reasoning questions and 58 goal-directed navigation tasks. These tasks jointly assess four foundational dimensions: semantic, spatial, temporal, and physical reasoning. Each embodiment presents dynamic sensor configurations and environment variations to probe generalization beyond platform-specific adaptation. To prevent embodiment overfitting, Embodied4C integrates domain-far queries targeting abstract and cross-context reasoning. Comprehensive evaluation across ten state-of-the-art VLMs and four embodied control baselines shows that cross-modal alignment and instruction tuning matter more than scale, while spatial and temporal reasoning remains the primary bottleneck for reliable embodied competence.
Robotic VLA Benefits from Joint Learning with Motion Image Diffusion
Vision-Language-Action (VLA) models have achieved remarkable progress in robotic manipulation by mapping multimodal observations and instructions directly to actions. However, they typically mimic expert trajectories without predictive motion reasoning, which limits their ability to reason about what actions to take. To address this limitation, we propose joint learning with motion image diffusion, a novel strategy that enhances VLA models with motion reasoning capabilities. Our method extends the VLA architecture with a dual-head design: while the action head predicts action chunks as in vanilla VLAs, an additional motion head, implemented as a Diffusion Transformer (DiT), predicts optical-flow-based motion images that capture future dynamics. The two heads are trained jointly, enabling the shared VLM backbone to learn representations that couple robot control with motion knowledge. This joint learning builds temporally coherent and physically grounded representations without modifying the inference pathway of standard VLAs, thereby maintaining test-time latency. Experiments in both simulation and real-world environments demonstrate that joint learning with motion image diffusion improves the success rate of pi-series VLAs to 97.5% on the LIBERO benchmark and 58.0% on the RoboTwin benchmark, yielding a 23% improvement in real-world performance and validating its effectiveness in enhancing the motion reasoning capability of large-scale VLAs.
comment: Website: https://vla-motion.github.io/
Unifying Deep Predicate Invention with Pre-trained Foundation Models
Long-horizon robotic tasks are hard due to continuous state-action spaces and sparse feedback. Symbolic world models help by decomposing tasks into discrete predicates that capture object properties and relations. Existing methods learn predicates either top-down, by prompting foundation models without data grounding, or bottom-up, from demonstrations without high-level priors. We introduce UniPred, a bilevel learning framework that unifies both. UniPred uses large language models (LLMs) to propose predicate effect distributions that supervise neural predicate learning from low-level data, while learned feedback iteratively refines the LLM hypotheses. Leveraging strong visual foundation model features, UniPred learns robust predicate classifiers in cluttered scenes. We further propose a predicate evaluation method that supports symbolic models beyond STRIPS assumptions. Across five simulated and one real-robot domains, UniPred achieves 2-4 times higher success rates than top-down methods and 3-4 times faster learning than bottom-up approaches, advancing scalable and flexible symbolic world modeling for robotics.
comment: 18 pages, 11 figures
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected static web data. As a result, despite improved semantic generalization, the policy must implicitly infer complex physical dynamics and temporal dependencies solely from robot trajectories. This reliance creates an unsustainable data burden, necessitating continuous, large-scale expert data collection to compensate for the lack of innate physical understanding. We contend that while vision-language pretraining effectively captures semantic priors, it remains blind to physical causality. A more effective paradigm leverages video to jointly capture semantics and visual dynamics during pretraining, thereby isolating the remaining task of low-level control. To this end, we introduce mimic-video, a novel Video-Action Model (VAM) that pairs a pretrained Internet-scale video model with a flow matching-based action decoder conditioned on its latent representations. The decoder serves as an Inverse Dynamics Model (IDM), generating low-level robot actions from the latent representation of video-space action plans. Our extensive evaluation shows that our approach achieves state-of-the-art performance on simulated and real-world robotic manipulation tasks, improving sample efficiency by 10x and convergence speed by 2x compared to traditional VLA architectures.
comment: Revised Introduction, Related Work, and Appendix. Additional minor notational and grammatical fixes
Maintaining the Level of a Payload carried by Multi-Robot System on Irregular Surface
In this paper, we introduce a multi robot payload transport system to carry payloads through an environment of unknown and uneven inclinations while maintaining the desired orientation of the payload. For this task, we used custom built robots with a linear actuator (pistons) mounted on top of each robot. The system continuously monitors the payload's orientation and computes the required piston height of each robot to maintain the desired orientation of the payload. In this work, we propose an open loop controller coupled with a closed loop PID controller to achieve the goal. As our modelling makes no assumptions on the type of terrain, the system can work on any unknown and uneven terrains and inclinations. We showcase the efficacy of our proposed controller by testing it on various simulated environments with varied and complex terrains.
Mitigating Undesired Conditions in Flexible Production with Product-Process-Resource Asset Knowledge Graphs ISWC 2025
Contemporary industrial cyber-physical production systems (CPPS) composed of robotic workcells face significant challenges in the analysis of undesired conditions due to the flexibility of Industry 4.0 that disrupts traditional quality assurance mechanisms. This paper presents a novel industry-oriented semantic model called Product-Process-Resource Asset Knowledge Graph (PPR-AKG), which is designed to analyze and mitigate undesired conditions in flexible CPPS. Built on top of the well-proven Product-Process-Resource (PPR) model originating from ISA-95 and VDI-3682, a comprehensive OWL ontology addresses shortcomings of conventional model-driven engineering for CPPS, particularly inadequate undesired condition and error handling representation. The integration of semantic technologies with large language models (LLMs) provides intuitive interfaces for factory operators, production planners, and engineers to interact with the entire model using natural language. Evaluation with the use case addressing electric vehicle battery remanufacturing demonstrates that the PPR-AKG approach efficiently supports resource allocation based on explicitly represented capabilities as well as identification and mitigation of undesired conditions in production. The key contributions include (1) a holistic PPR-AKG model capturing multi-dimensional production knowledge, and (2) the useful combination of the PPR-AKG with LLM-based chatbots for human interaction.
comment: Originally published online by CEUR Workshop Proceedings (CEUR-WS.org, ISSN 1613-0073) within ISWC 2025 Companion Volume. Available online: https://ceur-ws.org/Vol-4085/ and https://ceur-ws.org/Vol-4085/paper9.pdf
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation
Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pipelines suffer from inaccurate motion retargeting, low data-collection efficiency, and missing high-resolution fingertip tactile sensing. We address this gap with MILE, a mechanically isomorphic teleoperation and data-collection system co-designed from human hand to exoskeleton to robotic hand. The exoskeleton is anthropometrically derived from the human hand, and the robotic hand preserves one-to-one joint-position isomorphism, eliminating nonlinear retargeting and enabling precise, natural control. The exoskeleton achieves a multi-joint mean absolute angular error below one degree, while the robotic hand integrates compact fingertip visuotactile modules that provide high-resolution tactile observations. Built on this retargeting-free interface, we teleoperate complex, contact-rich in-hand manipulation and efficiently collect a multimodal dataset comprising high-resolution fingertip visuotactile signals, RGB-D images, and joint positions. The teleoperation pipeline achieves a mean success rate improvement of 64%. Incorporating fingertip tactile observations further increases the success rate by an average of 25% over the vision-only baseline, validating the fidelity and utility of the dataset. Further details are available at: https://sites.google.com/view/mile-system.
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.
CBMC-V3: A CNS-inspired Control Framework Towards Manipulation Agility with SNN
As robotic arm applications extend beyond industrial settings into service-oriented sectors such as catering, household and retail, existing control algorithms struggle to achieve the agile manipulation required for complex environments with dynamic trajectories, unpredictable interactions, and diverse objects. This paper presents a biomimetic control framework based on Spiking Neural Networks (SNNs), inspired by the human Central Nervous System (CNS), to achieve agile control in such environments. The proposed framework features five control modules (cerebral cortex, cerebellum, thalamus, brainstem, and spinal cord), three hierarchical control levels (first-order, second-order, and third-order), and two information pathways (ascending and descending). Each module is fully implemented using SNN. The spinal cord module uses spike encoding and Leaky Integrate-and-Fire (LIF) neurons for feedback control. The brainstem module employs a network of LIF and non-spiking LIF neurons to dynamically adjust spinal cord parameters via reinforcement learning. The thalamus module similarly employs a network of LIF and non-spiking LIF neurons to adjust the cerebellum's torque outputs via reinforcement learning. The cerebellum module, which provides feedfoward gravity compensation torques, uses a recurrent SNN to learn the robotic arm's dynamics through regression. The framework is validated both in simulation and on real-world robotic arm platform under various loads and trajectories. Results demonstrate that our method outperforms the industrial-grade position control in manipulation agility.
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our \href{https://suyuz1.github.io/VLA-Survey-Anatomy/}{project page}.
comment: project page: https://suyuz1.github.io/VLA-Survey-Anatomy/
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
While leveraging abundant human videos and simulated robot data poses a scalable solution to the scarcity of real-world robot data, the generalization capability of existing vision-language-action models (VLAs) remains limited by mismatches in camera views, visual appearance, and embodiment morphologies. To overcome this limitation, we propose MiVLA, a generalizable VLA empowered by human-robot mutual imitation pre-training, which leverages inherent behavioral similarity between human hands and robotic arms to build a foundation of strong behavioral priors for both human actions and robotic control. Specifically, our method utilizes kinematic rules with left/right hand coordinate systems for bidirectional alignment between human and robot action spaces. Given human or simulated robot demonstrations, MiVLA is trained to forecast behavior trajectories for one embodiment, and imitate behaviors for another one unseen in the demonstration. Based on this mutual imitation, it integrates the behavioral fidelity of real-world human data with the manipulative diversity of simulated robot data into a unified model, thereby enhancing the generalization capability for downstream tasks. Extensive experiments conducted on both simulation and real-world platforms with three robots (ARX, PiPer and LocoMan), demonstrate that MiVLA achieves strong improved generalization capability, outperforming state-of-the-art VLAs (e.g., $\boldsymbolπ_{0}$, $\boldsymbolπ_{0.5}$ and H-RDT) by 25% in simulation, and 14% in real-world robot control tasks.
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models Against Physical Sensor Attacks AAAI 2026
Vision-Language-Action (VLA) models revolutionize robotic systems by enabling end-to-end perception-to-action pipelines that integrate multiple sensory modalities, such as visual signals processed by cameras and auditory signals captured by microphones. This multi-modality integration allows VLA models to interpret complex, real-world environments using diverse sensor data streams. Given the fact that VLA-based systems heavily rely on the sensory input, the security of VLA models against physical-world sensor attacks remains critically underexplored. To address this gap, we present the first systematic study of physical sensor attacks against VLAs, quantifying the influence of sensor attacks and investigating the defenses for VLA models. We introduce a novel "Real-Sim-Real" framework that automatically simulates physics-based sensor attack vectors, including six attacks targeting cameras and two targeting microphones, and validates them on real robotic systems. Through large-scale evaluations across various VLA architectures and tasks under varying attack parameters, we demonstrate significant vulnerabilities, with susceptibility patterns that reveal critical dependencies on task types and model designs. We further develop an adversarial-training-based defense that enhances VLA robustness against out-of-distribution physical perturbations caused by sensor attacks while preserving model performance. Our findings expose an urgent need for standardized robustness benchmarks and mitigation strategies to secure VLA deployments in safety-critical environments.
comment: Accepted by AAAI 2026 main track
Collaborative Object Handover in a Robot Crafting Assistant
Robots are increasingly working alongside people, delivering food to patrons in restaurants or helping workers on assembly lines. These scenarios often involve object handovers between the person and the robot. To achieve safe and efficient human-robot collaboration (HRC), it is important to incorporate human context in a robot's handover strategies. We develop a collaborative handover model trained on human teleoperation data collected in a naturalistic crafting task. To evaluate its performance, we conduct cross-validation experiments on the training dataset as well as a user study in the same HRC crafting task. The handover episodes and user perceptions of the autonomous handover policy were compared with those of the human teleoperated handovers. While the cross-validation experiment and user study indicate that the autonomous policy successfully achieved collaborative handovers, the comparison with human teleoperation revealed avenues for further improvements.
comment: Published at the 2025 Australasian Conference on Robotics and Automation (ACRA 2025): https://www.araa.asn.au/conference/acra-2025/
DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate a 100% success rate (vs. 90% baseline). We also present an offline variant that achieves state-of-the-art results on OGBench benchmarks, with up to 71% absolute gains on giant HumanoidMaze tasks, demonstrating our core contributions are architecture-agnostic. The method also generalizes to momentum-based control tasks and requires only log N steps for replanning. Theoretical analysis and ablations validate our design choices.
Efficient Image-Goal Navigation with Representative Latent World Model
World models enable robots to conduct counterfactual reasoning in physical environments by predicting future world states. While conventional approaches often prioritize pixel-level reconstruction of future scenes, such detailed rendering is computationally intensive and unnecessary for planning tasks like navigation. We therefore propose that prediction and planning can be efficiently performed directly within a latent space of high-level semantic representations. To realize this, we introduce the Representative Latent space Navigation World Model (ReL-NWM). Rather than relying on reconstructionoriented latent embeddings, our method leverages a pre-trained representation encoder, DINOv3, and incorporates specialized mechanisms to effectively integrate action signals and historical context within this representation space. By operating entirely in the latent domain, our model bypasses expensive explicit reconstruction and achieves highly efficient navigation planning. Experiments show state-of-the-art trajectory prediction and image-goal navigation performance on multiple benchmarks. Additionally, we demonstrate real-world applicability by deploying the system on a Unitree G1 humanoid robot, confirming its efficiency and robustness in practical navigation scenarios.
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation NeurIPS 2025
Large Language Model (LLM) web agents often struggle with long-horizon web navigation and web task completion in new websites, producing inefficient action sequences unless fine-tuned on environment-specific data. We show that experience-driven memory, combined with look-ahead action simulation, is sufficient for LLM agents to adapt to unseen web environments by remembering past failures and predicting the consequences of future actions. We introduce WebATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented LLM web agent that learns a lightweight internal model of the environment from interaction experience and performs hypothetical action rollouts before acting in the real world. WebATLAS builds a persistent cognitive map via curiosity-driven exploration, stores interaction outcomes as experience-based memory, and evaluates candidate actions in cognitive space using a planner--simulator--critic loop. This enables the agent to reuse past experience, avoid previously unsuccessful behaviors, and generate more efficient plans. We evaluate WebATLAS on the WebArena-Lite benchmark for autonomous web navigation and demonstrate a success rate of 63%, outperforming the previous state-of-the-art at 53.9%. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablation studies confirm that experience-driven memory, look-ahead action simulation, and hierarchical replanning play complementary roles in enabling robust, training-free web agents.
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
We study the problem of robot navigation in dense and interactive crowds with static constraints such as corridors and furniture. Previous methods fail to consider all types of spatial and temporal interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different inputs and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous spatio-temporal graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success, navigation time, and generalization to domain shifts in challenging navigation scenarios. More information is available at https://sites.google.com/view/crowdnav-height/home.
comment: Accepted to IEEE Transactions of Automation Science and Engineering (T-ASE)
Towards a Multi-Embodied Grasping Agent
Multi-embodiment grasping focuses on developing approaches that exhibit generalist behavior across diverse gripper designs. Existing methods often learn the kinematic structure of the robot implicitly and face challenges due to the difficulty of sourcing the required large-scale data. In this work, we present a data-efficient, flow-based, equivariant grasp synthesis architecture that can handle different gripper types with variable degrees of freedom and successfully exploit the underlying kinematic model, deducing all necessary information solely from the gripper and scene geometry. Unlike previous equivariant grasping methods, we translated all modules from the ground up to JAX and provide a model with batching capabilities over scenes, grippers, and grasps, resulting in smoother learning, improved performance and faster inference time. Our dataset encompasses grippers ranging from humanoid hands to parallel yaw grippers and includes 25,000 scenes and 20 million grasps.
comment: 8 pages, 3 figures
Multiagent Systems
Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems
As LLM-based agents grow more autonomous and multi-modal, ensuring they remain controllable, auditable, and faithful to deployer intent becomes critical. Prior benchmarks measured the propensity for misaligned behavior and showed that agent personalities and tool access significantly influence misalignment. Building on these insights, we propose a Verifiability-First architecture that (1) integrates run-time attestations of agent actions using cryptographic and symbolic methods, (2) embeds lightweight Audit Agents that continuously verify intent versus behavior using constrained reasoning, and (3) enforces challenge-response attestation protocols for high-risk operations. We introduce OPERA (Observability, Provable Execution, Red-team, Attestation), a benchmark suite and evaluation protocol designed to measure (i) detectability of misalignment, (ii) time to detection under stealthy strategies, and (iii) resilience of verifiability mechanisms to adversarial prompt and persona injection. Our approach shifts the evaluation focus from how likely misalignment is to how quickly and reliably misalignment can be detected and remediated.
MAPPO-LCR: Multi-Agent Policy Optimization with Local Cooperation Reward in Spatial Public Goods Games
Spatial public goods games model collective dilemmas where individual payoffs depend on population-level strategy configurations. Most existing studies rely on evolutionary update rules or value-based reinforcement learning methods. These approaches struggle to represent payoff coupling and non-stationarity in large interacting populations. This work introduces Multi-Agent Proximal Policy Optimization (MAPPO) into spatial public goods games for the first time. In these games, individual returns are intrinsically coupled through overlapping group interactions. Proximal Policy Optimization (PPO) treats agents as independent learners and ignores this coupling during value estimation. MAPPO addresses this limitation through a centralized critic that evaluates joint strategy configurations. To study neighborhood-level cooperation signals under this framework, we propose MAPPO with Local Cooperation Reward, termed MAPPO-LCR. The local cooperation reward aligns policy updates with surrounding cooperative density without altering the original game structure. MAPPO-LCR preserves decentralized execution while enabling population-level value estimation during training. Extensive simulations demonstrate stable cooperation emergence and reliable convergence across enhancement factors. Statistical analyses further confirm the learning advantage of MAPPO over PPO in spatial public goods games.
Efficient Mixture-of-Agents Serving via Tree-Structured Routing, Adaptive Pruning, and Dependency-Aware Prefill-Decode Overlap
Mixture-of-Agents (MoA) inference can suffer from dense inter-agent communication and low hardware utilization, which jointly inflate serving latency. We present a serving design that targets these bottlenecks through an algorithm-system co-design. First, we replace dense agent interaction graphs with a hierarchical tree topology that induces structured sparsity in inter-agent communication. Second, we introduce a runtime adaptive mechanism that selectively terminates or skips downstream agent invocations using semantic agreement and confidence signals from intermediate outputs. Third, we pipeline agent execution by overlapping incremental prefilling with decoding across dependency-related agents, improving utilization and reducing inference latency. Across representative tasks, this approach substantially reduces end-to-end latency (up to 90%) while maintaining comparable accuracy (within $\pm$1%) relative to dense-connectivity MoA baselines, and can improve accuracy in certain settings.
comment: 13 pages, 4 figures, submitted to Design Automation Conference (DAC) 2026
Rethinking Multi-Agent Intelligence Through the Lens of Small-World Networks
Large language models (LLMs) have enabled multi-agent systems (MAS) in which multiple agents argue, critique, and coordinate to solve complex tasks, making communication topology a first-class design choice. Yet most existing LLM-based MAS either adopt fully connected graphs, simple sparse rings, or ad-hoc dynamic selection, with little structural guidance. In this work, we revisit classic theory on small-world (SW) networks and ask: what changes if we treat SW connectivity as a design prior for MAS? We first bridge insights from neuroscience and complex networks to MAS, highlighting how SW structures balance local clustering and long-range integration. Using multi-agent debate (MAD) as a controlled testbed, experiment results show that SW connectivity yields nearly the same accuracy and token cost, while substantially stabilizing consensus trajectories. Building on this, we introduce an uncertainty-guided rewiring scheme for scaling MAS, where long-range shortcuts are added between epistemically divergent agents using LLM-oriented uncertainty signals (e.g., semantic entropy). This yields controllable SW structures that adapt to task difficulty and agent heterogeneity. Finally, we discuss broader implications of SW priors for MAS design, framing them as stabilizers of reasoning, enhancers of robustness, scalable coordinators, and inductive biases for emergent cognitive roles.
comment: Under Review
Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis AAMAS
Industrial symbiosis fosters circularity by enabling firms to repurpose residual resources, yet its emergence is constrained by socio-spatial frictions that shape costs, matching opportunities, and market efficiency. Existing models often overlook the interaction between spatial structure, market design, and adaptive firm behavior, limiting our understanding of where and how symbiosis arises. We develop an agent-based model where heterogeneous firms trade byproducts through a spatially embedded double-auction market, with prices and quantities emerging endogenously from local interactions. Leveraging reinforcement learning, firms adapt their bidding strategies to maximize profit while accounting for transport costs, disposal penalties, and resource scarcity. Simulation experiments reveal the economic and spatial conditions under which decentralized exchanges converge toward stable and efficient outcomes. Counterfactual regret analysis shows that sellers' strategies approach a near Nash equilibrium, while sensitivity analysis highlights how spatial structures and market parameters jointly govern circularity. Our model provides a basis for exploring policy interventions that seek to align firm incentives with sustainability goals, and more broadly demonstrates how decentralized coordination can emerge from adaptive agents in spatially constrained markets.
comment: AAMAS CC-BY 4.0 licence. Adaptive Agents in Spatial Double-Auction Markets: Modeling the Emergence of Industrial Symbiosis. Full paper. In Proc. of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026), Paphos, Cyprus, May 25 - 29, 2026, IFAAMAS, 10 pages
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 48 hours in total yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems AAAI 2026
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their effectiveness in open-ended, high-complexity domains. This paper proposes a coordination framework that enables adaptiveness through three core mechanisms: dynamic task routing, bidirectional feedback, and parallel agent evaluation. The framework allows agents to reallocate tasks based on confidence and workload, exchange structured critiques to iteratively improve outputs, and crucially compete on high-ambiguity subtasks with evaluator-driven selection of the most suitable result. We instantiate these principles in a modular architecture and demonstrate substantial improvements in factual coverage, coherence, and efficiency over static and partially adaptive baselines. Our findings highlight the benefits of incorporating both adaptiveness and structured competition in multi-agent LLM systems.
comment: Accepted at AAAI 2026 Workshop on WoMAPF, Camera ready version
WebATLAS: An LLM Agent with Experience-Driven Memory and Action Simulation NeurIPS 2025
Large Language Model (LLM) web agents often struggle with long-horizon web navigation and web task completion in new websites, producing inefficient action sequences unless fine-tuned on environment-specific data. We show that experience-driven memory, combined with look-ahead action simulation, is sufficient for LLM agents to adapt to unseen web environments by remembering past failures and predicting the consequences of future actions. We introduce WebATLAS (Actor-Critic Task-completion with Look-ahead Action Simulation), a memory-augmented LLM web agent that learns a lightweight internal model of the environment from interaction experience and performs hypothetical action rollouts before acting in the real world. WebATLAS builds a persistent cognitive map via curiosity-driven exploration, stores interaction outcomes as experience-based memory, and evaluates candidate actions in cognitive space using a planner--simulator--critic loop. This enables the agent to reuse past experience, avoid previously unsuccessful behaviors, and generate more efficient plans. We evaluate WebATLAS on the WebArena-Lite benchmark for autonomous web navigation and demonstrate a success rate of 63%, outperforming the previous state-of-the-art at 53.9%. Unlike previous systems, our modular architecture requires no website-specific LLM fine-tuning. Ablation studies confirm that experience-driven memory, look-ahead action simulation, and hierarchical replanning play complementary roles in enabling robust, training-free web agents.
comment: 9 pages, NeurIPS 2025 Workshop on Language Agents and World Models
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
STAIR: Stability criterion for Time-windowed Assignment and Internal adversarial influence in Routing and decision-making
A major limitation of existing routing algorithms for multi-agent systems is that they are designed without considering the potential presence of adversarial agents in the decision-making loop, which could lead to severe performance degradation in real-life applications where adversarial agents may be present. We study autonomous pickup-and-delivery routing problems in which adversarial agents launch coordinated denial-of-service attacks by spoofing their locations. This deception causes the central scheduler to assign pickup requests to adversarial agents instead of cooperative agents. Adversarial agents then choose not to service the requests with the goal of disrupting the operation of the system, leading to delays, cancellations, and potential instability in the routing policy. Policy stability in routing problems is typically defined as the cost of the policy being uniformly bounded over time, and it has been studied through two different lenses: queuing theory and reinforcement learning (RL), which are not well suited for routing with adversaries. In this paper, we propose a new stability criterion, STAIR, which is easier to analyze than queuing-theory-based stability in adversarial settings. Furthermore, STAIR does not depend on a chosen discount factor as is the case in discounted RL stability. STAIR directly links stability to desired operational metrics, like a finite number of rejected requests. This characterization is particularly useful in adversarial settings as it provides a metric for monitoring the effect of adversaries in the operation of the system. Furthermore, we demonstrate STAIR's practical relevance through simulations on real-world San Francisco mobility-on-demand data. We also identify a phenomenon of degenerate stability that arises in the adversarial routing problem, and we introduce time-window constraints in the decision-making algorithm to mitigate it.
comment: Requires major changes
Systems and Control (EESS)
Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy
Imitation learning (IL) enables autonomous behavior by learning from expert demonstrations. While more sample-efficient than comparative alternatives like reinforcement learning, IL is sensitive to compounding errors induced by distribution shifts. There are two significant sources of distribution shifts when using IL-based feedback laws on systems: distribution shifts caused by policy error and distribution shifts due to exogenous disturbances and endogenous model errors due to lack of learning. Our previously developed approaches, Taylor Series Imitation Learning (TaSIL) and $\mathcal{L}_1$ -Distributionally Robust Adaptive Control (\ellonedrac), address the challenge of distribution shifts in complementary ways. While TaSIL offers robustness against policy error-induced distribution shifts, \ellonedrac offers robustness against distribution shifts due to aleatoric and epistemic uncertainties. To enable certifiable IL for learned and/or uncertain dynamical systems, we formulate \textit{Distributionally Robust Imitation Policy (DRIP)} architecture, a Layered Control Architecture (LCA) that integrates TaSIL and~\ellonedrac. By judiciously designing individual layer-centric input and output requirements, we show how we can guarantee certificates for the entire control pipeline. Our solution paves the path for designing fully certifiable autonomy pipelines, by integrating learning-based components, such as perception, with certifiable model-based decision-making through the proposed LCA approach.
comment: 18 pages, 5 figures
NeuRehab: A Reinforcement Learning and Spiking Neural Network-Based Rehab Automation Framework
Recent advancements in robotic rehabilitation therapy have provided modular exercise systems for post-stroke muscle recovery with basic control schemes. But these systems struggle to adapt to patients' complex and ever-changing behaviour, and to operate within mobile settings, such as heat and power. To aid this, we present NeuRehab: an end-to-end framework consisting of a training and inference pipeline with AI-based automation, co-designed with neuromorphic computing-based control systems that balance action performance, power consumption, and observed latency. The framework consists of 2 partitions. One is designated for the rehabilitation device based on ultra-low power spiking networks deployed on dedicated neuromorphic hardware. The other resides on stationary hardware that can accommodate computationally intensive hardware for fine-tuning on a per-patient basis. By maintaining a communication channel between both the modules and splitting the algorithm components, the power and latency requirements of the movable system have been optimised, while retaining the learning performance advantages of compute- and power-hungry hardware on the stationary machine. As part of the framework, we propose (a) the split machine learning processes for efficiency in architectural utilisation, and (b) task-specific temporal optimisations to lower edge-inference control latency. This paper evaluates the proposed methods on a reference stepper motor-based shoulder exercise. Overall, these methods offer comparable performance uplifts over the State-of-the-art for neuromorphic deployment, while achieving over 60% savings in both power and latency during inference compared to standard implementations.
comment: 18 pages, 22 figures, 3 tables, for submission to IOP Neuromorphic Computing and Engineering
Techno-Economic Case Study of a Rural Local Electricity Community in Switzerland
Local Electricity Communities (communautés électriques locales, CEL) will become operational in Switzerland in 2026, allowing prosumers, consumers, and storage operators within the same municipality and distribution system operator (DSO) area to exchange electricity over the public grid with reduced distribution tariffs. This report examines a rural Swiss case study to explore the techno-economic implications of CELs for both participants and the local DSO. The findings indicate that CELs can enhance the local use of renewable generation, particularly photovoltaics, and offer modest financial gains, with outcomes strongly shaped by community size, composition, and tariff design. Larger and more heterogeneous communities achieve better internal matching of supply and demand, though the overall incentive remains limited because the tariff reduction applies only to distribution charges. The study further shows that internal energy exchange is maximized when local PV generation covers roughly 1-2 times the community load. For DSOs, CELs reduce grid imports (27-46%), resulting in a substantial reduction in distribution tariff revenues (17-36%), necessitating regulatory adaptation. While centralized batteries provide economic value to members, their technical impact on the grid remains modest due to their small, economically optimized capacity. Larger centralized storage is shown to reduce transformer peak power, but risks increasing line loading, suggesting a need for careful sizing and placement.
comment: 33 pages and 12 figures
UAV-Enabled ISAC: Towards On-Demand Sensing Services and Enhanced Communication
In this paper, we investigate an integrated sensing-and-communication (ISAC) network enabled by an unmanned aerial vehicle (UAV). The UAV is supposed to fly along a periodical circular trajectory at a fixed height for ISAC service supply from the sky. We consider on-demand sensing services, where on-demand detection and on-demand localization requests may be activated at any time toward any position within the targeted serving region. While guaranteeing satisfactory accuracy for both on-demand sensing tasks, we aim at maximizing the minimum achievable throughput among all communication users, via joint optimizing the UAV trajectory and communication user scheduling. To address the complicated problem with infinite sensing constraints, we characterize the on-demand detection constraint as a restricted deployment area for UAV and the on-demand localization constraint as Cramer-Rao Bound (CRB) constraints over finite reference target points, based on which the original problem is simplified to more tractable one. Afterwards, particularly aiming to ensure no violations of CRB constraints, we propose a convex approximation for the reformulated problem, where tight approximation is guaranteed at given local solution. The construction strategy for convex problem approximation allows an efficient iterative algorithm with verified convergence to a superior suboptimal solution. At last, with simulations, we verified the applicability of our developed optimization scheme in strictly fulfilling the on-demand sensing constraints and the effectiveness of our proposed solution for simultaneously enhancing the communication throughput in UAV-enabled ISAC.
Layer-to-layer Closed-loop Switched Heating and Cooling Control of the Laser Powder Bed Fusion Process
This study investigates the stabilization of interlayer temperature in the laser powder bed fusion process through a novel switched layer-to-layer closed-loop feedback controller. The controller architecture aims to measure the interlayer temperature by a laterally positioned thermal camera and maintain a preset reference temperature by switching between the heating mode through dynamic laser power adjustment and the cooling mode by assigning interlayer dwell time to allow cooling between layers. The switching controller employs a feedback optimization control algorithm for the heating mode to adjust the laser power, and a triggering algorithm that increases the interlayer dwell time until the interlayer temperature reaches the reference value. Additionally, the study compares the performance of the proposed controller in both supported and unsupported overhanging parts to evaluate the effect of support structures on the controller performance as well as the thermal behavior of overhanging parts. Results demonstrate the controller's effectiveness in stabilizing interlayer temperature across varying cross-sectional areas while remaining within the material's stable processing zone. In the heating mode, the controller efficiently stabilizes temperature, even in geometries with significant cross-section variation. The study also identifies trade-offs among process efficiency, energy consumption, and build time. Supported parts exhibit reduced overheating but consume more energy and material, while unsupported parts stabilize interlayer temperature faster but with longer build times due to increased dwell time assignments. The research highlights notable improvements in interlayer temperature control for geometries prone to excessive thermal stresses. Moreover, the introduction of interlayer dwell time offers a practical solution to maintaining thermal stability in complex geometries.
Dispatch-Aware Learning for Optimal Transmission Switching
Optimal transmission switching (OTS) improves optimal power flow (OPF) by selectively opening transmission lines, but its mixed-integer formulation increases computational complexity, especially on large grids. To deal with this, we propose a dispatch-aware deep neural network (DA-DNN) that accelerates DC-OTS without relying on pre-solved labels. DA-DNN predicts line states and passes them through an embedded differentiable DC-OPF layer, using the resulting generation cost as the loss function so that all physical network constraints are enforced throughout training and inference. In addition, we adopt a customized weight-bias initialization that keeps every forward pass feasible from the first iteration, which allows stable learning. Once trained, the proposed DA-DNN successfully produces a feasible topology and dispatch pair in the same time as solving the DCOPF, whereas conventional mixed-integer solvers become intractable. Moreover, the embedded OPF layer enables DA-DNN to generalize to untrained system configurations, such as changes in line flow limits. As a result, the proposed method successfully captures the economic advantages of OTS while maintaining scalability and generalization ability.
comment: 10 pages, 5 figures
Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks
Optimal AP clustering and power allocation are critical in user-centric cell-free massive MIMO systems. Existing deep learning models lack flexibility to handle dynamic network configurations. Furthermore, many approaches overlook pilot contamination and suffer from high computational complexity. In this paper, we propose a lightweight transformer model that overcomes these limitations by jointly predicting AP clusters and powers solely from spatial coordinates of user devices and AP. Our model is architecture-agnostic to users load, handles both clustering and power allocation without channel estimation overhead, and eliminates pilot contamination by assigning users to AP within a pilot reuse constraint. We also incorporate a customized linear attention mechanism to capture user-AP interactions efficiently and enable linear scalability with respect to the number of users. Numerical results confirm the model's effectiveness in maximizing the minimum spectral efficiency and providing near-optimal performance while ensuring adaptability and scalability in dynamic scenarios.
comment: Submitted
Grey graphs and its application
In multi-attribute decision-making problems where the attribute values are interval grey numbers, a simplified form based on kernels and the degree of greyness is presented. Combining fuzzy graph theory with the kernel and the degree of greyness of interval grey numbers, grey graphs and their corresponding operation rules are presented. This paper presents a new multi-attribute decision-making method based on grey graph theory. We analyzed and evaluated the alternative schemes using grey graph. Lastly, a numerical example was conducted in order to demonstrate the effectiveness and feasibility of the proposed method.
Uncertainty-Aware 3D UAV Tracking Using Single-Anchor UWB Measurements
In this letter, we present an uncertainty-aware single-anchor Ultra-Wideband (UWB)-based 3D tracking framework. Specifically, a mobile Unmanned Aerial Vehicle (UAV) maintains a desired standoff distance to a moving target using range and 3D bearing measurements from a multi-antenna UWB anchor rigidly mounted on the UAV. To enhance the stability and safety under measurement degradation and motion uncertainty, we jointly design a robust factor-graph-based target localization method and a covariance-aware control Lyapunov function--control barrier function (CLF--CBF) tracking controller. This controller adaptively adjusts distance bounds and safety margins based on the posterior target covariance provided by the factor graph. The proposed system is evaluated through numerical simulations and real-world experiments carried out in a narrow indoor corridor environment.
Cooperative Energy Scheduling of Multi-Microgrids Based on Risk-Sensitive Reinforcement Learning
With the rapid development of distributed renewable energy, multi-microgrids play an increasingly important role in improving the flexibility and reliability of energy supply. Reinforcement learning has shown great potential in coordination strategies due to its model-free nature. Current methods lack explicit quantification of the relationship between individual and joint risk values, resulting in obscured credit assignment. Moreover, they often depend on explicit communication, which becomes inefficient as system complexity grows. To address these challenges, this paper proposes a risk-sensitive reinforcement learning framework with shared memory (RRL-SM) for multi-microgrid scheduling. Specifically, a risk-sensitive value factorization scheme is proposed to quantify the relationship between individual and joint risk values by leveraging distributional modeling and attention-based representations, thereby aligning local decisions with global risk objectives. An implicit shared-memory coordination mechanism is implemented through a global memory space to enhance the overall efficiency of decentralized decision-making. Collectively, the integrated approach delivers more reliable cooperative scheduling under renewable energy uncertainty. Simulation results show that RRL-SM reduces load-shedding risk by 84.5%, demonstrating a favorable balance between reliability and economic performance.
Bayesian Holonic Systems: Equilibrium, Uniqueness, and Computation
This paper addresses the challenge of modeling and control in hierarchical, multi-agent systems, known as holonic systems, where local agent decisions are coupled with global systemic outcomes. We introduce the Bayesian Holonic Equilibrium (BHE), a concept that ensures consistency between agent-level rationality and system-wide emergent behavior. We establish the theoretical soundness of the BHE by showing its existence and, under stronger regularity conditions, its uniqueness. We propose a two-time scale learning algorithm to compute such an equilibrium. This algorithm mirrors the system's structure, with a fast timescale for intra-holon strategy coordination and a slow timescale for inter-holon belief adaptation about external risks. The convergence of the algorithm to the theoretical equilibrium is validated through a numerical experiment on a continuous public good game. This work provides a complete theoretical and algorithmic framework for the principled design and analysis of strategic risk in complex, coupled control systems.
Timing-Aware Two-Player Stochastic Games with Self-Triggered Control
We study self-triggered two-player stochastic games on Piecewise Deterministic Markov Processes (PDMPs) where each agent decides when to observe and which open-loop action to hold. Augmenting the state with clocks and committed controls yields flow regions (both hold) and trigger surfaces (at least one updates). The framework covers both blind simultaneous (Nash) timing and observable sequential (Stackelberg) commitments; the former leads to coupled, intractable QVIs, while the latter admits a nested Hamilton-Jacobi-Bellman quasi-variational inequality and a tractable dynamic-programming decomposition. We outline a computational scheme based on implicit differentiation of the follower's fixed point. A pursuit-evasion example illustrates the strategic timing interaction.
A Games-in-Games Paradigm for Strategic Hybrid Jump-Diffusions: Hamilton-Jacobi-Isaacs Hierarchy and Spectral Structure
This paper develops a hierarchical games-in-games control architecture for hybrid stochastic systems governed by regime-switching jump-diffusions. We model the interplay between continuous state dynamics and discrete mode transitions as a bilevel differential game: an inner layer solves a robust stochastic control problem within each regime, while a strategic outer layer modulates the transition intensities of the underlying Markov chain. A Dynkin-based analysis yields a system of coupled Hamilton-Jacobi-Isaacs (HJI) equations. We prove that for the class of Linear-Quadratic games and Exponential-Affine games, this hierarchy admits tractable semi-closed form solutions via coupled matrix differential equations. We prove that for the class of Linear-Quadratic games and Exponential-Affine games, this hierarchy admits tractable semi-closed form solutions via coupled matrix differential equations. The framework is demonstrated through a case study on adversarial market microstructure, showing how the outer layer's strategic switching pre-emptively adjusts inventory spreads against latent regime risks, which leads to a hyper-alert equilibrium.
Review of Power Electronic Solutions for Dielectric Barrier Discharge Applications
This paper presents a comprehensive review of dielectric barrier discharge (DBD) power supply topologies, aiming to bridge the gap between DBD applications and power electronics design. Two key aspects are examined: the dependence of the DBD electrical model on reactor geometry, and application-driven requirements for injected waveform characteristics, including shapes, voltage amplitude, frequency, and modulation techniques. On this basis, the paper systematically reviews two major categories of power supplies: sinusoidal types comprising transformerless and transformer-based resonant inverters, and pulsed power supplies (PPSs). The review summarizes performance trade-offs, highlights untested topologies and emerging applications, and offers guidance for advancing high-performance DBD power supply design for next-generation systems.
comment: 26 pages, 32 figures. Under conditional acceptance at IEEE Transactions on Power Electronics
Robustness of Delayed Higher Order Sliding Mode Control
In this paper, the feasibility of recently developed higher order delayed sliding mode controllers is addressed. With this aim the robustness against the measurement noise and mismatched perturbations for the systems governed by such controllers is established using ISS implicit Lyapunov-Razumikhin function approach. To illustrate proposed results, a simulation example validating the efficiency of the method is provided.
comment: 16 pages, 3 figures
Reduced-order asymptotic observer of nonlinear friction for precise motion control
Nonlinear friction has long been, and continues to be, one of the major challenges for precision motion control systems. A linear asymptotic observer of the motion state variables with nonlinear friction uses a dedicated state-space representation of the dynamic friction force (including pre-sliding) [25], which is robust to the noisy input signals. The observer implements the reduced-order Luenberger observation law, while assuming the output displacement is the only available measurement. The latter is relevant for multiple motion control applications with use of encoder-type sensors. The uniform asymptotic stability and convergence analysis of the proposed observer are elaborated in the present work by using the Lyapunov function-based stability criterion by Ignatyev and imposing parametric constraints on the time-dependent eigenvalues of the system matrix to be always negative real. A design procedure for assigning a dominant, and thus slowest, real pole of the observer is proposed. Explanative numerical examples accompany the developed analysis. In addition, a thorough experimental evaluation is given for the proposed observer-based friction compensation which is performed for positioning and tracking tasks. The observer-based compensation, which can serve as a plug-in to a standard feedback controller, extends a PID feedback control that is optimally tuned for disturbance suppression. The experimental results are compared with and without the plugged-in observer-based compensator.
comment: 10 pages, 6 figures
Combinatorial Control Barrier Functions: Nested Boolean and p-choose-r Compositions of Safety Constraints
This paper investigates the problem of composing multiple control barrier functions (CBFs) -- and matrix control barrier functions (MCBFs) -- through logical and combinatorial operations. Standard CBF formulations naturally enable conjunctive (AND) combinations, but disjunctive (OR) and more general logical structures introduce nonsmoothness and possibly a combinatorial blow-up in the number of logical combinations. We introduce the framework of combinatorial CBFs that addresses p-choose-r safety specifications and their nested composition. The proposed framework ensures safety for the exact safe set in a scalable way, using the original number of primitive constraints. We establish theoretical guarantees on safety under these compositions, and we demonstrate their use on a patrolling problem in a multi-agent system.
comment: 6 pages, 3 figures, Accepted for publication in Control System Letters (L-CSS) with the possibility of presenting at the American Control Conference (ACC) 2026
The Battle of the Water Futures
The highly anticipated 'Battle of the Water Networks' is back with a new challenge for the water community. This competition will be hosted at the 4th International Joint Conference on Water Distribution Systems Analysis and Computing and Control in the Water Industry (WDSA/CCWI 2026), taking place in Paphos, Cyprus, from May 18-21, 2026. This competition embodies the core mission of Water-Futures and the theme for WDSA/CCWI 2026: "Designing the next generation of urban water (and wastewater) systems." The objective is to design and operate a water distribution system over a long-term horizon under deep uncertainty, with interventions applied in stages. For the first time, this challenge features a staged-design approach, unobservable and unknown uncertainties, and incorporates elements of policymaking and artificial intelligence. The solutions will be assessed using a transparent and inspectable open-source evaluation framework.
Quantifying the Dunkelflaute -- An analysis of variable renewable energy droughts in Europe
Variable renewable energy droughts, also called "Dunkelflaute", emerge as a challenge for climate-neutral energy systems based on variable renewables. Here we characterize European drought events for on- and offshore wind power, solar photovoltaics, and renewable technology portfolios, using 38 historic weather years and an advanced identification method. Their characteristics heavily depend on the chosen drought threshold, questioning the usefulness of single-threshold analyses. Applying a multi-threshold framework, we quantify how the complementarity of wind and solar power temporally and spatially alleviates drought frequency, return periods, duration, and severity within (portfolio effect) and across countries (balancing effect). We identify the most extreme droughts, which drive major discharging periods of long-duration storage in a fully renewable European energy system, based on a policy-relevant decarbonization scenario. Such events comprise sequences of shorter droughts of varying severity. The most extreme event occurred in winter 1996/97 and lasted 55 days in an idealized, perfectly interconnected setting. The average renewable availability during this period was still 47% of its long-run mean. System planners must consider such events when planning for storage and other flexibility technologies. Methodologically, we conclude that using arbitrary single calendar years is not suitable for modeling weather-resilient energy scenarios.
Cooperative Task Spaces for Multi-Arm Manipulation Control based on Similarity Transformations
Many tasks in human environments require collaborative behavior between multiple kinematic chains, either to provide additional support for carrying big and bulky objects or to enable the dexterity that is required for in-hand manipulation. Since these complex systems often have a very high number of degrees of freedom coordinating their movements is notoriously difficult to model. In this article, we present the derivation of the theoretical foundations for cooperative task spaces of multi-arm robotic systems based on geometric primitives defined using conformal geometric algebra. Based on the similarity transformations of these cooperative geometric primitives, we derive an abstraction of complex robotic systems that enables representing these systems in a way that directly corresponds to single-arm systems. By deriving the associated analytic and geometric Jacobian matrices, we then show the straightforward integration of our approach into classical control techniques rooted in operational space control. We demonstrate this using bimanual manipulators, humanoids and multi-fingered hands in optimal control experiments for reaching desired geometric primitives and in teleoperation experiments using differential kinematics control. We then discuss how the geometric primitives naturally embed nullspace structures into the controllers that can be exploited for introducing secondary control objectives. This work, represents the theoretical foundations of this cooperative manipulation control framework, and thus the experiments are presented in an abstract way, while giving pointers towards potential future applications.
Decentralized Voltage Control of AC Microgrids with Constant Power Loads using Control Barrier Functions
This paper proposes a novel nonlinear decentralized voltage controller for constrained regulation of meshed AC Microgrid networks with high penetration of time-varying constant power loads. Modelling the load demand as a constantly evolving unknown disturbance, the network model is reformulated in a cascaded structure composed of a nominal, \ie uncertainty-free, and an error subsystem. By adopting a suitable control barrier function, we formulate a continuous-time control law and derive analytic conditions on the tuning parameters, such that the distance between the true and the nominal state trajectories is bounded. Under sufficient conditions, we prove asymptotic stability of the cascaded dynamics with respect to an equilibrium set and also provide an estimate of the region of attraction. In addition, it is rigorously shown that the proposed nonlinear control law enforces constrained regulation around a rated voltage value, without the need of saturation devices. The operation of the closed-loop system is illustrated both via simulation and real-time HIL scenarios, demonstrating bounded operation and convergence to a neighbourhood of the desired reference vector.
comment: 12 pages
Nonlinear moving horizon estimation for robust state and parameter estimation -- extended version
We propose a moving horizon estimation scheme to estimate the states and the unknown constant parameters of general nonlinear uncertain discrete-time systems. The proposed framework and analysis explicitly do not involve the a priori verification of a particular excitation condition for the parameters. Instead, we use online information about the actual excitation of the parameters at any time during operation and ensure that the regularization term in the cost function is always automatically selected appropriately. This ensures that the state and parameter estimation error is bounded for all times, even if the parameters are never (or only rarely) excited during operation. Robust exponential stability of the state and parameter estimation error emerges under an additional uniform condition on the maximum duration of insufficient excitation. The theoretical results are illustrated by a numerical example.
comment: Replaced by version corresponding to the final published paper
Observer-based Differentially Private Consensus for Linear Multi-agent Systems
This paper investigates the differentially private consensus problem for general linear multi-agent systems (MASs) based on output feedback protocols. To protect the output information, which is considered private data and may be at high risk of exposure, Laplace noise is added to the information exchange. The conditions for achieving mean square and almost sure consensus in observer-based MASs are established using the backstepping method and the convergence theory for nonnegative almost supermartingales. It is shown that the separation principle remains valid for the consensus problem of linear MASs with decaying Laplace noise. Furthermore, the convergence rate is provided. Then, a joint design framework is developed for state estimation gain, feedback control gain, and noise to ensure the preservation of ε-differential privacy. The output information of each agent is shown to be protected at every time step. Finally, sufficient conditions are established for simultaneously achieving consensus and preserving differential privacy for linear MASs utilizing both full-order and reduced-order observers. Meanwhile, an ε*-differentially private consensus is achieved to meet the desired privacy level. Two simulation examples are provided to validate the theoretical results.
Data-Driven Nonconvex Reachability Analysis using Exact Multiplication
This paper addresses a fundamental challenge in data-driven reachability analysis: accurately representing and propagating non-convex reachable sets. We propose a novel approach using constrained polynomial zonotopes to describe reachable sets for unknown LTI systems. Unlike constrained zonotopes commonly used in existing literature, constrained polynomial zonotopes are closed under multiplication with constrained matrix zonotopes. We leverage this property to develop an exact multiplication method that preserves the non-convex geometry of reachable sets without resorting to approximations. We demonstrate that our approach provides tighter over-approximations of reachable sets for LTI systems compared to conventional methods.
comment: This paper has been accepted at the 64th IEEE Conference on Decision and Control (CDC 2025)
BattBee: Equivalent Circuit Modeling and Early Detection of Thermal Runaway Triggered by Internal Short Circuits for Lithium-Ion Batteries
Lithium-ion batteries are the enabling power source for transportation electrification. However, in real-world applications, they remain vulnerable to internal short circuits (ISCs) and the consequential risk of thermal runaway (TR). Toward addressing the challenge of ISCs and TR, we undertake a systematic study that extends from dynamic modeling to fault detection in this paper. First, we develop {\em BattBee}, the first equivalent circuit model to specifically describe the onset of ISCs and the evolution of subsequently induced TR. Drawing upon electrochemical modeling, the model can simulate ISCs at different severity levels and predict their impact on the initiation and progression of TR events. With the physics-inspired design, this model offers strong physical interpretability and predictive accuracy, while maintaining structural simplicity to allow fast computation. Then, building upon the BattBee model, we develop fault detection observers and derive detection criteria together with decision-making logics to identify the occurrence and emergence of ISC and TR events. This detection approach is principled in design and fast in computation, lending itself to practical applications. Validation based on simulations and experimental data demonstrates the effectiveness of both the BattBee model and the ISC/TR detection approach. The research outcomes underscore this study's potential for real-world battery safety risk management.
comment: 19 pages, 15 figures, 2 tables
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Reliability in multi-agent systems (MAS) built on large language models is increasingly limited by cognitive failures rather than infrastructure faults. Existing observability tools describe failures but do not quantify how quickly distributed reasoning recovers once coherence is lost. We introduce MTTR-A (Mean Time-to-Recovery for Agentic Systems), a runtime reliability metric that measures cognitive recovery latency in MAS. MTTR-A adapts classical dependability theory to agentic orchestration, capturing the time required to detect reasoning drift and restore coherent operation. We further define complementary metrics, including MTBF and a normalized recovery ratio (NRR), and establish theoretical bounds linking recovery latency to long-run cognitive uptime. Using a LangGraph-based benchmark with simulated drift and reflex recovery, we empirically demonstrate measurable recovery behavior across multiple reflex strategies. This work establishes a quantitative foundation for runtime cognitive dependability in distributed agentic systems.
comment: preprint
STAIR: Stability criterion for Time-windowed Assignment and Internal adversarial influence in Routing and decision-making
A major limitation of existing routing algorithms for multi-agent systems is that they are designed without considering the potential presence of adversarial agents in the decision-making loop, which could lead to severe performance degradation in real-life applications where adversarial agents may be present. We study autonomous pickup-and-delivery routing problems in which adversarial agents launch coordinated denial-of-service attacks by spoofing their locations. This deception causes the central scheduler to assign pickup requests to adversarial agents instead of cooperative agents. Adversarial agents then choose not to service the requests with the goal of disrupting the operation of the system, leading to delays, cancellations, and potential instability in the routing policy. Policy stability in routing problems is typically defined as the cost of the policy being uniformly bounded over time, and it has been studied through two different lenses: queuing theory and reinforcement learning (RL), which are not well suited for routing with adversaries. In this paper, we propose a new stability criterion, STAIR, which is easier to analyze than queuing-theory-based stability in adversarial settings. Furthermore, STAIR does not depend on a chosen discount factor as is the case in discounted RL stability. STAIR directly links stability to desired operational metrics, like a finite number of rejected requests. This characterization is particularly useful in adversarial settings as it provides a metric for monitoring the effect of adversaries in the operation of the system. Furthermore, we demonstrate STAIR's practical relevance through simulations on real-world San Francisco mobility-on-demand data. We also identify a phenomenon of degenerate stability that arises in the adversarial routing problem, and we introduce time-window constraints in the decision-making algorithm to mitigate it.
comment: Requires major changes
Robotics
DVGT: Driving Visual Geometry Transformer
Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and camera configurations. To bridge this gap, we propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs. We first extract visual features for each image using a DINO backbone, and employ alternating intra-view local attention, cross-view spatial attention, and cross-frame temporal attention to infer geometric relations across images. We then use multiple heads to decode a global point map in the ego coordinate of the first frame and the ego poses for each frame. Unlike conventional methods that rely on precise camera parameters, DVGT is free of explicit 3D geometric priors, enabling flexible processing of arbitrary camera configurations. DVGT directly predicts metric-scaled geometry from image sequences, eliminating the need for post-alignment with external sensors. Trained on a large mixture of driving datasets including nuScenes, OpenScene, Waymo, KITTI, and DDAD, DVGT significantly outperforms existing models on various scenarios. Code is available at https://github.com/wzzheng/DVGT.
comment: Code is available at https://github.com/wzzheng/DVGT
Posterior Behavioral Cloning: Pretraining BC Policies for Efficient RL Finetuning
Standard practice across domains from robotics to language is to first pretrain a policy on a large-scale demonstration dataset, and then finetune this policy, typically with reinforcement learning (RL), in order to improve performance on deployment domains. This finetuning step has proved critical in achieving human or super-human performance, yet while much attention has been given to developing more effective finetuning algorithms, little attention has been given to ensuring the pretrained policy is an effective initialization for RL finetuning. In this work we seek to understand how the pretrained policy affects finetuning performance, and how to pretrain policies in order to ensure they are effective initializations for finetuning. We first show theoretically that standard behavioral cloning (BC) -- which trains a policy to directly match the actions played by the demonstrator -- can fail to ensure coverage over the demonstrator's actions, a minimal condition necessary for effective RL finetuning. We then show that if, instead of exactly fitting the observed demonstrations, we train a policy to model the posterior distribution of the demonstrator's behavior given the demonstration dataset, we do obtain a policy that ensures coverage over the demonstrator's actions, enabling more effective finetuning. Furthermore, this policy -- which we refer to as the posterior behavioral cloning (PostBC) policy -- achieves this while ensuring pretrained performance is no worse than that of the BC policy. We then show that PostBC is practically implementable with modern generative models in robotic control domains -- relying only on standard supervised learning -- and leads to significantly improved RL finetuning performance on both realistic robotic control benchmarks and real-world robotic manipulation tasks, as compared to standard behavioral cloning.
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. However, advancing such a representation requires both suitable data and rigorous evaluation, which have been largely missing. We thus contribute MomaGraph-Scenes, the first large-scale dataset of richly annotated, task-driven scene graphs in household environments, along with MomaGraph-Bench, a systematic evaluation suite spanning six reasoning capabilities from high-level planning to fine-grained scene understanding. Built upon this foundation, we further develop MomaGraph-R1, a 7B vision-language model trained with reinforcement learning on MomaGraph-Scenes. MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework. Extensive experiments demonstrate that our model achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.
comment: 25 pages, 10 figures. Project page:https://hybridrobotics.github.io/MomaGraph/
Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos
Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.
comment: Project website: https://egoman-project.github.io
Sceniris: A Fast Procedural Scene Generation Framework
Synthetic 3D scenes are essential for developing Physical AI and generative models. Existing procedural generation methods often have low output throughput, creating a significant bottleneck in scaling up dataset creation. In this work, we introduce Sceniris, a highly efficient procedural scene generation framework for rapidly generating large-scale, collision-free scene variations. Sceniris also provides an optional robot reachability check, providing manipulation-feasible scenes for robot tasks. Sceniris is designed for maximum efficiency by addressing the primary performance limitations of the prior method, Scene Synthesizer. Leveraging batch sampling and faster collision checking in cuRobo, Sceniris achieves at least 234x speed-up over Scene Synthesizer. Sceniris also expands the object-wise spatial relationships available in prior work to support diverse scene requirements. Our code is available at https://github.com/rai-inst/sceniris
comment: Code is available at https://github.com/rai-inst/sceniris
PolaRiS: Scalable Real-to-Sim Evaluations for Generalist Robot Policies
A significant challenge for robot learning research is our ability to accurately measure and compare the performance of robot policies. Benchmarking in robotics is historically challenging due to the stochasticity, reproducibility, and time-consuming nature of real-world rollouts. This challenge is exacerbated for recent generalist policies, which has to be evaluated across a wide variety of scenes and tasks. Evaluation in simulation offers a scalable complement to real world evaluations, but the visual and physical domain gap between existing simulation benchmarks and the real world has made them an unreliable signal for policy improvement. Furthermore, building realistic and diverse simulated environments has traditionally required significant human effort and expertise. To bridge the gap, we introduce Policy Evaluation and Environment Reconstruction in Simulation (PolaRiS), a scalable real-to-sim framework for high-fidelity simulated robot evaluation. PolaRiS utilizes neural reconstruction methods to turn short video scans of real-world scenes into interactive simulation environments. Additionally, we develop a simple simulation data co-training recipe that bridges remaining real-to-sim gaps and enables zero-shot evaluation in unseen simulation environments. Through extensive paired evaluations between simulation and the real world, we demonstrate that PolaRiS evaluations provide a much stronger correlation to real world generalist policy performance than existing simulated benchmarks. Its simplicity also enables rapid creation of diverse simulated environments. As such, this work takes a step towards distributed and democratized evaluation for the next generation of robotic foundation models.
comment: Website: https://polaris-evals.github.io/
ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/
OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction
The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset, containing 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips with detailed text annotations. Using OpenTouch, we introduce retrieval and classification benchmarks that probe how touch grounds perception and action. We show that tactile signals provide a compact yet powerful cue for grasp understanding, strengthen cross-modal alignment, and can be reliably retrieved from in-the-wild video queries. By releasing this annotated vision-touch-pose dataset and benchmark, we aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.
comment: https://opentouch-tactile.github.io/
GeoPredict: Leveraging Predictive Kinematics and 3D Gaussian Geometry for Precise VLA Manipulation
Vision-Language-Action (VLA) models achieve strong generalization in robotic manipulation but remain largely reactive and 2D-centric, making them unreliable in tasks that require precise 3D reasoning. We propose GeoPredict, a geometry-aware VLA framework that augments a continuous-action policy with predictive kinematic and geometric priors. GeoPredict introduces a trajectory-level module that encodes motion history and predicts multi-step 3D keypoint trajectories of robot arms, and a predictive 3D Gaussian geometry module that forecasts workspace geometry with track-guided refinement along future keypoint trajectories. These predictive modules serve exclusively as training-time supervision through depth-based rendering, while inference requires only lightweight additional query tokens without invoking any 3D decoding. Experiments on RoboCasa Human-50, LIBERO, and real-world manipulation tasks show that GeoPredict consistently outperforms strong VLA baselines, especially in geometry-intensive and spatially demanding scenarios.
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. However, most VLMs are trained primarily on third-person data, creating a fundamental viewpoint mismatch for humanoid robots. Scaling robot egocentric data collection remains impractical due to high cost and limited diversity, whereas large-scale human egocentric videos offer a scalable alternative that naturally capture rich interaction context and causal structure. The key challenge is to convert raw egocentric videos into structured and reliable embodiment training supervision. Accordingly, we propose an Egocentric2Embodiment translation pipeline that transforms first-person videos into multi-level, schema-driven VQA supervision with enforced evidence grounding and temporal consistency, enabling the construction of the Egocentric2Embodiment dataset (E2E-3M) at scale. An egocentric-aware embodied brain, termed PhysBrain, is obtained by training on the E2E-3M dataset. PhysBrain exhibits substantially improved egocentric understanding, particularly for planning on EgoThink. It provides an egocentric-aware initialization that enables more sample-efficient VLA fine-tuning and higher SimplerEnv success rates (53.9\%), demonstrating effective transfer from human egocentric supervision to downstream robot control.
comment: 17 pages, 4 figures
Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future
Autonomous driving has long relied on modular "Perception-Decision-Action" pipelines, where hand-crafted interfaces and rule-based components often break down in complex or long-tailed scenarios. Their cascaded design further propagates perception errors, degrading downstream planning and control. Vision-Action (VA) models address some limitations by learning direct mappings from visual inputs to actions, but they remain opaque, sensitive to distribution shifts, and lack structured reasoning or instruction-following capabilities. Recent progress in Large Language Models (LLMs) and multimodal learning has motivated the emergence of Vision-Language-Action (VLA) frameworks, which integrate perception with language-grounded decision making. By unifying visual understanding, linguistic reasoning, and actionable outputs, VLAs offer a pathway toward more interpretable, generalizable, and human-aligned driving policies. This work provides a structured characterization of the emerging VLA landscape for autonomous driving. We trace the evolution from early VA approaches to modern VLA frameworks and organize existing methods into two principal paradigms: End-to-End VLA, which integrates perception, reasoning, and planning within a single model, and Dual-System VLA, which separates slow deliberation (via VLMs) from fast, safety-critical execution (via planners). Within these paradigms, we further distinguish subclasses such as textual vs. numerical action generators and explicit vs. implicit guidance mechanisms. We also summarize representative datasets and benchmarks for evaluating VLA-based driving systems and highlight key challenges and open directions, including robustness, interpretability, and instruction fidelity. Overall, this work aims to establish a coherent foundation for advancing human-compatible autonomous driving systems.
comment: Preprint; 40 pages, 7 figures, 9 tables; GitHub at https://github.com/worldbench/awesome-vla-for-ad
VERM: Leveraging Foundation Models to Create a Virtual Eye for Efficient 3D Robotic Manipulation
When performing 3D manipulation tasks, robots have to execute action planning based on perceptions from multiple fixed cameras. The multi-camera setup introduces substantial redundancy and irrelevant information, which increases computational costs and forces the model to spend extra training time extracting crucial task-relevant details. To filter out redundant information and accurately extract task-relevant features, we propose the VERM (Virtual Eye for Robotic Manipulation) method, leveraging the knowledge in foundation models to imagine a virtual task-adaptive view from the constructed 3D point cloud, which efficiently captures necessary information and mitigates occlusion. To facilitate 3D action planning and fine-grained manipulation, we further design a depth-aware module and a dynamic coarse-to-fine procedure. Extensive experimental results on both simulation benchmark RLBench and real-world evaluations demonstrate the effectiveness of our method, surpassing previous state-of-the-art methods while achieving 1.89x speedup in training time and 1.54x speedup in inference speed. More results can be found on our project website at https://verm-ral.github.io .
comment: Accepted at RA-L 2025
Olaf: Bringing an Animated Character to Life in the Physical World
Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot. This provides an ideal platform for innovation in both mechanical design and stylized motion control. In this paper, we bring Olaf to life in the physical world, relying on reinforcement learning guided by animation references for control. To create the illusion of Olaf's feet moving along his body, we hide two asymmetric legs under a soft foam skirt. To fit actuators inside the character, we use spherical and planar linkages in the arms, mouth, and eyes. Because the walk cycle results in harsh contact sounds, we introduce additional rewards that noticeably reduce impact noise. The large head, driven by small actuators in the character's slim neck, creates a risk of overheating, amplified by the costume. To keep actuators from overheating, we feed temperature values as additional inputs to policies, introducing new rewards to keep them within bounds. We validate the efficacy of our modeling in simulation and on hardware, demonstrating an unmatched level of believability for a costumed robotic character.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
Tri-Select: A Multi-Stage Visual Data Selection Framework for Mobile Visual Crowdsensing
Mobile visual crowdsensing enables large-scale, fine-grained environmental monitoring through the collection of images from distributed mobile devices. However, the resulting data is often redundant and heterogeneous due to overlapping acquisition perspectives, varying resolutions, and diverse user behaviors. To address these challenges, this paper proposes Tri-Select, a multi-stage visual data selection framework that efficiently filters redundant and low-quality images. Tri-Select operates in three stages: (1) metadata-based filtering to discard irrelevant samples; (2) spatial similarity-based spectral clustering to organize candidate images; and (3) a visual-feature-guided selection based on maximum independent set search to retain high-quality, representative images. Experiments on real-world and public datasets demonstrate that Tri-Select improves both selection efficiency and dataset quality, making it well-suited for scalable crowdsensing applications.
SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning
Autonomous robotic systems require spatio-temporal understanding of dynamic environments to ensure reliable navigation and interaction. While Vision-Language Models (VLMs) provide open-world semantic priors, they lack grounding in 3D geometry and temporal dynamics. Conversely, geometric perception captures structure and motion but remains semantically sparse. We propose SNOW (Scene Understanding with Open-World Knowledge), a training-free and backbone-agnostic framework for unified 4D scene understanding that integrates VLM-derived semantics with point cloud geometry and temporal consistency. SNOW processes synchronized RGB images and 3D point clouds, using HDBSCAN clustering to generate object-level proposals that guide SAM2-based segmentation. Each segmented region is encoded through our proposed Spatio-Temporal Tokenized Patch Encoding (STEP), producing multimodal tokens that capture localized semantic, geometric, and temporal attributes. These tokens are incrementally integrated into a 4D Scene Graph (4DSG), which serves as 4D prior for downstream reasoning. A lightweight SLAM backend anchors all STEP tokens spatially in the environment, providing the global reference alignment, and ensuring unambiguous spatial grounding across time. The resulting 4DSG forms a queryable, unified world model through which VLMs can directly interpret spatial scene structure and temporal dynamics. Experiments on a diverse set of benchmarks demonstrate that SNOW enables precise 4D scene understanding and spatially grounded inference, thereby setting new state-of-the-art performance in several settings, highlighting the importance of structured 4D priors for embodied reasoning and autonomous robotics.
AG-MPBS: a Mobility-Aware Prediction and Behavior-Based Scheduling Framework for Air-Ground Unmanned Systems
As unmanned systems such as Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) become increasingly important to applications like urban sensing and emergency response, efficiently recruiting these autonomous devices to perform time-sensitive tasks has become a critical challenge. This paper presents MPBS (Mobility-aware Prediction and Behavior-based Scheduling), a scalable task recruitment framework that treats each device as a recruitable "user". MPBS integrates three key modules: a behavior-aware KNN classifier, a time-varying Markov prediction model for forecasting device mobility, and a dynamic priority scheduling mechanism that considers task urgency and base station performance. By combining behavioral classification with spatiotemporal prediction, MPBS adaptively assigns tasks to the most suitable devices in real time. Experimental evaluations on the real-world GeoLife dataset show that MPBS significantly improves task completion efficiency and resource utilization. The proposed framework offers a predictive, behavior-aware solution for intelligent and collaborative scheduling in unmanned systems.
Single-View Shape Completion for Robotic Grasping in Clutter
In vision-based robot manipulation, a single camera view can only capture one side of objects of interest, with additional occlusions in cluttered scenes further restricting visibility. As a result, the observed geometry is incomplete, and grasp estimation algorithms perform suboptimally. To address this limitation, we leverage diffusion models to perform category-level 3D shape completion from partial depth observations obtained from a single view, reconstructing complete object geometries to provide richer context for grasp planning. Our method focuses on common household items with diverse geometries, generating full 3D shapes that serve as input to downstream grasp inference networks. Unlike prior work, which primarily considers isolated objects or minimal clutter, we evaluate shape completion and grasping in realistic clutter scenarios with household objects. In preliminary evaluations on a cluttered scene, our approach consistently results in better grasp success rates than a naive baseline without shape completion by 23% and over a recent state of the art shape completion approach by 19%. Our code is available at https://amm.aass.oru.se/shape-completion-grasping/.
E-SDS: Environment-aware See it, Do it, Sorted - Automated Environment-Aware Reinforcement Learning for Humanoid Locomotion
Vision-language models (VLMs) show promise in automating reward design in humanoid locomotion, which could eliminate the need for tedious manual engineering. However, current VLM-based methods are essentially "blind", as they lack the environmental perception required to navigate complex terrain. We present E-SDS (Environment-aware See it, Do it, Sorted), a framework that closes this perception gap. E-SDS integrates VLMs with real-time terrain sensor analysis to automatically generate reward functions that facilitate training of robust perceptive locomotion policies, grounded by example videos. Evaluated on a Unitree G1 humanoid across four distinct terrains (simple, gaps, obstacles, stairs), E-SDS uniquely enabled successful stair descent, while policies trained with manually-designed rewards or a non-perceptive automated baseline were unable to complete the task. In all terrains, E-SDS also reduced velocity tracking error by 51.9-82.6%. Our framework reduces the human effort of reward design from days to less than two hours while simultaneously producing more robust and capable locomotion policies.
comment: 12 pages, 3 figures, 4 tables. Accepted at RiTA 2025 (Springer LNNS)
A2VISR: An Active and Adaptive Ground-Aerial Localization System Using Visual Inertial and Single-Range Fusion
It's a practical approach using the ground-aerial collaborative system to enhance the localization robustness of flying robots in cluttered environments, especially when visual sensors degrade. Conventional approaches estimate the flying robot's position using fixed cameras observing pre-attached markers, which could be constrained by limited distance and susceptible to capture failure. To address this issue, we improve the ground-aerial localization framework in a more comprehensive manner, which integrates active vision, single-ranging, inertial odometry, and optical flow. First, the designed active vision subsystem mounted on the ground vehicle can be dynamically rotated to detect and track infrared markers on the aerial robot, improving the field of view and the target recognition with a single camera. Meanwhile, the incorporation of single-ranging extends the feasible distance and enhances re-capture capability under visual degradation. During estimation, a dimension-reduced estimator fuses multi-source measurements based on polynomial approximation with an extended sliding window, balancing computational efficiency and redundancy. Considering different sensor fidelities, an adaptive sliding confidence evaluation algorithm is implemented to assess measurement quality and dynamically adjust the weighting parameters based on moving variance. Finally, extensive experiments under conditions such as smoke interference, illumination variation, obstacle occlusion, prolonged visual loss, and extended operating range demonstrate that the proposed approach achieves robust online localization, with an average root mean square error of approximately 0.09 m, while maintaining resilience to capture loss and sensor failures.
comment: accept by IEEE Transactions on Industrial Electronics
ManiLong-Shot: Interaction-Aware One-Shot Imitation Learning for Long-Horizon Manipulation AAAI 2026
One-shot imitation learning (OSIL) offers a promising way to teach robots new skills without large-scale data collection. However, current OSIL methods are primarily limited to short-horizon tasks, thus limiting their applicability to complex, long-horizon manipulations. To address this limitation, we propose ManiLong-Shot, a novel framework that enables effective OSIL for long-horizon prehensile manipulation tasks. ManiLong-Shot structures long-horizon tasks around physical interaction events, reframing the problem as sequencing interaction-aware primitives instead of directly imitating continuous trajectories. This primitive decomposition can be driven by high-level reasoning from a vision-language model (VLM) or by rule-based heuristics derived from robot state changes. For each primitive, ManiLong-Shot predicts invariant regions critical to the interaction, establishes correspondences between the demonstration and the current observation, and computes the target end-effector pose, enabling effective task execution. Extensive simulation experiments show that ManiLong-Shot, trained on only 10 short-horizon tasks, generalizes to 20 unseen long-horizon tasks across three difficulty levels via one-shot imitation, achieving a 22.8% relative improvement over the SOTA. Additionally, real-robot experiments validate ManiLong-Shot's ability to robustly execute three long-horizon manipulation tasks via OSIL, confirming its practical applicability.
comment: Accepted by AAAI 2026
Privacy-Aware Sharing of Raw Spatial Sensor Data for Cooperative Perception
Cooperative perception between vehicles is poised to offer robust and reliable scene understanding. Recently, we are witnessing experimental systems research building testbeds that share raw spatial sensor data for cooperative perception. While there has been a marked improvement in accuracies and is the natural way forward, we take a moment to consider the problems with such an approach for eventual adoption by automakers. In this paper, we first argue that new forms of privacy concerns arise and discourage stakeholders to share raw sensor data. Next, we present SHARP, a research framework to minimize privacy leakage and drive stakeholders towards the ambitious goal of raw data based cooperative perception. Finally, we discuss open questions for networked systems, mobile computing, perception researchers, industry and government in realizing our proposed framework.
Towards Closing the Domain Gap with Event Cameras
Although traditional cameras are the primary sensor for end-to-end driving, their performance suffers greatly when the conditions of the data they were trained on does not match the deployment environment, a problem known as the domain gap. In this work, we consider the day-night lighting difference domain gap. Instead of traditional cameras we propose event cameras as a potential alternative which can maintain performance across lighting condition domain gaps without requiring additional adjustments. Our results show that event cameras maintain more consistent performance across lighting conditions, exhibiting domain-shift penalties that are generally comparable to or smaller than grayscale frames and provide superior baseline performance in cross-domain scenarios.
comment: Accepted to Australasian Conference on Robotics and Automation (ACRA), 2025
A simulation platform calibration method for automated vehicle evaluation: accurate on both vehicle level and traffic flow level
Simulation testing is a fundamental approach for evaluating automated vehicles (AVs). To ensure its reliability, it is crucial to accurately replicate interactions between AVs and background traffic, which necessitates effective calibration. However, existing calibration methods often fall short in achieving this goal. To address this gap, this study introduces a simulation platform calibration method that ensures high accuracy at both the vehicle and traffic flow levels. The method offers several key features:(1) with the capability of calibration for vehicle-to-vehicle interaction; (2) with accuracy assurance; (3) with enhanced efficiency; (4) with pipeline calibration capability. The proposed method is benchmarked against a baseline with no calibration and a state-of-the-art calibration method. Results show that it enhances the accuracy of interaction replication by 83.53% and boosts calibration efficiency by 76.75%. Furthermore, it maintains accuracy across both vehicle-level and traffic flow-level metrics, with an improvement of 51.9%. Notably, the entire calibration process is fully automated, requiring no human intervention.
A Task-Driven, Planner-in-the-Loop Computational Design Framework for Modular Manipulators
Modular manipulators composed of pre-manufactured and interchangeable modules offer high adaptability across diverse tasks. However, their deployment requires generating feasible motions while jointly optimizing morphology and mounted pose under kinematic, dynamic, and physical constraints. Moreover, traditional single-branch designs often extend reach by increasing link length, which can easily violate torque limits at the base joint. To address these challenges, we propose a unified task-driven computational framework that integrates trajectory planning across varying morphologies with the co-optimization of morphology and mounted pose. Within this framework, a hierarchical model predictive control (HMPC) strategy is developed to enable motion planning for both redundant and non-redundant manipulators. For design optimization, the CMA-ES is employed to efficiently explore a hybrid search space consisting of discrete morphology configurations and continuous mounted poses. Meanwhile, a virtual module abstraction is introduced to enable bi-branch morphologies, allowing an auxiliary branch to offload torque from the primary branch and extend the achievable workspace without increasing the capacity of individual joint modules. Extensive simulations and hardware experiments on polishing, drilling, and pick-and-place tasks demonstrate the effectiveness of the proposed framework. The results show that: 1) the framework can generate multiple feasible designs that satisfy kinematic and dynamic constraints while avoiding environmental collisions for given tasks; 2) flexible design objectives, such as maximizing manipulability, minimizing joint effort, or reducing the number of modules, can be achieved by customizing the cost functions; and 3) a bi-branch morphology capable of operating in a large workspace can be realized without requiring more powerful basic modules.
Driving in Corner Case: A Real-World Adversarial Closed-Loop Evaluation Platform for End-to-End Autonomous Driving
Safety-critical corner cases, difficult to collect in the real world, are crucial for evaluating end-to-end autonomous driving. Adversarial interaction is an effective method to generate such safety-critical corner cases. While existing adversarial evaluation methods are built for models operating in simplified simulation environments, adversarial evaluation for real-world end-to-end autonomous driving has been little explored. To address this challenge, we propose a closed-loop evaluation platform for end-to-end autonomous driving, which can generate adversarial interactions in real-world scenes. In our platform, the real-world image generator cooperates with an adversarial traffic policy to evaluate various end-to-end models trained on real-world data. The generator, based on flow matching, efficiently and stably generates real-world images according to the traffic environment information. The efficient adversarial surrounding vehicle policy is designed to model challenging interactions and create corner cases that current autonomous driving systems struggle to handle. Experimental results demonstrate that the platform can generate realistic driving images efficiently. Through evaluating the end-to-end models such as UniAD and VAD, we demonstrate that based on the adversarial policy, our platform evaluates the performance degradation of the tested model in corner cases. This result indicates that this platform can effectively detect the model's potential issues, which will facilitate the safety and robustness of end-to-end autonomous driving.
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.
Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making
We propose a new approach for solving planning problems with a hierarchical structure, fusing reinforcement learning and MPC planning. Our formulation tightly and elegantly couples the two planning paradigms. It leverages reinforcement learning actions to inform the MPPI sampler, and adaptively aggregates MPPI samples to inform the value estimation. The resulting adaptive process leverages further MPPI exploration where value estimates are uncertain, and improves training robustness and the overall resulting policies. This results in a robust planning approach that can handle complex planning problems and easily adapts to different applications, as demonstrated over several domains, including race driving, modified Acrobot, and Lunar Lander with added obstacles. Our results in these domains show better data efficiency and overall performance in terms of both rewards and task success, with up to a 72% increase in success rate compared to existing approaches, as well as accelerated convergence (x2.1) compared to non-adaptive sampling.
comment: 23 pages, 8 figures. Under review
Lang2Manip: A Tool for LLM-Based Symbolic-to-Geometric Planning for Manipulation
Simulation is essential for developing robotic manipulation systems, particularly for task and motion planning (TAMP), where symbolic reasoning interfaces with geometric, kinematic, and physics-based execution. Recent advances in Large Language Models (LLMs) enable robots to generate symbolic plans from natural language, yet executing these plans in simulation often requires robot-specific engineering or planner-dependent integration. In this work, we present a unified pipeline that connects an LLM-based symbolic planner with the Kautham motion planning framework to achieve generalizable, robot-agnostic symbolic-to-geometric manipulation. Kautham provides ROS-compatible support for a wide range of industrial manipulators and offers geometric, kinodynamic, physics-driven, and constraint-based motion planning under a single interface. Our system converts language instructions into symbolic actions and computes and executes collision-free trajectories using any of Kautham's planners without additional coding. The result is a flexible and scalable tool for language-driven TAMP that is generalized across robots, planning modalities, and manipulation tasks.
comment: Submitted to ICARA
Mr.MSTE: Multi-robot Multi-Source Term Estimation with Wind-Aware Coverage Control
This paper presents a Multi-Robot Multi-Source Term Estimation (MRMSTE) framework that enables teams of mobile robots to collaboratively sample gas concentrations and infer the parameters of an unknown number of airborne releases. The framework is built on a hybrid Bayesian inference scheme that represents the joint multi-source probability density and incorporates physics-informed state transitions, including source birth, removal, and merging induced by atmospheric dispersion. A superposition-based measurement model is naturally accommodated, allowing sparse concentration measurements to be exploited efficiently. To guide robot deployment, we introduce a wind-aware coverage control (WCC) strategy that integrates the evolving multi-source belief with local wind information to prioritize regions of high detection likelihood. Unlike conventional coverage control or information-theoretic planners, WCC explicitly accounts for anisotropic plume transport when modelling sensor performance, leading to more effective sensor placement for multi-source estimation. Monte Carlo studies demonstrate faster convergence and improved separation of individual source beliefs compared to traditional coverage-based strategies and small-scale static sensor networks. Real-world experiments with CO2 releases using TurtleBot platforms further validate the proposed approach, demonstrating its practicality for scalable multi-robot gas-sensing applications.
Dual-Channel Tomographic Tactile Skin with Pneumatic Pressure Sensing for Improved Force Estimation
Tactile skins based on Electrical Impedance Tomography (EIT) enable large-area contact localization with few electrodes, but suffer from nonuniform sensitivity that limits force estimation accuracy. This work introduces a dual-channel tactile skin that integrates an EIT layer with a pneumatic pressure layer and a calibration framework that leverages their complementary strengths. The EIT layer provides robust multi-contact localization, while the pneumatic pressure layer supplies a stable scalar measurement that serves as contact force estimation. A location-aware correction method is introduced, learning smooth spatial gain and offset fields from a single-session calibration, enabling spatially consistent multi-contact force estimation. The proposed system achieves accurate force estimation across diverse contact configurations, generalizes to varying indenter sizes, and preserves EIT's inherent advantages in multi-contact localization. By letting the pneumatic pressure layer handle the force estimation and using the EIT layer to determine where each contact occurs, the method avoids the need for large datasets, complicated calibration setups, and heavy machine-learning pipelines often required by previous EIT-only approaches. This dual-channel design provides a practical, scalable, and easy-to-calibrate solution for building large-area robotic skins.
comment: This work has been submitted to the IEEE for possible publication
Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space
Humans possess a large reachable space in the 3D world, enabling interaction with objects at varying heights and distances. However, realizing such large-space reaching on humanoids is a complex whole-body control problem and requires the robot to master diverse skills simultaneously-including base positioning and reorientation, height and body posture adjustments, and end-effector pose control. Learning from scratch often leads to optimization difficulty and poor sim2real transferability. To address this challenge, we propose Real-world-Ready Skill Space (R2S2). Our approach begins with a carefully designed skill library consisting of real-world-ready primitive skills. We ensure optimal performance and robust sim2real transfer through individual skill tuning and sim2real evaluation. These skills are then ensembled into a unified latent space, serving as a structured prior that helps task execution in an efficient and sim2real transferable manner. A high-level planner, trained to sample skills from this space, enables the robot to accomplish real-world goal-reaching tasks. We demonstrate zero-shot sim2real transfer and validate R2S2 in multiple challenging goal-reaching scenarios.
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task may demand varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system on a real-world multi-drone system, and conduct an ablation study to assess the contribution of each module.
comment: 8 pages, 9 figures, submitted to the IEEE for possible publication
VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps IROS
This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.
comment: v2: Updated the citation of SparseLoc from an arXiv preprint to its published version in the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging performance scales with the amount of pretraining data, and enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our \href{https://suyuz1.github.io/Survery/}{project page}.
comment: project page: https://suyuz1.github.io/Survery/
Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Learning from long-horizon demonstrations with complex action sequences presents significant challenges for visual imitation learning, particularly in understanding temporal relationships of actions and spatial relationships between objects. In this paper, we propose a new agent framework that incorporates two dedicated reflection modules to enhance both plan and code generation. The plan generation module produces an initial action sequence, which is then verified by the plan reflection module to ensure temporal coherence and spatial alignment with the demonstration video. The code generation module translates the plan into executable code, while the code reflection module verifies and refines the generated code to ensure correctness and consistency with the generated plan. These two reflection modules jointly enable the agent to detect and correct errors in both the plan generation and code generation, improving performance in tasks with intricate temporal and spatial dependencies. To support systematic evaluation, we introduce LongVILBench, a benchmark comprising 300 human demonstrations with action sequences of up to 18 steps. LongVILBench emphasizes temporal and spatial complexity across multiple task types. Experimental results demonstrate that existing methods perform poorly on this benchmark, whereas our new framework establishes a strong baseline for long-horizon visual imitation learning.
comment: 9 pages, 4 figures
Self-localization on a 3D map by fusing global and local features from a monocular camera
Self-localization on a 3D map by using an inexpensive monocular camera is required to realize autonomous driving. Self-localization based on a camera often uses a convolutional neural network (CNN) that can extract local features that are calculated by nearby pixels. However, when dynamic obstacles, such as people, are present, CNN does not work well. This study proposes a new method combining CNN with Vision Transformer, which excels at extracting global features that show the relationship of patches on whole image. Experimental results showed that, compared to the state-of-the-art method (SOTA), the accuracy improvement rate in a CG dataset with dynamic obstacles is 1.5 times higher than that without dynamic obstacles. Moreover, the self-localization error of our method is 20.1% smaller than that of SOTA on public datasets. Additionally, our robot using our method can localize itself with 7.51cm error on average, which is more accurate than SOTA.
"I am here for you": How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable
General-purpose conversational AI chatbots and AI companions increasingly provide young adolescents with emotionally supportive conversations, raising questions about how conversational style shapes anthropomorphism and emotional reliance. In a preregistered online experiment with 284 adolescent-parent dyads, youth aged 11-15 and their parents read two matched transcripts in which a chatbot responded to an everyday social problem using either a relational style (first-person, affiliative, commitment language) or a transparent style (explicit nonhumanness, informational tone). Adolescents more often preferred the relational than the transparent style, whereas parents were more likely to prefer transparent style than adolescents. Adolescents rated the relational chatbot as more human-like, likable, trustworthy and emotionally close, while perceiving both styles as similarly helpful. Adolescents who preferred relational style had lower family and peer relationship quality and higher stress and anxiety than those preferring transparent style or both chatbots. These findings identify conversational style as a key design lever for youth AI safety, showing that relational framing heightens anthropomorphism, trust and emotional closeness and can be especially appealing to socially and emotionally vulnerable adolescents, who may be at increased risk for emotional reliance on conversational AI.
comment: I updated a title
Imperative Learning: A Self-supervised Neuro-Symbolic Learning Framework for Robot Autonomy
Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, labeling data for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neuro-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
Provable Ordering and Continuity in Vision-Language Pretraining for Generalizable Embodied Agents NeurIPS 2025
Pre-training vision-language representations on human action videos has emerged as a promising approach to reduce reliance on large-scale expert demonstrations for training embodied agents. However, prior methods often employ time contrastive learning based on goal-reaching heuristics, progressively aligning language instructions from the initial to the final frame. This overemphasis on future frames can result in erroneous vision-language associations, as actions may terminate early or include irrelevant moments in the end. To address this issue, we propose Action Temporal Coherence Learning (AcTOL) to learn ordered and continuous vision-language representations without rigid goal-based constraint. AcTOL treats a video as a continuous trajectory where it (1) contrasts semantic differences between frames to reflect their natural ordering, and (2) imposes a local Brownian bridge constraint to ensure smooth transitions across intermediate frames. Extensive imitation learning experiments on both simulated and real robots show that the pretrained features significantly enhance downstream manipulation tasks with high robustness to different linguistic styles of instructions, offering a viable pathway toward generalized embodied agents.
comment: NeurIPS 2025 Poster
Transformer Driven Visual Servoing and Dual Arm Impedance Control for Fabric Texture Matching
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.
comment: 8 pages, 11 figures. Accepted to IEEE Robotics and Automation Letters (RA-L)
A Practical Guide for Incorporating Symmetry in Diffusion Policy NeurIPS 2025
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.
comment: NeurIPS 2025
Accelerating Hybrid Model Predictive Control using Warm-Started Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contacts with the environment. Due to combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose a hybrid MPC algorithm based on Generalized Benders Decomposition. The algorithm enumerates and stores cutting planes online inside a finite buffer and transfers them across MPC iterations to provide warm-starts for new problem instances, significantly enhancing solving speed. We theoretically analyze this warm-starting performance by modeling the deviation of mode sequences through temporal shifting and stretching, deriving bounds on the dual gap between transferred optimality cuts and the true optimal costs, and utilizing these bounds to quantify the level of suboptimality guaranteed in the first solve of the Benders Master Problem. Our algorithm is validated in simulation through controlling a cart-pole system with soft contact walls, a free-flying robot navigating around obstacles, and a humanoid robot standing on one leg while pushing against walls with its hands for balance. For our benchmark problems, the algorithm enumerates cuts on the order of only tens to hundreds while reaching speeds 2-3 times faster than the off-the-shelf solver Gurobi, oftentimes exceeding 1000 Hz. The code is available at https://github.com/XuanLin/Benders-MPC.
comment: Significantly revised to emphasize theoretical bounds. The heuristic Master Problem algorithm and GCS tightening experiments are preserved in v1
Deadlock-Free Hybrid RL-MAPF Framework for Zero-Shot Multi-Robot Navigation
Multi-robot navigation in cluttered environments presents fundamental challenges in balancing reactive collision avoidance with long-range goal achievement. When navigating through narrow passages or confined spaces, deadlocks frequently emerge that prevent agents from reaching their destinations, particularly when Reinforcement Learning (RL) control policies encounter novel configurations out of learning distribution. Existing RL-based approaches suffer from limited generalization capability in unseen environments. We propose a hybrid framework that seamlessly integrates RL-based reactive navigation with on-demand Multi-Agent Path Finding (MAPF) to explicitly resolve topological deadlocks. Our approach integrates a safety layer that monitors agent progress to detect deadlocks and, when detected, triggers a coordination controller for affected agents. The framework constructs globally feasible trajectories via MAPF and regulates waypoint progression to reduce inter-agent conflicts during navigation. Extensive evaluation on dense multi-agent benchmarks shows that our method boosts task completion from marginal to near-universal success, markedly reducing deadlocks and collisions. When integrated with hierarchical task planning, it enables coordinated navigation for heterogeneous robots, demonstrating that coupling reactive RL navigation with selective MAPF intervention yields a robust, zero-shot performance.
comment: 18 pages (including appendix), 3 figures. Project page (videos, animations, additional resources): https://wanghaoyi518.github.io/rl-mapf-project-page/
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Multiagent Systems
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
We introduce Stackelberg Learning from Human Feedback (SLHF), a new framework for preference optimization. SLHF frames the alignment problem as a sequential-move game between two policies: a Leader, which commits to an action, and a Follower, which responds conditionally on the Leader's action. This approach decomposes preference optimization into a refinement problem for the Follower and an optimization problem against an adversary for the Leader. Unlike Reinforcement Learning from Human Feedback (RLHF), which assigns scalar rewards to actions, or Nash Learning from Human Feedback (NLHF), which seeks a simultaneous-move equilibrium, SLHF leverages the asymmetry of sequential play to capture richer preference structures. The sequential design of SLHF naturally enables inference-time refinement, as the Follower learns to improve the Leader's actions, and these refinements can be leveraged through iterative sampling. We compare the solution concepts of SLHF, RLHF, and NLHF, and lay out key advantages in consistency, data sensitivity, and robustness to intransitive preferences. Experiments on large language models demonstrate that SLHF achieves strong alignment across diverse preference datasets, scales from 0.5B to 8B parameters, and yields inference-time refinements that transfer across model families without further fine-tuning.
comment: 10 pages, 5 tables, 1 figures
Don't Guess, Escalate: Towards Explainable Uncertainty-Calibrated AI Forensic Agents
AI is reshaping the landscape of multimedia forensics. We propose AI forensic agents: reliable orchestrators that select and combine forensic detectors, identify provenance and context, and provide uncertainty-aware assessments. We highlight pitfalls in current solutions and introduce a unified framework to improve the authenticity verification process.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
NDRL: Cotton Irrigation and Nitrogen Application with Nested Dual-Agent Reinforcement Learning ICONIP 2025
Effective irrigation and nitrogen fertilization have a significant impact on crop yield. However, existing research faces two limitations: (1) the high complexity of optimizing water-nitrogen combinations during crop growth and poor yield optimization results; and (2) the difficulty in quantifying mild stress signals and the delayed feedback, which results in less precise dynamic regulation of water and nitrogen and lower resource utilization efficiency. To address these issues, we propose a Nested Dual-Agent Reinforcement Learning (NDRL) method. The parent agent in NDRL identifies promising macroscopic irrigation and fertilization actions based on projected cumulative yield benefits, reducing ineffective explorationwhile maintaining alignment between objectives and yield. The child agent's reward function incorporates quantified Water Stress Factor (WSF) and Nitrogen Stress Factor (NSF), and uses a mixed probability distribution to dynamically optimize daily strategies, thereby enhancing both yield and resource efficiency. We used field experiment data from 2023 and 2024 to calibrate and validate the Decision Support System for Agrotechnology Transfer (DSSAT) to simulate real-world conditions and interact with NDRL. Experimental results demonstrate that, compared to the best baseline, the simulated yield increased by 4.7% in both 2023 and 2024, the irrigation water productivity increased by 5.6% and 5.1% respectively, and the nitrogen partial factor productivity increased by 6.3% and 1.0% respectively. Our method advances the development of cotton irrigation and nitrogen fertilization, providing new ideas for addressing the complexity and precision issues in agricultural resource management and for sustainable agricultural development.
comment: Accepted by ICONIP 2025
AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Recent multimodal large language models (MLLMs) such as GPT-4o and Qwen3-Omni show strong perception but struggle in multi-speaker, dialogue-centric settings that demand agentic reasoning tracking who speaks, maintaining roles, and grounding events across time. These scenarios are central to multimodal audio-video understanding, where models must jointly reason over audio and visual streams in applications such as conversational video assistants and meeting analytics. We introduce AMUSE, a benchmark designed around tasks that are inherently agentic, requiring models to decompose complex audio-visual interactions into planning, grounding, and reflection steps. It evaluates MLLMs across three modes zero-shot, guided, and agentic and six task families, including spatio-temporal speaker grounding and multimodal dialogue summarization. Across all modes, current models exhibit weak multi-speaker reasoning and inconsistent behavior under both non-agentic and agentic evaluation. Motivated by the inherently agentic nature of these tasks and recent advances in LLM agents, we propose RAFT, a data-efficient agentic alignment framework that integrates reward optimization with intrinsic multimodal self-evaluation as reward and selective parameter adaptation for data and parameter efficient updates. Using RAFT, we achieve up to 39.52\% relative improvement in accuracy on our benchmark. Together, AMUSE and RAFT provide a practical platform for examining agentic reasoning in multimodal models and improving their capabilities.
Ev-Trust: A Strategy Equilibrium Trust Mechanism for Evolutionary Games in LLM-Based Multi-Agent Services
The rapid evolution of the Web toward an agent-centric paradigm, driven by large language models (LLMs), has enabled autonomous agents to reason, plan, and interact in complex decentralized environments. However, the openness and heterogeneity of LLM-based multi-agent systems also amplify the risks of deception, fraud, and misinformation, posing severe challenges to trust establishment and system robustness. To address this issue, we propose Ev-Trust, a strategy-equilibrium trust mechanism grounded in evolutionary game theory. This mechanism integrates direct trust, indirect trust, and expected revenue into a dynamic feedback structure that guides agents' behavioral evolution toward equilibria. Within a decentralized "Request-Response-Payment-Evaluation" service framework, Ev-Trust enables agents to adaptively adjust strategies, naturally excluding malicious participants while reinforcing high-quality collaboration. Furthermore, our theoretical derivation based on replicator dynamics equations proves the existence and stability of local evolutionary equilibria. Experimental results indicate that our approach effectively reflects agent trustworthiness in LLM-driven open service interaction scenarios, reduces malicious strategies, and increases collective revenue. We hope Ev-Trust can provide a new perspective on trust modeling for the agentic service web in group evolutionary game scenarios.
comment: 12 pages, 11 figures
Evaluation of Generative Models for Emotional 3D Animation Generation in VR
Social interactions incorporate nonverbal signals to convey emotions alongside speech, including facial expressions and body gestures. Generative models have demonstrated promising results in creating full-body nonverbal animations synchronized with speech; however, evaluations using statistical metrics in 2D settings fail to fully capture user-perceived emotions, limiting our understanding of model effectiveness. To address this, we evaluate emotional 3D animation generative models within a Virtual Reality (VR) environment, emphasizing user-centric metrics emotional arousal realism, naturalness, enjoyment, diversity, and interaction quality in a real-time human-agent interaction scenario. Through a user study (N=48), we examine perceived emotional quality for three state of the art speech-driven 3D animation methods across two emotions happiness (high arousal) and neutral (mid arousal). Additionally, we compare these generative models against real human expressions obtained via a reconstruction-based method to assess both their strengths and limitations and how closely they replicate real human facial and body expressions. Our results demonstrate that methods explicitly modeling emotions lead to higher recognition accuracy compared to those focusing solely on speech-driven synchrony. Users rated the realism and naturalness of happy animations significantly higher than those of neutral animations, highlighting the limitations of current generative models in handling subtle emotional states. Generative models underperformed compared to reconstruction-based methods in facial expression quality, and all methods received relatively low ratings for animation enjoyment and interaction quality, emphasizing the importance of incorporating user-centric evaluations into generative model development. Finally, participants positively recognized animation diversity across all generative models.
comment: 20 pages, 5 figures. Webpage: https://emotional3dhumans.github.io/
DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations
Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.
On the Role of Contextual Information and Ego States in LLM Agent Behavior for Transactional Analysis Dialogues ACL
LLM-powered agents are now used in many areas, from customer support to education, and there is increasing interest in their ability to act more like humans. This includes fields such as social, political, and psychological research, where the goal is to model group dynamics and social behavior. However, current LLM agents often lack the psychological depth and consistency needed to capture the real patterns of human thinking. They usually provide direct or statistically likely answers, but they miss the deeper goals, emotional conflicts, and motivations that drive real human interactions. This paper proposes a Multi-Agent System (MAS) inspired by Transactional Analysis (TA) theory. In the proposed system, each agent is divided into three ego states - Parent, Adult, and Child. The ego states are treated as separate knowledge structures with their own perspectives and reasoning styles. To enrich their response process, they have access to an information retrieval mechanism that allows them to retrieve relevant contextual information from their vector stores. This architecture is evaluated through ablation tests in a simulated dialogue scenario, comparing agents with and without information retrieval. The results are promising and open up new directions for exploring how psychologically grounded structures can enrich agent behavior. The contribution is an agent architecture that integrates Transactional Analysis theory with contextual information retrieval to enhance the realism of LLM-based multi-agent simulations.
comment: Presented at the 39th Pacific Asia Conference on Language, Information and Computation (PACLIC 39)
PAACE: A Plan-Aware Automated Agent Context Engineering Framework
Large Language Model (LLM) agents are increasingly deployed in complex, multi-step workflows involving planning, tool use, reflection, and interaction with external knowledge systems. These workflows generate rapidly expanding contexts that must be curated, transformed, and compressed to maintain fidelity, avoid attention dilution, and reduce inference cost. Prior work on summarization and query-aware compression largely ignores the multi-step, plan-aware nature of agentic reasoning. In this work, we introduce PAACE (Plan-Aware Automated Context Engineering), a unified framework for optimizing the evolving state of LLM agents through next-k-task relevance modeling, plan-structure analysis, instruction co-refinement, and function-preserving compression. PAACE comprises (1) PAACE-Syn, a large-scale generator of synthetic agent workflows annotated with stepwise compression supervision, and (2) PAACE-FT, a family of distilled, plan-aware compressors trained from successful teacher demonstrations. Experiments on long-horizon benchmarks (AppWorld, OfficeBench, and 8-Objective QA) demonstrate that PAACE consistently improves agent correctness while substantially reducing context load. On AppWorld, PAACE achieves higher accuracy than all baselines while lowering peak context and cumulative dependency. On OfficeBench and multi-hop QA, PAACE improves both accuracy and F1, achieving fewer steps, lower peak tokens, and reduced attention dependency. Distilled PAACE-FT retains 97 percent of the teacher's performance while reducing inference cost by over an order of magnitude, enabling practical deployment of plan-aware compression with compact models.
Computing ESS in Multiplayer Games
We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
comment: Full title is "Computing Evolutionarily Stable Strategies in Multiplayer Games." I am modifying it because Google scholar is merging it with a different paper. I hope to revert the title back in a few weeks
Towards Pervasive Distributed Agentic Generative AI -- A State of The Art
The rapid advancement of intelligent agents and Large Language Models (LLMs) is reshaping the pervasive computing field. Their ability to perceive, reason, and act through natural language understanding enables autonomous problem-solving in complex pervasive environments, including the management of heterogeneous sensors, devices, and data. This survey outlines the architectural components of LLM agents (profiling, memory, planning, and action) and examines their deployment and evaluation across various scenarios. Than it reviews computational and infrastructural advancements (cloud to edge) in pervasive computing and how AI is moving in this field. It highlights state-of-the-art agent deployment strategies and applications, including local and distributed execution on resource-constrained devices. This survey identifies key challenges of these agents in pervasive computing such as architectural, energetic and privacy limitations. It finally proposes what we called "Agent as a Tool", a conceptual framework for pervasive agentic AI, emphasizing context awareness, modularity, security, efficiency and effectiveness.
Breaking the Performance Ceiling in Reinforcement Learning requires Inference Strategies
Reinforcement learning (RL) systems have countless applications, from energy-grid management to protein design. However, such real-world scenarios are often extremely difficult, combinatorial in nature, and require complex coordination between multiple agents. This level of complexity can cause even state-of-the-art RL systems, trained until convergence, to hit a performance ceiling which they are unable to break out of with zero-shot inference. Meanwhile, many digital or simulation-based applications allow for an inference phase that utilises a specific time and compute budget to explore multiple attempts before outputting a final solution. In this work, we show that such an inference phase employed at execution time, and the choice of a corresponding inference strategy, are key to breaking the performance ceiling observed in complex multi-agent RL problems. Our main result is striking: we can obtain up to a 126% and, on average, a 45% improvement over the previous state-of-the-art across 17 tasks, using only a couple seconds of extra wall-clock time during execution. We also demonstrate promising compute scaling properties, supported by over 60k experiments, making it the largest study on inference strategies for complex RL to date. Our experimental data and code are available at https://sites.google.com/view/inference-strategies-rl.
comment: Neurips '25 version (post conference)
TACOS: Task Agnostic COordinator of a multi-drone System
When a single pilot is responsible for managing a multi-drone system, the task may demand varying levels of autonomy, from direct control of individual UAVs, to group-level coordination, to fully autonomous swarm behaviors for accomplishing high-level tasks. Enabling such flexible interaction requires a framework that supports multiple modes of shared autonomy. As language models continue to improve in reasoning and planning, they provide a natural foundation for such systems, reducing pilot workload by enabling high-level task delegation through intuitive, language-based interfaces. In this paper we present TACOS (Task-Agnostic COordinator of a multi-drone System), a unified framework that enables high-level natural language control of multi-UAV systems through Large Language Models (LLMs). TACOS integrates three key capabilities into a single architecture: a one-to-many natural language interface for intuitive user interaction, an intelligent coordinator for translating user intent into structured task plans, and an autonomous agent that executes plans interacting with the real world. TACOS allows a LLM to interact with a library of executable APIs, bridging semantic reasoning with real-time multi-robot coordination. We demonstrate the system on a real-world multi-drone system, and conduct an ablation study to assess the contribution of each module.
comment: 8 pages, 9 figures, submitted to the IEEE for possible publication
The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games
Large language models (LLMs) are increasingly being deployed as autonomous agents on behalf of institutions and individuals in economic, political, and social settings that involve negotiation. Yet this trend carries significant risks if their strategic behavior is not well understood. In this work, we revisit the NegotiationArena framework and run controlled simulation experiments on a diverse set of frontier LLMs across three multi turn bargaining games: Buyer Seller, Multi turn Ultimatum, and Resource Exchange. We ask whether improved general reasoning capabilities lead to rational, unbiased, and convergent negotiation strategies. Our results challenge this assumption. We find that models diverge into distinct, model specific strategic equilibria rather than converging to a unified optimal behavior. Moreover, strong numerical and semantic anchoring effects persist: initial offers are highly predictive of final agreements, and models consistently generate biased proposals by collapsing diverse internal valuations into rigid, generic price points. More concerningly, we observe dominance patterns in which some models systematically achieve higher payoffs than their counterparts. These findings underscore an urgent need to develop mechanisms to mitigate these issues before deploying such systems in real-world scenarios.
comment: 9 pages, 5 figures
Upgrading Democracies with Fairer Voting Methods
Voting methods are instrumental design elements of democracies. Citizens use them to express and aggregate their preferences to reach a collective decision. However, voting outcomes can be as sensitive to voting rules as they are to people's voting choices. Despite significance and interdisciplinary scientific progress, several democracies keep relying on outdated voting methods that do not fit modern, pluralistic societies well, while lacking social innovation. Here, we demonstrate how one can upgrade real-world democracies, namely by using alternative preferential voting methods such as cumulative voting and the method of equal shares designed for a proportional representation of voters' preferences. We rigorously evaluate the striking voting outcomes of these fair voting methods in a new participatory budgeting approach applied in the city of Aarau, Switzerland, including past and follow-up evidence. Results show more winning projects with the same budget. They also show broader geographic and preference representation of citizens by the elected projects, in particular for voters who used to be under-represented. We provide causal evidence showing that citizens prefer proportional voting methods, which possess strong legitimacy without the need of very specialized technical explanations. We also reveal strong underlying democratic values exhibited by citizens who support fair voting methods such as altruism and compromise. These findings come with the momentum to unleash a new and long-awaited participation blueprint of how to upgrade democracies globally.
comment: Includes Supplementary Information
Systems and Control (EESS)
Observer-based Differentially Private Consensus for Linear Multi-agent Systems
This paper investigates the differentially private consensus problem for general linear multi-agent systems (MASs) based on output feedback protocols. To protect the output information, which is considered private data and may be at high risk of exposure, Laplace noise is added to the information exchange. The conditions for achieving mean square and almost sure consensus in observer-based MASs are established using the backstepping method and the convergence theory for nonnegative almost supermartingales. It is shown that the separation principle remains valid for the consensus problem of linear MASs with decaying Laplace noise. Furthermore, the convergence rate is provided. Then, a joint design framework is developed for state estimation gain, feedback control gain, and noise to ensure the preservation of ε-differential privacy. The output information of each agent is shown to be protected at every time step. Finally, sufficient conditions are established for simultaneously achieving consensus and preserving differential privacy for linear MASs utilizing both full-order and reduced-order observers. Meanwhile, an ε*-differentially private consensus is achieved to meet the desired privacy level. Two simulation examples are provided to validate the theoretical results.
Learning-based Approximate Model Predictive Control for an Impact Wrench Tool
Learning-based model predictive control has emerged as a powerful approach for handling complex dynamics in mechatronic systems, enabling data-driven performance improvements while respecting safety constraints. However, when computational resources are severely limited, as in battery-powered tools with embedded processors, existing approaches struggle to meet real-time requirements. In this paper, we address the problem of real-time torque control for impact wrenches, where high-frequency control updates are necessary to accurately track the fast transients occurring during periodic impact events, while maintaining high-performance safety-critical control that mitigates harmful vibrations and component wear. The key novelty of the approach is that we combine data-driven model augmentation through Gaussian process regression with neural network approximation of the resulting control policy. This insight allows us to deploy predictive control on resource-constrained embedded platforms while maintaining both constraint satisfaction and microsecond-level inference times. The proposed framework is evaluated through numerical simulations and hardware experiments on a custom impact wrench testbed. The results show that our approach successfully achieves real-time control suitable for high-frequency operation while maintaining constraint satisfaction and improving tracking accuracy compared to baseline PID control.
Resilience of coupled systems under deep uncertainty and dynamic complexity: An integrative literature review
Resilience in coupled systems is increasingly critical in addressing global challenges such as climate change and pandemics. These systems show unpredictable behaviour due to dynamic complexity and deep uncertainty across spatiotemporal scales. Despite growing interest, few studies systematically integrate both concepts when assessing resilience. This paper conducts an integrative review of 102 English-language publications to identify gaps in current approaches. Findings reveal that most papers address lower levels of uncertainty and rarely consider dynamic complexity and deep uncertainty simultaneously, which limits the effectiveness of resilience strategies. To advance systems research, we propose a conceptual framework and practical tools to support researchers and decision-makers in evaluating and improving resilience. The paper also outlines future research directions for more robust, adaptive, and integrative resilience assessments.
A Formal Modular Synthesis Approach for the Coordination of 3-D Robotic Construction with Multi-robots
In this paper, we deal with the problem of coordinating multiple robots to build 3-D structures. This problem consists of a set of mobile robots that interact with each other in order to autonomously build a predefined 3-D structure. Our approach is based on Supervisory Control Theory, and it allows us to synthesize from models that represent a single robot and the target structure a correct-by-construction reactive controller, called supervisor. When this supervisor is replicated for the other robots, then the target structure can be completed by all robots
From Liability to Asset: A Three-Mode Grid-Forming Control Framework for Centralized Data Center UPS Systems
AI workloads are turning large data centers into highly dynamic power-electronic loads; fault-time behavior and workload pulsing can stress weak-grid points of interconnection. This paper proposes a centralized medium-voltage (MV) uninterruptible power supply (UPS) control architecture implemented as three operating modes: Mode 1 regulates a DC stiff bus and shapes normal-operation grid draw, Mode 2 enforces current-limited fault-mode P--Q priority with UPS battery energy storage system (UPS-BESS) buffering and a rate-limited post-fault "soft return," and Mode 3 optionally provides droop-based fast frequency response via grid-draw modulation. Fundamental-frequency averaged dq simulations (50 MW block, short-circuit ratio (SCR) = 1.5, 0.5 p.u. three-phase dip for 150~ms) show zero unserved information-technology (IT) energy (0.00000 MWh vs.0.00208 MWh for a momentary-cessation benchmark), a 0.57 p.u. peak inverter current (vs. 1.02 p.u. for a synchronous-reference-frame phase-locked loop (SRF-PLL) low-voltage ride-through (LVRT) baseline), a nonzero mean fault-window grid draw of 0.20~p.u. (vs.approx 0 for momentary cessation), and an improved settled point-of-common-coupling (PCC) voltage minimum of 0.79 p.u. after one cycle (vs. 0.66 p.u.). A forced-oscillation case study applies a 1 Hz pulsed load (+/- 0.25 p.u.) and shows that the normal-operation shaping filters the oscillation seen by the grid while the UPS-BESS buffers the pulsing component.
Machine Learning-based Optimal Control for Colloidal Self-Assembly
Achieving precise control of colloidal self-assembly into specific patterns remains a longstanding challenge due to the complex process dynamics. Recently, machine learning-based state representation and reinforcement learning-based control strategies have started to accumulate popularity in the field, showing great potential in achieving an automatable and generalizable approach to producing patterned colloidal assembly. In this work, we adopted a machine learning-based optimal control framework, combining unsupervised learning and graph convolutional neural work for state observation with deep reinforcement learning-based optimal control policy calculation, to provide a data-driven control approach that can potentially be generalized to other many-body self-assembly systems. With Brownian dynamics simulations, we demonstrated its superior performance as compared to traditional order parameter-based state description, and its efficacy in obtaining ordered 2-dimensional spherical colloidal self-assembly in an electric field-mediated system with an actual success rate of 97%.
comment: 19 pages, 5 figures, 1 table
Economic versus energetic model predictive control of a cold production plant with thermal energy storage
Economic model predictive control has been proposed as a means for solving the unit loading and unit allocation problem in multi-chiller cooling plants. The adjective economic stems from the use of financial cost due to electricity consumption in a time horizon, such is the loss function minimized at each sampling period. The energetic approach is rarely encountered. This article presents for the first time a comparison between the energetic optimization objective and the economic one. The comparison is made on a cooling plant using air-cooled water chillers and a cold storage system. Models developed have been integrated into Simscape, and non-convex mixed optimization methods used to achieve optimal control trajectories for both energetic and economic goals considered separately. The results over several scenarios, and in different seasons, support the consideration of the energetic approach despite the current prevalence of the economic one. The results are dependent on the electric season and the available tariffs. In particular, for the high electric season and considering a representative tariff, the results show that an increment of about 2.15% in energy consumption takes place when using the economic approach instead of the energetic one. On the other hand, a reduction in cost of 2.94% is achieved.
comment: 14 pages
Optimal chiller loading including transients
Scheduling and loading of chillers in a multi-chiller plant is considered. A new framework is introduced considering an extended set of independent variables for the optimization problem of energy consumption. In this way the number of decision variables is increased, providing extra degrees of freedom to optimize cooling plant operation. The dynamic effects due to transients arising from switching on and off of units are usually not considered in the literature dealing with Optimal Chiller Loading/Sequencing which is restricted to the static case. In this paper, these effects are treated in a way that results in a manageable optimization problem. A Simultaneous Perturbation Stochastic Approximation solution is deployed for the problem and the proposed method is compared with a similar but static approach showing the benefits in terms of reduced energy consumption.
Contraction Analysis of Filippov Solutions in Multi-Modal Piecewise Smooth Systems
This paper provides conditions to ensure contractive behavior of Filippov solutions generated by multi-modal piecewise smooth (PWS) systems. These conditions are instrumental in analyzing the asymptotic behavior of PWS systems, such as convergence towards an equilibrium point or a limit cycle. The work is motivated by a known principle for contraction analysis of bimodal PWS systems which ensures that the flow dynamics of each mode and the sliding dynamics on the switching manifold are contracting. This approach is extended first to PWS systems with multiple non-intersecting switching manifolds in Rn, and then to two intersecting switching manifolds in R2. Numerical examples are provided to validate the theoretical findings, along with a discussion on extensions to more general intersecting switching manifolds in Rn.
Using Seminorms To Analyze Contraction of Switched Systems With Only Non-Contracting Modes
This paper investigates contraction properties of switched dynamical systems for the case that all modes are non-contracting, thereby extending existing results that require at least one mode to be contracting. Leveraging the property that unstable systems may still exhibit stable behavior within certain subspaces, conditions are provided which ensure contracting evolution within a given subspace of the state space of the switched system. These conditions are derived using the concepts of seminorms and semi-contracting systems. Then, by selecting a set of subspaces whose corresponding seminorms form a separating family of the state space, and by verifying whether a given mode is contracting in each subspace, conditions on the activation time of each mode are provided by which contraction on the complete state space is guaranteed. Numerical examples are presented for illustration.
Closed Loop Reference Optimization for Extrusion Additive Manufacturing
Various defects occur during material extrusion additive manufacturing processes that degrade the quality of the 3D printed parts and lead to significant material waste. This motivates feedback control of the extrusion process to mitigate defects and prevent print failure. We propose a linear quadratic regulator (LQR) for closed-loop control with force feedback to provide accurate width tracking of the extruded filament. Furthermore, we propose preemptive optimization of the reference force given to the LQR that accounts for the performance of the LQR and generates the optimal reference for the closed loop extrusion dynamics and machine constraints. Simulation results demonstrate the improved tracking performance and response time. Experiments on a Fused Filament Fabrication 3D printer showcase a root mean square error improvement of 39.57% compared to tracking the unmodified reference as well as an 83.7% shorter settling time.
Black-Start Power Capacity Sizing and Control Strategy for an Islanded DFIG Wind-to-Hydrogen System
This paper proposes a black-start method for an off-grid wind-to-hydrogen (W2H) system comprising a wind farm based on Doubly-Fed Induction Generators (DFIGs), proton exchange membrane fuel cells (PEMFCs) serving as the black-start power source, and a hydrogen production industry. The PEMFC is installed within the hydrogen industry to facilitate direct access to hydrogen fuel. Based on the microgrid topology and black-start scheme, this study innovatively sizes the rated capacity of the PEMFC through power flow analysis. The capacity must be sufficient to charge passive components such as transmission lines and transformers, provide rotor excitation, and supply wind turbine (WT) and electrolyzer (ELZ) auxiliaries during startup. The proposed system integrates wind-hydrogen coordinated control (WHCC) and hydrogen-storage coordinated control (HSCC). Under maximum power point tracking (MPPT) of the WTs, the ELZ follows power fluctuations to absorb wind output, ensuring stable voltage and frequency. Fixed-frequency control applied to either the DFIG or PEMFC converters enables DFIGs to retain conventional grid-following (GFL) operation, reducing converter development costs. For both control modes, this paper establishes the black-start sequence and formulates a comprehensive coordinated control strategy for the entire system. The entire control system is validated through simulations in MATLAB/Simulink. Results confirm that the calculated PEMFC capacity supports reliable black-start, while the black-start control strategy ensures smooth system self-startup. Furthermore, the coordinated control strategy maintains stable frequency and voltage under fluctuating wind power, demonstrating the practicality and robustness of the proposed approach.
Consensus tracking of perturbed open multi-agent systems with repelling antagonistic interactions
An open multi-agent system (OMAS) comprises migrating agents which produce a flexible network structure that is naturally switching and size-varying. Meanwhile, agent migrations also make an OMAS more prone to environmental adversities. In this work, we deal with the consensus tracking problem of OMASs suffering these migration-induced adversities, including non-vanishing perturbations in the agent dynamics/state and the repelling antagonistic interactions among agents, over an intermittently disconnected signed digraph. The OMAS is interpreted into a perturbed $M^3D$ system in which unstable subsystems are created when repelling interactions dominate the normal cooperative ones in the OMAS network regardless of its connectivity. To handle the destabilizing effects brought by the repelling interaction as well as the non-vanishing perturbations, we extend the stability theory for $M^3D$ systems and apply it to the OMAS to show that practical consensus tracking can be achieved if the migration-induced switching satisfies the piecewise average dwell time and activation time ratio constraints. Particularly, we indicate that for vanishing perturbations and repelling interactions, asymptotic tracking can be expected under weaker switching constraints.
Synchronization, Identification, and Signal Detection for Underwater Photon-Counting Communications With Input-Dependent Shot Noise
Photon counting (PhC) is an effective detection technology for underwater optical wireless communication (OWC) systems. The presence of signal-dependent Poisson shot noise and asynchronous multi-user interference (MUI) complicates the processing of received data signals, hindering the effective signal detection of PhC OWC systems. This paper proposes a novel iterative signal detection method in grant-free, multi-user, underwater PhC OWC systems with signal-dependent Poisson shot noise. We first introduce a new synchronization algorithm with a unique frame structure design.The algorithm performs active user identification and transmission delay estimation. Specifically, the estimation is performed first on a user group basis and then at the individual user level with reduced complexity and latency.We also develop a nonlinear iterative multi-user detection (MUD) algorithm that utilizes a detection window for each user to identify interfering symbols and estimate MUI on a slot-by-slot basis, followed by maximum \textit{a-posteriori} probability detection of user signals.Simulations demonstrate that our scheme achieves bit error rates comparable to scenarios with transmission delays known and signal detection perfectly synchronized.
Perception-Limited Smooth Safety Filtering
This paper develops a smooth safety-filtering framework for nonlinear control-affine systems under limited perception. Classical Control Barrier Function (CBF) filters assume global availability of the safety function - its value and gradient must be known everywhere - an assumption incompatible with sensing-limited settings, and the resulting filters often exhibit nonsmooth switching when constraints activate. We propose two complementary perception-aware safety filters applicable to general control-invariant safety sets. The first introduces a smooth perception gate that modulates barrier constraints based on sensing range, yielding a closed-form Lipschitz-safe controller with forward-invariance guarantees. The second replaces the hard CBF constraint with a differentiable penalty term, leading to a smooth unconstrained optimization-based safety filter consistent with CBF principles. For both designs, we establish existence, uniqueness, and forward invariance of the closed-loop trajectories. Numerical results demonstrate that the proposed smooth filters enable the synthesis of higher-order tracking controllers for systems such as drones and second-order ground robots, offering substantially smoother and more robust safety-critical behaviors than classical CBF-based filters.
comment: 8 pages
Security Risks of Agentic Vehicles: A Systematic Analysis of Cognitive and Cross-Layer Threats
Agentic AI is increasingly being explored and introduced in both manually driven and autonomous vehicles, leading to the notion of Agentic Vehicles (AgVs), with capabilities such as memory-based personalization, goal interpretation, strategic reasoning, and tool-mediated assistance. While frameworks such as the OWASP Agentic AI Security Risks highlight vulnerabilities in reasoning-driven AI systems, they are not designed for safety-critical cyber-physical platforms such as vehicles, nor do they account for interactions with other layers such as perception, communication, and control layers. This paper investigates security threats in AgVs, including OWASP-style risks and cyber-attacks from other layers affecting the agentic layer. By introducing a role-based architecture for agentic vehicles, consisting of a Personal Agent and a Driving Strategy Agent, we will investigate vulnerabilities in both agentic AI layer and cross-layer risks, including risks originating from upstream layers (e.g., perception layer, control layer, etc.). A severity matrix and attack-chain analysis illustrate how small distortions can escalate into misaligned or unsafe behavior in both human-driven and autonomous vehicles. The resulting framework provides the first structured foundation for analyzing security risks of agentic AI in both current and emerging vehicle platforms.
Verifiable Natural Language to Linear Temporal Logic Translation: A Benchmark Dataset and Evaluation Suite
Empirical evaluation of state-of-the-art natural-language (NL) to temporal-logic (TL) translation systems reveals near-perfect performance on existing benchmarks. However, current studies measure only the accuracy of the translation of NL logic into formal TL, ignoring a system's capacity to ground atomic propositions into new scenarios or environments. This is a critical feature, necessary for the verification of resulting formulas in a concrete state space. Consequently, most NL-to-TL translation frameworks propose their own bespoke dataset in which the correct grounding is known a-priori, inflating performance metrics and neglecting the need for extensible, domain-general systems. In this paper, we introduce the Verifiable Linear Temporal Logic Benchmark ( VLTL-Bench), a unifying benchmark that measures verification and verifiability of automated NL-to-LTL translation. The dataset consists of four unique state spaces and thousands of diverse natural language specifications and corresponding formal specifications in temporal logic. Moreover, the benchmark contains sample traces to validate the temporal logic expressions. While the benchmark directly supports end-to-end evaluation, we observe that many frameworks decompose the process into i) lifting, ii) grounding, iii) translation, and iv) verification. The benchmark provides ground truths after each of these steps to enable researches to improve and evaluate different substeps of the overall problem. To encourage methodologically sound advances in verifiable NL-to-LTL translation approaches, we release VLTL-Bench here: https://www.kaggle.com/datasets/dubascudes/vltl bench.
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 22 pages, 6 figures
Feedback stabilization of some fourth-order nonlinear parabolic equations with saturated controls
In this work, we analyze the internal and boundary stabilization of the Cahn-Hilliard and Kuramoto-Sivashinsky equations under saturated feedback control. We conduct our study through the spectral analysis of the associated linear operator. We identify a finite number of eigenvalues related to the unstable part of the system and then design a stabilization strategy based on modal decomposition, linear matrix inequalities (LMIs), and geometric conditions on the saturation function. Local exponential stabilization in $H^{2}$ is established.
Hull Clustering with Blended Representative Periods for Energy System Optimization Models
The growing integration of renewable energy sources into power systems requires planning models to account for not only demand variability but also fluctuations in renewable availability during operational periods. Capturing this temporal detail over long planning horizons can be computationally demanding or even intractable. A common approach to address this challenge is to approximate the problem using a reduced set of selected time periods, known as representative periods (RPs). However, using too few RPs can significantly degrade solution quality. In this paper, we propose the method of hull clustering with blended RPs to enhance traditional clustering-based RP approaches in two key ways. First, instead of selecting typical cluster centers (e.g., centroids or medoids) as RPs, our method is based on extreme points, which are more likely to be constraint-binding. Second, it represents base periods as weighted combinations of RPs (e.g., convex or conic blends), approximating the full time horizon more accurately and with fewer RPs. Through two case studies based on data from the European network operators, we demonstrate that hull clustering with blended RPs outperforms traditional RP techniques in both regret and computational efficiency.
Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method
Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.
Iterative Joint Detection of Kalman Filter and Channel Decoder for Sensor-to-Controller Link in Wireless Networked Control Systems
In this letter, we propose an iterative joint detection algorithm of Kalman filter (KF) and channel decoder for the sensor-to-controller link of wireless networked control systems, which utilizes the prior information of control system to improve control and communication performance. In this algorithm, we first use the KF to estimate the probability density of the control system outputs and calculate the prior probability of received signals to assist decoder. Then, the possible outputs of the control system are traversed to update the prior probability in order to implement iterative detection. The simulation results show that the prior information and the iterative structure can reduce the block error rate performance of communications while improving the root mean square error performance of controls.
comment: This letter has been accepted by IEEE WCL
Optimizing the Network Topology of a Linear Reservoir Computer
Machine learning has become a fundamental approach for modeling, prediction, and control, enabling systems to learn from data and perform complex tasks. Reservoir computing is a machine learning tool that leverages high-dimensional dynamical systems to efficiently process temporal data for prediction and observation tasks. Traditionally, the connectivity of the network that underlies a reservoir computer (RC) is generated randomly, lacking a principled design. Here, we focus on optimizing the connectivity of a linear RC to improve its performance and interpretability, which we achieve by decoupling the RC dynamics into a number of independent modes. We then proceed to optimize each one of these modes to perform a given task, which corresponds to selecting an optimal RC connectivity in terms of a given set of eigenvalues of the RC adjacency matrix. Simulations on networks of varying sizes show that the optimized RC significantly outperforms randomly constructed reservoirs in both training and testing phases and often surpasses nonlinear reservoirs of comparable size. This approach provides both practical performance advantages and theoretical guidelines for designing efficient, task-specific, and analytically transparent RC architectures.
Data-driven uncertainty-aware seakeeping prediction of the Delft 372 catamaran using ensemble Hankel dynamic mode decomposition
In this study, we present and validate an ensemble-based Hankel Dynamic Mode Decomposition with control (HDMDc) for uncertainty-aware seakeeping predictions of a high-speed catamaran, namely the Delft 372 model. Experimental measurements (time histories) of wave elevation at the longitudinal center of gravity, heave, pitch, notional flight-deck velocity, notional bridge acceleration, and total resistance were collected from irregular wave basin tests on a 1:33.3 scale replica of the Delft 372 model under sea state 5 conditions at Fr = 0.425, and organized into training, validation, and test sets. The HDMDc algorithm constructs an equation-free linear reduced-order model of the seakeeping vessel by augmenting states and inputs with their time-lagged copies to capture nonlinear and memory effects. Two ensembling strategies, namely Bayesian HDMDc (BHDMDc), which samples hyperparameters considered stochastic variables with prior distribution to produce posterior mean forecasts with confidence intervals, and Frequentist HDMDc (FHDMDc), which aggregates multiple model obtained over data subsets, are compared in providing seakeeping prediction and uncertainty quantification. The FHDMDc approach is found to improve the accuracy of the predictions compared to the deterministic counterpart, also providing robust uncertainty estimation; whereas the application of BHDMDc to the present test case is not found beneficial in comparison to the deterministic model. FHDMDc-derived probability density functions for the motions closely match both experimental data and URANS results, demonstrating reliable and computationally efficient seakeeping prediction for design and operational support.
A Review of Software for Designing and Operating Quantum Networks
Quantum network protocol development is crucial to realizing a production-grade network that can support distributed sensing, secure communication, and utility-scale quantum computation. However, the transition from laboratory demonstration to deployable networks requires software implementations of architectures and protocols tailored to the unique constraints of quantum systems. This paper reviews the current state of software implementations for quantum networks, organized around the three-plane abstraction of infrastructure, logical, and control/service planes. We cover software for both designing quantum network protocols (e.g., SeQUeNCe, QuISP, and NetSquid) and operating them, with a focus on essential control/service plane functions such as entanglement, topology, and resource management, in a proposed taxonomy. Our review highlights a persistent gap between theoretical protocol proposals and their realization in simulators or testbeds, particularly in dynamic topology and network management. We conclude by outlining open challenges and proposing a roadmap for developing scalable software architectures to enable hybrid, large-scale quantum networks.
comment: 12 pages, 5 figures, 3 tables, journal
Accelerating Hybrid Model Predictive Control using Warm-Started Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contacts with the environment. Due to combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we propose a hybrid MPC algorithm based on Generalized Benders Decomposition. The algorithm enumerates and stores cutting planes online inside a finite buffer and transfers them across MPC iterations to provide warm-starts for new problem instances, significantly enhancing solving speed. We theoretically analyze this warm-starting performance by modeling the deviation of mode sequences through temporal shifting and stretching, deriving bounds on the dual gap between transferred optimality cuts and the true optimal costs, and utilizing these bounds to quantify the level of suboptimality guaranteed in the first solve of the Benders Master Problem. Our algorithm is validated in simulation through controlling a cart-pole system with soft contact walls, a free-flying robot navigating around obstacles, and a humanoid robot standing on one leg while pushing against walls with its hands for balance. For our benchmark problems, the algorithm enumerates cuts on the order of only tens to hundreds while reaching speeds 2-3 times faster than the off-the-shelf solver Gurobi, oftentimes exceeding 1000 Hz. The code is available at https://github.com/XuanLin/Benders-MPC.
comment: Significantly revised to emphasize theoretical bounds. The heuristic Master Problem algorithm and GCS tightening experiments are preserved in v1
Robotics
mimic-video: Video-Action Models for Generalizable Robot Control Beyond VLAs
Prevailing Vision-Language-Action Models (VLAs) for robotic manipulation are built upon vision-language backbones pretrained on large-scale, but disconnected static web data. As a result, despite improved semantic generalization, the policy must implicitly infer complex physical dynamics and temporal dependencies solely from robot trajectories. This reliance creates an unsustainable data burden, necessitating continuous, large-scale expert data collection to compensate for the lack of innate physical understanding. We contend that while vision-language pretraining effectively captures semantic priors, it remains blind to physical causality. A more effective paradigm leverages video to jointly capture semantics and visual dynamics during pretraining, thereby isolating the remaining task of low-level control. To this end, we introduce \model, a novel Video-Action Model (VAM) that pairs a pretrained Internet-scale video model with a flow matching-based action decoder conditioned on its latent representations. The decoder serves as an Inverse Dynamics Model (IDM), generating low-level robot actions from the latent representation of video-space action plans. Our extensive evaluation shows that our approach achieves state-of-the-art performance on simulated and real-world robotic manipulation tasks, improving sample efficiency by 10x and convergence speed by 2x compared to traditional VLA architectures.
An Open Toolkit for Underwater Field Robotics
Underwater robotics is becoming increasingly important for marine science, environmental monitoring, and subsea industrial operations, yet the development of underwater manipulation and actuation systems remains restricted by high costs, proprietary designs, and limited access to modular, research-oriented hardware. While open-source initiatives have democratized vehicle construction and control software, a substantial gap persists for joint-actuated systems-particularly those requiring waterproof, feedback-enabled actuation suitable for manipulators, grippers, and bioinspired devices. As a result, many research groups face lengthy development cycles, limited reproducibility, and difficulty transitioning laboratory prototypes to field-ready platforms. To address this gap, we introduce an open, cost-effective hardware and software toolkit for underwater manipulation research. The toolkit includes a depth-rated Underwater Robotic Joint (URJ) with early leakage detection, compact control and power management electronics, and a ROS2-based software stack for sensing and multi-mode actuation. All CAD models, fabrication files, PCB sources, firmware, and ROS2 packages are openly released, enabling local manufacturing, modification, and community-driven improvement. The toolkit has undergone extensive laboratory testing and multiple field deployments, demonstrating reliable operation up to 40 m depth across diverse applications, including a 3-DoF underwater manipulator, a tendon-driven soft gripper, and an underactuated sediment sampler. These results validate the robustness, versatility, and reusability of the toolkit for real marine environments. By providing a fully open, field-tested platform, this work aims to lower the barrier to entry for underwater manipulation research, improve reproducibility, and accelerate innovation in underwater field robotics.
comment: 10 pages, 8 figures
OMCL: Open-vocabulary Monte Carlo Localization
Robust robot localization is an important prerequisite for navigation planning. If the environment map was created from different sensors, robot measurements must be robustly associated with map features. In this work, we extend Monte Carlo Localization using vision-language features. These open-vocabulary features enable to robustly compute the likelihood of visual observations, given a camera pose and a 3D map created from posed RGB-D images or aligned point clouds. The abstract vision-language features enable to associate observations and map elements from different modalities. Global localization can be initialized by natural language descriptions of the objects present in the vicinity of locations. We evaluate our approach using Matterport3D and Replica for indoor scenes and demonstrate generalization on SemanticKITTI for outdoor scenes.
comment: Accepted to IEEE RA-L
QuantGraph: A Receding-Horizon Quantum Graph Solver
Dynamic programming is a cornerstone of graph-based optimization. While effective, it scales unfavorably with problem size. In this work, we present QuantGraph, a two-stage quantum-enhanced framework that casts local and global graph-optimization problems as quantum searches over discrete trajectory spaces. The solver is designed to operate efficiently by first finding a sequence of locally optimal transitions in the graph (local stage), without considering full trajectories. The accumulated cost of these transitions acts as a threshold that prunes the search space (up to 60% reduction for certain examples). The subsequent global stage, based on this threshold, refines the solution. Both stages utilize variants of the Grover-adaptive-search algorithm. To achieve scalability and robustness, we draw on principles from control theory and embed QuantGraph's global stage within a receding-horizon model-predictive-control scheme. This classical layer stabilizes and guides the quantum search, improving precision and reducing computational burden. In practice, the resulting closed-loop system exhibits robust behavior and lower overall complexity. Notably, for a fixed query budget, QuantGraph attains a 2x increase in control-discretization precision while still benefiting from Grover-search's inherent quadratic speedup compared to classical methods.
comment: P.Vaidhyanathan and A. Papatheodorou contributed equally to this work. 11 pages, 4 figures, 1 table, 2 algorithms
Load-Based Variable Transmission Mechanism for Robotic Applications
This paper presents a Load-Based Variable Transmission (LBVT) mechanism designed to enhance robotic actuation by dynamically adjusting the transmission ratio in response to external torque demands. Unlike existing variable transmission systems that require additional actuators for active control, the proposed LBVT mechanism leverages a pre-tensioned spring and a four-bar linkage to passively modify the transmission ratio, thereby reducing the complexity of robot joint actuation systems. The effectiveness of the LBVT mechanism is evaluated through simulation-based analyses. The results confirm that the system achieves up to a 40 percent increase in transmission ratio upon reaching a predefined torque threshold, effectively amplifying joint torque when required without additional actuation. Furthermore, the simulations demonstrate a torque amplification effect triggered when the applied force exceeds 18 N, highlighting the system ability to autonomously respond to varying load conditions. This research contributes to the development of lightweight, efficient, and adaptive transmission systems for robotic applications, particularly in legged robots where dynamic torque adaptation is critical.
comment: 22nd International Conference on Advanced Robotics (ICAR 2025)
Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry
Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces planarity on illusion ROIs while using a frozen teacher to preserve background knowledge, thus avoiding catastrophic forgetting. Our work provides the essential tools to diagnose and mitigate this phenomenon, urging a necessary shift in MDE evaluation from pixel-wise accuracy to structural and contextual robustness. Our code and benchmark will be publicly available to foster this exciting research direction.
MiVLA: Towards Generalizable Vision-Language-Action Model with Human-Robot Mutual Imitation Pre-training
While leveraging abundant human videos and simulated robot data poses a scalable solution to the scarcity of real-world robot data, the generalization capability of existing vision-language-action models (VLAs) remains limited by mismatches in camera views, visual appearance, and embodiment morphologies. To overcome this limitation, we propose MiVLA, a generalizable VLA empowered by human-robot mutual imitation pre-training, which leverages inherent behavioral similarity between human hands and robotic arms to build a foundation of strong behavioral priors for both human actions and robotic control. Specifically, our method utilizes kinematic rules with left/right hand coordinate systems for bidirectional alignment between human and robot action spaces. Given human or simulated robot demonstrations, MiVLA is trained to forecast behavior trajectories for one embodiment, and imitate behaviors for another one unseen in the demonstration. Based on this mutual imitation, it integrates the behavioral fidelity of real-world human data with the manipulative diversity of simulated robot data into a unified model, thereby enhancing the generalization capability for downstream tasks. Extensive experiments conducted on both simulation and real-world platforms with three robots (ARX, PiPer and LocoMan), demonstrate that MiVLA achieves strong improved generalization capability, outperforming state-of-the-art VLAs (e.g., $\boldsymbolπ_{0}$, $\boldsymbolπ_{0.5}$ and H-RDT) by 25% in simulation, and 14% in real-world robot control tasks.
Remotely Detectable Robot Policy Watermarking
The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ownership and detect unauthorized, possibly unsafe misuse. While watermarking is established in other domains, physical policies present a unique challenge: remote detection. Existing methods assume access to the robot's internal state, but auditors are often limited to external observations (e.g., video footage). This ``Physical Observation Gap'' means the watermark must be detected from signals that are noisy, asynchronous, and filtered by unknown system dynamics. We formalize this challenge using the concept of a \textit{glimpse sequence}, and introduce Colored Noise Coherency (CoNoCo), the first watermarking strategy designed for remote detection. CoNoCo embeds a spectral signal into the robot's motions by leveraging the policy's inherent stochasticity. To show it does not degrade performance, we prove CoNoCo preserves the marginal action distribution. Our experiments demonstrate strong, robust detection across various remote modalities, including motion capture and side-way/top-down video footage, in both simulated and real-world robot experiments. This work provides a necessary step toward protecting intellectual property in robotics, offering the first method for validating the provenance of physical policies non-invasively, using purely remote observations.
GuangMing-Explorer: A Four-Legged Robot Platform for Autonomous Exploration in General Environments
Autonomous exploration is a fundamental capability that tightly integrates perception, planning, control, and motion execution. It plays a critical role in a wide range of applications, including indoor target search, mapping of extreme environments, resource exploration, etc. Despite significant progress in individual components, a holistic and practical description of a completely autonomous exploration system, encompassing both hardware and software, remains scarce. In this paper, we present GuangMing-Explorer, a fully integrated autonomous exploration platform designed for robust operation across diverse environments. We provide a comprehensive overview of the system architecture, including hardware design, software stack, algorithm deployment, and experimental configuration. Extensive real-world experiments demonstrate the platform's effectiveness and efficiency in executing autonomous exploration tasks, highlighting its potential for practical deployment in complex and unstructured environments.
comment: 6 pages, published in ICUS2025
A Network-Based Framework for Modeling and Analyzing Human-Robot Coordination Strategies
Studies of human-robot interaction in dynamic and unstructured environments show that as more advanced robotic capabilities are deployed, the need for cooperative competencies to support collaboration with human problem-holders increases. Designing human-robot systems to meet these demands requires an explicit understanding of the work functions and constraints that shape the feasibility of alternative joint work strategies. Yet existing human-robot interaction frameworks either emphasize computational support for real-time execution or rely on static representations for design, offering limited support for reasoning about coordination dynamics during early-stage conceptual design. To address this gap, this article presents a novel computational framework for analyzing joint work strategies in human-robot systems by integrating techniques from functional modeling with graph-theoretic representations. The framework characterizes collective work in terms of the relationships among system functions and the physical and informational structure of the work environment, while explicitly capturing how coordination demands evolve over time. Its use during conceptual design is demonstrated through a case study in disaster robotics, which shows how the framework can be used to support early trade-space exploration of human-robot coordination strategies and to identify cooperative competencies that support flexible management of coordination overhead. These results show how the framework makes coordination demands and their temporal evolution explicit, supporting design-time reasoning about cooperative competency requirements and work demands prior to implementation.
comment: Under review at IEEE Transactions on Human-Machine Systems. 12 pages, 5 figures
VLA-AN: An Efficient and Onboard Vision-Language-Action Framework for Aerial Navigation in Complex Environments
This paper proposes VLA-AN, an efficient and onboard Vision-Language-Action (VLA) framework dedicated to autonomous drone navigation in complex environments. VLA-AN addresses four major limitations of existing large aerial navigation models: the data domain gap, insufficient temporal navigation with reasoning, safety issues with generative action policies, and onboard deployment constraints. First, we construct a high-fidelity dataset utilizing 3D Gaussian Splatting (3D-GS) to effectively bridge the domain gap. Second, we introduce a progressive three-stage training framework that sequentially reinforces scene comprehension, core flight skills, and complex navigation capabilities. Third, we design a lightweight, real-time action module coupled with geometric safety correction. This module ensures fast, collision-free, and stable command generation, mitigating the safety risks inherent in stochastic generative policies. Finally, through deep optimization of the onboard deployment pipeline, VLA-AN achieves a robust real-time 8.3x improvement in inference throughput on resource-constrained UAVs. Extensive experiments demonstrate that VLA-AN significantly improves spatial grounding, scene reasoning, and long-horizon navigation, achieving a maximum single-task success rate of 98.1%, and providing an efficient, practical solution for realizing full-chain closed-loop autonomy in lightweight aerial robots.
Infrastructure-based Autonomous Mobile Robots for Internal Logistics -- Challenges and Future Perspectives
The adoption of Autonomous Mobile Robots (AMRs) for internal logistics is accelerating, with most solutions emphasizing decentralized, onboard intelligence. While AMRs in indoor environments like factories can be supported by infrastructure, involving external sensors and computational resources, such systems remain underexplored in the literature. This paper presents a comprehensive overview of infrastructure-based AMR systems, outlining key opportunities and challenges. To support this, we introduce a reference architecture combining infrastructure-based sensing, on-premise cloud computing, and onboard autonomy. Based on the architecture, we review core technologies for localization, perception, and planning. We demonstrate the approach in a real-world deployment in a heavy-vehicle manufacturing environment and summarize findings from a user experience (UX) evaluation. Our aim is to provide a holistic foundation for future development of scalable, robust, and human-compatible AMR systems in complex industrial environments.
EPSM: A Novel Metric to Evaluate the Safety of Environmental Perception in Autonomous Driving
Extensive evaluation of perception systems is crucial for ensuring the safety of intelligent vehicles in complex driving scenarios. Conventional performance metrics such as precision, recall and the F1-score assess the overall detection accuracy, but they do not consider the safety-relevant aspects of perception. Consequently, perception systems that achieve high scores in these metrics may still cause misdetections that could lead to severe accidents. Therefore, it is important to evaluate not only the overall performance of perception systems, but also their safety. We therefore introduce a novel safety metric for jointly evaluating the most critical perception tasks, object and lane detection. Our proposed framework integrates a new, lightweight object safety metric that quantifies the potential risk associated with object detection errors, as well as an lane safety metric including the interdependence between both tasks that can occur in safety evaluation. The resulting combined safety score provides a unified, interpretable measure of perception safety performance. Using the DeepAccident dataset, we demonstrate that our approach identifies safety critical perception errors that conventional performance metrics fail to capture. Our findings emphasize the importance of safety-centric evaluation methods for perception systems in autonomous driving.
comment: Submitted at IEEE IV 2026
Criticality Metrics for Relevance Classification in Safety Evaluation of Object Detection in Automated Driving
Ensuring safety is the primary objective of automated driving, which necessitates a comprehensive and accurate perception of the environment. While numerous performance evaluation metrics exist for assessing perception capabilities, incorporating safety-specific metrics is essential to reliably evaluate object detection systems. A key component for safety evaluation is the ability to distinguish between relevant and non-relevant objects - a challenge addressed by criticality or relevance metrics. This paper presents the first in-depth analysis of criticality metrics for safety evaluation of object detection systems. Through a comprehensive review of existing literature, we identify and assess a range of applicable metrics. Their effectiveness is empirically validated using the DeepAccident dataset, which features a variety of safety-critical scenarios. To enhance evaluation accuracy, we propose two novel application strategies: bidirectional criticality rating and multi-metric aggregation. Our approach demonstrates up to a 100% improvement in terms of criticality classification accuracy, highlighting its potential to significantly advance the safety evaluation of object detection systems in automated vehicles.
comment: Accepted at IEEE ICVES 2025
I am here for you": How relational conversational AI appeals to adolescents, especially those who are socially and emotionally vulnerable
General-purpose conversational AI chatbots and AI companions increasingly provide young adolescents with emotionally supportive conversations, raising questions about how conversational style shapes anthropomorphism and emotional reliance. In a preregistered online experiment with 284 adolescent-parent dyads, youth aged 11-15 and their parents read two matched transcripts in which a chatbot responded to an everyday social problem using either a relational style (first-person, affiliative, commitment language) or a transparent style (explicit nonhumanness, informational tone). Adolescents more often preferred the relational than the transparent style, whereas parents were more likely to prefer transparent style than adolescents. Adolescents rated the relational chatbot as more human-like, likable, trustworthy and emotionally close, while perceiving both styles as similarly helpful. Adolescents who preferred relational style had lower family and peer relationship quality and higher stress and anxiety than those preferring transparent style or both chatbots. These findings identify conversational style as a key design lever for youth AI safety, showing that relational framing heightens anthropomorphism, trust and emotional closeness and can be especially appealing to socially and emotionally vulnerable adolescents, who may be at increased risk for emotional reliance on conversational AI.
BEV-Patch-PF: Particle Filtering with BEV-Aerial Feature Matching for Off-Road Geo-Localization
We propose BEV-Patch-PF, a GPS-free sequential geo-localization system that integrates a particle filter with learned bird's-eye-view (BEV) and aerial feature maps. From onboard RGB and depth images, we construct a BEV feature map. For each 3-DoF particle pose hypothesis, we crop the corresponding patch from an aerial feature map computed from a local aerial image queried around the approximate location. BEV-Patch-PF computes a per-particle log-likelihood by matching the BEV feature to the aerial patch feature. On two real-world off-road datasets, our method achieves 7.5x lower absolute trajectory error (ATE) on seen routes and 7.0x lower ATE on unseen routes than a retrieval-based baseline, while maintaining accuracy under dense canopy and shadow. The system runs in real time at 10 Hz on an NVIDIA Tesla T4, enabling practical robot deployment.
NAP3D: NeRF Assisted 3D-3D Pose Alignment for Autonomous Vehicles
Accurate localization is essential for autonomous vehicles, yet sensor noise and drift over time can lead to significant pose estimation errors, particularly in long-horizon environments. A common strategy for correcting accumulated error is visual loop closure in SLAM, which adjusts the pose graph when the agent revisits previously mapped locations. These techniques typically rely on identifying visual mappings between the current view and previously observed scenes and often require fusing data from multiple sensors. In contrast, this work introduces NeRF-Assisted 3D-3D Pose Alignment (NAP3D), a complementary approach that leverages 3D-3D correspondences between the agent's current depth image and a pre-trained Neural Radiance Field (NeRF). By directly aligning 3D points from the observed scene with synthesized points from the NeRF, NAP3D refines the estimated pose even from novel viewpoints, without relying on revisiting previously observed locations. This robust 3D-3D formulation provides advantages over conventional 2D-3D localization methods while remaining comparable in accuracy and applicability. Experiments demonstrate that NAP3D achieves camera pose correction within 5 cm on a custom dataset, robustly outperforming a 2D-3D Perspective-N-Point baseline. On TUM RGB-D, NAP3D consistently improves 3D alignment RMSE by approximately 6 cm compared to this baseline given varying noise, despite PnP achieving lower raw rotation and translation parameter error in some regimes, highlighting NAP3D's improved geometric consistency in 3D space. By providing a lightweight, dataset-agnostic tool, NAP3D complements existing SLAM and localization pipelines when traditional loop closure is unavailable.
comment: 10 pages, 5 figures, 2 tables
HERO: Hierarchical Traversable 3D Scene Graphs for Embodied Navigation Among Movable Obstacles
3D Scene Graphs (3DSGs) constitute a powerful representation of the physical world, distinguished by their abilities to explicitly model the complex spatial, semantic, and functional relationships between entities, rendering a foundational understanding that enables agents to interact intelligently with their environment and execute versatile behaviors. Embodied navigation, as a crucial component of such capabilities, leverages the compact and expressive nature of 3DSGs to enable long-horizon reasoning and planning in complex, large-scale environments. However, prior works rely on a static-world assumption, defining traversable space solely based on static spatial layouts and thereby treating interactable obstacles as non-traversable. This fundamental limitation severely undermines their effectiveness in real-world scenarios, leading to limited reachability, low efficiency, and inferior extensibility. To address these issues, we propose HERO, a novel framework for constructing Hierarchical Traversable 3DSGs, that redefines traversability by modeling operable obstacles as pathways, capturing their physical interactivity, functional semantics, and the scene's relational hierarchy. The results show that, relative to its baseline, HERO reduces PL by 35.1% in partially obstructed environments and increases SR by 79.4% in fully obstructed ones, demonstrating substantially higher efficiency and reachability.
ISS Policy : Scalable Diffusion Policy with Implicit Scene Supervision
Vision-based imitation learning has enabled impressive robotic manipulation skills, but its reliance on object appearance while ignoring the underlying 3D scene structure leads to low training efficiency and poor generalization. To address these challenges, we introduce \emph{Implicit Scene Supervision (ISS) Policy}, a 3D visuomotor DiT-based diffusion policy that predicts sequences of continuous actions from point cloud observations. We extend DiT with a novel implicit scene supervision module that encourages the model to produce outputs consistent with the scene's geometric evolution, thereby improving the performance and robustness of the policy. Notably, ISS Policy achieves state-of-the-art performance on both single-arm manipulation tasks (MetaWorld) and dexterous hand manipulation (Adroit). In real-world experiments, it also demonstrates strong generalization and robustness. Additional ablation studies show that our method scales effectively with both data and parameters. Code and videos will be released.
SWIFT-Nav: Stability-Aware Waypoint-Level TD3 with Fuzzy Arbitration for UAV Navigation in Cluttered Environments
Efficient and reliable UAV navigation in cluttered and dynamic environments remains challenging. We propose SWIFT-Nav: Stability-aware Waypoint-level Integration of Fuzzy arbitration and TD3 for Navigation, a TD3-based navigation framework that achieves fast, stable convergence to obstacle-aware paths. The system couples a sensor-driven perception front end with a TD3 waypoint policy: the perception module converts LiDAR ranges into a confidence-weighted safety map and goal cues, while the TD3 policy is trained with Prioritised Experience Replay to focus on high-error transitions and a decaying epsilon-greedy exploration schedule that gradually shifts from exploration to exploitation. A lightweight fuzzy-logic layer computes a safety score from radial measurements and near obstacles, gates mode switching and clamps unsafe actions; in parallel, task-aligned reward shaping combining goal progress, clearance, and switch-economy terms provides dense, well-scaled feedback that accelerates learning. Implemented in Webots with proximity-based collision checking, our approach consistently outperforms baselines in trajectory smoothness and generalization to unseen layouts, while preserving real-time responsiveness. These results show that combining TD3 with replay prioritisation, calibrated exploration, and fuzzy-safety rules yields a robust and deployable solution for UAV navigation in cluttered scenes.
comment: 10 pages, Accepted at Australasian Conference on Robotics and Automation (ACRA) 2025
Maintaining the Level of a Payload carried by Multi-Robot System on Irregular Surface
In this paper, we introduce a multi robot payload transport system to carry payloads through an environment of unknown and uneven inclinations while maintaining the desired orientation of the payload. For this task, we used custom built robots with a linear actuator (pistons) mounted on top of each robot. The system continuously monitors the payload's orientation and computes the required piston height of each robot to maintain the desired orientation of the payload. In this work, we propose an open loop controller coupled with a closed loop PID controller to achieve the goal. As our modelling makes no assumptions on the type of terrain, the system can work on any unknown and uneven terrains and inclinations. We showcase the efficacy of our proposed controller by testing it on various simulated environments with varied and complex terrains.
Few-Shot Inference of Human Perceptions of Robot Performance in Social Navigation Scenarios
Understanding how humans evaluate robot behavior during human-robot interactions is crucial for developing socially aware robots that behave according to human expectations. While the traditional approach to capturing these evaluations is to conduct a user study, recent work has proposed utilizing machine learning instead. However, existing data-driven methods require large amounts of labeled data, which limits their use in practice. To address this gap, we propose leveraging the few-shot learning capabilities of Large Language Models (LLMs) to improve how well a robot can predict a user's perception of its performance, and study this idea experimentally in social navigation tasks. To this end, we extend the SEAN TOGETHER dataset with additional real-world human-robot navigation episodes and participant feedback. Using this augmented dataset, we evaluate the ability of several LLMs to predict human perceptions of robot performance from a small number of in-context examples, based on observed spatio-temporal cues of the robot and surrounding human motion. Our results demonstrate that LLMs can match or exceed the performance of traditional supervised learning models while requiring an order of magnitude fewer labeled instances. We further show that prediction performance can improve with more in-context examples, confirming the scalability of our approach. Additionally, we investigate what kind of sensor-based information an LLM relies on to make these inferences by conducting an ablation study on the input features considered for performance prediction. Finally, we explore the novel application of personalized examples for in-context learning, i.e., drawn from the same user being evaluated, finding that they further enhance prediction accuracy. This work paves the path to improving robot behavior in a scalable manner through user-centered feedback.
dLITE: Differentiable Lighting-Informed Trajectory Evaluation for On-Orbit Inspection
Visual inspection of space-borne assets is of increasing interest to spacecraft operators looking to plan maintenance, characterise damage, and extend the life of high-value satellites in orbit. The environment of Low Earth Orbit (LEO) presents unique challenges when planning inspection operations that maximise visibility, information, and data quality. Specular reflection of sunlight from spacecraft bodies, self-shadowing, and dynamic lighting in LEO significantly impact the quality of data captured throughout an orbit. This is exacerbated by the relative motion between spacecraft, which introduces variable imaging distances and attitudes during inspection. Planning inspection trajectories with the aide of simulation is a common approach. However, the ability to design and optimise an inspection trajectory specifically to improve the resulting image quality in proximity operations remains largely unexplored. In this work, we present $\partial$LITE, an end-to-end differentiable simulation pipeline for on-orbit inspection operations. We leverage state-of-the-art differentiable rendering tools and a custom orbit propagator to enable end-to-end optimisation of orbital parameters based on visual sensor data. $\partial$LITE enables us to automatically design non-obvious trajectories, vastly improving the quality and usefulness of attained data. To our knowledge, our differentiable inspection-planning pipeline is the first of its kind and provides new insights into modern computational approaches to spacecraft mission planning. Project page: https://appearance-aware.github.io/dlite/
comment: 13 pages, 9 images
SORS: A Modular, High-Fidelity Simulator for Soft Robots
The deployment of complex soft robots in multiphysics environments requires advanced simulation frameworks that not only capture interactions between different types of material, but also translate accurately to real-world performance. Soft robots pose unique modeling challenges due to their large nonlinear deformations, material incompressibility, and contact interactions, which complicate both numerical stability and physical accuracy. Despite recent progress, robotic simulators often struggle with modeling such phenomena in a scalable and application-relevant manner. We present SORS (Soft Over Rigid Simulator), a versatile, high-fidelity simulator designed to handle these complexities for soft robot applications. Our energy-based framework, built on the finite element method, allows modular extensions, enabling the inclusion of custom-designed material and actuation models. To ensure physically consistent contact handling, we integrate a constrained nonlinear optimization based on sequential quadratic programming, allowing for stable and accurate modeling of contact phenomena. We validate our simulator through a diverse set of real-world experiments, which include cantilever deflection, pressure-actuation of a soft robotic arm, and contact interactions from the PokeFlex dataset. In addition, we showcase the potential of our framework for control optimization of a soft robotic leg. These tests confirm that our simulator can capture both fundamental material behavior and complex actuation dynamics with high physical fidelity. By bridging the sim-to-real gap in these challenging domains, our approach provides a validated tool for prototyping next-generation soft robots, filling the gap of extensibility, fidelity, and usability in the soft robotic ecosystem.
comment: This work has been submitted to the IEEE for possible publication. Code and data are available at github.com/srl-ethz/sors
R4: Retrieval-Augmented Reasoning for Vision-Language Models in 4D Spatio-Temporal Space
Humans perceive and reason about their surroundings in four dimensions by building persistent, structured internal representations that encode semantic meaning, spatial layout, and temporal dynamics. These multimodal memories enable them to recall past events, infer unobserved states, and integrate new information into context-dependent reasoning. Inspired by this capability, we introduce R4, a training-free framework for retrieval-augmented reasoning in 4D spatio-temporal space that equips vision-language models (VLMs) with structured, lifelong memory. R4 continuously constructs a 4D knowledge database by anchoring object-level semantic descriptions in metric space and time, yielding a persistent world model that can be shared across agents. At inference, natural language queries are decomposed into semantic, spatial, and temporal keys to retrieve relevant observations, which are integrated into the VLM's reasoning. Unlike classical retrieval-augmented generation methods, retrieval in R4 operates directly in 4D space, enabling episodic and collaborative reasoning without training. Experiments on embodied question answering and navigation benchmarks demonstrate that R4 substantially improves retrieval and reasoning over spatio-temporal information compared to baselines, advancing a new paradigm for embodied 4D reasoning in dynamic environments.
Large Video Planner Enables Generalizable Robot Control
General-purpose robots require decision-making models that generalize across diverse tasks and environments. Recent works build robot foundation models by extending multimodal large language models (MLLMs) with action outputs, creating vision-language-action (VLA) systems. These efforts are motivated by the intuition that MLLMs' large-scale language and image pretraining can be effectively transferred to the action output modality. In this work, we explore an alternative paradigm of using large-scale video pretraining as a primary modality for building robot foundation models. Unlike static images and language, videos capture spatio-temporal sequences of states and actions in the physical world that are naturally aligned with robotic behavior. We curate an internet-scale video dataset of human activities and task demonstrations, and train, for the first time at a foundation-model scale, an open video model for generative robotics planning. The model produces zero-shot video plans for novel scenes and tasks, which we post-process to extract executable robot actions. We evaluate task-level generalization through third-party selected tasks in the wild and real-robot experiments, demonstrating successful physical execution. Together, these results show robust instruction following, strong generalization, and real-world feasibility. We release both the model and dataset to support open, reproducible video-based robot learning. Our website is available at https://www.boyuan.space/large-video-planner/.
comment: 29 pages, 16 figures
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filter does not require any knowledge of the system dynamics for both synthesis and runtime safety monitoring and intervention, our method applies readily to complex, black-box shared autonomy systems. Notably, our HCSF's CBF-based interventions modify the human's actions minimally and smoothly, avoiding the abrupt, last-moment corrections delivered by many conventional safety filters. We validate our approach in a comprehensive in-person user study using Assetto Corsa-a high-fidelity car racing simulator with black-box dynamics-to assess robustness in "driving on the edge" scenarios. We compare both trajectory data and drivers' perceptions of our HCSF assistance against unassisted driving and a conventional safety filter. Experimental results show that 1) compared to having no assistance, our HCSF improves both safety and user satisfaction without compromising human agency or comfort, and 2) relative to a conventional safety filter, our proposed HCSF boosts human agency, comfort, and satisfaction while maintaining robustness.
comment: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16 figures, 7 tables Updates for v4: typos in Appendix Subsection A revised
ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics
Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements. Project page: https://brittonjordan.github.io/probe_mde/
comment: 9 pages, 5 figures. Project page: https://brittonjordan.github.io/probe_mde/
New Location Science Models with Applications to UAV-Based Disaster Relief
Natural and human-made disasters can cause severe devastation and claim thousands of lives worldwide. Therefore, developing efficient methods for disaster response and management is a critical task for relief teams. One of the most essential components of effective response is the rapid collection of information about affected areas, damages, and victims. More data translates into better coordination, faster rescue operations, and ultimately, more lives saved. However, in some disasters, such as earthquakes, the communication infrastructure is often partially or completely destroyed, making it extremely difficult for victims to send distress signals and for rescue teams to locate and assist them in time. Unmanned Aerial Vehicles (UAVs) have emerged as valuable tools in such scenarios. In particular, a fleet of UAVs can be dispatched from a mobile station to the affected area to facilitate data collection and establish temporary communication networks. Nevertheless, real-world deployment of UAVs faces several challenges, with adverse weather conditions--especially wind--being among the most significant. To address this, we develop a novel mathematical framework to determine the optimal location of a mobile UAV station while explicitly accounting for the heterogeneity of the UAVs and the effect of wind. In particular, we generalize the Sylvester problem to introduce the Sylvester-Fermat-Torricelli (SFT) problem, which captures complex factors such as wind influence, UAV heterogeneity, and back-and-forth motion within a unified framework. The proposed framework enhances the practicality of UAV-based disaster response planning by accounting for real-world factors such as wind and UAV heterogeneity. Experimental results demonstrate that it can reduce wasted operational time by up to 84%, making post-disaster missions significantly more efficient and effective.
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Supervisory Measurement-Guided Noise Covariance Estimation
Reliable state estimation hinges on accurate specification of sensor noise covariances, which weigh heterogeneous measurements. In practice, these covariances are difficult to identify due to environmental variability, front-end preprocessing, and other reasons. We address this by formulating noise covariance estimation as a bilevel optimization that, from a Bayesian perspective, factorizes the joint likelihood of so-called odometry and supervisory measurements, thereby balancing information utilization with computational efficiency. The factorization converts the nested Bayesian dependency into a chain structure, enabling efficient parallel computation: at the lower level, an invariant extended Kalman filter with state augmentation estimates trajectories, while a derivative filter computes analytical gradients in parallel for upper-level gradient updates. The upper level refines the covariance to guide the lower-level estimation. Experiments on synthetic and real-world datasets show that our method achieves higher efficiency over existing baselines.
Integration of UWB Radar on Mobile Robots for Continuous Obstacle and Environment Mapping
This paper presents an infrastructure-free approach for obstacle detection and environmental mapping using ultra-wideband (UWB) radar mounted on a mobile robotic platform. Traditional sensing modalities such as visual cameras and Light Detection and Ranging (LiDAR) fail in environments with poor visibility due to darkness, smoke, or reflective surfaces. In these visioned-impaired conditions, UWB radar offers a promising alternative. To this end, this work explores the suitability of robot-mounted UWB radar for environmental mapping in dynamic, anchor-free scenarios. The study investigates how different materials (metal, concrete and plywood) and UWB radio channels (5 and 9) influence the Channel Impulse Response (CIR). Furthermore, a processing pipeline is proposed to achieve reliable mapping of detected obstacles, consisting of 3 steps: (i) target identification (based on CIR peak detection), (ii) filtering (based on peak properties, signal-to-noise score, and phase-difference of arrival), and (iii) clustering (based on distance estimation and angle-of-arrival estimation). The proposed approach successfully reduces noise and multipath effects, resulting in an obstacle detection precision of at least 90.71% and a recall of 88.40% on channel 9 even when detecting low-reflective materials such as concrete. This work offers a foundation for further development of UWB-based localisation and mapping (SLAM) systems that do not rely on visual features and, unlike conventional UWB localisation systems, do not require on fixed anchor nodes for triangulation.
comment: This paper has been submitted to IEEE Access Journal and is currently undergoing review
Event Camera Meets Mobile Embodied Perception: Abstraction, Algorithm, Acceleration, Application
With the increasing complexity of mobile device applications, these devices are evolving toward high agility. This shift imposes new demands on mobile sensing, particularly in achieving high-accuracy and low-latency. Event-based vision has emerged as a disruptive paradigm, offering high temporal resolution and low latency, making it well-suited for high-accuracy and low-latency sensing tasks on high-agility platforms. However, the presence of substantial noisy events, lack of stable, persistent semantic information, and large data volume pose challenges for event-based data processing on resource-constrained mobile devices. This paper surveys the literature from 2014 to 2025 and presents a comprehensive overview of event-based mobile sensing, encompassing its fundamental principles, event \textit{abstraction} methods, \textit{algorithm} advancements, and both hardware and software \textit{acceleration} strategies. We discuss key \textit{applications} of event cameras in mobile sensing, including visual odometry, object tracking, optical flow, and 3D reconstruction, while highlighting challenges associated with event data processing, sensor fusion, and real-time deployment. Furthermore, we outline future research directions, such as improving the event camera with advanced optics, leveraging neuromorphic computing for efficient processing, and integrating bio-inspired algorithms. To support ongoing research, we provide an open-source \textit{Online Sheet} with recent developments. We hope this survey serves as a reference, facilitating the adoption of event-based vision across diverse applications.
comment: Accepted by ACM CSUR,35 pages
Registering the 4D Millimeter Wave Radar Point Clouds Via Generalized Method of Moments
4D millimeter wave radars (4D radars) are new emerging sensors that provide point clouds of objects with both position and radial velocity measurements. Compared to LiDARs, they are more affordable and reliable sensors for robots' perception under extreme weather conditions. On the other hand, point cloud registration is an essential perception module that provides robot's pose feedback information in applications such as Simultaneous Localization and Mapping (SLAM). Nevertheless, the 4D radar point clouds are sparse and noisy compared to those of LiDAR, and hence we shall confront great challenges in registering the radar point clouds. To address this issue, we propose a point cloud registration framework for 4D radars based on Generalized Method of Moments. The method does not require explicit point-to-point correspondences between the source and target point clouds, which is difficult to compute for sparse 4D radar point clouds. Moreover, we show the consistency of the proposed method. Experiments on both synthetic and real-world datasets show that our approach achieves higher accuracy and robustness than benchmarks, and the accuracy is even comparable to LiDAR-based frameworks.
SignBot: Learning Human-to-Humanoid Sign Language Interaction
Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.
Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning
Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.
Demonstration Sidetracks: Categorizing Systematic Non-Optimality in Human Demonstrations
Learning from Demonstration (LfD) is a popular approach for robots to acquire new skills, but most LfD methods suffer from imperfections in human demonstrations. Prior work typically treats these suboptimalities as random noise. In this paper we study non-optimal behaviors in non-expert demonstrations and show that they are systematic, forming what we call demonstration sidetracks. Using a public space study with 40 participants performing a long-horizon robot task, we recreated the setup in simulation and annotated all demonstrations. We identify four types of sidetracks (Exploration, Mistake, Alignment, Pause) and one control pattern (one-dimension control). Sidetracks appear frequently across participants, and their temporal and spatial distribution is tied to task context. We also find that users' control patterns depend on the control interface. These insights point to the need for better models of suboptimal demonstrations to improve LfD algorithms and bridge the gap between lab training and real-world deployment. All demonstrations, infrastructure, and annotations are available at https://github.com/AABL-Lab/Human-Demonstration-Sidetracks.
Off-Road Navigation via Implicit Neural Representation of Terrain Traversability
Autonomous off-road navigation requires robots to estimate terrain traversability from onboard sensors and plan accordingly. Conventional approaches typically rely on sampling-based planners such as MPPI to generate short-term control actions that aim to minimize traversal time and risk measures derived from the traversability estimates. These planners can react quickly but optimize only over a short look-ahead window, limiting their ability to reason about the full path geometry, which is important for navigating in challenging off-road environments. Moreover, they lack the ability to adjust speed based on the terrain bumpiness, which is important for smooth navigation on challenging terrains. In this paper, we introduce TRAIL (Traversability with an Implicit Learned Representation), an off-road navigation framework that leverages an implicit neural representation to continuously parameterize terrain properties. This representation yields spatial gradients that enable integration with a novel gradient-based trajectory optimization method that adapts the path geometry and speed profile based on terrain traversability.
comment: 9 pages
Dexterous Manipulation through Imitation Learning: A Survey
Dexterous manipulation, which refers to the ability of a robotic hand or multi-fingered end-effector to skillfully control, reorient, and manipulate objects through precise, coordinated finger movements and adaptive force modulation, enables complex interactions similar to human hand dexterity. With recent advances in robotics and machine learning, there is a growing demand for these systems to operate in complex and unstructured environments. Traditional model-based approaches struggle to generalize across tasks and object variations due to the high dimensionality and complex contact dynamics of dexterous manipulation. Although model-free methods such as reinforcement learning (RL) show promise, they require extensive training, large-scale interaction data, and carefully designed rewards for stability and effectiveness. Imitation learning (IL) offers an alternative by allowing robots to acquire dexterous manipulation skills directly from expert demonstrations, capturing fine-grained coordination and contact dynamics while bypassing the need for explicit modeling and large-scale trial-and-error. This survey provides an overview of dexterous manipulation methods based on imitation learning, details recent advances, and addresses key challenges in the field. Additionally, it explores potential research directions to enhance IL-driven dexterous manipulation. Our goal is to offer researchers and practitioners a comprehensive introduction to this rapidly evolving domain.
comment: 32pages, 6 figures, 9 tables
CaFe-TeleVision: A Coarse-to-Fine Teleoperation System with Immersive Situated Visualization for Enhanced Ergonomics
Teleoperation presents a promising paradigm for remote control and robot proprioceptive data collection. Despite recent progress, current teleoperation systems still suffer from limitations in efficiency and ergonomics, particularly in challenging scenarios. In this paper, we propose CaFe-TeleVision, a coarse-to-fine teleoperation system with immersive situated visualization for enhanced ergonomics. At its core, a coarse-to-fine control mechanism is proposed in the retargeting module to bridge workspace disparities, jointly optimizing efficiency and physical ergonomics. To stream immersive feedback with adequate visual cues for human vision systems, an on-demand situated visualization technique is integrated in the perception module, which reduces the cognitive load for multi-view processing. The system is built on a humanoid collaborative robot and validated with six challenging bimanual manipulation tasks. User study among 24 participants confirms that CaFe-TeleVision enhances ergonomics with statistical significance, indicating a lower task load and a higher user acceptance during teleoperation. Quantitative results also validate the superior performance of our system across six tasks, surpassing comparative methods by up to 28.89% in success rate and accelerating by 26.81% in completion time. Project webpage: https://clover-cuhk.github.io/cafe_television/
comment: Project webpage: https://clover-cuhk.github.io/cafe_television/ Code: https://github.com/Zixin-Tang/CaFe-TeleVision
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects. To address this limitation, this work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline. Specifically, HETT first predicts coarse-grained target positions by fusing spatial landmarks and historical context, then refines actions via fine-grained visual analysis. In addition, a historical grid map is designed to dynamically aggregate visual features into a structured spatial memory, enhancing comprehensive scene awareness. Additionally, the CityNav dataset annotations are manually refined to enhance data quality. Experiments on the refined CityNav dataset show that HETT delivers significant performance gains, while extensive ablation studies further verify the effectiveness of each component.
Fast and Continual Learning for Hybrid Control Policies using Generalized Benders Decomposition
Hybrid model predictive control with both continuous and discrete variables is widely applicable to robotic control tasks, especially those involving contact with the environment. Due to the combinatorial complexity, the solving speed of hybrid MPC can be insufficient for real-time applications. In this paper, we proposed a hybrid MPC solver based on Generalized Benders Decomposition (GBD). The algorithm enumerates and stores cutting planes online inside a finite buffer. After a short cold-start phase, the stored cuts provide warm-starts for the new problem instances to enhance the solving speed. Despite the disturbance and randomly changing environment, the solving speed maintains. Leveraging on the sparsity of feasibility cuts, we also propose a fast algorithm for Benders master problems. Our solver is validated through controlling a cart-pole system with randomly moving soft contact walls, and a free-flying robot navigating around obstacles. The results show that with significantly less data than previous works, the solver reaches competitive speeds to the off-the-shelf solver Gurobi despite the Python overhead.
comment: This paper has been withdrawn by the author. It has been superseded by a significantly updated version available at arXiv:2406.00780
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV's viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA's photorealistic rendering with NASA's GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
comment: To be Published in AIAA SciTech 2026
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
Large language models leverage internet-scale text data, yet embodied AI remains constrained by the prohibitive costs of physical trajectory collection. Desktop environments -- particularly gaming -- offer a compelling alternative: they provide rich sensorimotor interactions at scale while maintaining the structured observation-action coupling essential for embodied learning. We present D2E (Desktop to Embodied AI), a framework that demonstrates desktop interactions can serve as an effective pretraining substrate for robotics embodied AI tasks. Unlike prior work that remained domain-specific (e.g., VPT for Minecraft) or kept data proprietary (e.g., SIMA), D2E establishes a complete pipeline from scalable desktop data collection to verified transfer in embodied domains. Our framework comprises three components: (1) the OWA Toolkit that unifies diverse desktop interactions into a standardized format with 152x compression, (2) the Generalist-IDM that achieves strong zero-shot generalization across unseen games through timestamp-based event prediction, enabling internet-scale pseudo-labeling, and (3) VAPT that transfers desktop-pretrained representations to physical manipulation and navigation. Using 1.3K+ hours of data (259 hours of human demonstrations, and 1K+ hours of pseudo-labeled gameplay), we achieve a total of 96.6% success rate on LIBERO manipulation and 83.3% on CANVAS navigation benchmarks. This validates that sensorimotor primitives in digital interactions exhibit sufficient invariance to transfer meaningfully to physical embodied tasks, establishing desktop pretraining as a practical paradigm for robotics. We will make all our work public, including the OWA toolkit, datasets of human-collected and pseudo-labeled, and VAPT-trained models available at https://worv-ai.github.io/d2e/
Multiagent Systems
Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis
Polycystic Ovary Syndrome (PCOS) constitutes a significant public health issue affecting 10% of reproductive-aged women, highlighting the critical importance of developing effective diagnostic tools. Previous machine learning and deep learning detection tools are constrained by their reliance on large-scale labeled data and an lack of interpretability. Although multi-agent systems have demonstrated robust capabilities, the potential of such systems for PCOS detection remains largely unexplored. Existing medical multi-agent frameworks are predominantly designed for general medical tasks, suffering from insufficient domain integration and a lack of specific domain knowledge. To address these challenges, we propose Mapis, the first knowledge-grounded multi-agent framework explicitly designed for guideline-based PCOS diagnosis. Specifically, it built upon the 2023 International Guideline into a structured collaborative workflow that simulates the clinical diagnostic process. It decouples complex diagnostic tasks across specialized agents: a gynecological endocrine agent and a radiology agent collaborative to verify inclusion criteria, while an exclusion agent strictly rules out other causes. Furthermore, we construct a comprehensive PCOS knowledge graph to ensure verifiable, evidence-based decision-making. Extensive experiments on public benchmarks and specialized clinical datasets, benchmarking against nine diverse baselines, demonstrate that Mapis significantly outperforms competitive methods. On the clinical dataset, it surpasses traditional machine learning models by 13.56%, single-agent by 6.55%, and previous medical multi-agent systems by 7.05% in Accuracy.
Epistemic diversity across language models mitigates knowledge collapse
The growing use of artificial intelligence (AI) raises concerns of knowledge collapse, i.e., a reduction to the most dominant and central set of ideas. Prior work has demonstrated single-model collapse, defined as performance decay in an AI model trained on its own output. Inspired by ecology, we ask whether AI ecosystem diversity, that is, diversity among models, can mitigate such a collapse. We build on the single-model approach but focus on ecosystems of models trained on their collective output. To study the effect of diversity on model performance, we segment the training data across language models and evaluate the resulting ecosystems over ten, self-training iterations. We find that increased epistemic diversity mitigates collapse, but, interestingly, only up to an optimal level. Our results suggest that an ecosystem containing only a few diverse models fails to express the rich mixture of the full, true distribution, resulting in rapid performance decay. Yet distributing the data across too many models reduces each model's approximation capacity on the true distribution, leading to poor performance already in the first iteration step. In the context of AI monoculture, our results suggest the need to monitor diversity across AI systems and to develop policies that incentivize more domain- and community-specific models.
comment: 16 pages, 7 figures
What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, & Emerging Implications for Agentic E-Commerce
Online marketplaces will be transformed by autonomous AI agents acting on behalf of consumers. Rather than humans browsing and clicking, AI agents can parse webpages or leverage APIs to view, evaluate and choose products. We investigate the behavior of AI agents using ACES, a provider-agnostic framework for auditing agent decision-making. We reveal that agents can exhibit choice homogeneity, often concentrating demand on a few ``modal'' products while ignoring others entirely. Yet, these preferences are unstable: model updates can drastically reshuffle market shares. Furthermore, randomized trials show that while agents have improved over time on simple tasks with a clearly identified best choice, they exhibit strong position biases -- varying across providers and model versions, and persisting even in text-only "headless" interfaces -- undermining any universal notion of a ``top'' rank. Agents also consistently penalize sponsored tags while rewarding platform endorsements, and sensitivities to price, ratings, and reviews vary sharply across models. Finally, we demonstrate that sellers can respond: a seller-side agent making simple, query-conditional description tweaks can drive significant gains in market share. These findings reveal that agentic markets are volatile and fundamentally different from human-centric commerce, highlighting the need for continuous auditing and raising questions for platform design, seller strategy and regulation.
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Large language models (LLMs) are increasingly used to evolve programs and multi-agent systems, yet most existing approaches rely on overwrite-based mutations that maintain only a single candidate at a time. Such methods discard useful variants, suffer from destructive edits, and explore a brittle search space prone to structural failure. We introduce EvoLattice, a framework that represents an entire population of candidate programs or agent behaviors within a single directed acyclic graph. Each node stores multiple persistent alternatives, and every valid path through the graph defines a distinct executable candidate, yielding a large combinatorial search space without duplicating structure. EvoLattice enables fine-grained alternative-level evaluation by scoring each alternative across all paths in which it appears, producing statistics that reveal how local design choices affect global performance. These statistics provide a dense, data-driven feedback signal for LLM-guided mutation, recombination, and pruning, while preserving successful components. Structural correctness is guaranteed by a deterministic self-repair mechanism that enforces acyclicity and dependency consistency independently of the LLM. EvoLattice naturally extends to agent evolution by interpreting alternatives as prompt fragments or sub-agent behaviors. Across program synthesis (proxy and optimizer meta-learning), EvoLattice yields more stable evolution, greater expressivity, and stronger improvement trajectories than prior LLM-guided methods. The resulting dynamics resemble quality-diversity optimization, emerging implicitly from EvoLattice's internal multi-alternative representation rather than an explicit external archive.
TrafficGamer: Reliable and Flexible Traffic Simulation for Safety-Critical Scenarios with Game-Theoretic Oracles
While modern Autonomous Vehicle (AV) systems can develop reliable driving policies under regular traffic conditions, they frequently struggle with safety-critical traffic scenarios. This difficulty primarily arises from the rarity of such scenarios in driving datasets and the complexities associated with predictive modeling of multiple vehicles. Effectively simulating safety-critical traffic situations is therefore a crucial challenge. In this paper, we introduce TrafficGamer, which facilitates game-theoretic traffic simulation by viewing common road driving as a multi-agent game. When we evaluate the empirical performance across various real-world datasets, TrafficGamer ensures both the fidelity, exploitability, and diversity of the simulated scenarios, guaranteeing that they not only statically align with real-world traffic distribution but also efficiently capture equilibria for representing safety-critical scenarios involving multiple agents compared with other methods. Additionally, the results demonstrate that TrafficGamer provides highly flexible simulations across various contexts. Specifically, we demonstrate that the generated scenarios can dynamically adapt to equilibria of varying tightness by configuring risk-sensitive constraints during optimization. We have provided a demo webpage at: https://anonymous.4open.science/api/repo/trafficgamer-demo-1EE0/file/index.html.
Parallelism Meets Adaptiveness: Scalable Documents Understanding in Multi-Agent LLM Systems AAAI 2026
Large language model (LLM) agents have shown increasing promise for collaborative task completion. However, existing multi-agent frameworks often rely on static workflows, fixed roles, and limited inter-agent communication, reducing their effectiveness in open-ended, high-complexity domains. This paper proposes a coordination framework that enables adaptiveness through three core mechanisms: dynamic task routing, bidirectional feedback, and parallel agent evaluation. The framework allows agents to reallocate tasks based on confidence and workload, exchange structured critiques to iteratively improve outputs, and crucially compete on high-ambiguity subtasks with evaluator-driven selection of the most suitable result. We instantiate these principles in a modular architecture and demonstrate substantial improvements in factual coverage, coherence, and efficiency over static and partially adaptive baselines. Our findings highlight the benefits of incorporating both adaptiveness and structured competition in multi-agent LLM systems.
comment: Accepted at AAAI 2026 Workshop on WoMAPF
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Improved High-probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has been receiving increasing interest, due to its attractive properties, such as exponentially decaying tail bounds and strong guarantees for each individual run of an algorithm. While HP guarantees are extensively studied in centralized settings, much less is understood in the decentralized, networked setup. Existing HP studies in decentralized settings impose strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise, resulting in a significant gap between assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, even for vanilla Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm. This is contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed to guarantee MSE convergence. Motivated by this observation, we revisit HP guarantees for $\mathtt{DSGD}$ in the presence of light-tailed noise. We show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing uniformly bounded gradients and other restrictive assumptions, while simultaneously achieving order-optimal rates for both non-convex and strongly convex costs. Moreover, our improved analysis yields linear speed-up in the number of users, demonstrating that $\mathtt{DSGD}$ maintains strong performance in the HP sense and matches existing MSE guarantees. Our improved results stem from a careful analysis of the MGF of quantities of interest (norm-squared of gradient or optimality gap) and the MGF of the consensus gap between users' models. To achieve linear speed-up, we provide a novel result on the variance-reduction effect of decentralized methods in the HP sense and more fine-grained bounds on the MGF for strongly convex costs, which are both of independent interest.
comment: 48 pages, 2 figures
Systems and Control (EESS)
Multi-Modal Semantic Communication
Semantic communication aims to transmit information most relevant to a task rather than raw data, offering significant gains in communication efficiency for applications such as telepresence, augmented reality, and remote sensing. Recent transformer-based approaches have used self-attention maps to identify informative regions within images, but they often struggle in complex scenes with multiple objects, where self-attention lacks explicit task guidance. To address this, we propose a novel Multi-Modal Semantic Communication framework that integrates text-based user queries to guide the information extraction process. Our proposed system employs a cross-modal attention mechanism that fuses visual features with language embeddings to produce soft relevance scores over the visual data. Based on these scores and the instantaneous channel bandwidth, we use an algorithm to transmit image patches at adaptive resolutions using independently trained encoder-decoder pairs, with total bitrate matching the channel capacity. At the receiver, the patches are reconstructed and combined to preserve task-critical information. This flexible and goal-driven design enables efficient semantic communication in complex and bandwidth-constrained environments.
Service-Oriented Fast Frequency Response from Flexible Loads and Energy Storage in Low-Inertia Power Systems
The increasing penetration of inverter-based renewable generation has significantly reduced system inertia, making modern power grids more vulnerable to rapid frequency deviations following disturbances. While a wide range of flexible resources-including electric vehicles (EVs), data centers, and battery energy storage systems (BESS)-have demonstrated the physical capability to provide fast frequency response (FFR), existing studies primarily focus on individual resource performance or controller-level designs. A systematic framework that translates heterogeneous FFR capabilities into deployable, system-level frequency services remains largely unexplored. This paper proposes a service-oriented coordination framework for fast frequency response from flexible loads and energy storage, bridging the gap between physical capability assessment and grid-operational utilization. The framework decomposes frequency support into multiple time-critical service layers based on response speed, power capacity, and energy sustainability, and dynamically allocates FFR responsibilities among heterogeneous resources accordingly. By explicitly accounting for response latency, saturation limits, and energy constraints, the proposed approach enables coordinated dispatch that prioritizes ultra-fast resources for initial frequency arrest while leveraging slower but energy-rich resources to sustain recovery.
Enhancing industrial microalgae production through Economic Model Predictive Control
The industrial production of microalgae is an important and sustainable process, but its actual competitiveness is closely related to its optimization. The biological nature of the process hinders this task, mainly due to the high nonlinearity of the process along with its changing nature, features that make its modeling, control and optimization remarkably challenging. This paper presents an economic optimization framework aiming to enhance the operation of such systems. An Economic Model Predictive Controller is proposed, centralizing the decision making and achieving the theoretical optimal operation. Different scenarios with changing climate conditions are presented, and a comparison with the typical, non-optimized industrial process operation is established. The obtained results achieve economic optimization and dynamic stability of the process, while providing some insight into the priorities during process operation at industrial level, and justifying the use of optimal controllers over traditional operation.
Operator-Theoretic Joint Estimation of Aging-Aware State of Charge and Control-Informed State of Health
Accurate estimation of a battery's state of charge and state of health is essential for safe and reliable battery management. Existing approaches often decouple these two states, lack stability guarantees, and exhibit limited generalization across operating conditions. This study introduces a unified operator-theoretic framework for aging-aware state of charge and control-informed state of health estimation. The architecture couples a Koopman-based latent dynamics model, which enables linear forecasting of nonlinear discharge-capacity evolution under varying operational conditions, with a neural operator that maps measurable intra-cycle signals to state of charge. The predicted discharge capacity is incorporated as a static correction within the neural operator pathway, yielding an age-aware state of charge estimate. Stability is ensured through spectral-radius clipping of the Koopman operator. The overall framework is trained end-to-end and evaluated on real-world lithium-ion battery datasets, demonstrating real-time capability while maintaining stable dynamics. To handle condition shifts and unseen regimes, the method integrates both zero-shot and few-shot out-of-distribution adaptation using only a limited number of cycles. Results show accurate and stable capacity forecasts, competitive state of charge trajectories on held-out cycles, and a direct, model-consistent mechanism for tracking capacity fade as a surrogate for state of health across diverse operating conditions.
comment: This paper has been submitted to IEEE for publication
Scheduling the Charge of Temporally Flexible Electric Vehicles: a Market-based Approach
The increasing electrification of human activities and the rapid integration of variable renewable energy sources strain the power grid. A solution to address the need for more grid storage is to use the battery of electric vehicles as a back-up capacity. However, drivers tend to disconnect their electric vehicle when its battery is needed the most. We propose a charge scheduler that incentivizes drivers to delay their disconnection to improve vehicle-to-grid services. We also leverage drivers' temporal flexibility to alleviate congestion in oversubscribed charging stations. We formulate the computation of an optimal flexible schedule as a mixed-integer quadratic problem. We tractably approximate its solution using the Alternating Direction Method of Multipliers. Considering the possibility that strategic drivers misreport their charging preferences to the station coordinator, we then propose a Vickrey-Clarke-Groves mechanism that incentivizes truthful reporting. We conclude with a simulated case study using real-world data to quantitatively assess the added value of drivers' temporal flexibility for enhancing vehicle-to-grid services and reducing station congestion.
Exact Learning of Linear Model Predictive Control Laws using Oblique Decision Trees with Linear Predictions
Model Predictive Control (MPC) is a powerful strategy for constrained multivariable systems but faces computational challenges in real-time deployment due to its online optimization requirements. While explicit MPC and neural network approximations mitigate this burden, they suffer from scalability issues or lack interpretability, limiting their applicability in safety-critical systems. This work introduces a data-driven framework that directly learns the Linear MPC control law from sampled state-action pairs using Oblique Decision Trees with Linear Predictions (ODT-LP), achieving both computational efficiency and interpretability. By leveraging the piecewise affine structure of Linear MPC, we prove that the Linear MPC control law can be replicated by finite-depth ODT-LP models. We develop a gradient-based training algorithm using smooth approximations of tree routing functions to learn this structure from grid-sampled Linear MPC solutions, enabling end-to-end optimization. Input-to-state stability is established under bounded approximation errors, with explicit error decomposition into learning inaccuracies and sampling errors to inform model design. Numerical experiments demonstrate that ODT-LP controllers match MPC's closed-loop performance while reducing online evaluation time by orders of magnitude compared to MPC, explicit MPC, neural network, and random forest counterparts. The transparent tree structure enables formal verification of control logic, bridging the gap between computational efficiency and certifiable reliability for safety-critical systems.
comment: 6 pages, 4 figures, accepted by and presented at the 64th IEEE Conference on Decision and Control (CDC) in December 2025
Ising Machines for Model Predictive Path Integral-Based Optimal Control
We present a sampling-based Model Predictive Control (MPC) method that implements Model Predictive Path Integral (MPPI) as an \emph{Ising machine}, suitable for novel forms of probabilistic computing. By expressing the control problem as a Quadratic Unconstrained Binary Optimization (QUBO) problem, we map MPC onto an energy landscape suitable for Gibbs sampling from an Ising model. This formulation enables efficient exploration of (near-)optimal control trajectories. We demonstrate that the approach achieves accurate trajectory tracking compared to a reference MPPI implementation, highlighting the potential of Ising-based MPPI for real-time control in robotics and autonomous systems.
QuantGraph: A Receding-Horizon Quantum Graph Solver
Dynamic programming is a cornerstone of graph-based optimization. While effective, it scales unfavorably with problem size. In this work, we present QuantGraph, a two-stage quantum-enhanced framework that casts local and global graph-optimization problems as quantum searches over discrete trajectory spaces. The solver is designed to operate efficiently by first finding a sequence of locally optimal transitions in the graph (local stage), without considering full trajectories. The accumulated cost of these transitions acts as a threshold that prunes the search space (up to 60% reduction for certain examples). The subsequent global stage, based on this threshold, refines the solution. Both stages utilize variants of the Grover-adaptive-search algorithm. To achieve scalability and robustness, we draw on principles from control theory and embed QuantGraph's global stage within a receding-horizon model-predictive-control scheme. This classical layer stabilizes and guides the quantum search, improving precision and reducing computational burden. In practice, the resulting closed-loop system exhibits robust behavior and lower overall complexity. Notably, for a fixed query budget, QuantGraph attains a 2x increase in control-discretization precision while still benefiting from Grover-search's inherent quadratic speedup compared to classical methods.
comment: P.Vaidhyanathan and A. Papatheodorou contributed equally to this work. 11 pages, 4 figures, 1 table, 2 algorithms
A Container-based Approach For Proactive Asset Administration Shell Digital Twins
In manufacturing, digital twins, realized as Asset Administration Shells (AAS), have emerged as a prevalent practice. These digital replicas, often utilized as structured repositories of asset-related data, facilitate interoperability across diverse systems. However, extant approaches treat the AAS as a static information model, lacking support for dynamic service integration and system adaptation. The existing body of literature has not yet thoroughly explored the potential for integrating executable behavior, particularly in the form of containerized services, into or from the AAS. This integration could serve to enable proactive functionality. In this paper, we propose a submodel-based architecture that introduces a structured service notion to the AAS, enabling services to dynamically interact with and adapt AAS instances at runtime. This concept is implemented through the extension of a submodel with behavioral definitions, resulting in a modular event-driven architecture capable of deploying containerized services based on embedded trigger conditions. The approach is illustrated through a case study on a 3-axis milling machine. Our contribution enables the AAS to serve not only as a passive digital representation but also as an active interface for executing added-value services.%, thereby laying the foundation for future AI-driven adaptation and system-level intelligence in digital twin environments.
Lithium-ion battery degradation: Introducing the concept of reservoirs to design for lifetime
Designing lithium-ion batteries for long service life remains a challenge, as most cells are optimized for beginning-of-life metrics such as energy density, often overlooking how design and operating conditions shape degradation. This work introduces a degradation-aware design framework built around finite, interacting reservoirs (lithium, porosity, and electrolyte) that are depleted over time by coupled degradation processes. We extend a physics-based Doyle-Fuller-Newman model to include validated mechanisms such as SEI growth, lithium plating, cracking, and solvent dry-out, and simulate how small design changes impact lifetime. Across more than 1,000 cycles, we find that increasing electrolyte volume by just 1% or porosity by 5% can extend service life by over 30% without significantly affecting cell energy density. However, lithium excess, while boosting initial capacity, can accelerate failure if not supported by sufficient structural or ionic buffers. Importantly, we show that interaction between reservoirs is crucial to optimal design: multi-reservoir tuning yields either synergistic benefits or compound failures, depending on operating conditions. We also quantify how C-rate and operating temperature influence degradation pathways, emphasizing the need for co-optimized design and usage profiles. By reframing degradation as a problem of managing finite internal reservoirs, this work offers a predictive and mechanistic foundation for designing lithium-ion batteries that balance energy, durability, and application-specific needs.
comment: 20 Pages, 5 figures
Remotely Detectable Robot Policy Watermarking
The success of machine learning for real-world robotic systems has created a new form of intellectual property: the trained policy. This raises a critical need for novel methods that verify ownership and detect unauthorized, possibly unsafe misuse. While watermarking is established in other domains, physical policies present a unique challenge: remote detection. Existing methods assume access to the robot's internal state, but auditors are often limited to external observations (e.g., video footage). This ``Physical Observation Gap'' means the watermark must be detected from signals that are noisy, asynchronous, and filtered by unknown system dynamics. We formalize this challenge using the concept of a \textit{glimpse sequence}, and introduce Colored Noise Coherency (CoNoCo), the first watermarking strategy designed for remote detection. CoNoCo embeds a spectral signal into the robot's motions by leveraging the policy's inherent stochasticity. To show it does not degrade performance, we prove CoNoCo preserves the marginal action distribution. Our experiments demonstrate strong, robust detection across various remote modalities, including motion capture and side-way/top-down video footage, in both simulated and real-world robot experiments. This work provides a necessary step toward protecting intellectual property in robotics, offering the first method for validating the provenance of physical policies non-invasively, using purely remote observations.
Consensus-based formation of a swarm of quadrotors interacting over ring digraphs
This work proposes a cooperative strategy for a group of quadrotors interacting over ring digraphs with macro-vertices of size two. Consensus for a group of general double integrators has been initially investigated, and it has been proved that through a suitable choice of a single controller parameter, consensus and stability of the resulting networked dynamical system can be ensured. This further opens up the possibility of achieving a desired formation and to move a swarm of quadrotors, interacting over ring digraphs, at a desired flight velocity, using a single controller gain. An analysis of achievable velocities is performed. Examples have been provided to offer deeper insights into the obtained analytical results. Simulation studies clearly demonstrate that a desired formation is achieved, starting from arbitrary initial positions, while also ensuring convergence to a final desired flight velocity.
Remote Magnetic Levitation Using Reduced Attitude Control and Parametric Field Models
Electromagnetic navigation systems (eMNS) are increasingly used in minimally invasive procedures such as endovascular interventions and targeted drug delivery due to their ability to generate fast and precise magnetic fields. In this paper, we utilize the OctoMag eMNS to achieve remote levitation and control of a rigid body across large air gaps which showcases the dynamic capabilities of clinical eMNS. A compact parametric analytical model maps coil currents to the forces and torques acting on the levitating object, eliminating the need for computationally expensive simulations or lookup tables and leading to a levitator agnostic modeling approach. Translational motion is stabilized using linear quadratic regulators. A nonlinear time-invariant controller is used to regulate the reduced attitude accounting for the inherent uncontrollability of rotations about the dipole axis and stabilizing the full five degrees of freedom controllable pose subspace. We analyze key design limitations and evaluate the approach through trajectory tracking experiments. This work demonstrates the dynamic capabilities and potential of feedback control in electromagnetic navigation, which is likely to open up new medical applications.
Historical Information Accelerates Decentralized Optimization: A Proximal Bundle Method
Historical information, such as past function values or gradients, has significant potential to enhance decentralized optimization methods for two key reasons: first, it provides richer information about the objective function, which also explains its established success in centralized optimization; second, unlike the second-order derivative or its alternatives, historical information has already been computed or communicated and requires no additional cost to acquire. Despite this potential, it remains underexploited. In this work, we employ a proximal bundle framework to incorporate the function values and gradients at historical iterates and adapt the framework to the proximal decentralized gradient descent method, resulting in a Decentralized Proximal Bundle Method (DPBM). To broaden its applicability, we further extend DPBM to the asynchronous and stochastic setting. We theoretically analysed the convergence of the proposed methods. Notably, both the asynchronous DPBM and its stochastic variant can converge with fixed step-sizes that are independent of delays, which is superior to the delay-dependent step-sizes required by most existing asynchronous optimization methods, as it is easier to determine and often leads to faster convergence. Numerical experiments on classification problems demonstrate that by using historical information, our methods yield faster convergence and stronger robustness in the step-sizes.
Intertemporal Hedging Demand under Epstein-Zin Preferences in a Multi-Asset Long-Run Risk Model: Evidence from Projected Pontryagin-Guided Deep Policy Optimization
I study intertemporal hedging demand in a continuous-time multi-asset long-run risk (LRR) model under Epstein--Zin (EZ) recursive preferences. The investor trades a risk-free asset and several risky assets whose drifts and volatilities depend on an Ornstein--Uhlenbeck type LRR factor. Preferences are described by EZ utility with risk aversion $R$, elasticity of intertemporal substitution $ψ$, and discount rate $δ$, so that the standard time-additive CRRA case appears as a limiting benchmark. To handle the high-dimensional consumption--investment problem, I use a projected Pontryagin-guided deep policy optimization (P-PGDPO) scheme adapted to EZ preferences. The method starts from the continuous-time Hamiltonian implied by the Pontryagin maximum principle, represents the value and costate processes with neural networks, and updates the policy along the Hamiltonian gradient. Portfolio constraints and a lower bound on wealth are enforced by explicit projection operators rather than by adding ad hoc penalties. Three main findings emerge from numerical experiments in a five-asset LRR economy: \textbf{(1)} the P-PGDPO algorithm achieves stable convergence across multiple random seeds, validating its reliability for solving high-dimensional EZ problems; \textbf{(2)} wealth floors materially reduce hedging demand by limiting the investor's ability to exploit intertemporal risk-return tradeoffs; and \textbf{(3)} the learned hedging portfolios concentrate exposure in assets with high correlation to the LRR factor, confirming that EZ agents actively hedge long-run uncertainty rather than merely following myopic rules. Because EZ preferences nest time-additive CRRA in the limit $ψ\to 1/R$, I use CRRA as an explicit diagnostic benchmark and, when needed, a warm start to stabilize training in high dimensions.
EMFusion: Conditional Diffusion Framework for Trustworthy Frequency Selective EMF Forecasting in Wireless Networks
The rapid growth in wireless infrastructure has increased the need to accurately estimate and forecast electromagnetic field (EMF) levels to ensure ongoing compliance, assess potential health impacts, and support efficient network planning. While existing studies rely on univariate forecasting of wideband aggregate EMF data, frequency-selective multivariate forecasting is needed to capture the inter-operator and inter-frequency variations essential for proactive network planning. To this end, this paper introduces EMFusion, a conditional multivariate diffusion-based probabilistic forecasting framework that integrates diverse contextual factors (e.g., time of day, season, and holidays) while providing explicit uncertainty estimates. The proposed architecture features a residual U-Net backbone enhanced by a cross-attention mechanism that dynamically integrates external conditions to guide the generation process. Furthermore, EMFusion integrates an imputation-based sampling strategy that treats forecasting as a structural inpainting task, ensuring temporal coherence even with irregular measurements. Unlike standard point forecasters, EMFusion generates calibrated probabilistic prediction intervals directly from the learned conditional distribution, providing explicit uncertainty quantification essential for trustworthy decision-making. Numerical experiments conducted on frequency-selective EMF datasets demonstrate that EMFusion with the contextual information of working hours outperforms the baseline models with or without conditions. The EMFusion outperforms the best baseline by 23.85% in continuous ranked probability score (CRPS), 13.93% in normalized root mean square error, and reduces prediction CRPS error by 22.47%.
comment: Submission for possible publication
Lyapunov-based Adaptive Transformer (LyAT) for Control of Stochastic Nonlinear Systems
This paper presents a novel Lyapunov-based Adaptive Transformer (LyAT) controller for stochastic nonlinear systems. While transformers have shown promise in various control applications due to sequential modeling through self-attention mechanisms, they have not been used within adaptive control architectures that provide stability guarantees. Existing transformer-based approaches for control rely on offline training with fixed weights, resulting in open-loop implementations that lack real-time adaptation capabilities and stability assurances. To address these limitations, a continuous LyAT controller is developed that adaptively estimates drift and diffusion uncertainties in stochastic dynamical systems without requiring offline pre-training. A key innovation is the analytically derived adaptation law constructed from a Lyapunov-based stability analysis, which enables real-time weight updates while guaranteeing probabilistic uniform ultimate boundedness of tracking and parameter estimation errors. Experimental validation on a quadrotor demonstrates the performance of the developed controller.
A Comprehensive Benchmark Platform for Process Control Research of Outdoor Microalgae Raceway Reactors
This paper presents a benchmarking framework to evaluate process control strategies in outdoor microalgae raceway reactors, integrating four key control regulation tasks: pH, dissolved oxygen (DO), culture volume through coordinated harvest-dilution actions, and temperature via a sump-mounted spiral heat exchanger. The benchmark is built upon a high-fidelity, experimentally calibrated dynamic model that captures the strongly coupled thermal, physicochemical, and biological processes governing industrial-scale open raceway ponds. A closed-loop simulation environment is provided, featuring realistic actuator constraints, gas transport delays, stiff integration, and a fully specified scenario based on multi-day outdoor disturbances (irradiance, temperature, wind, and humidity). Four user-replaceable controllers define the manipulation of CO2 injection, air bubbling, harvest/dilution sequencing, and heat-exchanger operation. The platform computes a unified global performance index, in addition to individual metrics for each control problem, combining tracking error, gas and energy usage, and biomass productivity, enabling consistent and quantitative comparison of alternative control strategies. Baseline regulatory architectures (On/Off, PI/PID, and Economic Model Predictive Control (EMPC)) are included to illustrate the benchmark use for classical and advanced control methods. By providing an openly specified, reproducible, and computationally tractable benchmark with well-defined function interfaces, this work aims to bridge control methodology and outdoor algal bioprocess engineering, and to support the development of multivariable control strategies for disturbance-rich environmental systems.
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory
This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a continuously online, event driven dynamical control system: a two sided marketplace operating directly on grid physics. Existing energy only, capacity augmented, and zonal market designs are shown to admit no shock robust Nash equilibrium under realistic uncertainty, instead relying on price caps, uplift, and regulatory intervention to preserve solvency and security. In response, the thesis develops a holarchic Automatic Market Maker (AMM) in which prices are bounded, exogenous control signals derived from physical tightness rather than emergent equilibrium outcomes. The AMM generalises nodal and zonal pricing through nested scarcity layers, from node to cluster to zone to region to system, such that participant facing prices inherit from the tightest binding constraint. Nodal and zonal pricing therefore emerge as special cases of a unified scarcity propagation rule. Beyond pricing, the AMM functions as a scarcity aware control system and a digitally enforceable rulebook for fair access and proportional allocation under shortage. Fuel costs are recovered through pay as bid energy dispatch consistent with merit order, while non fuel operating and capital costs are allocated according to adequacy, flexibility, and locational contribution. Large scale simulations demonstrate bounded input bounded output stability, controllable procurement costs, zero structural waste, and improved distributional outcomes. The architecture is climate aligned and policy configurable, but requires a managed transition and new operational tools for system operators and market participants.
comment: PhD thesis
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports
We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filter does not require any knowledge of the system dynamics for both synthesis and runtime safety monitoring and intervention, our method applies readily to complex, black-box shared autonomy systems. Notably, our HCSF's CBF-based interventions modify the human's actions minimally and smoothly, avoiding the abrupt, last-moment corrections delivered by many conventional safety filters. We validate our approach in a comprehensive in-person user study using Assetto Corsa-a high-fidelity car racing simulator with black-box dynamics-to assess robustness in "driving on the edge" scenarios. We compare both trajectory data and drivers' perceptions of our HCSF assistance against unassisted driving and a conventional safety filter. Experimental results show that 1) compared to having no assistance, our HCSF improves both safety and user satisfaction without compromising human agency or comfort, and 2) relative to a conventional safety filter, our proposed HCSF boosts human agency, comfort, and satisfaction while maintaining robustness.
comment: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16 figures, 7 tables Updates for v4: typos in Appendix Subsection A revised
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography
Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel linear descriptors-based block field method (MLD-BFM) for continuous decoding of five finger-joint DoFs by leveraging the rich spatial information of high-density surface electromyography (HD sEMG). Twenty-one healthy participants performed dynamic sinusoidal finger movements while HD sEMG signals were recorded from the extensor digitorum communis (EDC) and flexor digitorum superficialis (FDS) muscles. MLD-BFM extracted region-specific spatial features, including effective field strength ($Σ$), field-strength variation rate ($Φ$), and spatial complexity ($Ω$). Model performance was optimized (block size: $2 \times 2$; window: 0.15 s) and compared with conventional time-domain features and dimensionality reduction approaches when applied to multi-output regression models. MLD-BFM consistently achieved the highest $\mathrm{R}^2_{\mathrm{vw}}$ values across all models. The multilayer perceptron (MLP) combined with MLD-BFM yielded the best performance ($\mathrm{R}^2_{\mathrm{vw}} = 86.68\% \pm 0.33$). Time-domain features also showed strong predictive capability and were statistically comparable to MLD-BFM in some models, whereas dimensionality reduction techniques exhibited lower accuracy. Decoding accuracy was higher for the middle and ring fingers than for the thumb. Overall, MLD-BFM improved continuous finger movement decoding accuracy, underscoring the importance of taking advantage of the spatial richness of HD sEMG. These findings suggest that spatially structured features enhance SPC and provide practical guidance for designing robust, real-time, and responsive myoelectric interfaces.
comment: 39 pages, 13 figures, 2 tables
Online convex optimization for robust control of constrained dynamical systems
This article investigates the problem of controlling linear time-invariant systems subject to time-varying and a priori unknown cost functions, state and input constraints, and exogenous disturbances. We combine the online convex optimization framework with tools from robust model predictive control to propose an algorithm that is able to guarantee robust constraint satisfaction. The performance of the closed loop emerging from application of our framework is studied in terms of its dynamic regret, which is proven to be bounded linearly by the variation of the cost functions and the magnitude of the disturbances. We corroborate our theoretical findings and illustrate implementational aspects of the proposed algorithm by a numerical case study on a tracking control problem of an autonomous vehicle.
comment: 16 pages
Missing Money and Market-Based Adequacy in Deeply Decarbonized Power Systems with Long-Duration Energy Storage
The ability of deeply decarbonised power systems to ensure adequacy may increasingly depend on long-duration energy storage (LDES). A central challenge is whether capacity markets (CMs), originally designed around thermal generation, can provide efficient investment signals when storage becomes a central participant. While recent studies have advanced methods for accrediting variable renewables and short-duration storage, the effectiveness of these methods in CMs with substantial LDES penetration remains largely unexplored. To address this gap, we extend a two-stage stochastic equilibrium investment model by endogenising continuous, duration-based capacity accreditation for storage and apply it to a Great Britain-based case using 40 years of weather-driven demand and renewable profiles under varying emission limits. Results show that well-calibrated CMs can sustain near-efficient investment and mitigate revenue volatility, but their effectiveness diminishes in deeply decarbonized systems, underscoring both their potential and the regulatory challenges of supporting large-scale LDES.
Accelerated Decentralized Constraint-Coupled Optimization: A Dual$^2$ Approach
In this paper, we focus on a class of decentralized constraint-coupled optimization problem: $\min_{x_i \in \mathbb{R}^{d_i}, i \in \mathcal{I}; y \in \mathbb{R}^p}$ $\sum_{i=1}^n\left(f_i(x_i) + g_i(x_i)\right) + h(y) \ \text{s.t.} \ \sum_{i=1}^{n}A_ix_i = y$, over an undirected and connected network of $n$ agents. Here, $f_i$, $g_i$, and $A_i$ represent private information of agent $i \in \mathcal{I} = \{1, \cdots, n\}$, while $h$ is public for all agents. Building on a novel dual$^2$ approach, we develop two accelerated algorithms to solve this problem: the inexact Dual$^2$ Accelerated (iD2A) gradient method and the Multi-consensus inexact Dual$^2$ Accelerated (MiD2A) gradient method. We demonstrate that both iD2A and MiD2A can guarantee asymptotic convergence under a milder condition on $h$ compared to existing algorithms. Furthermore, under additional assumptions, we establish linear convergence rates and derive significantly lower communication and computational complexity bounds than those of existing algorithms. Several numerical experiments validate our theoretical analysis and demonstrate the practical superiority of the proposed algorithms.
Differentially Private Gradient-Tracking-Based Distributed Stochastic Optimization over Directed Graphs
This paper proposes a differentially private gradient-tracking-based distributed stochastic optimization algorithm over directed graphs. In particular, privacy noises are incorporated into each agent's state and tracking variable to mitigate information leakage, after which the perturbed states and tracking variables are transmitted to neighbors. We design two novel schemes for the step-sizes and the sampling number within the algorithm. The sampling parameter-controlled subsampling method employed by both schemes enhances the differential privacy level, and ensures a finite cumulative privacy budget even over infinite iterations. The algorithm achieves both almost sure and mean square convergence for nonconvex objectives. Furthermore, when nonconvex objectives satisfy the Polyak-Lojasiewicz (PL) condition, Scheme (S1) achieves a polynomial mean square convergence rate, and Scheme (S2) achieves an exponential mean square convergence rate. The trade-off between privacy and convergence is presented. The effectiveness of the algorithm and its superior performance compared to existing works are illustrated through numerical examples of distributed training on the benchmark datasets "MNIST" and "CIFAR-10".
Observation Compression in Rate-Limited Closed-Loop Distributed ISAC Systems: From Signal Reconstruction to Control
In closed-loop distributed multi-sensor integrated sensing and communication (ISAC) systems, performance often hinges on transmitting high-dimensional sensor observations over rate-limited networks. In this paper, we first present a general framework for rate-limited closed-loop distributed ISAC systems, and then propose an autoencoder-based observation compression method to overcome the constraints imposed by limited transmission capacity. Building on this framework, we conduct a case study using a closed-loop linear quadratic regulator (LQR) system to analyze how the interplay among observation, compression, and state dimensions affects reconstruction accuracy, state estimation error, and control performance. In multi-sensor scenarios, our results further show that optimal resource allocation initially prioritizes low-noise sensors until the compression becomes lossless, after which resources are reallocated to high-noise sensors.
comment: 6 pages, 15 figures. This work has been accepted by Globecom Workshop 2025
Large Language Model-Based Intelligent Antenna Design System
Antenna simulation typically involves modeling and optimization, which are time-consuming and labor-intensive, slowing down antenna analysis and design. This paper presents a prototype of a large language model (LLM)-based antenna design system (LADS) to assist in antenna simulation. LADS generates antenna models with textual descriptions and images extracted from academic papers, patents, and technical reports (either one or multiple), and it interacts with engineers to iteratively refine the designs. After that, LADS configures and runs an optimizer to meet the design specifications. The effectiveness of LADS is demonstrated by a monopole slotted antenna generated from images and descriptions from the literature. To improve gain stability across the 3.1-10.6 GHz ultra-wide band, LADS modifies the cross-slot into an H-slot and changes substrate material, followed by parameter optimization. As a result, the gain variation is reduced while maintaining the same gain level. The LLM-enabled antenna modeling (LEAM) is available at: https://github.com/TaoWu974/LEAM.
comment: Code are available: https://github.com/TaoWu974/LEAM. Accepted by and will be presented in EuCAP 2026, Dublin
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: This version contains a methodological inconsistency in the performance evaluation formulation described in Section 3 (Methodology). Specifically, the definition of the temporal aggregation of player performance was incomplete, leading to an ambiguity between cumulative and normalized metrics in the intra
Improving Action Smoothness for a Cascaded Online Learning Flight Control System
This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.
Exploring the Influence of Residential Electric Vehicle Charging on Distribution System Hosting Capacity -- A Case-Study in Arizona
The installation of high-capacity fast chargers for electric vehicles (EVs) is posing a significant risk to the distribution grid as the increased demand from widespread residential EV charging could exceed the technical limits of the distribution system. Addressing this issue is critical, given that current infrastructure upgrades to enhance EV hosting capacity are both costly and time-consuming. Moreover, the inherent uncertainties associated with EV charging parameters make it challenging for power utilities to accurately assess the impact of EVs added to specific locations. To address these knowledge gaps, this study (a) introduces an algorithm to coordinate residential EV charging, and (b) proposes a comprehensive framework that evaluates all transformers within a feeder. The proposed method is applied to a real-world feeder, which includes 120 transformers of varying capacities. The results demonstrate that this approach effectively manages a substantial number of EVs without overloading any of the transformers, while also pinpointing locations that must be prioritized for future upgrades. This framework can serve as a valuable reference for utilities when conducting distribution system evaluations for supporting the growing EV penetration.
Robotics
CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives SP
We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction relies on data-driven priors and joint optimization with no physics in the loop, or recovers noisy geometry with artifacts that cause motion tracking policies with scene interactions to fail. In contrast, our key insight is to recover convex, clean, and simulation-ready geometry by fitting planar primitives to a point cloud reconstruction of the scene, via a simple clustering pipeline over depth, normals, and flow. To reconstruct scene geometry that might be occluded during interactions, we make use of human-scene contact modeling (e.g., we use human posture to reconstruct the occluded seat of a chair). Finally, we ensure that human and scene reconstructions are physically-plausible by using them to drive a humanoid controller via reinforcement learning. Our approach reduces motion tracking failure rates from 55.2\% to 6.9\% on human-centric video benchmarks (EMDB, PROX), while delivering a 43\% faster RL simulation throughput. We further validate it on in-the-wild videos including casually-captured videos, Internet videos, and even Sora-generated videos. This demonstrates CRISP's ability to generate physically-valid human motion and interaction environments at scale, greatly advancing real-to-sim applications for robotics and AR/VR.
comment: Project page: https://crisp-real2sim.github.io/CRISP-Real2Sim/
CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation
Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoid robot to perform forceful manipulation tasks such as moving objects, wiping, and pushing a cart. We propose adaptive Compliance Humanoid control through hIsight Perturbation (CHIP), a plug-and-play module that enables controllable end-effector stiffness while preserving agile tracking of dynamic reference motions. CHIP is easy to implement and requires neither data augmentation nor additional reward tuning. We show that a generalist motion-tracking controller trained with CHIP can perform a diverse set of forceful manipulation tasks that require different end-effector compliance, such as multi-robot collaboration, wiping, box delivery, and door opening.
comment: The first two authors contributed equally. Project page: https://nvlabs.github.io/CHIP/
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models
Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which is akin to how humans master skills through practice. Vision-Language-Action (VLA) models have advanced robotic manipulation by leveraging large language models, yet remain fundamentally limited by Supervised Finetuning (SFT): requiring hundreds of demonstrations per task, rigidly memorizing trajectories, and failing to adapt when deployment conditions deviate from training. We introduce EVOLVE-VLA, a test-time training framework enabling VLAs to continuously adapt through environment interaction with minimal or zero task-specific demonstrations. The key technical challenge is replacing oracle reward signals (unavailable at test time) with autonomous feedback. We address this through a learned progress estimator providing dense feedback, and critically, we design our framework to ``tame'' this inherently noisy signal via two mechanisms: (1) an accumulative progress estimation mechanism smoothing noisy point-wise estimates, and (2) a progressive horizon extension strategy enabling gradual policy evolution. EVOLVE-VLA achieves substantial gains: +8.6\% on long-horizon tasks, +22.0\% in 1-shot learning, and enables cross-task generalization -- achieving 20.8\% success on unseen tasks without task-specific demonstrations training (vs. 0\% for pure SFT). Qualitative analysis reveals emergent capabilities absent in demonstrations, including error recovery and novel strategies. This work represents a critical step toward VLAs that truly learn and adapt, moving beyond static imitation toward continuous self-improvements.
comment: 15 pages
Nonlinear System Identification Nano-drone Benchmark
We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Affordance prediction, which identifies interaction regions on objects based on language instructions, is critical for embodied AI. Prevailing end-to-end models couple high-level reasoning and low-level grounding into a single monolithic pipeline and rely on training over annotated datasets, which leads to poor generalization on novel objects and unseen environments. In this paper, we move beyond this paradigm by proposing A4-Agent, a training-free agentic framework that decouples affordance prediction into a three-stage pipeline. Our framework coordinates specialized foundation models at test time: (1) a $\textbf{Dreamer}$ that employs generative models to visualize $\textit{how}$ an interaction would look; (2) a $\textbf{Thinker}$ that utilizes large vision-language models to decide $\textit{what}$ object part to interact with; and (3) a $\textbf{Spotter}$ that orchestrates vision foundation models to precisely locate $\textit{where}$ the interaction area is. By leveraging the complementary strengths of pre-trained models without any task-specific fine-tuning, our zero-shot framework significantly outperforms state-of-the-art supervised methods across multiple benchmarks and demonstrates robust generalization to real-world settings.
Geometric Parameter Optimization of a Novel 3-(PP(2-(UPS))) Redundant Parallel Mechanism based on Workspace Determination
Redundant parallel robots are normally employed in scenarios requiring good precision, high load capability, and large workspace compared to traditional parallel mechanisms. However, the elementary robotic configuration and geometric parameter optimization are still quite challenging. This paper proposes a novel 3-(PP(2-(UPS))) redundant parallel mechanism, with good generalizability first, and further investigates the kinematic optimization issue by analyzing and investigating how its key geometric parameters influence the volume, shape, boundary completeness, and orientation capabilities of its workspace. The torsional capability index TI_1 and tilting capability index TI_2 are defined to evaluate the orientation performance of the mechanism. Numerical simulation studies are completed to indicate the analysis, providing reasonable but essential references for the parameter optimization of 3-(PP(2-(UPS))) and other similar redundant parallel mechanisms.
comment: 7 pages, 5 figures
Odyssey: An Automotive Lidar-Inertial Odometry Dataset for GNSS-denied situations
The development and evaluation of Lidar-Inertial Odometry (LIO) and Simultaneous Localization and Mapping (SLAM) systems requires a precise ground truth. The Global Navigation Satellite System (GNSS) is often used as a foundation for this, but its signals can be unreliable in obstructed environments due to multi-path effects or loss-of-signal. While existing datasets compensate for the sporadic loss of GNSS signals by incorporating Inertial Measurement Unit (IMU) measurements, the commonly used Micro-Electro-Mechanical Systems (MEMS) or Fiber Optic Gyroscope (FOG)-based systems do not permit the prolonged study of GNSS-denied environments. To close this gap, we present Odyssey, a LIO dataset with a focus on GNSS-denied environments such as tunnels and parking garages as well as other underrepresented, yet ubiquitous situations such as stop-and-go-traffic, bumpy roads and wide open fields. Our ground truth is derived from a navigation-grade Inertial Navigation System (INS) equipped with a Ring Laser Gyroscope (RLG), offering exceptional bias stability characteristics compared to IMUs used in existing datasets and enabling the prolonged and accurate study of GNSS-denied environments. This makes Odyssey the first publicly available dataset featuring a RLG-based INS. Besides providing data for LIO, we also support other tasks, such as place recognition, through the threefold repetition of all trajectories as well as the integration of external mapping data by providing precise geodetic coordinates. All data, dataloader and other material is available online at https://odyssey.uni-goettingen.de/ .
comment: 9 pages, 4 figures, submitted to International Journal of Robotics Research (IJRR)
Quadratic Kalman Filter for Elliptical Extended Object Tracking based on Decoupling State Components
Extended object tracking involves estimating both the physical extent and kinematic parameters of a target object, where typically multiple measurements are observed per time step. In this article, we propose a deterministic closed-form elliptical extended object tracker, based on decoupling of the kinematics, orientation, and axis lengths. By disregarding potential correlations between these state components, fewer approximations are required for the individual estimators than for an overall joint solution. The resulting algorithm outperforms existing algorithms, reaching the accuracy of sampling-based procedures. Additionally, a batch-based variant is introduced, yielding highly efficient computation while outperforming all comparable state-of-the-art algorithms. This is validated both by a simulation study using common models from literature, as well as an extensive quantitative evaluation on real automotive radar data.
comment: 13 pages, 8 figures, submitted to IEEE Transactions on Aerospace and Electronic Systems
Synthetic Data Pipelines for Adaptive, Mission-Ready Militarized Humanoids
Omnia presents a synthetic data driven pipeline to accelerate the training, validation, and deployment readiness of militarized humanoids. The approach converts first-person spatial observations captured from point-of-view recordings, smart glasses, augmented reality headsets, and spatial browsing workflows into scalable, mission-specific synthetic datasets for humanoid autonomy. By generating large volumes of high-fidelity simulated scenarios and pairing them with automated labeling and model training, the pipeline enables rapid iteration on perception, navigation, and decision-making capabilities without the cost, risk, or time constraints of extensive field trials. The resulting datasets can be tuned quickly for new operational environments and threat conditions, supporting both baseline humanoid performance and advanced subsystems such as multimodal sensing, counter-detection survivability, and CBRNE-relevant reconnaissance behaviors. This work targets faster development cycles and improved robustness in complex, contested settings by exposing humanoid systems to broad scenario diversity early in the development process.
comment: 6 pages; xTech Humanoid white paper submission
A Comprehensive Safety Metric to Evaluate Perception in Autonomous Systems SC 2020
Complete perception of the environment and its correct interpretation is crucial for autonomous vehicles. Object perception is the main component of automotive surround sensing. Various metrics already exist for the evaluation of object perception. However, objects can be of different importance depending on their velocity, orientation, distance, size, or the potential damage that could be caused by a collision due to a missed detection. Thus, these additional parameters have to be considered for safety evaluation. We propose a new safety metric that incorporates all these parameters and returns a single easily interpretable safety assessment score for object perception. This new metric is evaluated with both real world and virtual data sets and compared to state of the art metrics.
comment: Accepted at IEEE ITSC 2020
CoLD Fusion: A Real-time Capable Spline-based Fusion Algorithm for Collective Lane Detection
Comprehensive environment perception is essential for autonomous vehicles to operate safely. It is crucial to detect both dynamic road users and static objects like traffic signs or lanes as these are required for safe motion planning. However, in many circumstances a complete perception of other objects or lanes is not achievable due to limited sensor ranges, occlusions, and curves. In scenarios where an accurate localization is not possible or for roads where no HD maps are available, an autonomous vehicle must rely solely on its perceived road information. Thus, extending local sensing capabilities through collective perception using vehicle-to-vehicle communication is a promising strategy that has not yet been explored for lane detection. Therefore, we propose a real-time capable approach for collective perception of lanes using a spline-based estimation of undetected road sections. We evaluate our proposed fusion algorithm in various situations and road types. We were able to achieve real-time capability and extend the perception range by up to 200%.
comment: Accepted at IEEE IV 2023
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
A Geometric Task-Space Port-Hamiltonian Formulation for Redundant Manipulators
We present a novel geometric port-Hamiltonian formulation of redundant manipulators performing a differential kinematic task $η=J(q)\dot{q}$, where $q$ is a point on the configuration manifold, $η$ is a velocity-like task space variable, and $J(q)$ is a linear map representing the task, for example the classical analytic or geometric manipulator Jacobian matrix. The proposed model emerges from a change of coordinates from canonical Hamiltonian dynamics, and splits the standard Hamiltonian momentum variable into a task-space momentum variable and a null-space momentum variable. Properties of this model and relation to Lagrangian formulations present in the literature are highlighted. Finally, we apply the proposed model in an \textit{Interconnection and Damping Assignment Passivity-Based Control} (IDA-PBC) design to stabilize and shape the impedance of a 7-DOF Emika Panda robot in simulation.
Field evaluation and optimization of a lightweight lidar-based UAV navigation system for dense boreal forest environments
The interest in the usage of uncrewed aerial vehicles (UAVs) for forest applications has increased in recent years. While above-canopy flight has reached a high level of autonomy, navigating under-canopy remains a significant challenge. The use of autonomous UAVs could reduce the burden of data collection, which has motivated the development of numerous solutions for under-canopy autonomous flight. However, the experiments conducted in the literature and their reporting lack rigor. Very rarely, the density and the difficulty of the test forests are reported, or multiple flights are flown, and the success rate of those flights is reported. The aim of this study was to implement an autonomously flying quadrotor based on a lightweight lidar using openly available algorithms and test its behavior in real forest environments. A set of rigorous experiments was conducted with a quadrotor prototype utilizing the IPC path planner and LTA-OM SLAM algorithm. Based on the results of the first 33 flights, the original system was further enhanced. With the optimized system, 60 flights were performed, resulting in a total of 93 test flights. The optimized system performed significantly better in terms of reliability and flight mission completion times, achieving success rates of 12/15 in a medium-density forest and 15/15 in a dense forest, at a target flight velocity of 1 m/s. At a target flight velocity of 2 m/s, it had a success rate of 12/15 and 5/15, respectively. Furthermore, a standardized testing setup and evaluation criteria were proposed, enabling consistent performance comparisons of autonomous under-canopy UAV systems, enhancing reproducibility, guiding system improvements, and accelerating progress in forest robotics.
comment: This work has been submitted to the IEEE for possible publication
ARCADE: Adaptive Robot Control with Online Changepoint-Aware Bayesian Dynamics Learning
Real-world robots must operate under evolving dynamics caused by changing operating conditions, external disturbances, and unmodeled effects. These may appear as gradual drifts, transient fluctuations, or abrupt shifts, demanding real-time adaptation that is robust to short-term variation yet responsive to lasting change. We propose a framework for modeling the nonlinear dynamics of robotic systems that can be updated in real time from streaming data. The method decouples representation learning from online adaptation, using latent representations learned offline to support online closed-form Bayesian updates. To handle evolving conditions, we introduce a changepoint-aware mechanism with a latent variable inferred from data likelihoods that indicates continuity or shift. When continuity is likely, evidence accumulates to refine predictions; when a shift is detected, past information is tempered to enable rapid re-learning. This maintains calibrated uncertainty and supports probabilistic reasoning about transient, gradual, or structural change. We prove that the adaptive regret of the framework grows only logarithmically in time and linearly with the number of shifts, competitive with an oracle that knows timings of shift. We validate on cartpole simulations and real quadrotor flights with swinging payloads and mid-flight drops, showing improved predictive accuracy, faster recovery, and more accurate closed-loop tracking than relevant baselines.
CaFe-TeleVision: A Coarse-to-Fine Teleoperation System with Immersive Situated Visualization for Enhanced Ergonomics
Teleoperation presents a promising paradigm for remote control and robot proprioceptive data collection. Despite recent progress, current teleoperation systems still suffer from limitations in efficiency and ergonomics, particularly in challenging scenarios. In this paper, we propose CaFe-TeleVision, a coarse-to-fine teleoperation system with immersive situated visualization for enhanced ergonomics. At its core, a coarse-to-fine control mechanism is proposed in the retargeting module to bridge workspace disparities, jointly optimizing efficiency and physical ergonomics. To stream immersive feedback with adequate visual cues for human vision systems, an on-demand situated visualization technique is integrated in the perception module, which reduces the cognitive load for multi-view processing. The system is built on a humanoid collaborative robot and validated with six challenging bimanual manipulation tasks. User study among 24 participants confirms that CaFe-TeleVision enhances ergonomics with statistical significance, indicating a lower task load and a higher user acceptance during teleoperation. Quantitative results also validate the superior performance of our system across six tasks, surpassing comparative methods by up to 28.89% in success rate and accelerating by 26.81% in completion time. Project webpage: https://clover-cuhk.github.io/cafe_television/
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation
Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While successful navigation demands both global environmental reasoning and local scene comprehension, existing UAV agents typically adopt mono-granularity frameworks that struggle to balance these two aspects. To address this limitation, this work proposes a History-Enhanced Two-Stage Transformer (HETT) framework, which integrates the two aspects through a coarse-to-fine navigation pipeline. Specifically, HETT first predicts coarse-grained target positions by fusing spatial landmarks and historical context, then refines actions via fine-grained visual analysis. In addition, a historical grid map is designed to dynamically aggregate visual features into a structured spatial memory, enhancing comprehensive scene awareness. Additionally, the CityNav dataset annotations are manually refined to enhance data quality. Experiments on the refined CityNav dataset show that HETT delivers significant performance gains, while extensive ablation studies further verify the effectiveness of each component.
DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos
Video diffusion models provide powerful real-world simulators for embodied AI but remain limited in controllability for robotic manipulation. Recent works on trajectory-conditioned video generation address this gap but often rely on 2D trajectories or single modality conditioning, which restricts their ability to produce controllable and consistent robotic demonstrations. We present DRAW2ACT, a depth-aware trajectory-conditioned video generation framework that extracts multiple orthogonal representations from the input trajectory, capturing depth, semantics, shape and motion, and injects them into the diffusion model. Moreover, we propose to jointly generate spatially aligned RGB and depth videos, leveraging cross-modality attention mechanisms and depth supervision to enhance the spatio-temporal consistency. Finally, we introduce a multimodal policy model conditioned on the generated RGB and depth sequences to regress the robot's joint angles. Experiments on Bridge V2, Berkeley Autolab, and simulation benchmarks show that DRAW2ACT achieves superior visual fidelity and consistency while yielding higher manipulation success rates compared to existing baselines.
Trajectory Tracking for Multi-Manipulator Systems in Constrained Environments
We consider the problem of cooperative manipulation by a mobile multi-manipulator system operating in obstacle-cluttered and highly constrained environments under spatio-temporal task specifications. The task requires transporting a grasped object while respecting both continuous robot dynamics and discrete geometric constraints arising from obstacles and narrow passages. To address this hybrid structure, we propose a multi-rate planning and control framework that combines offline generation of an STL-satisfying object trajectory and collision-free base footprints with online constrained inverse kinematics and continuous-time feedback control. The resulting closed-loop system enables coordinated reconfiguration of multiple manipulators while tracking the desired object motion. The approach is evaluated in high-fidelity physics simulations using three Franka Emika Panda mobile manipulators rigidly grasping an object.
SUPER -- A Framework for Sensitivity-based Uncertainty-aware Performance and Risk Assessment in Visual Inertial Odometry
While many visual odometry (VO), visual-inertial odometry (VIO), and SLAM systems achieve high accuracy, the majority of existing methods miss to assess risks at runtime. This paper presents SUPER (Sensitivity-based Uncertainty-aware PErformance and Risk assessment) that is a generic and explainable framework that propagates uncertainties via sensitivities for real-time risk assessment in VIO. The scientific novelty lies in the derivation of a real-time risk indicator that is backend-agnostic and exploits the Schur complement blocks of the Gauss-Newton normal matrix to propagate uncertainties. Practically, the Schur complement captures the sensitivity that reflects the influence of the uncertainty on the risk occurrence. Our framework estimates risks on the basis of the residual magnitudes, geometric conditioning, and short horizon temporal trends without requiring ground truth knowledge. Our framework enables to reliably predict trajectory degradation 50 frames ahead with an improvement of 20% to the baseline. In addition, SUPER initiates a stop or relocalization policy with 89.1% recall. The framework is backend agnostic and operates in real time with less than 0.2% additional CPU cost. Experiments show that SUPER provides consistent uncertainty estimates. A SLAM evaluation highlights the applicability to long horizon mapping.
Interactive Motion Planning for Human-Robot Collaboration Based on Human-Centric Configuration Space Ergonomic Field
Industrial human-robot collaboration requires motion planning that is collision-free, responsive, and ergonomically safe to reduce fatigue and musculoskeletal risk. We propose the Configuration Space Ergonomic Field (CSEF), a continuous and differentiable field over the human joint space that quantifies ergonomic quality and provides gradients for real-time ergonomics-aware planning. An efficient algorithm constructs CSEF from established metrics with joint-wise weighting and task conditioning, and we integrate it into a gradient-based planner compatible with impedance-controlled robots. In a 2-DoF benchmark, CSEF-based planning achieves higher success rates, lower ergonomic cost, and faster computation than a task-space ergonomic planner. Hardware experiments with a dual-arm robot in unimanual guidance, collaborative drilling, and bimanual cocarrying show faster ergonomic cost reduction, closer tracking to optimized joint targets, and lower muscle activation than a point-to-point baseline. CSEF-based planning method reduces average ergonomic scores by up to 10.31% for collaborative drilling tasks and 5.60% for bimanual co-carrying tasks while decreasing activation in key muscle groups, indicating practical benefits for real-world deployment.
comment: 10 pages, 9 figures
Context Representation via Action-Free Transformer encoder-decoder for Meta Reinforcement Learning
Reinforcement learning (RL) enables robots to operate in uncertain environments, but standard approaches often struggle with poor generalization to unseen tasks. Context-adaptive meta reinforcement learning addresses these limitations by conditioning on the task representation, yet they mostly rely on complete action information in the experience making task inference tightly coupled to a specific policy. This paper introduces Context Representation via Action Free Transformer encoder decoder (CRAFT), a belief model that infers task representations solely from sequences of states and rewards. By removing the dependence on actions, CRAFT decouples task inference from policy optimization, supports modular training, and leverages amortized variational inference for scalable belief updates. Built on a transformer encoder decoder with rotary positional embeddings, the model captures long range temporal dependencies and robustly encodes both parametric and non-parametric task variations. Experiments on the MetaWorld ML-10 robotic manipulation benchmark show that CRAFT achieves faster adaptation, improved generalization, and more effective exploration compared to context adaptive meta--RL baselines. These findings highlight the potential of action-free inference as a foundation for scalable RL in robotic control.
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation
Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer strong baseline performance, single-model pipelines struggle to remain robust across the extreme scale transitions that occur during descent, where helipads appear small at high altitude and large near touchdown. To address this limitation, we propose a scale-adaptive dual-expert perception framework that decomposes the detection task into far-range and close-range regimes. Two YOLOv8 experts are trained on scale-specialized versions of the HelipadCat dataset, enabling one model to excel at detecting small, low-resolution helipads and the other to provide high-precision localization when the target dominates the field of view. During inference, both experts operate in parallel, and a geometric gating mechanism selects the expert whose prediction is most consistent with the AAV's viewpoint. This adaptive routing prevents the degradation commonly observed in single-detector systems when operating across wide altitude ranges. The dual-expert perception module is evaluated in a closed-loop landing environment that integrates CARLA's photorealistic rendering with NASA's GUAM flight-dynamics engine. Results show substantial improvements in alignment stability, landing accuracy, and overall robustness compared to single-detector baselines. By introducing a scale-aware expert routing strategy tailored to the landing problem, this work advances resilient vision-based perception for autonomous descent and provides a foundation for future multi-expert AAV frameworks.
E-Navi: Environmental Adaptive Navigation for UAVs on Resource Constrained Platforms
The ability to adapt to changing environments is crucial for the autonomous navigation systems of Unmanned Aerial Vehicles (UAVs). However, existing navigation systems adopt fixed execution configurations without considering environmental dynamics based on available computing resources, e.g., with a high execution frequency and task workload. This static approach causes rigid flight strategies and excessive computations, ultimately degrading flight performance or even leading to failures in UAVs. Despite the necessity for an adaptive system, dynamically adjusting workloads remains challenging, due to difficulties in quantifying environmental complexity and modeling the relationship between environment and system configuration. Aiming at adapting to dynamic environments, this paper proposes E-Navi, an environmental-adaptive navigation system for UAVs that dynamically adjusts task executions on the CPUs in response to environmental changes based on available computational resources. Specifically, the perception-planning pipeline of UAVs navigation system is redesigned through dynamic adaptation of mapping resolution and execution frequency, driven by the quantitative environmental complexity evaluations. In addition, E-Navi supports flexible deployment across hardware platforms with varying levels of computing capability. Extensive Hardware-In-the-Loop and real-world experiments demonstrate that the proposed system significantly outperforms the baseline method across various hardware platforms, achieving up to 53.9% navigation task workload reduction, up to 63.8% flight time savings, and delivering more stable velocity control.
Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model
This study evaluates two leading approaches for teaching construction robots new skills to understand their applicability for construction automation: a Vision-Language-Action (VLA) model and Reinforcement Learning (RL) methods. The goal is to understand both task performance and the practical effort needed to deploy each approach on real jobs. The authors developed two teleoperation interfaces to control the robots and collect the demonstrations needed, both of which proved effective for training robots for long-horizon and dexterous tasks. In addition, the authors conduct a three-stage evaluation. First, the authors compare a Multi-Layer Perceptron (MLP) policy with a Deep Q-network (DQN) imitation model to identify the stronger RL baseline, focusing on model performance, generalization, and a pick-up experiment. Second, three different VLA models are trained in two different scenarios and compared with each other. Third, the authors benchmark the selected RL baseline against the VLA model using computational and sample-efficiency measures and then a robot experiment on a multi-stage panel installation task that includes transport and installation. The VLA model demonstrates strong generalization and few-shot capability, achieving 60% and 100% success in the pickup phase. In comparison, DQN can be made robust but needs additional noise during tuning, which increases the workload. Overall, the findings indicate that VLA offers practical advantages for changing tasks by reducing programming effort and enabling useful performance with minimal data, while DQN provides a viable baseline when sufficient tuning effort is acceptable.
CLAIM: Camera-LiDAR Alignment with Intensity and Monodepth IROS 2025
In this paper, we unleash the potential of the powerful monodepth model in camera-LiDAR calibration and propose CLAIM, a novel method of aligning data from the camera and LiDAR. Given the initial guess and pairs of images and LiDAR point clouds, CLAIM utilizes a coarse-to-fine searching method to find the optimal transformation minimizing a patched Pearson correlation-based structure loss and a mutual information-based texture loss. These two losses serve as good metrics for camera-LiDAR alignment results and require no complicated steps of data processing, feature extraction, or feature matching like most methods, rendering our method simple and adaptive to most scenes. We validate CLAIM on public KITTI, Waymo, and MIAS-LCEC datasets, and the experimental results demonstrate its superior performance compared with the state-of-the-art methods. The code is available at https://github.com/Tompson11/claim.
comment: Accepted by IROS 2025
Impact of Robot Facial-Audio Expressions on Human Robot Trust Dynamics and Trust Repair
Despite recent advances in robotics and human-robot collaboration in the AEC industry, trust has mostly been treated as a static factor, with little guidance on how it changes across events during collaboration. This paper investigates how a robot's task performance and its expressive responses after outcomes shape the dynamics of human trust over time. To this end, we designed a controlled within-subjects study with two construction-inspired tasks, Material Delivery (physical assistance) and Information Gathering (perceptual assistance), and measured trust repeatedly (four times per task) using the 14-item Trust Perception Scale for HRI plus a redelegation choice. The robot produced two multimodal expressions, a "glad" display with a brief confirmation after success, and a "sad" display with an apology and a request for a second chance after failure. The study was conducted in a lab environment with 30 participants and a quadruped platform, and we evaluated trust dynamics and repair across both tasks. Results show that robot success reliably increases trust, failure causes sharp drops, and apology-based expressions partially restores trust (44% recovery in Material Delivery; 38% in Information Gathering). Item-level analysis indicates that recovered trust was driven mostly by interaction and communication factors, with competence recovering partially and autonomy aspects changing least. Additionally, age group and prior attitudes moderated trust dynamics with younger participants showed larger but shorter-lived changes, mid-20s participants exhibited the most durable repair, and older participants showed most conservative dynamics. This work provides a foundation for future efforts that adapt repair strategies to task demands and user profiles to support safe, productive adoption of robots on construction sites.
Autonomous Construction-Site Safety Inspection Using Mobile Robots: A Multilayer VLM-LLM Pipeline
Construction safety inspection remains mostly manual, and automated approaches still rely on task-specific datasets that are hard to maintain in fast-changing construction environments due to frequent retraining. Meanwhile, field inspection with robots still depends on human teleoperation and manual reporting, which are labor-intensive. This paper aims to connect what a robot sees during autonomous navigation to the safety rules that are common in construction sites, automatically generating a safety inspection report. To this end, we proposed a multi-layer framework with two main modules: robotics and AI. On the robotics side, SLAM and autonomous navigation provide repeatable coverage and targeted revisits via waypoints. On AI side, a Vision Language Model (VLM)-based layer produces scene descriptions; a retrieval component powered grounds those descriptions in OSHA and site policies; Another VLM-based layer assesses the safety situation based on rules; and finally Large Language Model (LLM) layer generates safety reports based on previous outputs. The framework is validated with a proof-of-concept implementation and evaluated in a lab environment that simulates common hazards across three scenarios. Results show high recall with competitive precision compared to state-of-the-art closed-source models. This paper contributes a transparent, generalizable pipeline that moves beyond black-box models by exposing intermediate artifacts from each layer and keeping the human in the loop. This work provides a foundation for future extensions to additional tasks and settings within and beyond construction context.
Breathe with Me: Synchronizing Biosignals for User Embodiment in Robots
Embodiment of users within robotic systems has been explored in human-robot interaction, most often in telepresence and teleoperation. In these applications, synchronized visuomotor feedback can evoke a sense of body ownership and agency, contributing to the experience of embodiment. We extend this work by employing embreathment, the representation of the user's own breath in real time, as a means for enhancing user embodiment experience in robots. In a within-subjects experiment, participants controlled a robotic arm, while its movements were either synchronized or non-synchronized with their own breath. Synchrony was shown to significantly increase body ownership, and was preferred by most participants. We propose the representation of physiological signals as a novel interoceptive pathway for human-robot interaction, and discuss implications for telepresence, prosthetics, collaboration with robots, and shared autonomy.
comment: Accepted to appear in the ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), Edinburgh, United Kingdom. Iddo Yehoshua Wald and Amber Maimon contributed equally
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of "Long-Tail" training data. While fleets collect petabytes of video logs, identifying rare safety-critical events (e.g., erratic jaywalking, construction diversions) remains a manual, cost-prohibitive process. Existing solutions rely on coarse metadata search, which lacks precision, or cloud-based VLMs, which are privacy-invasive and expensive. We introduce Semantic-Drive, a local-first, neuro-symbolic framework for semantic data mining. Our approach decouples perception into two stages: (1) Symbolic Grounding via a real-time open-vocabulary detector (YOLOE) to anchor attention, and (2) Cognitive Analysis via a Reasoning VLM that performs forensic scene analysis. To mitigate hallucination, we implement a "System 2" inference-time alignment strategy, utilizing a multi-model "Judge-Scout" consensus mechanism. Benchmarked on the nuScenes dataset against the Waymo Open Dataset (WOD-E2E) taxonomy, Semantic-Drive achieves a Recall of 0.966 (vs. 0.475 for CLIP) and reduces Risk Assessment Error by 40% ccompared to the best single scout models. The system runs entirely on consumer hardware (NVIDIA RTX 3090), offering a privacy-preserving alternative to the cloud.
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan.
Data-fused MPC with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: This paper has been accepted for publication in IEEE Control Systems Letters (L-CSS)
Mirror Skin: In Situ Visualization of Robot Touch Intent on Robotic Skin
Effective communication of robotic touch intent is a key factor in promoting safe and predictable physical human-robot interaction (pHRI). While intent communication has been widely studied, existing approaches lack the spatial specificity and semantic depth necessary to convey robot touch actions. We present Mirror Skin, a cephalopod-inspired concept that utilizes high-resolution, mirror-like visual feedback on robotic skin. By mapping in-situ visual representations of a human's body parts onto the corresponding robot's touch region, Mirror Skin communicates who shall initiate touch, where it will occur, and when it is imminent. To inform the design of Mirror Skin, we conducted a structured design exploration with experts in virtual reality (VR), iteratively refining six key dimensions. A subsequent controlled user study demonstrated that Mirror Skin significantly enhances accuracy and reduces response times for interpreting touch intent. These findings highlight the potential of visual feedback on robotic skin to communicate human-robot touch interactions.
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. Using the lightweight Qwen-0.5B LLM, MindDrive achieves Driving Score (DS) of 78.04 and Success Rate (SR) of 55.09% on the challenging Bench2Drive benchmark. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.
comment: 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/
Recent Advances in Multi-Agent Human Trajectory Prediction: A Comprehensive Review
With the emergence of powerful data-driven methods in human trajectory prediction (HTP), gaining a finer understanding of multi-agent interactions lies within hand's reach, with important implications in areas such as social robot navigation, autonomous navigation, and crowd modeling. This survey reviews some of the most recent advancements in deep learning-based multi-agent trajectory prediction, focusing on studies published between 2020 and 2025. We categorize the existing methods based on their architectural design, their input representations, and their overall prediction strategies, placing a particular emphasis on models evaluated using the ETH/UCY benchmark. Furthermore, we highlight key challenges and future research directions in the field of multi-agent HTP.
comment: 45 pages
MAPS$^2$: Multi-Robot Autonomous Motion Planning under Signal Temporal Logic Specifications
This article presents MAPS$^2$ : a distributed algorithm that allows multi-robot systems to deliver coupled tasks expressed as Signal Temporal Logic (STL) constraints. Classical control theoretical tools addressing STL constraints either adopt a limited fragment of the STL formula or require approximations of min/max operators, whereas works maximising robustness through optimisation-based methods often suffer from local minima, relaxing any completeness arguments due to the NP-hard nature of the problem. Endowed with probabilistic guarantees, MAPS$^2$ provides an anytime algorithm that iteratively improves the robots' trajectories. The algorithm selectively imposes spatial constraints by taking advantage of the temporal properties of the STL. The algorithm is distributed, in the sense that each robot calculates its trajectory by communicating only with its immediate neighbours as defined via a communication graph. We illustrate the efficiency of MAPS$^2$ by conducting extensive simulation and experimental studies, verifying the generation of STL satisfying trajectories.
MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion
Vision-language models enable the understanding and reasoning of complex traffic scenarios through multi-source information fusion, establishing it as a core technology for autonomous driving. However, existing vision-language models are constrained by the image understanding paradigm in 2D plane, which restricts their capability to perceive 3D spatial information and perform deep semantic fusion, resulting in suboptimal performance in complex autonomous driving environments. This study proposes MMDrive, an multimodal vision-language model framework that extends traditional image understanding to a generalized 3D scene understanding framework. MMDrive incorporates three complementary modalities, including occupancy maps, LiDAR point clouds, and textual scene descriptions. To this end, it introduces two novel components for adaptive cross-modal fusion and key information extraction. Specifically, the Text-oriented Multimodal Modulator dynamically weights the contributions of each modality based on the semantic cues in the question, guiding context-aware feature integration. The Cross-Modal Abstractor employs learnable abstract tokens to generate compact, cross-modal summaries that highlight key regions and essential semantics. Comprehensive evaluations on the DriveLM and NuScenes-QA benchmarks demonstrate that MMDrive achieves significant performance gains over existing vision-language models for autonomous driving, with a BLEU-4 score of 54.56 and METEOR of 41.78 on DriveLM, and an accuracy score of 62.7% on NuScenes-QA. MMDrive effectively breaks the traditional image-only understanding barrier, enabling robust multimodal reasoning in complex driving environments and providing a new foundation for interpretable autonomous driving scene understanding.
Decomposed Object Manipulation via Dual-Actor Policy
Object manipulation, which focuses on learning to perform tasks on similar parts across different types of objects, can be divided into an approaching stage and a manipulation stage. However, previous works often ignore this characteristic of the task and rely on a single policy to directly learn the whole process of object manipulation. To address this problem, we propose a novel Dual-Actor Policy, termed DAP, which explicitly considers different stages and leverages heterogeneous visual priors to enhance each stage. Specifically, we introduce an affordance-based actor to locate the functional part in the manipulation task, thereby improving the approaching process. Following this, we propose a motion flow-based actor to capture the movement of the component, facilitating the manipulation process. Finally, we introduce a decision maker to determine the current stage of DAP and select the corresponding actor. Moreover, existing object manipulation datasets contain few objects and lack the visual priors needed to support training. To address this, we construct a simulated dataset, the Dual-Prior Object Manipulation Dataset, which combines the two visual priors and includes seven tasks, including two challenging long-term, multi-stage tasks. Experimental results on our dataset, the RoboTwin benchmark and real-world scenarios illustrate that our method consistently outperforms the SOTA method by 5.55%, 14.7% and 10.4% on average respectively.
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration
This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While RL offers a promising paradigm, the inherent unpredictability and often uncooperative dynamics of pedestrian behavior pose substantial challenges, particularly concerning the efficiency of coordinated exploration among robots. To address this, we propose a novel coordinated-exploration multi-robot RL algorithm introducing an intrinsic motivation exploration. Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism. Moreover, this algorithm incorporates a dual-sampling mode within the centralized training and decentralized execution framework to enhance the representation of both the navigation policy and the intrinsic reward, leveraging a two-time-scale update rule to decouple parameter updates. Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance over existing state-of-the-art methods across crucial metrics. Our code and video demos are available at: https://github.com/czxhunzi/CEMRRL.
Multiagent Systems
Multi-Agent Medical Decision Consensus Matrix System: An Intelligent Collaborative Framework for Oncology MDT Consultations
Multidisciplinary team (MDT) consultations are the gold standard for cancer care decision-making, yet current practice lacks structured mechanisms for quantifying consensus and ensuring decision traceability. We introduce a Multi-Agent Medical Decision Consensus Matrix System that deploys seven specialized large language model agents, including an oncologist, a radiologist, a nurse, a psychologist, a patient advocate, a nutritionist and a rehabilitation therapist, to simulate realistic MDT workflows. The framework incorporates a mathematically grounded consensus matrix that uses Kendall's coefficient of concordance to objectively assess agreement. To further enhance treatment recommendation quality and consensus efficiency, the system integrates reinforcement learning methods, including Q-Learning, PPO and DQN. Evaluation across five medical benchmarks (MedQA, PubMedQA, DDXPlus, MedBullets and SymCat) shows substantial gains over existing approaches, achieving an average accuracy of 87.5% compared with 83.8% for the strongest baseline, a consensus achievement rate of 89.3% and a mean Kendall's W of 0.823. Expert reviewers rated the clinical appropriateness of system outputs at 8.9/10. The system guarantees full evidence traceability through mandatory citations of clinical guidelines and peer-reviewed literature, following GRADE principles. This work advances medical AI by providing structured consensus measurement, role-specialized multi-agent collaboration and evidence-based explainability to improve the quality and efficiency of clinical decision-making.
comment: 14 pages, 6 figures
Grammar Search for Multi-Agent Systems
Automatic search for Multi-Agent Systems has recently emerged as a key focus in agentic AI research. Several prior approaches have relied on LLM-based free-form search over the code space. In this work, we propose a more structured framework that explores the same space through a fixed set of simple, composable components. We show that, despite lacking the generative flexibility of LLMs during the candidate generation stage, our method outperforms prior approaches on four out of five benchmarks across two domains: mathematics and question answering. Furthermore, our method offers additional advantages, including a more cost-efficient search process and the generation of modular, interpretable multi-agent systems with simpler logic.
Single-Agent Scaling Fails Multi-Agent Intelligence: Towards Foundation Models with Native Multi-Agent Intelligence
Foundation models (FMs) are increasingly assuming the role of the ''brain'' of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence, across 41 large language models and 7 challenging benchmarks, showing that scaling single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.
Exploring Network-Knowledge Graph Duality: A Case Study in Agentic Supply Chain Risk Analysis
Large Language Models (LLMs) struggle with the complex, multi-modal, and network-native data underlying financial risk. Standard Retrieval-Augmented Generation (RAG) oversimplifies relationships, while specialist models are costly and static. We address this gap with an LLM-centric agent framework for supply chain risk analysis. Our core contribution is to exploit the inherent duality between networks and knowledge graphs (KG). We treat the supply chain network as a KG, allowing us to use structural network science principles for retrieval. A graph traverser, guided by network centrality scores, efficiently extracts the most economically salient risk paths. An agentic architecture orchestrates this graph retrieval alongside data from numerical factor tables and news streams. Crucially, it employs novel ``context shells'' -- descriptive templates that embed raw figures in natural language -- to make quantitative data fully intelligible to the LLM. This lightweight approach enables the model to generate concise, explainable, and context-rich risk narratives in real-time without costly fine-tuning or a dedicated graph database.
comment: Accepted to the 2nd Workshop on LLMs and Generative AI in Finance: International Conference on AI in Finance(ICAIF) 2025;7 pages, 3 figures
Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems
Recent advances in agentic AI have shifted the focus from standalone Large Language Models (LLMs) to integrated systems that combine LLMs with tools, memory, and other agents to perform complex tasks. These multi-agent architectures enable coordinated reasoning, planning, and execution across diverse domains, allowing agents to collaboratively automate complex workflows. Despite these advances, evaluation and assessment of LLM agents and the multi-agent systems they constitute remain a fundamental challenge. Although various approaches have been proposed in the software engineering literature for evaluating conventional software components, existing methods for AI-based systems often overlook the non-deterministic nature of models. This non-determinism introduces behavioral uncertainty during execution, yet existing evaluations rely on binary task completion metrics that fail to capture it. Evaluating agentic systems therefore requires examining additional dimensions, including the agent ability to invoke tools, ingest and retrieve memory, collaborate with other agents, and interact effectively with its environment. These challenges emerged during our ongoing industry collaboration with MontyCloud Inc., when we deployed an agentic system in production. These limitations surfaced during deployment, highlighting practical gaps in the current evaluation methods and the need for a systematic assessment of agent behavior beyond task outcomes. Informed by these observations and established definitions of agentic systems, we propose an end-to-end Agent Assessment Framework with four evaluation pillars encompassing LLMs, Memory, Tools, and Environment. We validate the framework on a representative Autonomous CloudOps use case, where experiments reveal behavioral deviations overlooked by conventional metrics, demonstrating its effectiveness in capturing runtime uncertainties.
Optimizing Highway Traffic Flow in Mixed Autonomy: A Multiagent Truncated Rollout Approach
The development of connected and autonomous vehicles (CAVs) offers substantial opportunities to enhance traffic efficiency. However, in mixed autonomy environments where CAVs coexist with human-driven vehicles (HDVs), achieving efficient coordination among CAVs remains challenging due to heterogeneous driving behaviors. To address this, this paper proposes a multiagent truncated rollout approach that enhances CAV speed coordination to improve highway throughput while reducing computational overhead. In this approach, a traffic density evolution equation is formulated that comprehensively accounts for the presence or absence of CAVs, and a distributed coordination control framework is established accordingly. By incorporating kinematic information from neighbor agents and employing an agent-by-agent sequential solution mechanism, our method enables explicit cooperation among CAVs. Furthermore, we introduce a truncated rollout scheme that adaptively shortens the optimization horizon based on the evaluation of control sequences. This significantly reduces the time complexity, thereby improving real-time performance and scalability. Theoretical analysis provides rigorous guarantees on the stability and performance improvement of the system. Simulations conducted on real-world bottleneck scenarios demonstrate that, in large-scale mixed traffic flows, the proposed method outperforms conventional model predictive control methods by reducing both the average travel time in the bottleneck area and overall computational time, highlighting its strong potential for practical deployment.
Bayesian Ego-graph Inference for Networked Multi-Agent Reinforcement Learning NeurIPS 2025
In networked multi-agent reinforcement learning (Networked-MARL), decentralized agents must act under local observability and constrained communication over fixed physical graphs. Existing methods often assume static neighborhoods, limiting adaptability to dynamic or heterogeneous environments. While centralized frameworks can learn dynamic graphs, their reliance on global state access and centralized infrastructure is impractical in real-world decentralized systems. We propose a stochastic graph-based policy for Networked-MARL, where each agent conditions its decision on a sampled subgraph over its local physical neighborhood. Building on this formulation, we introduce BayesG, a decentralized actor-framework that learns sparse, context-aware interaction structures via Bayesian variational inference. Each agent operates over an ego-graph and samples a latent communication mask to guide message passing and policy computation. The variational distribution is trained end-to-end alongside the policy using an evidence lower bound (ELBO) objective, enabling agents to jointly learn both interaction topology and decision-making strategies. BayesG outperforms strong MARL baselines on large-scale traffic control tasks with up to 167 agents, demonstrating superior scalability, efficiency, and performance.
comment: Accepted at NeurIPS 2025 https://openreview.net/forum?id=3qeTs05bRL
Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance
Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large language models (LLMs) have established LLM-based agents as a major branch of MAS, enabling major breakthroughs in complex problem solving and world modeling. However, the reliability implications of this shift remain largely unexplored. i.e., whether substituting traditional agents with LLM-based agents can effectively enhance the reliability of MAS. In this work, we investigate and quantify the reliability of LLM-based agents from the perspective of Byzantine fault tolerance. We observe that LLM-based agents demonstrate stronger skepticism when processing erroneous message flows, a characteristic that enables them to outperform traditional agents across different topological structures. Motivated by the results of the pilot experiment, we design CP-WBFT, a confidence probe-based weighted Byzantine Fault Tolerant consensus mechanism to enhance the stability of MAS with different topologies. It capitalizes on the intrinsic reflective and discriminative capabilities of LLMs by employing a probe-based, weighted information flow transmission method to improve the reliability of LLM-based agents. Extensive experiments demonstrate that CP-WBFT achieves superior performance across diverse network topologies under extreme Byzantine conditions (85.7\% fault rate). Notably, our approach surpasses traditional methods by attaining remarkable accuracy on various topologies and maintaining strong reliability in both mathematical reasoning and safety assessment tasks.
MedChat: A Multi-Agent Framework for Multimodal Diagnosis with Large Language Models
The integration of deep learning-based glaucoma detection with large language models (LLMs) presents an automated strategy to mitigate ophthalmologist shortages and improve clinical reporting efficiency. However, applying general LLMs to medical imaging remains challenging due to hallucinations, limited interpretability, and insufficient domain-specific medical knowledge, which can potentially reduce clinical accuracy. Although recent approaches combining imaging models with LLM reasoning have improved reporting, they typically rely on a single generalist agent, restricting their capacity to emulate the diverse and complex reasoning found in multidisciplinary medical teams. To address these limitations, we propose MedChat, a multi-agent diagnostic framework and platform that combines specialized vision models with multiple role-specific LLM agents, all coordinated by a director agent. This design enhances reliability, reduces hallucination risk, and enables interactive diagnostic reporting through an interface tailored for clinical review and educational use. Code available at https://github.com/Purdue-M2/MedChat.
Systems and Control (EESS)
Enhancing Orbital Debris Remediation with Reconfigurable Space-Based Laser Constellations
Orbital debris poses an escalating threat to space missions and the long-term sustainability of Earth's orbital environment. The literature proposes various approaches for orbital debris remediation, including the use of multiple space-based lasers that collaboratively engage debris targets. While the proof of concept for this laser-based approach has been demonstrated, critical questions remain about its scalability and responsiveness as the debris population continues to expand rapidly. This paper introduces constellation reconfiguration as a system-level strategy to address these limitations. Through coordinated orbital maneuvers, laser-equipped satellites can dynamically adapt their positions to respond to evolving debris distributions and time-critical events. We formalize this concept as the Reconfigurable Laser-to-Debris Engagement Scheduling Problem (R-L2D-ESP), an optimization framework that determines the optimal sequence of constellation reconfigurations and laser engagements to maximize debris remediation capacity, which quantifies the constellation's ability to nudge, deorbit, or perform just-in-time collision avoidance maneuvers on debris objects. To manage the complexity of this combinatorial optimization problem, we employ a receding horizon approach. Our experiments reveal that reconfigurable constellations significantly outperform static ones, achieving greater debris remediation capacity and successfully deorbiting substantially more debris objects. Additionally, our sensitivity analyses identify the key parameters that influence remediation performance the most, providing essential insights for future system design. These findings demonstrate that constellation reconfiguration represents a promising advancement for laser-based debris removal systems, offering the adaptability and scalability necessary to enhance this particular approach to orbital debris remediation.
comment: Accepted to the 2026 IEEE Aerospace Conference
gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation
We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existing datasets and libraries face three main challenges: (1) lack of realistic stochastic load and topology perturbations, limiting scenario diversity; (2) PF datasets are restricted to OPF-feasible points, hindering generalization of ML solvers to cases that violate operating limits (e.g., branch overloads or voltage violations); and (3) OPF datasets use fixed generator cost functions, limiting generalization across varying costs. gridfm-datakit addresses these challenges by: (1) combining global load scaling from real-world profiles with localized noise and supporting arbitrary N-k topology perturbations to create diverse yet realistic datasets; (2) generating PF samples beyond operating limits; and (3) producing OPF data with varying generator costs. It also scales efficiently to large grids (up to 10,000 buses). Comparisons with OPFData, OPF-Learn, PGLearn, and PF$Δ$ are provided. Available on GitHub at https://github.com/gridfm/gridfm-datakit under Apache 2.0 and via `pip install gridfm-datakit`.
comment: Main equal contributors: Alban Puech, Matteo Mazzonelli. Other equal contributors: Celia Cintas, Tamara R. Govindasamy, Mangaliso Mngomezulu, Jonas Weiss
Scalable Nonlinear DeePC: Bridging Direct and Indirect Methods and Basis Reduction
This paper studies regularized data-enabled predictive control (DeePC) within a nonlinear framework and its relationship to subspace predictive control (SPC). The $Π$-regularization is extended to general basis functions and it is shown that, under suitable conditions, the resulting basis functions DeePC formulation constitutes a relaxation of basis functions SPC. To improve scalability, we introduce an SVD-based dimensionality reduction that preserves the equivalence with SPC, and we derive a reduced Π-regularization. A LASSO based sparse basis selection method is proposed to obtain a reduced basis from lifted data. Simulations on a nonlinear van der Pol oscillator model indicate that, in the absence of noise, DeePC and SPC yield equivalent absolute mean tracking errors (AMEs) when large penalties are applied. In contrast, under noisy measurements, careful tuning of the DeePC regularization results in a reduced AME, outperforming SPC.
The Innovation Null Space of the Kalman Predictor: A Stochastic Perspective for DeePC
Willems' fundamental lemma uses a key decision variable $g$ to combine measured input-output data and describe trajectories of a linear time-invariant system. In this paper, we ask: what is a good choice for this vector $g$ when the system is affected by noise? For a linear system with Gaussian noise, we show that there exists an optimal subspace for this decision variable $g$, which is the null space of the innovation Hankel matrix. If the decision vector lies in this null space, the resulting predictor gets closer to the Kalman predictor. To show this, we use a result that we refer to as the Kalman Filter Fundamental Lemma (KFFL), which applies Willems' lemma to the Kalman predictor. This viewpoint also explains several existing data-driven predictive control methods: regularized DeePC schemes act as soft versions of the innovation null-space constraint, instrumental-variable methods enforce it by construction, and ARX-based approaches explicitly estimate this innovation null space.
Closed-Loop Consistent, Causal Data-Driven Predictive Control via SSARX
We propose a fundamental-lemma-free data-driven predictive control (DDPC) scheme for synthesizing model predictive control (MPC)-like policies directly from input-output data. Unlike the well-known DeePC approach and other DDPC methods that rely on Willems' fundamental lemma, our method avoids stacked Hankel representations and the DeePC decision variable g. Instead, we develop a closed-loop consistent, causal DDPC scheme based on the multi-step predictor Subspace-ARX (SSARX). The method first (i) estimates predictor/observer Markov parameters via a high-order ARX model to decouple the noise, then (ii) learns a multi-step past-to-future map by regression, optionally with a reduced-rank constraint. The SSARX predictor is strictly causal, which allows it to be integrated naturally into an MPC formulation. Our experimental results show that SSARX performs competitively with other methods when applied to closed-loop data affected by measurement and process noise.
Equivariant Observer for Bearing Estimation with Linear and Angular Velocity Inputs
This work addresses the problem of designing an equivariant observer for a first order dynamical system on the unit-sphere. Building upon the established case of unit bearing vector dynamics with angular velocity inputs, we introduce an additional linear velocity input projected onto the unit-sphere tangent space. This extended formulation is particularly useful in image-based visual servoing scenarios where stable bearing estimates are required and the relative velocity between the vehicle and target features must be accounted for. Leveraging lifted kinematics to the Special Orthogonal group, we design an observer for the bearing vector and prove its almost global asymptotic stability. Additionally, we demonstrate how the equivariant observer can be expressed in the original state manifold. Numerical simulation results validate the effectiveness of the proposed algorithm.
comment: This work has been submitted to the 2026 European Control Conference
Nonlinear System Identification Nano-drone Benchmark
We introduce a benchmark for system identification based on 75k real-world samples from the Crazyflie 2.1 Brushless nano-quadrotor, a sub-50g aerial vehicle widely adopted in robotics research. The platform presents a challenging testbed due to its multi-input, multi-output nature, open-loop instability, and nonlinear dynamics under agile maneuvers. The dataset comprises four aggressive trajectories with synchronized 4-dimensional motor inputs and 13-dimensional output measurements. To enable fair comparison of identification methods, the benchmark includes a suite of multi-horizon prediction metrics for evaluating both one-step and multi-step error propagation. In addition to the data, we provide a detailed description of the platform and experimental setup, as well as baseline models highlighting the challenge of accurate prediction under real-world noise and actuation nonlinearities. All data, scripts, and reference implementations are released as open-source at https://github.com/idsia-robotics/nanodrone-sysid-benchmark to facilitate transparent comparison of algorithms and support research on agile, miniaturized aerial robotics.
Equivariant Filter Cascade for Relative Attitude, Target's Angular Velocity, and Gyroscope Bias Estimation
Rendezvous and docking between a chaser spacecraft and an uncooperative target, such as an inoperative satellite, require synchronization between the chaser spacecraft and the target. In these scenarios, the chaser must estimate the relative attitude and angular velocity of the target using onboard sensors, in the presence of gyroscope bias. In this work, we propose a cascade of Equivariant Filters (EqF) to address this problem. The first stage of the cascade estimates the chaser's attitude and the bias, using measurements from a star tracker, while the second stage of the cascade estimates the relative attitude and the target's angular velocity, using observations of two known, non-collinear vectors fixed in the target frame. The stability of the EqF cascade is theoretically analyzed and simulation results demonstrate the filter cascade's performance.
comment: This work has been submitted to the 2026 European Control Conference
Temporal interference stimulation for deep brain neuromodulation in humans
For decades, focal non-invasive neuromodulation of deep brain regions has not been possible because of the steep depth-focality trade-off of conventional non-invasive brain stimulation (NIBS) techniques, such as transcranial magnetic stimulation (TMS) or classical transcranial electric stimulation (tES). Deep brain stimulation has therefore largely relied on invasive approaches in clinical populations, requiring surgery. Transcranial Temporal Interference Stimulation (tTIS) has recently emerged as a promising method to overcome this challenge and allows for the first time focal non-invasive electrical deep brain stimulation. The method, which was first validated through computational modeling and rodent work, has now been successfully translated to humans to target deep brain regions such as the hippocampus or striatum. In this Perspective, we present current evidence for tTIS-based neuromodulation, underlying mechanisms and discuss future developments of this promising technology. More specifically, we highlight key opportunities and challenges for fundamental neuroscience as well as for the design of new interventions in neuropsychiatric disorders. We also discuss the status of understanding and challenges regarding the basic mechanisms of action of tTIS and possible lines of technological innovation to optimize stimulation, in particular in terms of intensity and focality. Overall, we suggest that following the first proof-of-concepts, an important multidisciplinary research effort is now required to further validate the use of tTIS in multiple applications, understand its underlying principles and optimize the technology in the view of a wider scientific and clinical deployment.
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
A Geometric Task-Space Port-Hamiltonian Formulation for Redundant Manipulators
We present a novel geometric port-Hamiltonian formulation of redundant manipulators performing a differential kinematic task $η=J(q)\dot{q}$, where $q$ is a point on the configuration manifold, $η$ is a velocity-like task space variable, and $J(q)$ is a linear map representing the task, for example the classical analytic or geometric manipulator Jacobian matrix. The proposed model emerges from a change of coordinates from canonical Hamiltonian dynamics, and splits the standard Hamiltonian momentum variable into a task-space momentum variable and a null-space momentum variable. Properties of this model and relation to Lagrangian formulations present in the literature are highlighted. Finally, we apply the proposed model in an \textit{Interconnection and Damping Assignment Passivity-Based Control} (IDA-PBC) design to stabilize and shape the impedance of a 7-DOF Emika Panda robot in simulation.
A Data-Driven Approach for Electric Vehicle Powertrain Modeling
Electrification in the automotive industry and increasing powertrain complexity demand accelerated, cost-effective development cycles. While data-driven models are recently investigated at component level, a gap exists in systematically integrating them into cohesive, system-level simulations for virtual validation. This paper addresses this gap by presenting a modular framework for developing powertrain simulations. By defining standardized interfaces for key components-the battery, inverter, and electric motor-our methodology enables independently developed models, whether data-driven, physics-based, or empirical, to be easily integrated. This approach facilitates scalable system-level modeling, aims to shorten development timelines and to meet the agile demands of the modern automotive industry.
Estimating Reaction Rate Constants from Impedance Spectra: Simulating the Multistep Oxygen Evolution Reaction
The efficiency of water electrolysis in a photoelectrochemical cell is largely limited by the oxygen evolution reaction (OER) at its semiconductor photoanode. Reaction rate constants are key to investigating the slow kinetics of the multistep OER, as they indicate the rate-determining step. While these rate constants are usually calculated based on first-principles simulations, this research aims to estimate them from experimental electrochemical impedance spectroscopy (EIS) data. Starting from a microkinetic model for charge transfer at the semiconductor-electrolyte interface, an expression for the impedance as a function of the rate constants is derived. At lower potentials, the order of this impedance model is reduced, thus eliminating the rate constants corresponding to the last reaction steps. Moreover, it is shown that EIS data from at least two potentials needs to be combined in order to uniquely identify the rate constants of a particular reduced order model. Therefore, this work details a sample maximum likelihood estimator that integrates not only multiple frequencies, but also multiple potentials simultaneously. Measuring multiple periods of the current density and potential signals, allows this frequency domain estimator to take measurement uncertainty into account. In addition, due to the large numerical range of the rate constants, various scaling methods are implemented to achieve numerical stability. To find suitable initial values for the highly nonlinear optimization problem, different global estimation methods are compared. The complete estimation procedure of the rate constants is illustrated on simulated EIS data of a hematite photoanode.
comment: 18 pages, 8 figures
LegionITS: A Federated Intrusion-Tolerant System Architecture
The growing sophistication, frequency, and diversity of cyberattacks increasingly exceed the capacity of individual entities to fully understand and counter them. While existing solutions, such as Security Information and Event Management (SIEM) systems, Security Orchestration, Automation, and Response (SOAR) platforms, and Security Operation Center (SOC), play a vital role in mitigating known threats, they often struggle to effectively address emerging and unforeseen attacks. To increase the effectiveness of cyber defense, it is essential to foster greater information sharing between entities; however, this requires addressing the challenge of exchanging sensitive data without compromising confidentiality or operational security. To address the challenges of secure and confidential Cyber Threat Intelligence (CTI) sharing, we propose a novel architecture that federates Intrusion Tolerant Systems (ITSs) and leverages concepts from Malware Information Sharing Platform (MISP) to empower SOCs. This framework enables controlled collaboration and data privacy while enhancing collective defenses. As a proof of concept, we evaluate one module by applying Differential Privacy (DP) to Federated Learning (FL), observing a manageable accuracy drop from 98.42% to 85.98% (average loss 12.44%) while maintaining reliable detection of compromised messages. These results highlight the viability of secure data sharing and establishes a foundation for the future full-scale implementation of LegionITS.
comment: 17 pages, 4 Figures, 1 Table
Trajectory Tracking for Multi-Manipulator Systems in Constrained Environments
We consider the problem of cooperative manipulation by a mobile multi-manipulator system operating in obstacle-cluttered and highly constrained environments under spatio-temporal task specifications. The task requires transporting a grasped object while respecting both continuous robot dynamics and discrete geometric constraints arising from obstacles and narrow passages. To address this hybrid structure, we propose a multi-rate planning and control framework that combines offline generation of an STL-satisfying object trajectory and collision-free base footprints with online constrained inverse kinematics and continuous-time feedback control. The resulting closed-loop system enables coordinated reconfiguration of multiple manipulators while tracking the desired object motion. The approach is evaluated in high-fidelity physics simulations using three Franka Emika Panda mobile manipulators rigidly grasping an object.
KalMRACO: Unifying Kalman Filter and Model Reference Adaptive Control for Robust Control and Estimation of Uncertain Systems
A common assumption when applying the Kalman filter is a priori knowledge of the system parameters. These parameters are not necessarily known, and this may limit real-world applications of the Kalman filter. The well-established Model Reference Adaptive Controller (MRAC) utilizes a known reference model and ensures that the input-output behavior of a potentially unknown system converges to that of the reference model. We present KalMRACO, a unification of the Kalman filter and MRAC leveraging the reference model of MRAC as the Kalman filter system model, thus eliminating, to a large degree, the need for knowledge of the underlying system parameters in the application of the Kalman filter. We also introduce the concept of blending estimated states and measurements in the feedback law to handle stability issues during the initial transient. KalMRACO is validated through simulations and lab trials on an underwater vehicle. Results show superior tracking of the reference model state, observer state convergence, and noise mitigation properties.
comment: 7 pages, 4 figures
Coordinated Fast Frequency Response from Electric Vehicles, Data Centers, and Battery Energy Storage Systems
High renewable penetration has significantly reduced system inertia in modern power grids, increasing the need for fast frequency response (FFR) from distributed and non-traditional resources. While electric vehicles (EVs), data centers, and battery energy storage systems (BESS) have each demonstrated the capability to provide sub-second active power support, their combined frequency response potential has not been systematically evaluated. This paper proposes a coordinated control framework that aggregates these heterogeneous resources to provide fast, stable, and reliable FFR. Dynamic models for EV fleets, data center UPS and workload modulation, and BESS are developed, explicitly capturing their response times, power limits, and operational constraints. A hierarchical control architecture is introduced, where an upper-level coordinator dynamically allocates FFR among resources based on response speed and available capacity, and lower-level controllers implement the actual power response. Case studies based on the IEEE 39-bus test system demonstrate that the coordinated EV-DC-BESS framework improves frequency nadir by up to 0.2 Hz, reduces RoCoF, and accelerates frequency recovery compared with single-resource FFR. Results confirm that synergistic coordination significantly enhances grid stability, especially in low-inertia scenarios. This work highlights the value of multi-resource aggregation for future frequency regulation markets in renewable-dominated grids.
Fast Frequency Response Potential of Data Centers through Workload Modulation and UPS Coordination
The rapid growth of renewable energy sources has significantly reduced system inertia and increased the need for fast frequency response (FFR) in modern power systems. Data centers, as large and flexible electrical consumers, hold great potential to contribute to frequency stabilization due to their controllable IT workloads and on-site uninterruptible power supply (UPS) systems. This paper investigates the feasibility of leveraging data centers for providing fast frequency response through real-time workload modulation and UPS coordination. A dynamic model combining data center power consumption and grid frequency dynamics is developed, capturing the interactions between IT servers, cooling systems, and energy storage. Control strategies based on frequency deviation are implemented to adjust server power and discharge UPS batteries during frequency events. Case studies on a modified IEEE 39-bus system demonstrate that the proposed strategy can effectively reduce frequency nadir and shorten recovery time without compromising service quality. The results highlight the promising role of data centers as grid-supporting resources in future low-inertia systems.
Optimizing Sensor Node Localization for Achieving Sustainable Smart Agriculture System Connectivity
The innovative agriculture system is revolutionizing how we farm, making it one of the most critical innovations of our time! Yet it faces significant connectivity challenges, particularly with the sensors that power this technology. An efficient sensor deployment solution is still required to maximize the network's detection capabilities and efficiency while minimizing resource consumption and operational costs. This paper introduces an innovative sensor allocation optimization method that employs a Gradient-Based Iteration with Lagrange. The proposed method enhances coverage by utilizing a hybrid approach while minimizing the number of sensor nodes required under grid-based allocation. The proposed sensor distribution outperformed the classic deterministic deployment across coverage, number of sensors, cost, and power consumption. Furthermore, scalability is enhanced by extending sensing coverage to the remaining area via Bluetooth, which has a shorter communication range. Moreover, the proposed algorithm achieved 98.5% wireless sensor coverage, compared with 95% for the particle swarm distribution.
Restless Multi-Process Multi-Armed Bandits with Applications to Self-Driving Microscopies
High-content screening microscopy generates large amounts of live-cell imaging data, yet its potential remains constrained by the inability to determine when and where to image most effectively. Optimally balancing acquisition time, computational capacity, and photobleaching budgets across thousands of dynamically evolving regions of interest remains an open challenge, further complicated by limited field-of-view adjustments and sensor sensitivity. Existing approaches either rely on static sampling or heuristics that neglect the dynamic evolution of biological processes, leading to inefficiencies and missed events. Here, we introduce the restless multi-process multi-armed bandit (RMPMAB), a new decision-theoretic framework in which each experimental region is modeled not as a single process but as an ensemble of Markov chains, thereby capturing the inherent heterogeneity of biological systems such as asynchronous cell cycles and heterogeneous drug responses. Building upon this foundation, we derive closed-form expressions for transient and asymptotic behaviors of aggregated processes, and design scalable Whittle index policies with sub-linear complexity in the number of imaging regions. Through both simulations and a real biological live-cell imaging dataset, we show that our approach achieves substantial improvements in throughput under resource constraints. Notably, our algorithm outperforms Thomson Sampling, Bayesian UCB, epsilon-Greedy, and Round Robin by reducing cumulative regret by more than 37% in simulations and capturing 93% more biologically relevant events in live imaging experiments, underscoring its potential for transformative smart microscopy. Beyond improving experimental efficiency, the RMPMAB framework unifies stochastic decision theory with optimal autonomous microscopy control, offering a principled approach to accelerate discovery across multidisciplinary sciences.
Assessing the Frequency Response Potential of Heavy-Duty Electric Vehicles with Vehicle-to-Grid Integration in the California Power System
The integration of heavy-duty electric vehicles (EVs) with Vehicle-to-Grid (V2G) capability can enhance primary frequency response and improve stability in power systems with high renewable penetration. This study evaluates the technical potential of heavy-duty EV fleets to support the California power grid under three practical charging strategies: immediate charging, delayed charging, and constant-minimum-power charging. We develop a simulation framework that couples aggregated frequency dynamics with battery and charger constraints, state-of-charge management, and fleet-availability profiles. Performance is assessed using standard frequency security metrics, including nadir, rate-of-change-of-frequency, overshoot, and settling time, across credible contingency scenarios and renewable generation conditions. Results indicate that both non-V2G modes and V2G-enabled operation can contribute meaningful primary response, with V2G providing the strongest and fastest support while respecting mobility and network limits. Sensitivity analyses show that the relative benefits depend on charging strategy, control parameters, and renewable output, highlighting design trade-offs between response magnitude, duration, and battery usage. Overall, heavy-duty EV fleets-when coordinated by appropriate charging and V2G controls-offer a viable resource for strengthening primary frequency control on the California grid and mitigating stability challenges associated with increasing renewable penetration.
comment: This work has been submitted to International Journal of Electrical Power & Energy System and is currently under consideration
Compensation of Coarse Quantization Effects on Channel Estimation and BER in Massive MIMO
Low-resolution quantization is essential to reduce implementation cost and power consumption in massive multiple-input multiple-output (MIMO) systems for 5G and 6G. While most existing studies assume perfect channel state information (CSI), we model the impact of coarse quantization noise on both channel estimation and data transmission, yielding a more realistic assessment of system performance under imperfect CSI conditions in the uplink. We develop a tight approximation for the bit-error ratio (BER) of uncoded M-QAM with zero-forcing detection, based on the linear minimum mean-square error (LMMSE) channel estimate. These analytical results enable compensation strategies that jointly optimize quantization resolution, transmit power, and pilot length across different numbers of users and base station antennas. We further demonstrate the applicability of the proposed framework through several design scenarios that highlight its effectiveness in optimizing system parameters and improving energy efficiency under quantization constraints. For example, in a 16-QAM system, extending the pilot sequence by 2.5 times and lowering transmit power by 0.5 dB enables a 3-bit quantized system to match the BER of the full-resolution case. The proposed framework offers a fast and accurate alternative to Monte Carlo simulations, enabling practical system optimization under realistic quantization constraints.
comment: 12 pages, submitted to IEEE Transactions
Competent Discrete Time Modeling For analogue controlled PWM Converter Considering State-Feedback
Ever since R.D.Middlebrook proposed the state space averaging notion. The small signal model has been widely used as a design tool to tune control parameters. As Moore's law is continuing and the AI chip's high demand for power consumption and dynamic response, the control bandwidth needs to be boosted. However, the average model has two basic assumptions: the low-frequency assumption, the small ripple assumption. In high-bandwidth design, these two assumptions are violated. In order to solve this, various methods have been proposed. This paper gives a comprehensive overview of the existing small signal model for PWM converters from the following perspectives: 1. model fidelity, 2. analytical tractability. 3. complexity of the derivation process and result 4.generality.
Closing the Loop: Motion Prediction Models beyond Open-Loop Benchmarks
Fueled by motion prediction competitions and benchmarks, recent years have seen the emergence of increasingly large learning based prediction models, many with millions of parameters, focused on improving open-loop prediction accuracy by mere centimeters. However, these benchmarks fail to assess whether such improvements translate to better performance when integrated into an autonomous driving stack. In this work, we systematically evaluate the interplay between state-of-the-art motion predictors and motion planners. Our results show that higher open-loop accuracy does not always correlate with better closed-loop driving behavior and that other factors, such as temporal consistency of predictions and planner compatibility, also play a critical role. Furthermore, we investigate downsized variants of these models, and, surprisingly, find that in some cases models with up to 86% fewer parameters yield comparable or even superior closed-loop driving performance. Our code is available at https://github.com/aumovio/pred2plan.
A constrained optimization approach to nonlinear system identification through simulation error minimization
This paper introduces a novel approach to system identification for nonlinear input-output models that minimizes the simulation error and frames the problem as a constrained optimization task. The proposed method addresses vanishing gradient issues, enabling faster convergence than traditional gradient-based techniques. We present an algorithm based on feedback linearization control of Lagrange multipliers and conduct a theoretical analysis of its performance. We prove that the algorithm converges to a local minimum, and it enhances computational efficiency by exploiting the problem's structure. Numerical experiments demonstrate that our approach outperforms gradient-based methods in both computational effort and estimation accuracy.
Data-fused MPC with Guarantees: Application to Flying Humanoid Robots
This paper introduces a Data-Fused Model Predictive Control (DFMPC) framework that combines physics-based models with data-driven representations of unknown dynamics. Leveraging Willems' Fundamental Lemma and an artificial equilibrium formulation, the method enables tracking of changing, potentially unreachable setpoints while explicitly handling measurement noise through slack variables and regularization. We provide guarantees of recursive feasibility and practical stability under input-output constraints for a specific class of reference signals. The approach is validated on the iRonCub flying humanoid robot, integrating analytical momentum models with data-driven turbine dynamics. Simulations show improved tracking and robustness compared to a purely model-based MPC, while maintaining real-time feasibility.
comment: This paper has been accepted for publication in IEEE Control Systems Letters (L-CSS)
Data-driven Implementations of Various Generalizations of Balanced Truncation
There exist two main approaches for non-intrusive implementations of approximate balanced truncation within the Loewner framework: the quadrature-based method [1] and the Alternating Direction Implicit (ADI)-based method [2]. Both approaches rely solely on samples of the transfer function to construct truncated balanced models, eliminating the need for access to the original model's statespace realization. Recently, the quadrature-based approach has been extended to various generalizations of balanced truncation, including positive-real balanced truncation, bounded-real balanced truncation, and balanced stochastic truncation. While this extension [3] is theoretically non-intrusive-meaning it does not require the original state-space realization-it depends on samples of spectral factorizations of the transfer function. Since practical methods for obtaining such samples are currently unavailable, this extension remains largely a theoretical contribution. In this work, we present a non-intrusive ADI-type framework for these generalized balanced truncation methods that requires only samples of the original transfer function for implementation.
The Phantom of Davis-Wielandt Shell: A Unified Framework for Graphical Stability Analysis of MIMO LTI Systems
This paper presents a unified framework based on Davis-Wielandt (DW) shell for graphical stability analysis of multi-input and multi-output linear time-invariant feedback systems. Connections between DW shells and various graphical representations, as well as gain and phase measures, are established through an intuitive geometric perspective. Within this framework, we map the relationships and relative conservatism among various separation conditions. A rotated scaled relative graph ($θ$-SRG) concept is proposed as a mixed gain-phase representation, from which a closed-loop stability criterion is derived and shown to be the least conservative among the existing 2-D graphical conditions for bi-component feedback loops. We also propose a reliable and generalizable algorithm for visualizing the $θ$-SRGs and include a system example to demonstrate the reduced conservatism of the proposed condition.
comment: 16 pages, 12 figures
Learning Robust and Correct Controllers Guided by Feasibility-Aware Signal Temporal Logic via BarrierNet
Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled the specification-driven synthesis of complex robotic behaviors. However, existing CBF-STL approaches typically rely on fixed hyperparameters and myopic, per-time step optimization, which can lead to overly conservative behavior, infeasibility near tight input limits, and difficulty satisfying long-horizon STL tasks. To address these limitations, we propose a feasibility-aware learning framework that embeds trainable, time-varying High Order Control Barrier Functions (HOCBFs) into a differentiable Quadratic Program (dQP). Our approach provides a systematic procedure for constructing time-varying HOCBF constraints for a broad fragment of STL and introduces a unified robustness measure that jointly captures STL satisfaction, QP feasibility, and control-bound compliance. Three neural networks-InitNet, RefNet, and an extended BarrierNet-collaborate to generate reference inputs and adapt constraint-related hyperparameters automatically over time and across initial conditions, reducing conservativeness while maximizing robustness. The resulting controller achieves STL satisfaction with strictly feasible dQPs and requires no manual tuning. Simulation results demonstrate that the proposed framework maintains high STL robustness under tight input bounds and significantly outperforms fixed-parameter and non-adaptive baselines in complex environments.
comment: 16 pages, 11 figures
Meta-Reinforcement Learning for Building Energy Management System
The building sector is one of the largest contributors to global energy consumption. Improving its energy efficiency is essential for reducing operational costs and greenhouse gas emissions. Energy management systems (EMS) play a key role in monitoring and controlling building appliances efficiently and reliably. With the increasing integration of renewable energy, intelligent EMS solutions have received growing attention. Reinforcement learning (RL) has recently been explored for this purpose and shows strong potential. However, most RL-based EMS methods require a large number of training steps to learn effective control policies, especially when adapting to unseen buildings, which limits their practical deployment. This paper introduces MetaEMS, a meta-reinforcement learning framework for EMS. MetaEMS improves learning efficiency by transferring knowledge from previously solved tasks to new ones through group-level and building-level adaptation, enabling fast adaptation and effective control across diverse building environments. Experimental results demonstrate that MetaEMS adapts more rapidly to unseen buildings and consistently outperforms baseline methods across various scenarios.
comment: arXiv admin note: text overlap with arXiv:1909.10165 by other authors
Standards-Compliant DM-RS Allocation via Temporal Channel Prediction for Massive MIMO Systems
Reducing feedback overhead in beyond 5G networks is a critical challenge, as the growing number of antennas in modern massive MIMO systems substantially increases the channel state information (CSI) feedback demand in frequency division duplex (FDD) systems. To address this, extensive research has focused on CSI compression and prediction, with neural network-based approaches gaining momentum and being considered for integration into the 3GPP 5G-Advanced standards. While deep learning has been effectively applied to CSI-limited beamforming and handover optimization, reference signal allocation under such constraints remains surprisingly underexplored. To fill this gap, we introduce the concept of channel prediction-based reference signal allocation (CPRS), which jointly optimizes channel prediction and DM-RS allocation to improve data throughput without requiring CSI feedback. We further propose a standards-compliant ViViT/CNN-based architecture that implements CPRS by treating evolving CSI matrices as sequential image-like data, enabling efficient and adaptive transmission in dynamic environments. Simulation results using ray-tracing channel data generated in NVIDIA Sionna validate the proposed method, showing up to 36.60% throughput improvement over benchmark strategies.
On the Impact of Voltage Unbalance on Distribution Locational Marginal Prices
Finding clear economic signals for distribution-network operation and expansion is increasingly important as single-phase loads and distributed energy resources escalate. These devices create phase-to-phase imbalances that manifest as voltage unbalance, a power quality issue that accelerates insulation aging in machines and increases network losses, thereby raising costs for operators and consumers. Traditional grid codes address unbalance via disparate hard limits on various indices thresholds that differ across standards, offer no dynamic economic incentive and undermine optimality. This paper proposes instead to treat voltage unbalance as a `soft limit' by adding penalty terms to grid operation costs within a three-phase optimal power flow to reflect the cost of the decrease in lifetime of assets due to being subject to voltage unbalance. This unified approach yields dynamic economic signals unbalance-aware Distribution Locational Marginal Prices (DLMP) that reflect the cost of power quality deviations. A novel mathematical decomposition of DLMP is developed, isolating the energy, loss, congestion, and unbalance components. Case studies conducted on two benchmark networks demonstrate the effectiveness and practical value of the proposed method. The results indicate that unbalance penalties reshape nodal prices, produce unexpected phase-level effects, and even allow scenarios where added load reduces unbalance and lowers costs, while providing planners and market designers with actionable insights to balance investment, operation, and power quality in modern distribution systems.
EDMD-Based Robust Observer Synthesis for Nonlinear Systems
This paper presents a data-driven Koopman operator-based approach for designing robust state observers for nonlinear systems. Based on a finite-dimensional surrogate of the Koopman generator, identified via an extended dynamic mode decomposition (EDMD) procedure, a tractable formulation of the observer design problem is enabled on the data-driven model with conic uncertainties. The resulting problem is cast as a semidefinite program (SDP) with linear matrix inequalities (LMIs), guaranteeing exponential convergence of the observer with a predetermined rate in a probabilistic sense. The approach bridges the gap between statistical error tolerance and observer convergence certification, and enables an explicit use of linear systems theory for nonlinear observation in a data-driven framework. Numerical studies demonstrate the effectiveness and flexibility of the proposed method.
comment: 9 pages, 2 figures. Submitted to Computers and Chemical Engineering, 2025
Robotics
NL2SpaTiaL: Generating Geometric Spatio-Temporal Logic Specifications from Natural Language for Manipulation Tasks
Spatio-Temporal Logic (SpaTiaL) offers a principled formalism for expressing geometric spatial requirements-an essential component of robotic manipulation, where object locations, neighborhood relations, pose constraints, and interactions directly determine task success. Yet prior works have largely relied on standard temporal logic (TL), which models only robot trajectories and overlooks object-level interactions. Existing datasets built from randomly generated TL formulas paired with natural-language descriptions therefore cover temporal operators but fail to represent the layered spatial relations that manipulation tasks depend on. To address this gap, we introduce a dataset generation framework that synthesizes SpaTiaL specifications and converts them into natural-language descriptions through a deterministic, semantics-preserving back-translation procedure. This pipeline produces the NL2SpaTiaL dataset, aligning natural language with multi-level spatial relations and temporal objectives to reflect the compositional structure of manipulation tasks. Building on this foundation, we propose a translation-verification framework equipped with a language-based semantic checker that ensures the generated SpaTiaL formulas faithfully encode the semantics specified by the input description. Experiments across a suite of manipulation tasks show that SpaTiaL-based representations yield more interpretable, verifiable, and compositional grounding for instruction following. Project website: https://sites.google.com/view/nl2spatial
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Spatial tracing, as a fundamental embodied interaction ability for robots, is inherently challenging as it requires multi-step metric-grounded reasoning compounded with complex spatial referring and real-world metric measurement. However, existing methods struggle with this compositional task. To this end, we propose RoboTracer, a 3D-aware VLM that first achieves both 3D spatial referring and measuring via a universal spatial encoder and a regression-supervised decoder to enhance scale awareness during supervised fine-tuning (SFT). Moreover, RoboTracer advances multi-step metric-grounded reasoning via reinforcement fine-tuning (RFT) with metric-sensitive process rewards, supervising key intermediate perceptual cues to accurately generate spatial traces. To support SFT and RFT training, we introduce TraceSpatial, a large-scale dataset of 30M QA pairs, spanning outdoor/indoor/tabletop scenes and supporting complex reasoning processes (up to 9 steps). We further present TraceSpatial-Bench, a challenging benchmark filling the gap to evaluate spatial tracing. Experimental results show that RoboTracer surpasses baselines in spatial understanding, measuring, and referring, with an average success rate of 79.1%, and also achieves SOTA performance on TraceSpatial-Bench by a large margin, exceeding Gemini-2.5-Pro by 36% accuracy. Notably, RoboTracer can be integrated with various control policies to execute long-horizon, dynamic tasks across diverse robots (UR5, G1 humanoid) in cluttered real-world scenes.
comment: Project page: https://zhoues.github.io/RoboTracer
World Models Can Leverage Human Videos for Dexterous Manipulation
Dexterous manipulation is challenging because it requires understanding how subtle hand motion influences the environment through contact with objects. We introduce DexWM, a Dexterous Manipulation World Model that predicts the next latent state of the environment conditioned on past states and dexterous actions. To overcome the scarcity of dexterous manipulation datasets, DexWM is trained on over 900 hours of human and non-dexterous robot videos. To enable fine-grained dexterity, we find that predicting visual features alone is insufficient; therefore, we introduce an auxiliary hand consistency loss that enforces accurate hand configurations. DexWM outperforms prior world models conditioned on text, navigation, and full-body actions, achieving more accurate predictions of future states. DexWM also demonstrates strong zero-shot generalization to unseen manipulation skills when deployed on a Franka Panda arm equipped with an Allegro gripper, outperforming Diffusion Policy by over 50% on average in grasping, placing, and reaching tasks.
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. MindDrive achieves strong closed-loop performance on the challenging Bench2Drive benchmark, with a Driving Score (DS) of 78.04 and a Success Rate (SR) of 55.09%. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.
comment: 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/
Near-Field Perception for Safety Enhancement of Autonomous Mobile Robots in Manufacturing Environments
Near-field perception is essential for the safe operation of autonomous mobile robots (AMRs) in manufacturing environments. Conventional ranging sensors such as light detection and ranging (LiDAR) and ultrasonic devices provide broad situational awareness but often fail to detect small objects near the robot base. To address this limitation, this paper presents a three-tier near-field perception framework. The first approach employs light-discontinuity detection, which projects a laser stripe across the near-field zone and identifies interruptions in the stripe to perform fast, binary cutoff sensing for obstacle presence. The second approach utilizes light-displacement measurement to estimate object height by analyzing the geometric displacement of a projected stripe in the camera image, which provides quantitative obstacle height information with minimal computational overhead. The third approach employs a computer vision-based object detection model on embedded AI hardware to classify objects, enabling semantic perception and context-aware safety decisions. All methods are implemented on a Raspberry Pi 5 system, achieving real-time performance at 25 or 50 frames per second. Experimental evaluation and comparative analysis demonstrate that the proposed hierarchy balances precision, computation, and cost, thereby providing a scalable perception solution for enabling safe operations of AMRs in manufacturing environments.
comment: Submitted to the 54th SME North American Manufacturing Research Conference (NAMRC 54)
Reinforcement Learning based 6-DoF Maneuvers for Microgravity Intravehicular Docking: A Simulation Study with Int-Ball2 in ISS-JEM
Autonomous free-flyers play a critical role in intravehicular tasks aboard the International Space Station (ISS), where their precise docking under sensing noise, small actuation mismatches, and environmental variability remains a nontrivial challenge. This work presents a reinforcement learning (RL) framework for six-degree-of-freedom (6-DoF) docking of JAXA's Int-Ball2 robot inside a high-fidelity Isaac Sim model of the Japanese Experiment Module (JEM). Using Proximal Policy Optimization (PPO), we train and evaluate controllers under domain-randomized dynamics and bounded observation noise, while explicitly modeling propeller drag-torque effects and polarity structure. This enables a controlled study of how Int-Ball2's propulsion physics influence RL-based docking performance in constrained microgravity interiors. The learned policy achieves stable and reliable docking across varied conditions and lays the groundwork for future extensions pertaining to Int-Ball2 in collision-aware navigation, safe RL, propulsion-accurate sim-to-real transfer, and vision-based end-to-end docking.
comment: Presented at AI4OPA Workshop at the International Conference on Space Robotics (iSpaRo) 2025 at Sendai, Japan
Evaluating the Navigation Capabilities of a Modified COAST Guidewire Robot in an Anatomical Phantom Model
To address the issues that arise due to the manual navigation of guidewires in endovascular interventions, research in medical robotics has taken a strong interest in developing robotically steerable guidewires, which offer the possibility of enhanced maneuverability and navigation, as the tip of the guidewire can be actively steered. The COaxially Aligned STeerable (COAST) guidewire robot has the ability to generate a wide variety of motions including bending motion with different bending lengths, follow-the-leader motion, and feedforward motion. In our past studies, we have explored different designs of the COAST guidewire robot and developed modeling, control, and sensing strategies for the COAST guidewire robot. In this study, the performance of a modified COAST guidewire robot is evaluated by conducting navigation experiments in an anatomical phantom model with pulsatile flow. The modified COAST guidewire robot is a simplified version of the COAST guidewire robot and consists of two tubes as opposed to three tubes. Through this study, we demonstrate the effectiveness of the modified COAST guidewire robot in navigating the tortuous phantom vasculature.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Universal Dexterous Functional Grasping via Demonstration-Editing Reinforcement Learning
Reinforcement learning (RL) has achieved great success in dexterous grasping, significantly improving grasp performance and generalization from simulation to the real world. However, fine-grained functional grasping, which is essential for downstream manipulation tasks, remains underexplored and faces several challenges: the complexity of specifying goals and reward functions for functional grasps across diverse objects, the difficulty of multi-task RL exploration, and the challenge of sim-to-real transfer. In this work, we propose DemoFunGrasp for universal dexterous functional grasping. We factorize functional grasping conditions into two complementary components - grasping style and affordance - and integrate them into an RL framework that can learn to grasp any object with any functional grasping condition. To address the multi-task optimization challenge, we leverage a single grasping demonstration and reformulate the RL problem as one-step demonstration editing, substantially enhancing sample efficiency and performance. Experimental results in both simulation and the real world show that DemoFunGrasp generalizes to unseen combinations of objects, affordances, and grasping styles, outperforming baselines in both success rate and functional grasping accuracy. In addition to strong sim-to-real capability, by incorporating a vision-language model (VLM) for planning, our system achieves autonomous instruction-following grasp execution.
comment: 19 pages
Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles
Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controllers are effective under nominal conditions but exhibit degraded performance when faced with unmodeled dynamics or environmental disturbances. Reinforcement learning (RL) provides a powerful alternative but training is typically slow and sim-to-real transfer remains challenging. This work introduces a GPU-accelerated RL training pipeline built in JAX and MuJoCo-XLA (MJX). By jointly JIT-compiling large-scale parallel physics simulation and learning updates, we achieve training times of under two minutes.Through systematic evaluation of multiple RL algorithms, we show robust 6-DOF trajectory tracking and effective disturbance rejection in real underwater experiments, with policies transferred zero-shot from simulation. Our results provide the first explicit real-world demonstration of RL-based AUV position control across all six degrees of freedom.
Control of a Twin Rotor using Twin Delayed Deep Deterministic Policy Gradient (TD3)
This paper proposes a reinforcement learning (RL) framework for controlling and stabilizing the Twin Rotor Aerodynamic System (TRAS) at specific pitch and azimuth angles and tracking a given trajectory. The complex dynamics and non-linear characteristics of the TRAS make it challenging to control using traditional control algorithms. However, recent developments in RL have attracted interest due to their potential applications in the control of multirotors. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was used in this paper to train the RL agent. This algorithm is used for environments with continuous state and action spaces, similar to the TRAS, as it does not require a model of the system. The simulation results illustrated the effectiveness of the RL control method. Next, external disturbances in the form of wind disturbances were used to test the controller's effectiveness compared to conventional PID controllers. Lastly, experiments on a laboratory setup were carried out to confirm the controller's effectiveness in real-world applications.
comment: This is the Author Accepted Manuscript version of a paper accepted for publication. The final published version is available via IEEE Xplore
Humanoid Robot Running Through Random Stepping Stones and Jumping Over Obstacles: Step Adaptation Using Spring-Mass Trajectories
This study proposes a step adaptation framework for running through spring-mass trajectories and deadbeat control gain libraries. It includes four main parts: (1) Automatic spring-mass trajectory library generation; (2) Deadbeat control gain library generation through an actively controlled template model that resembles the whole-body dynamics well; (3) Trajectory selection policy development for step adaptation; (4) Mapping spring-mass trajectories to a humanoid model through a whole-body control (WBC) framework also accounting for closed-kinematic chain systems, self collisions, and reactive limb swinging. We show the inclusiveness and the robustness of the proposed framework through various challenging and agile behaviors such as running through randomly generated stepping stones, jumping over random obstacles, performing slalom motions, changing the running direction suddenly with a random leg, and rejecting significant disturbances and uncertainties through the MuJoCo physics simulator. We also perform additional simulations under a comprehensive set of uncertainties and noise to better justify the proposed method's robustness to real-world challenges, including signal noise, imprecision, modeling errors, and delays. All the aforementioned behaviors are performed with a single library and the same set of WBC control parameters without additional tuning. The spring-mass and the deadbeat control gain library are automatically computed in 4.5 seconds in total for 315 different trajectories.
comment: Accepted for publication in Biomimetic Intelligence and Robotics. Supplemental video: https://youtu.be/HlAg2nbNct4
Intrinsic-Motivation Multi-Robot Social Formation Navigation with Coordinated Exploration
This paper investigates the application of reinforcement learning (RL) to multi-robot social formation navigation, a critical capability for enabling seamless human-robot coexistence. While RL offers a promising paradigm, the inherent unpredictability and often uncooperative dynamics of pedestrian behavior pose substantial challenges, particularly concerning the efficiency of coordinated exploration among robots. To address this, we propose a novel coordinated-exploration multi-robot RL algorithm introducing an intrinsic motivation exploration. Its core component is a self-learning intrinsic reward mechanism designed to collectively alleviate policy conservatism. Moreover, this algorithm incorporates a dual-sampling mode within the centralized training and decentralized execution framework to enhance the representation of both the navigation policy and the intrinsic reward, leveraging a two-time-scale update rule to decouple parameter updates. Empirical results on social formation navigation benchmarks demonstrate the proposed algorithm's superior performance over existing state-of-the-art methods across crucial metrics. Our code and video demos are available at: https://github.com/czxhunzi/CEMRRL.
SAMAY: System for Acoustic Measurement and Analysis
This paper describes an automatic bird call recording system called SAMAY, which is developed to study bird species by creating a database of large amounts of bird acoustic data. By analysing the recorded bird call data, the system can also be used for automatic classification of bird species, monitoring bird populations and analysing the impact of environmental changes. The system is driven through a powerful STM32F407 series microcontroller, supports 4 microphones, is equipped with 128 GB of storage capacity, and is powered by a 10400 mAh battery pack interfaced with a solar charger. In addition, the device is user-configurable over USB and Wi-Fi during runtime, ensuring user-friendly operation during field deployment.
Lightweight Dynamic Modeling of Cable-Driven Continuum Robots Based on Actuation-Space Energy Formulation
Cable-driven continuum robots (CDCRs) require accurate, real-time dynamic models for high-speed dynamics prediction or model-based control, making such capability an urgent need. In this paper, we propose the Lightweight Actuation-Space Energy Modeling (LASEM) framework for CDCRs, which formulates actuation potential energy directly in actuation space to enable lightweight yet accurate dynamic modeling. Through a unified variational derivation, the governing dynamics reduce to a single partial differential equation (PDE), requiring only the Euler moment balance while implicitly incorporating the Newton force balance. By also avoiding explicit computation of cable-backbone contact forces, the formulation simplifies the model structure and improves computational efficiency while preserving geometric accuracy and physical consistency. Importantly, the proposed framework for dynamic modeling natively supports both force-input and displacement-input actuation modes, a capability seldom achieved in existing dynamic formulations. Leveraging this lightweight structure, a Galerkin space-time modal discretization with analytical time-domain derivatives of the reduced state further enables an average 62.3% computational speedup over state-of-the-art real-time dynamic modeling approaches.
Post-Training and Test-Time Scaling of Generative Agent Behavior Models for Interactive Autonomous Driving
Learning interactive motion behaviors among multiple agents is a core challenge in autonomous driving. While imitation learning models generate realistic trajectories, they often inherit biases from datasets dominated by safe demonstrations, limiting robustness in safety-critical cases. Moreover, most studies rely on open-loop evaluation, overlooking compounding errors in closed-loop execution. We address these limitations with two complementary strategies. First, we propose Group Relative Behavior Optimization (GRBO), a reinforcement learning post-training method that fine-tunes pretrained behavior models via group relative advantage maximization with human regularization. Using only 10% of the training dataset, GRBO improves safety performance by over 40% while preserving behavioral realism. Second, we introduce Warm-K, a warm-started Top-K sampling strategy that balances consistency and diversity in motion selection. Our Warm-K method-based test-time scaling enhances behavioral consistency and reactivity at test time without retraining, mitigating covariate shift and reducing performance discrepancies. Demo videos are available in the supplementary material.
comment: 11 pages, 5 figures
A Unified Framework for Automated Assembly Sequence and Production Line Planning using Graph-based Optimization
This paper presents PyCAALP (Python-based Computer-Aided Assembly Line Planning), a framework for automated Assembly Sequence Planning (ASP) and Production Line Planning (PLP), employing a graph-based approach to model components and joints within production modules. The framework integrates kinematic boundary conditions, such as potential part collisions, to guarantee the feasibility of automated assembly planning. The developed algorithm computes all feasible production sequences, integrating modules for detecting spatial relationships and formulating geometric constraints. The algorithm incorporates additional attributes, including handling feasibility, tolerance matching, and joint compatibility, to manage the high combinatorial complexity inherent in assembly sequence generation. Heuristics, such as Single-Piece Flow assembly and geometrical constraint enforcement, are utilized to further refine the solution space, facilitating more efficient planning for complex assemblies. The PLP stage is formulated as a Mixed-Integer Program (MIP), balancing the total times of a fixed number of manufacturing stations. While some complexity reduction techniques may sacrifice optimality, they significantly reduce the MIPs computational time. Furthermore, the framework enables customization of engineering constraints and supports a flexible trade-off between ASP and PLP. The open-source nature of the framework, available at https://github.com/TUM-utg/PyCAALP, promotes further collaboration and adoption in both industrial and production research applications.
comment: Code available at https://github.com/TUM-utg/PyCAALP (repository will be made public prior to publication)
Multi-directional Safe Rectangle Corridor-Based MPC for Nonholonomic Robots Navigation in Cluttered Environment
Autonomous Mobile Robots (AMRs) have become indispensable in industrial applications due to their operational flexibility and efficiency. Navigation serves as a crucial technical foundation for accomplishing complex tasks. However, navigating AMRs in dense, cluttered, and semi-structured environments remains challenging, primarily due to nonholonomic vehicle dynamics, interactions with mixed static/dynamic obstacles, and the non-convex constrained nature of such operational spaces. To solve these problems, this paper proposes an Improved Sequential Model Predictive Control (ISMPC) navigation framework that systematically reformulates navigation tasks as sequential switched optimal control problems. The framework addresses the aforementioned challenges through two key innovations: 1) Implementation of a Multi-Directional Safety Rectangular Corridor (MDSRC) algorithm, which encodes the free space through rectangular convex regions to avoid collision with static obstacles, eliminating redundant computational burdens and accelerating solver convergence; 2) A sequential MPC navigation framework that integrates corridor constraints with barrier function constraints is proposed to achieve static and dynamic obstacle avoidance. The ISMPC navigation framework enables direct velocity generation for AMRs, simplifying traditional navigation algorithm architectures. Comparative experiments demonstrate the framework's superiority in free-space utilization ( an increase of 41.05$\%$ in the average corridor area) while maintaining real-time computational performance (average corridors generation latency of 3 ms).
comment: 9 pages, 11 figures, conference paper for the 2025 International Conference on Advanced Robotics and Mechatronics (ICARM), accepted
Differentiable Material Point Method for the Control of Deformable Objects
Controlling the deformation of flexible objects is challenging due to their non-linear dynamics and high-dimensional configuration space. This work presents a differentiable Material Point Method (MPM) simulator targeted at control applications. We exploit the differentiability of the simulator to optimize a control trajectory in an active damping problem for a hyperelastic rope. The simulator effectively minimizes the kinetic energy of the rope around 2$\times$ faster than a baseline MPPI method and to a 20% lower energy level, while using about 3% of the computation time.
comment: 7 Pages, 4 Figures, 1 Table
ALBATROSS: A robotised system for high-throughput electrolyte screening via automated electrolyte formulation, coin-cell fabrication, and electrochemical evaluation
As battery technologies advance toward higher stability and energy density, the need for extensive cell-level testing across various component configurations becomes critical. To evaluate performance and understand the operating principles of batteries in laboratory scale, fabrication and evaluation of coin cells are essential processes. However, the conventional coin-cell assembly and testing processes require significant time and labor from researchers, posing challenges to high-throughput screening research. In this study, we introduce an Automated Li-ion BAttery Testing RObot SyStem (ALBATROSS), an automated system capable of electrolyte formulation, coin-cell assembly, and electrochemical evaluation. The system, integrated within a argon-filled glovebox, enables fully automated assembly and testing of up to 48 cells without researcher intervention. By incorporating custom-designed robot gripper and 3D-printed structures optimized for precise cell handling, ALBATROSS achieved high assembly reliability, yielding a relative standard deviation (RSD) of less than 1.2% in discharge capacity and a standard deviation of less than 3 Ω in EIS measurements for NCM811||Li half cells. Owing to its high reliability and automation capability, ALBATROSS allows for the acquisition of high-quality coin-cell datasets, which are expected to accelerate the development of next-generation electrolytes.
Efficient Generation of Smooth Paths with Curvature Guarantees by Mollification
Most path following and trajectory tracking algorithms in mobile robotics require the desired path or trajectory to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, these algorithms typically exclude continuous but non-differentiable paths, such as piecewise functions. Despite this exclusion, such paths provide convenient high-level inputs for describing robot missions or behavior. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either can produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is computationally efficient, enabling real-time implementation on microcontrollers and compatibility with standard trajectory tracking and path following algorithms.
MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion
Vision-language models enable the understanding and reasoning of complex traffic scenarios through multi-source information fusion, establishing it as a core technology for autonomous driving. However, existing vision-language models are constrained by the image understanding paradigm in 2D plane, which restricts their capability to perceive 3D spatial information and perform deep semantic fusion, resulting in suboptimal performance in complex autonomous driving environments. This study proposes MMDrive, an multimodal vision-language model framework that extends traditional image understanding to a generalized 3D scene understanding framework. MMDrive incorporates three complementary modalities, including occupancy maps, LiDAR point clouds, and textual scene descriptions. To this end, it introduces two novel components for adaptive cross-modal fusion and key information extraction. Specifically, the Text-oriented Multimodal Modulator dynamically weights the contributions of each modality based on the semantic cues in the question, guiding context-aware feature integration. The Cross-Modal Abstractor employs learnable abstract tokens to generate compact, cross-modal summaries that highlight key regions and essential semantics. Comprehensive evaluations on the DriveLM and NuScenes-QA benchmarks demonstrate that MMDrive achieves significant performance gains over existing vision-language models for autonomous driving, with a BLEU-4 score of 54.56 and METEOR of 41.78 on DriveLM, and an accuracy score of 62.7% on NuScenes-QA. MMDrive effectively breaks the traditional image-only understanding barrier, enabling robust multimodal reasoning in complex driving environments and providing a new foundation for interpretable autonomous driving scene understanding.
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
START: Traversing Sparse Footholds with Terrain Reconstruction
Traversing terrains with sparse footholds like legged animals presents a promising yet challenging task for quadruped robots, as it requires precise environmental perception and agile control to secure safe foot placement while maintaining dynamic stability. Model-based hierarchical controllers excel in laboratory settings, but suffer from limited generalization and overly conservative behaviors. End-to-end learning-based approaches unlock greater flexibility and adaptability, but existing state-of-the-art methods either rely on heightmaps that introduce noise and complex, costly pipelines, or implicitly infer terrain features from egocentric depth images, often missing accurate critical geometric cues and leading to inefficient learning and rigid gaits. To overcome these limitations, we propose START, a single-stage learning framework that enables agile, stable locomotion on highly sparse and randomized footholds. START leverages only low-cost onboard vision and proprioception to accurately reconstruct local terrain heightmap, providing an explicit intermediate representation to convey essential features relevant to sparse foothold regions. This supports comprehensive environmental understanding and precise terrain assessment, reducing exploration cost and accelerating skill acquisition. Experimental results demonstrate that START achieves zero-shot transfer across diverse real-world scenarios, showcasing superior adaptability, precise foothold placement, and robust locomotion.
OXE-AugE: A Large-Scale Robot Augmentation of OXE for Scaling Cross-Embodiment Policy Learning
Large and diverse datasets are needed for training generalist robot policies that have potential to control a variety of robot embodiments -- robot arm and gripper combinations -- across diverse tasks and environments. As re-collecting demonstrations and retraining for each new hardware platform are prohibitively costly, we show that existing robot data can be augmented for transfer and generalization. The Open X-Embodiment (OXE) dataset, which aggregates demonstrations from over 60 robot datasets, has been widely used as the foundation for training generalist policies. However, it is highly imbalanced: the top four robot types account for over 85\% of its real data, which risks overfitting to robot-scene combinations. We present AugE-Toolkit, a scalable robot augmentation pipeline, and OXE-AugE, a high-quality open-source dataset that augments OXE with 9 different robot embodiments. OXE-AugE provides over 4.4 million trajectories, more than triple the size of the original OXE. We conduct a systematic study of how scaling robot augmentation impacts cross-embodiment learning. Results suggest that augmenting datasets with diverse arms and grippers improves policy performance not only on the augmented robots, but also on unseen robots and even the original robots under distribution shifts. In physical experiments, we demonstrate that state-of-the-art generalist policies such as OpenVLA and $π_0$ benefit from fine-tuning on OXE-AugE, improving success rates by 24-45% on previously unseen robot-gripper combinations across four real-world manipulation tasks. Project website: https://OXE-AugE.github.io/.
Sequence of Expert: Boosting Imitation Planners for Autonomous Driving through Temporal Alternation
Imitation learning (IL) has emerged as a central paradigm in autonomous driving. While IL excels in matching expert behavior in open-loop settings by minimizing per-step prediction errors, its performance degrades unexpectedly in closed-loop due to the gradual accumulation of small, often imperceptible errors over time.Over successive planning cycles, these errors compound, potentially resulting in severe failures.Current research efforts predominantly rely on increasingly sophisticated network architectures or high-fidelity training datasets to enhance the robustness of IL planners against error accumulation, focusing on the state-level robustness at a single time point. However, autonomous driving is inherently a continuous-time process, and leveraging the temporal scale to enhance robustness may provide a new perspective for addressing this issue.To this end, we propose a method termed Sequence of Experts (SoE), a temporal alternation policy that enhances closed-loop performance without increasing model size or data requirements. Our experiments on large-scale autonomous driving benchmarks nuPlan demonstrate that SoE method consistently and significantly improves the performance of all the evaluated models, and achieves state-of-the-art performance.This module may provide a key and widely applicable support for improving the training efficiency of autonomous driving models.
PvP: Data-Efficient Humanoid Robot Learning with Proprioceptive-Privileged Contrastive Representations
Achieving efficient and robust whole-body control (WBC) is essential for enabling humanoid robots to perform complex tasks in dynamic environments. Despite the success of reinforcement learning (RL) in this domain, its sample inefficiency remains a significant challenge due to the intricate dynamics and partial observability of humanoid robots. To address this limitation, we propose PvP, a Proprioceptive-Privileged contrastive learning framework that leverages the intrinsic complementarity between proprioceptive and privileged states. PvP learns compact and task-relevant latent representations without requiring hand-crafted data augmentations, enabling faster and more stable policy learning. To support systematic evaluation, we develop SRL4Humanoid, the first unified and modular framework that provides high-quality implementations of representative state representation learning (SRL) methods for humanoid robot learning. Extensive experiments on the LimX Oli robot across velocity tracking and motion imitation tasks demonstrate that PvP significantly improves sample efficiency and final performance compared to baseline SRL methods. Our study further provides practical insights into integrating SRL with RL for humanoid WBC, offering valuable guidance for data-efficient humanoid robot learning.
comment: 13 pages, 12 figures
Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion
Diffusion models have recently emerged as powerful tools for robot motion planning by capturing the multi-modal distribution of feasible trajectories. However, their extension to multi-robot settings with flexible, language-conditioned task specifications remains limited. Furthermore, current diffusion-based approaches incur high computational cost during inference and struggle with generalization because they require explicit construction of environment representations and lack mechanisms for reasoning about geometric reachability. To address these limitations, we present Language-Conditioned Heat-Inspired Diffusion (LCHD), an end-to-end vision-based framework that generates language-conditioned, collision-free trajectories. LCHD integrates CLIP-based semantic priors with a collision-avoiding diffusion kernel serving as a physical inductive bias that enables the planner to interpret language commands strictly within the reachable workspace. This naturally handles out-of-distribution scenarios -- in terms of reachability -- by guiding robots toward accessible alternatives that match the semantic intent, while eliminating the need for explicit obstacle information at inference time. Extensive evaluations on diverse real-world-inspired maps, along with real-robot experiments, show that LCHD consistently outperforms prior diffusion-based planners in success rate, while reducing planning latency.
Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos
Vision-Language-Action (VLA) models provide a promising paradigm for robot learning by integrating visual perception with language-guided policy learning. However, most existing approaches rely on 2D visual inputs to perform actions in 3D physical environments, creating a significant gap between perception and action grounding. To bridge this gap, we propose a Spatial-Aware VLA Pretraining paradigm that performs explicit alignment between visual space and physical space during pretraining, enabling models to acquire 3D spatial understanding before robot policy learning. Starting from pretrained vision-language models, we leverage large-scale human demonstration videos to extract 3D visual and 3D action annotations, forming a new source of supervision that aligns 2D visual observations with 3D spatial reasoning. We instantiate this paradigm with VIPA-VLA, a dual-encoder architecture that incorporates a 3D visual encoder to augment semantic visual representations with 3D-aware features. When adapted to downstream robot tasks, VIPA-VLA achieves significantly improved grounding between 2D vision and 3D action, resulting in more robust and generalizable robotic policies.
Motus: A Unified Latent Action World Model
While a general embodied agent must function as a unified system, current methods are built on isolated models for understanding, world modeling, and control. This fragmentation prevents unifying multimodal generative capabilities and hinders learning from large-scale, heterogeneous data. In this paper, we propose Motus, a unified latent action world model that leverages existing general pretrained models and rich, sharable motion information. Motus introduces a Mixture-of-Transformer (MoT) architecture to integrate three experts (i.e., understanding, video generation, and action) and adopts a UniDiffuser-style scheduler to enable flexible switching between different modeling modes (i.e., world models, vision-language-action models, inverse dynamics models, video generation models, and video-action joint prediction models). Motus further leverages the optical flow to learn latent actions and adopts a recipe with three-phase training pipeline and six-layer data pyramid, thereby extracting pixel-level "delta action" and enabling large-scale action pretraining. Experiments show that Motus achieves superior performance against state-of-the-art methods in both simulation (a +15% improvement over X-VLA and a +45% improvement over Pi0.5) and real-world scenarios(improved by +11~48%), demonstrating unified modeling of all functionalities and priors significantly benefits downstream robotic tasks.
K-VARK: Kernelized Variance-Aware Residual Kalman Filter for Sensorless Force Estimation in Collaborative Robots
Reliable estimation of contact forces is crucial for ensuring safe and precise interaction of robots with unstructured environments. However, accurate sensorless force estimation remains challenging due to inherent modeling errors and complex residual dynamics and friction. To address this challenge, in this paper, we propose K-VARK (Kernelized Variance-Aware Residual Kalman filter), a novel approach that integrates a kernelized, probabilistic model of joint residual torques into an adaptive Kalman filter framework. Through Kernelized Movement Primitives trained on optimized excitation trajectories, K-VARK captures both the predictive mean and input-dependent heteroscedastic variance of residual torques, reflecting data variability and distance-to-training effects. These statistics inform a variance-aware virtual measurement update by augmenting the measurement noise covariance, while the process noise covariance adapts online via variational Bayesian optimization to handle dynamic disturbances. Experimental validation on a 6-DoF collaborative manipulator demonstrates that K-VARK achieves over 20% reduction in RMSE compared to state-of-the-art sensorless force estimation methods, yielding robust and accurate external force/torque estimation suitable for advanced tasks such as polishing and assembly.
Learning Terrain Aware Bipedal Locomotion via Reduced Dimensional Perceptual Representations
This work introduces a hierarchical strategy for terrain-aware bipedal locomotion that integrates reduced-dimensional perceptual representations to enhance reinforcement learning (RL)-based high-level (HL) policies for real-time gait generation. Unlike end-to-end approaches, our framework leverages latent terrain encodings via a Convolutional Variational Autoencoder (CNN-VAE) alongside reduced-order robot dynamics, optimizing the locomotion decision process with a compact state. We systematically analyze the impact of latent space dimensionality on learning efficiency and policy robustness. Additionally, we extend our method to be history-aware, incorporating sequences of recent terrain observations into the latent representation to improve robustness. To address real-world feasibility, we introduce a distillation method to learn the latent representation directly from depth camera images and provide preliminary hardware validation by comparing simulated and real sensor data. We further validate our framework using the high-fidelity Agility Robotics (AR) simulator, incorporating realistic sensor noise, state estimation, and actuator dynamics. The results confirm the robustness and adaptability of our method, underscoring its potential for hardware deployment.
Tackling Snow-Induced Challenges: Safe Autonomous Lane-Keeping with Robust Reinforcement Learning
This paper proposes two new algorithms for the lane keeping system (LKS) in autonomous vehicles (AVs) operating under snowy road conditions. These algorithms use deep reinforcement learning (DRL) to handle uncertainties and slippage. They include Action-Robust Recurrent Deep Deterministic Policy Gradient (AR-RDPG) and end-to-end Action-Robust convolutional neural network Attention Deterministic Policy Gradient (AR-CADPG), two action-robust approaches for decision-making. In the AR-RDPG method, within the perception layer, camera images are first denoised using multi-scale neural networks. Then, the centerline coefficients are extracted by a pre-trained deep convolutional neural network (DCNN). These coefficients, concatenated with the driving characteristics, are used as input to the control layer. The AR-CADPG method presents an end-to-end approach in which a convolutional neural network (CNN) and an attention mechanism are integrated within a DRL framework. Both methods are first trained in the CARLA simulator and validated under various snowy scenarios. Real-world experiments on a Jetson Nano-based autonomous vehicle confirm the feasibility and stability of the learned policies. Among the two models, the AR-CADPG approach demonstrates superior path-tracking accuracy and robustness, highlighting the effectiveness of combining temporal memory, adversarial resilience, and attention mechanisms in AVs.
SLIM-VDB: A Real-Time 3D Probabilistic Semantic Mapping Framework
This paper introduces SLIM-VDB, a new lightweight semantic mapping system with probabilistic semantic fusion for closed-set or open-set dictionaries. Advances in data structures from the computer graphics community, such as OpenVDB, have demonstrated significantly improved computational and memory efficiency in volumetric scene representation. Although OpenVDB has been used for geometric mapping in robotics applications, semantic mapping for scene understanding with OpenVDB remains unexplored. In addition, existing semantic mapping systems lack support for integrating both fixed-category and open-language label predictions within a single framework. In this paper, we propose a novel 3D semantic mapping system that leverages the OpenVDB data structure and integrates a unified Bayesian update framework for both closed- and open-set semantic fusion. Our proposed framework, SLIM-VDB, achieves significant reduction in both memory and integration times compared to current state-of-the-art semantic mapping approaches, while maintaining comparable mapping accuracy. An open-source C++ codebase with a Python interface is available at https://github.com/umfieldrobotics/slim-vdb.
comment: Accepted into R-AL
PrediFlow: A Flow-Based Prediction-Refinement Framework for Real-Time Human Motion Prediction in Human-Robot Collaboration
Stochastic human motion prediction is critical for safe and effective human-robot collaboration (HRC) in industrial remanufacturing, as it captures human motion uncertainties and multi-modal behaviors that deterministic methods cannot handle. While earlier works emphasize highly diverse predictions, they often generate unrealistic human motions. More recent methods focus on accuracy and real-time performance, yet there remains potential to improve prediction quality further without exceeding time budgets. Additionally, current research on stochastic human motion prediction in HRC typically considers human motion in isolation, neglecting the influence of robot motion on human behavior. To address these research gaps and enable real-time, realistic, and interaction-aware human motion prediction, we propose a novel prediction-refinement framework that integrates both human and robot observed motion to refine the initial predictions produced by a pretrained state-of-the-art predictor. The refinement module employs a Flow Matching structure to account for uncertainty. Experimental studies on the HRC desktop disassembly dataset demonstrate that our method significantly improves prediction accuracy while preserving the uncertainties and multi-modalities of human motion. Moreover, the total inference time of the proposed framework remains within the time budget, highlighting the effectiveness and practicality of our approach.
A Convex Obstacle Avoidance Formulation
Autonomous driving requires reliable collision avoidance in dynamic environments. Nonlinear Model Predictive Controllers (NMPCs) are suitable for this task, but struggle in time-critical scenarios requiring high frequency. To meet this demand, optimization problems are often simplified via linearization, narrowing the horizon window, or reduced temporal nodes, each compromising accuracy or reliability. This work presents the first general convex obstacle avoidance formulation, enabled by a novel approach to integrating logic. This facilitates the incorporation of an obstacle avoidance formulation into convex MPC schemes, enabling a convex optimization framework with substantially improved computational efficiency relative to conventional nonconvex methods. A key property of the formulation is that obstacle avoidance remains effective even when obstacles lie outside the prediction horizon, allowing shorter horizons for real-time deployment. In scenarios where nonconvex formulations are unavoidable, the proposed method meets or exceeds the performance of representative nonconvex alternatives. The method is evaluated in autonomous vehicle applications, where system dynamics are highly nonlinear.
comment: 18 pages, 17 figures
Constrained Policy Optimization via Sampling-Based Weight-Space Projection
Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy unknown, rollout-based safety constraints. We propose SCPO, a sampling-based weight-space projection method that enforces safety directly in parameter space without requiring gradient access to the constraint functions. Our approach constructs a local safe region by combining trajectory rollouts with smoothness bounds that relate parameter changes to shifts in safety metrics. Each gradient update is then projected via a convex SOCP, producing a safe first-order step. We establish a safe-by-induction guarantee: starting from any safe initialization, all intermediate policies remain safe given feasible projections. In constrained control settings with a stabilizing backup policy, our approach further ensures closed-loop stability and enables safe adaptation beyond the conservative backup. On regression with harmful supervision and a constrained double-integrator task with malicious expert, our approach consistently rejects unsafe updates, maintains feasibility throughout training, and achieves meaningful primal objective improvement.
comment: Submitted to IFAC World Congress 2026
SocialNav-MoE: A Mixture-of-Experts Vision Language Model for Socially Compliant Navigation with Reinforcement Fine-Tuning
For robots navigating in human-populated environments, safety and social compliance are equally critical, yet prior work has mostly emphasized safety. Socially compliant navigation that accounts for human comfort, social norms, and contextual appropriateness remains underexplored. Vision language models (VLMs) show promise for this task; however, large-scale models incur substantial computational overhead, leading to higher inference latency and energy consumption, which makes them unsuitable for real-time deployment on resource-constrained robotic platforms. To address this issue, we investigate the effectiveness of small VLM and propose SocialNav-MoE, an efficient Mixture-of-Experts vision language model for socially compliant navigation with reinforcement fine-tuning (RFT). We further introduce a semantic similarity reward (SSR) to effectively leverage RFT for enhancing the decision-making capabilities. Additionally, we study the effectiveness of different small language model types (Phi, Qwen, and StableLM), routing strategies, and vision encoders (CLIP vs. SigLIP, frozen vs. fine-tuned). Experiments on the SNEI dataset demonstrate that SocialNav-MoE achieves an excellent balance between navigation accuracy and efficiency. The proposed SSR function is more effective than hard-level and character-level rewards. Source code will be released upon acceptance.
Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames
Estimating the 6D pose of textureless objects from RGB images is an important problem in robotics. Due to appearance ambiguities, rotational symmetries, and severe occlusions, single-view based 6D pose estimators are still unable to handle a wide range of objects, motivating research towards multi-view pose estimation and next-best-view prediction that addresses these limitations. In this work, we propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images. Our approach is built upon a key idea: decoupling the 6D pose estimation into a two-step sequential process can greatly improve both accuracy and efficiency. First, we estimate the 3D translation of each object, resolving scale and depth ambiguities inherent to RGB images. These estimates are then used to simplify the subsequent task of determining the 3D orientation, which we achieve through canonical scale template matching. Building on this formulation, we then introduce an active perception strategy that predicts the next best camera viewpoint to capture an RGB image, effectively reducing object pose uncertainty and enhancing pose accuracy. We evaluate our method on the public ROBI and TOD datasets, as well as on our reconstructed transparent object dataset, T-ROBI. Under the same camera viewpoints, our multi-view pose estimation significantly outperforms state-of-the-art approaches. Furthermore, by leveraging our next-best-view strategy, our approach achieves high pose accuracy with fewer viewpoints than heuristic-based policies across all evaluated datasets. The accompanying video and T-ROBI dataset will be released on our project page: https://trailab.github.io/ActiveODPE.
Accelerated Rotation-Invariant Convolution for UAV Image Segmentation
Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segmentation architectures like U-Net rely on convolution operators that are not rotation-invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU-optimized rotation-invariant convolution framework that eliminates the traditional data-lowering (im2col) step required for matrix-multiplication-based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi-orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non-symmetric) rotation angles. Across extensive benchmarks, the proposed convolution achieves 20--55% faster training and 15--45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state-of-the-art rotation-invariant methods. In the eight-orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256\(\times\)256 inputs, and 32% speedup and 23% lower energy usage on 1024\(\times\)1024 inputs. Integrated into a U-Net segmentation model, the framework yields up to 6% improvement in accuracy over the non-rotation-aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation-invariant CNN frameworks.
RoboCOIN: An Open-Sourced Bimanual Robotic Data COllection for INtegrated Manipulation
Bimanual manipulation is essential for achieving human-like dexterity in robots, but the large-scale and diverse bimanual robot datasets remain scarce due to hardware heterogeneity across robotic platforms. To address the challenge, we present RoboCOIN, a comprehensive multi-embodiment bimanual manipulation dataset with over 180,000 demonstrations collected from 15 distinct robotic platforms. The dataset covers 16 scenarios, including residential, commercial, and working environments, with 421 tasks systematically organized by bimanual coordination patterns and object properties. Our key innovation is a hierarchical capability pyramid that provides multi-level annotations, spanning trajectory-level concepts, segment-level subtasks, and frame-level kinematics. We further develop CoRobot, a comprehensive processing framework featuring Robot Trajectory Markup Language (RTML) for quality assessment, automated annotation generation, and unified multi-embodiment management. Extensive experiments demonstrate the reliability and effectiveness of RoboCOIN in multi-embodiment bimanual learning, with significant performance improvements across various model architectures and robotic platforms. The complete dataset and framework are open-sourced and publicly available for further research purposes. Project website: https://FlagOpen.github.io/RoboCOIN/.
comment: Fixed typos
CT-UIO: Continuous-Time UWB-Inertial-Odometer Localization Using Non-Uniform B-spline with Fewer Anchors
Ultra-wideband (UWB) based positioning with fewer anchors has attracted significant research interest in recent years, especially under energy-constrained conditions. However, most existing methods rely on discrete-time representations and smoothness priors to infer a robot's motion states, which often struggle with ensuring multi-sensor data synchronization. In this article, we present a continuous-time UWB-Inertial-Odometer localization system (CT-UIO), utilizing a non-uniform B-spline framework with fewer anchors. Unlike traditional uniform B-spline-based continuous-time methods, we introduce an adaptive knot-span adjustment strategy for non-uniform continuous-time trajectory representation. This is accomplished by adjusting control points dynamically based on movement speed. To enable efficient fusion of {inertial measurement unit (IMU) and odometer data, we propose an improved extended Kalman filter (EKF) with innovation-based adaptive estimation to provide short-term accurate motion prior. Furthermore, to address the challenge of achieving a fully observable UWB localization system under few-anchor conditions, the virtual anchor (VA) generation method based on multiple hypotheses is proposed. At the backend, we propose an adaptive sliding window strategy for global trajectory estimation. Comprehensive experiments are conducted on three self-collected datasets with different UWB anchor numbers and motion modes. The result shows that the proposed CT-UIO achieves 0.403m, 0.150m, and 0.189m localization accuracy in corridor, exhibition hall, and office environments, yielding 17.2%, 26.1%, and 15.2% improvements compared with competing state-of-the-art UIO systems, respectively. The codebase and datasets of this work will be open-sourced at https://github.com/JasonSun623/CT-UIO.
comment: Accepted to IEEE Transactions on Mobile Computing (TMC). The codebase and datasets will be open-sourced at https://github.com/JasonSun623/CT-UIO
Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach
In this paper we present a neurosymbolic architecture for coupling language-guided visual reasoning with robot manipulation. A non-expert human user can prompt the robot using unconstrained natural language, providing a referring expression (REF), a question (VQA), or a grasp action instruction. The system tackles all cases in a task-agnostic fashion through the utilization of a shared library of primitive skills. Each primitive handles an independent sub-task, such as reasoning about visual attributes, spatial relation comprehension, logic and enumeration, as well as arm control. A language parser maps the input query to an executable program composed of such primitives, depending on the context. While some primitives are purely symbolic operations (e.g. counting), others are trainable neural functions (e.g. visual grounding), therefore marrying the interpretability and systematic generalization benefits of discrete symbolic approaches with the scalability and representational power of deep networks. We generate a 3D vision-and-language synthetic dataset of tabletop scenes in a simulation environment to train our approach and perform extensive evaluations in both synthetic and real-world scenes. Results showcase the benefits of our approach in terms of accuracy, sample-efficiency, and robustness to the user's vocabulary, while being transferable to real-world scenes with few-shot visual fine-tuning. Finally, we integrate our method with a robot framework and demonstrate how it can serve as an interpretable solution for an interactive object-picking task, achieving an average success rate of 80.2\%, both in simulation and with a real robot. We make supplementary material available in https://gtziafas.github.io/neurosymbolic-manipulation.
comment: Published in International Journal of Robotics Research (IJRR) (2025)
A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots
Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pre-training to learn reusable primitive skills and broad behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and regularly updated collection of BFM papers and projects to facilitate more subsequent research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.
comment: 19 pages, 10 figures
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To execute the desired locomotion commands more precisely, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks.
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
Teleoperation is essential for autonomous robot learning, especially in manipulation tasks that require human demonstrations or corrections. However, most existing systems only offer unilateral robot control and lack the ability to synchronize the robot's status with the teleoperation hardware, preventing real-time, flexible intervention. In this work, we introduce HACTS (Human-As-Copilot Teleoperation System), a novel system that establishes bilateral, real-time joint synchronization between a robot arm and teleoperation hardware. This simple yet effective feedback mechanism, akin to a steering wheel in autonomous vehicles, enables the human copilot to intervene seamlessly while collecting action-correction data for future learning. Implemented using 3D-printed components and low-cost, off-the-shelf motors, HACTS is both accessible and scalable. Our experiments show that HACTS significantly enhances performance in imitation learning (IL) and reinforcement learning (RL) tasks, boosting IL recovery capabilities and data efficiency, and facilitating human-in-the-loop RL. HACTS paves the way for more effective and interactive human-robot collaboration and data-collection, advancing the capabilities of robot manipulation.
DRO-EDL-MPC: Evidential Deep Learning-Based Distributionally Robust Model Predictive Control for Safe Autonomous Driving
Safety is a critical concern in motion planning for autonomous vehicles. Modern autonomous vehicles rely on neural network-based perception, but making control decisions based on these inference results poses significant safety risks due to inherent uncertainties. To address this challenge, we present a distributionally robust optimization (DRO) framework that accounts for both aleatoric and epistemic perception uncertainties using evidential deep learning (EDL). Our approach introduces a novel ambiguity set formulation based on evidential distributions that dynamically adjusts the conservativeness according to perception confidence levels. We integrate this uncertainty-aware constraint into model predictive control (MPC), proposing the DRO-EDL-MPC algorithm with computational tractability for autonomous driving applications. Validation in the CARLA simulator demonstrates that our approach maintains efficiency under high perception confidence while enforcing conservative constraints under low confidence.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes
We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to changing environments, DualMap meets the essential requirements for real-world robot navigation applications. Our proposed hybrid segmentation frontend and object-level status check eliminate the costly 3D object merging required by prior methods, enabling efficient online scene mapping. The dual-map representation combines a global abstract map for high-level candidate selection with a local concrete map for precise goal-reaching, effectively managing and updating dynamic changes in the environment. Through extensive experiments in both simulation and real-world scenarios, we demonstrate state-of-the-art performance in 3D open-vocabulary segmentation, efficient scene mapping, and online language-guided navigation. Project page: https://eku127.github.io/DualMap/
comment: 14 pages, 14 figures. Published in IEEE Robotics and Automation Letters (RA-L), 2025. Code: https://github.com/Eku127/DualMap Project page: https://eku127.github.io/DualMap/
MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extended Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on multimodal reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Humans continuously infer the states, goals, and behaviors of others by perceiving their surroundings in dynamic, real-world social interactions. However, most Theory of Mind (ToM) benchmarks only evaluate static, text-based scenarios, which have a significant gap compared to real interactions. We propose the SoMi-ToM benchmark, designed to evaluate multi-perspective ToM in embodied multi-agent complex social interactions. This benchmark is based on rich multimodal interaction data generated by the interaction environment SoMi, covering diverse crafting goals and social relationships. Our framework supports multi-level evaluation: (1) first-person evaluation provides multimodal (visual, dialogue, action, etc.) input from a first-person perspective during a task for real-time state inference, (2) third-person evaluation provides complete third-person perspective video and text records after a task for goal and behavior inference. This evaluation method allows for a more comprehensive examination of a model's ToM capabilities from both the subjective immediate experience and the objective global observation. We constructed a challenging dataset containing 35 third-person perspective videos, 363 first-person perspective images, and 1225 expert-annotated multiple-choice questions (three options). On this dataset, we systematically evaluated the performance of human subjects and several state-of-the-art large vision-language models (LVLMs). The results show that LVLMs perform significantly worse than humans on SoMi-ToM: the average accuracy gap between humans and models is 40.1% in first-person evaluation and 26.4% in third-person evaluation. This indicates that future LVLMs need to further improve their ToM capabilities in embodied, complex social interactions.
comment: 24 pages, 6 figures
MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization
Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and increases the likelihood of incorrect associations, which in turn can cause significant errors in the estimated pose. Thus, in this letter, we propose a multi-label likelihood-based semantic graph matching framework for object-level global localization. The key idea is to exploit multi-label graph representations, rather than single-label alternatives, to capture and leverage the inherent semantic context of object observations. Based on these representations, our approach enhances semantic correspondence across graphs by combining the likelihood of each node with the maximum likelihood of its neighbors via context-aware likelihood propagation. For rigorous validation, data association and pose estimation performance are evaluated under both closed-set and open-set detection configurations. In addition, we demonstrate the scalability of our approach to large-vocabulary object categories in both real-world indoor scenes and synthetic environments.
comment: Accepted in IEEE Robotics and Automation Letters (2025)
ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning
Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studies on acceleration methods mainly rely on statistical or heuristic cues that ignore task semantics and can fail across diverse manipulation settings. We present ESPADA, a semantic and spatially aware framework that segments demonstrations using a VLM-LLM pipeline with 3D gripper-object relations, enabling aggressive downsampling only in non-critical segments while preserving precision-critical phases, without requiring extra data or architectural modifications, or any form of retraining. To scale from a single annotated episode to the full dataset, ESPADA propagates segment labels via Dynamic Time Warping (DTW) on dynamics-only features. Across both simulation and real-world experiments with ACT and DP baselines, ESPADA achieves approximately a 2x speed-up while maintaining success rates, narrowing the gap between human demonstrations and efficient robot control.
comment: project page: https://project-espada.github.io/espada/
Multiagent Systems
Fair Coordination in Strategic Scheduling
We consider a scheduling problem of strategic agents representing jobs of different weights. Each agent has to decide on one of a finite set of identical machines to get their job processed. In contrast to the common and exclusive focus on makespan minimization, we want the outcome to be fair under strategic considerations of the agents. Two natural properties are credibility, which ensures that the assignment is a Nash equilibrium and equality, requiring that agents with equal-weight jobs are assigned to machines of equal load. We combine these two with a hierarchy of fairness properties based on envy-freeness together with several relaxations based on the idea that envy seems more justified towards agents with a higher weight. We present a complete complexity landscape for satisfiability and decision versions of these properties, alone or in combination, and study them as structural constraints under makespan optimization. For our positive results, we develop a unified algorithmic approach, where we achieve different properties by fine-tuning key subroutines.
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows
We introduce a finance & accounting benchmark (Finch) for evaluating AI agents on real-world, enterprise-grade professional workflows -- interleaving data entry, structuring, formatting, web search, cross-file retrieval, calculation, modeling, validation, translation, visualization, and reporting. Finch is sourced from authentic enterprise workspaces at Enron (15,000 spreadsheets and 500,000 emails from 150 employees) and other financial institutions, preserving in-the-wild messiness across multimodal artifacts (text, tables, formulas, charts, code, and images) and spanning diverse domains such as budgeting, trading, and asset management. We propose a workflow construction process that combines LLM-assisted discovery with expert annotation: (1) LLM-assisted, expert-verified derivation of workflows from real-world email threads and version histories of spreadsheet files, and (2) meticulous expert annotation for workflows, requiring over 700 hours of domain-expert effort. This yields 172 composite workflows with 384 tasks, involving 1,710 spreadsheets with 27 million cells, along with PDFs and other artifacts, capturing the intrinsically messy, long-horizon, knowledge-intensive, and collaborative nature of real-world enterprise work. We conduct both human and automated evaluations of frontier AI systems including GPT 5.1, Claude Sonnet 4.5, Gemini 3 Pro, Grok 4, and Qwen 3 Max, and GPT 5.1 Pro spends 48 hours in total yet passes only 38.4% of workflows, while Claude Sonnet 4.5 passes just 25.0%. Comprehensive case studies further surface the challenges that real-world enterprise workflows pose for AI agents.
The Optimal Control Algorithm of Connected and Automated Vehicles at Roundabouts with Communication Delay
Connected and automated vehicles (CAVs) rely on wireless communication to exchange state information for distributed control, making communication delays a critical factor that can affect vehicle motion and degrade control performance, particularly in high-speed scenarios. To address these challenges in the complex environment of roundabout intersections, this paper proposes a roundabout control algorithm, which takes into account the uncertainty of interactive information caused by time delays. First, to maintain the required distance between the current vehicle and its preceding and following vehicles, conflicting vehicles are identified based on the time-to-collision (TTC) in the conflict zone. To fully consider communication performance, a vehicle motion model incorporating time delays is established. According to the distributed model predictive control (DMPC) mechanism, the vehicle motion control that satisfies the roundabout constraints is determined. Second, by scheduling the sequence of vehicles entering the roundabout, a multiscale optimization objective is developed by integrating vehicle motion indicators and roundabout system indicators. Traffic density and travel time are embedded into the optimization problem to guide vehicles to enter the roundabout safely and stably. Through a variety of simulation experiments, the effectiveness of the proposed control algorithm is verified by comparing its performance with that of multiple control algorithms under different autonomous vehicle penetration rates and heavy traffic load scenarios.
comment: 9 pages, 4 figures
Quantigence: A Multi-Agent AI Framework for Quantum Security Research
Cryptographically Relevant Quantum Computers (CRQCs) pose a structural threat to the global digital economy. Algorithms like Shor's factoring and Grover's search threaten to dismantle the public-key infrastructure (PKI) securing sovereign communications and financial transactions. While the timeline for fault-tolerant CRQCs remains probabilistic, the "Store-Now, Decrypt-Later" (SNDL) model necessitates immediate migration to Post-Quantum Cryptography (PQC). This transition is hindered by the velocity of research, evolving NIST standards, and heterogeneous deployment environments. To address this, we present Quantigence, a theory-driven multi-agent AI framework for structured quantum-security analysis. Quantigence decomposes research objectives into specialized roles - Cryptographic Analyst, Threat Modeler, Standards Specialist, and Risk Assessor - coordinated by a supervisory agent. Using "cognitive parallelism," agents reason independently to maintain context purity while execution is serialized on resource-constrained hardware (e.g., NVIDIA RTX 2060). The framework integrates external knowledge via the Model Context Protocol (MCP) and prioritizes vulnerabilities using the Quantum-Adjusted Risk Score (QARS), a formal extension of Mosca's Theorem. Empirical validation shows Quantigence achieves a 67% reduction in research turnaround time and superior literature coverage compared to manual workflows, democratizing access to high-fidelity quantum risk assessment.
comment: 13 pages, 2 figures
Multi-Agent Collaborative Framework for Intelligent IT Operations: An AOI System with Context-Aware Compression and Dynamic Task Scheduling
The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, and (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on both synthetic and real-world benchmarks demonstrate that AOI effectively mitigates information overload, achieving a 72.4% context compression ratio while preserving 92.8% of critical information and significantly enhances operational efficiency, attaining a 94.2% task success rate and reducing the Mean Time to Repair (MTTR) by 34.4% compared to the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.
Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery
Artificial intelligence is reshaping scientific exploration, but most methods automate procedural tasks without engaging in scientific reasoning, limiting autonomy in discovery. We introduce Materials Agents for Simulation and Theory in Electronic-structure Reasoning (MASTER), an active learning framework where large language models autonomously design, execute, and interpret atomistic simulations. In MASTER, a multimodal system translates natural language into density functional theory workflows, while higher-level reasoning agents guide discovery through a hierarchy of strategies, including a single agent baseline and three multi-agent approaches: peer review, triage-ranking, and triage-forms. Across two chemical applications, CO adsorption on Cu-surface transition metal (M) adatoms and on M-N-C catalysts, reasoning-driven exploration reduces required atomistic simulations by up to 90% relative to trial-and-error selection. Reasoning trajectories reveal chemically grounded decisions that cannot be explained by stochastic sampling or semantic bias. Altogether, multi-agent collaboration accelerates materials discovery and marks a new paradigm for autonomous scientific exploration.
comment: Keywords: Multi-agent reasoning; Large language models; Active learning; AI-driven simulation; Materials discovery; Density functional theory; Surface chemistry
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery
Large language models (LLMs) are increasingly used to evolve programs and multi-agent systems, yet most existing approaches rely on overwrite-based mutations that maintain only a single candidate at a time. Such methods discard useful variants, suffer from destructive edits, and explore a brittle search space prone to structural failure. We introduce EvoLattice, a framework that represents an entire population of candidate programs or agent behaviors within a single directed acyclic graph. Each node stores multiple persistent alternatives, and every valid path through the graph defines a distinct executable candidate, yielding a large combinatorial search space without duplicating structure. EvoLattice enables fine-grained alternative-level evaluation by scoring each alternative across all paths in which it appears, producing statistics that reveal how local design choices affect global performance. These statistics provide a dense, data-driven feedback signal for LLM-guided mutation, recombination, and pruning, while preserving successful components. Structural correctness is guaranteed by a deterministic self-repair mechanism that enforces acyclicity and dependency consistency independently of the LLM. EvoLattice naturally extends to agent evolution by interpreting alternatives as prompt fragments or sub-agent behaviors. Across program synthesis (proxy and optimizer meta-learning), EvoLattice yields more stable evolution, greater expressivity, and stronger improvement trajectories than prior LLM-guided methods. The resulting dynamics resemble quality-diversity optimization, emerging implicitly from EvoLattice's internal multi-alternative representation rather than an explicit external archive.
MALLM: Multi-Agent Large Language Models Framework EMNLP 2025
Multi-agent debate (MAD) has demonstrated the ability to augment collective intelligence by scaling test-time compute and leveraging expertise. Current frameworks for multi-agent debate are often designed towards tool use, lack integrated evaluation, or provide limited configurability of agent personas, response generators, discussion paradigms, and decision protocols. We introduce MALLM (Multi-Agent Large Language Models), an open-source framework that enables systematic analysis of MAD components. MALLM offers more than 144 unique configurations of MAD, including (1) agent personas (e.g., Expert, Personality), (2) response generators (e.g., Critical, Reasoning), (3) discussion paradigms (e.g., Memory, Relay), and (4) decision protocols (e.g., Voting, Consensus). MALLM uses simple configuration files to define a debate. Furthermore, MALLM can load any textual Hugging Face dataset (e.g., MMLU-Pro, WinoGrande) and provides an evaluation pipeline for easy comparison of MAD configurations. MALLM enables researchers to systematically configure, run, and evaluate debates for their problems, facilitating the understanding of the components and their interplay.
comment: Accepted at EMNLP 2025 (Demo)
AI Agents with Decentralized Identifiers and Verifiable Credentials
A fundamental limitation of current LLM-based AI agents is their inability to build differentiated trust among each other at the onset of an agent-to-agent dialogue. However, autonomous and interoperable trust establishment becomes essential once agents start to operate beyond isolated environments and engage in dialogues across individual or organizational boundaries. A promising way to fill this gap in Agentic AI is to equip agents with long-lived digital identities and introduce tamper-proof and flexible identity-bound attestations of agents, provisioned by commonly trusted third parties and designed for cross-domain verifiability. This article presents a conceptual framework and a prototypical multi-agent system, where each agent is endowed with a self-sovereign digital identity. It combines a unique and ledger-anchored W3C Decentralized Identifier (DID) of an agent with a set of third-party issued W3C Verifiable Credentials (VCs). This enables agents at the start of a dialog to prove ownership of their self-controlled DIDs for authentication purposes and to establish various cross-domain trust relationships through the spontaneous exchange of their self-hosted DID-bound VCs. A comprehensive evaluation of the prototypical implementation demonstrates technical feasibility but also reveals limitations once an agent's LLM is in sole charge to control the respective security procedures.
comment: Accepted for presentation at the 18th International Conference on Agents and Artificial Intelligence 2026
Unsupervised Acquisition of Discrete Grammatical Categories
This article presents experiments performed using a computational laboratory environment for language acquisition experiments. It implements a multi-agent system consisting of two agents: an adult language model and a daughter language model that aims to learn the mother language. Crucially, the daughter agent does not have access to the internal knowledge of the mother language model but only to the language exemplars the mother agent generates. These experiments illustrate how this system can be used to acquire abstract grammatical knowledge. We demonstrate how statistical analyses of patterns in the input data corresponding to grammatical categories yield discrete grammatical rules. These rules are subsequently added to the grammatical knowledge of the daughter language model. To this end, hierarchical agglomerative cluster analysis was applied to the utterances consecutively generated by the mother language model. It is argued that this procedure can be used to acquire structures resembling grammatical categories proposed by linguists for natural languages. Thus, it is established that non-trivial grammatical knowledge has been acquired. Moreover, the parameter configuration of this computational laboratory environment determined using training data generated by the mother language model is validated in a second experiment with a test set similarly resulting in the acquisition of non-trivial categories.
comment: 34 pages, 3 figures, 7 tables. Related work: Shakouri et al., arXiv:2512.02195
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transformer with linear self-attention (LSA) trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. We then study the coupled dynamics when two such LSA agents alternately update from each other's outputs under potentially misaligned fixed objectives. Within this framework, we characterize the generation dynamics and show that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. We further contrast this fixed objective regime with an adaptive multi-agent setting, wherein a helper agent updates a turn-based objective to implement a Newton-like step for the main agent, eliminating the plateau and accelerating its convergence. Experiments with trained LSA agents, as well as black-box GPT-5-mini runs on in-context linear regression tasks, are consistent with our theoretical predictions within this simplified setting. We view our framework as a mechanistic framework that links prompt geometry and objective misalignment to stability, bias, and robustness, and as a stepping stone toward analyzing more realistic multi-agent LLM systems.
Systems and Control (EESS)
Bidding Aggregated Flexibility in European Electricity Auctions
Bidding flexibility in day-ahead and intraday auctions would enable decentralized flexible resources, such as electric vehicles and heat pumps, to efficiently align their consumption with the intermittent generation of renewable energy. However, because these resources are individually too small to participate in those auctions directly, an aggregator (e.g., a utility) must act on their behalf. This requires aggregating many decentralized resources, which is a computationally challenging task. In this paper, we propose a computationally efficient and highly accurate method that is readily applicable to European day-ahead and intraday auctions. Distinct from existing methods, we aggregate only economically relevant power profiles, identified through price forecasts. The resulting flexibility is then conveyed to the market operator via exclusive groups of block bids. We evaluate our method for a utility serving the Swiss town of Losone, where flexibility from multiple heat pumps distributed across the grid must be aggregated and bid in the Swiss day-ahead auction. Results show that our method aggregates accurately, achieving 98% of the theoretically possible cost savings. This aggregation accuracy remains stable even as the number of heat pumps increases, while computation time grows only linearly, demonstrating strong scalability.
Competent Discrete Time Modeling For analogue controlled PWM Converter Considering State-Feedback
Ever since R.D.Middlebrook proposed the state space averaging notion. The small signal model has been widely used as a design tool to tune control parameters. As Moore's law is continuing and the AI chip's high demand for power consumption and dynamic response, the control bandwidth needs to be boosted. However, the average model has two basic assumptions: the low-frequency assumption, the small ripple assumption. In high-bandwidth design, these two assumptions are violated. In order to solve this, various methods have been proposed. This paper gives a comprehensive overview of the existing small signal model for PWM converters from the following perspectives: 1. model fidelity, 2. analytical tractability. 3. complexity of the derivation process and result 4.generality.
An $H_2$-norm approach to performance analysis of networked control systems under multiplicative routing transformations
This paper investigates the performance of networked control systems subject to multiplicative routing transformations that alter measurement pathways without directly injecting signals. Such transformations, arising from faults or adversarial actions, modify the feedback structure and can degrade performance while remaining stealthy. An $H_2$-norm framework is proposed to quantify the impact of these transformations by evaluating the ratio between the steady-state energies of performance and residual outputs. Equivalent linear matrix inequality (LMI) formulations are derived for computational assessment, and analytical upper bounds are established to estimate the worst-case degradation. The results provide structural insight into how routing manipulations influence closed-loop behavior and reveal conditions for stealthy multiplicative attacks.
comment: This work has been submitted to the IEEE for possible publication
Identification of Technical Design Constraints and Considerations for Transmission Grid Expansion Planning Projects
The large-scale deployment of renewable energy sources, particularly offshore wind, requires large-scale transmission grid expansion projects to transmit the produced low-carbon power to the main demand centers. However, the planning and design of such complex projects currently lack a transparent and systematic process that system operators can follow when considering such investments in their grids. This paper identifies and classifies the main technical design constraints and considerations relevant to the planning of transmission grid expansion projects, and more specifically, electrical energy hubs. Seven key areas of interest are identified, namely network integration, HVDC technologies, costs (CAPEX, OPEX, and space requirements), electricity market design, future proofness and modular expandability, reliability-availability-maintainability, and sustainability. Each area of interest is analyzed in terms of its technical and operational relevance, with technical design constraints and considerations derived from such analysis. In addition, a hierarchical classification of the identified constraints and considerations (and therefore areas of interest) is introduced, distinguishing them between three criticality classes, namely hard constraints, main drivers, and key considerations. The dependencies between the different areas are discussed, too. Therefore, this work provides system operators and policymakers with a structured basis to support a transparent planning methodology with clear decision hierarchies for investments in transmission grid expansion projects.
Impact analysis of hidden faults in nonlinear control systems using output-to-output gain
Networked control systems (NCSs) are vulnerable to faults and hidden malfunctions in communication channels that can degrade performance or even destabilize the closed loop. Classical metrics in robust control and fault detection typically treat impact and detectability separately, whereas the output-to-output gain (OOG) provides a unified measure of both. While existing results have been limited to linear systems, this paper extends the OOG framework to nonlinear NCSs with quadratically constrained nonlinearities, considering false-injection attacks that can also manipulate sensor measurements through nonlinear transformations. Specifically, we provide computationally efficient linear matrix inequality conditions and complementary frequency-domain tests that yield explicit upper bounds on the OOG of this class of nonlinear systems. Furthermore, we derive frequency-domain conditions for absolute stability of closed-loop systems, generalizing the Yakubovich quadratic criterion.
comment: This work has been submitted to IFAC World Congress 2026 for possible publication
Real-Time AI-Driven Milling Digital Twin Towards Extreme Low-Latency
Digital twin (DT) enables smart manufacturing by leveraging real-time data, AI models, and intelligent control systems. This paper presents a state-of-the-art analysis on the emerging field of DTs in the context of milling. The critical aspects of DT are explored through the lens of virtual models of physical milling, data flow from physical milling to virtual model, and feedback from virtual model to physical milling. Live data streaming protocols and virtual modeling methods are highlighted. A case study showcases the transformative capability of a real-time machine learning-driven live DT of tool-work contact in a milling process. Future research directions are outlined to achieve the goals of Industry 4.0 and beyond.
Evaluating the Navigation Capabilities of a Modified COAST Guidewire Robot in an Anatomical Phantom Model
To address the issues that arise due to the manual navigation of guidewires in endovascular interventions, research in medical robotics has taken a strong interest in developing robotically steerable guidewires, which offer the possibility of enhanced maneuverability and navigation, as the tip of the guidewire can be actively steered. The COaxially Aligned STeerable (COAST) guidewire robot has the ability to generate a wide variety of motions including bending motion with different bending lengths, follow-the-leader motion, and feedforward motion. In our past studies, we have explored different designs of the COAST guidewire robot and developed modeling, control, and sensing strategies for the COAST guidewire robot. In this study, the performance of a modified COAST guidewire robot is evaluated by conducting navigation experiments in an anatomical phantom model with pulsatile flow. The modified COAST guidewire robot is a simplified version of the COAST guidewire robot and consists of two tubes as opposed to three tubes. Through this study, we demonstrate the effectiveness of the modified COAST guidewire robot in navigating the tortuous phantom vasculature.
comment: Presented at the 14th Conference on New Technologies for Computer and Robot Assisted Surgery (CRAS 2025)
Balancing Timeliness and Privacy in Discrete-Time Updating Systems
We study the trade-off between Age of Information (AoI) and maximal leakage (MaxL) in discrete-time status updating systems. A source generates time-stamped update packets that are processed by a server that delivers them to a monitor. An adversary, who eavesdrops on the server-monitor link, wishes to infer the timing of the underlying source update sequence. The server must balance the timeliness of the status information at the monitor against the timing information leaked to the adversary. We consider a model with Bernoulli source updates under two classes of Last-Come-First-Served (LCFS) service policies: (1) Coupled policies that tie the server's deliveries to the update arrival process in a preemptive queue; (2) Decoupled (dumping) policies in which the server transmits its freshest update according to a schedule that is independent of the update arrivals. For each class, we characterize the structure of the optimal policy that minimizes AoI for a given MaxL rate. Our analysis reveals that decoupled dumping policies offer a superior age-leakage trade-off to coupled policies. When subject to a MaxL constraint, we prove that the optimal dumping strategy is achieved by dithering between two adjacent deterministic dump periods.
QoS-Aware State-Augmented Learnable Framework for 5G NR-U/Wi-Fi Coexistence: Impact of Parameter Selection and Enhanced Collision Resolution
Unlicensed spectrum supports diverse traffic with stringent Quality-of-Service (QoS) requirements. In NR-U/Wi-Fi coexistence,the values of MAC parameters critically influence delay, collision behavior, and airtime fairness and efficiency. In this paper, we investigate the impact of (i) cost scaling and violation modeling, (ii) choice of MAC parameters, and (iii) an enhanced collision resolution scheme for the Listen-Before-Talk (LBT) mechanism on the performance of a state-augmented constrained reinforcement learning controller for NR-U/Wi-Fi coexistence. Coexistence control is formulated as a constrained Markov decision process with an explicit delay constraint for high-priority traffic and fairness as the optimization goal. Our simulation results show three key findings: (1) signed, threshold-invariant cost scaling with temporal smoothing stabilizes learning and strengthens long-term constraint adherence; (2) use of the contention window parameter for control provides smoother adaptation and better delay compliance than other MAC parameters; and (3) enhanced LBT significantly reduces collisions and improves airtime efficiency. These findings provide practical insights for achieving robust, QoS-aware coexistence control.
comment: 10 pages, 6 figures, 2 tables
Neural Control Barrier Functions for Signal Temporal Logic Specifications with Input Constraints
Signal Temporal Logic (STL) provides a powerful framework to describe complex tasks involving temporal and logical behavior in dynamical systems. In this work, we address the problem of synthesizing controllers for continuous-time systems under STL specifications with input constraints. We propose a neural network-based framework for synthesizing time-varying control barrier functions (TVCBF) and their corresponding controllers for systems to fulfill STL specifications while respecting input constraints. We formulate barrier conditions incorporating the spatial and temporal logic of the given STL specification. Additionally, we introduce a validity condition to provide formal safety guarantees across the entire state space. Finally, we demonstrate the effectiveness of the proposed approach through several simulation studies considering different STL tasks.
A Multi-Worker Assembly Line Rebalancing with Spatial and Ergonomic Considerations
This work addresses the Assembly Line Rebalancing Problem in manual assembly systems where multiple workers operate in parallel within the same station - an industrially relevant scenario that remains insufficiently explored in the literature. A multi-objective optimization model is proposed that incorporates task reassignment, worker allocation, ergonomic evaluation, and explicit spatial feasibility through work-area constraints. The formulation minimizes deviations from the current configuration while promoting balanced workload and ergonomic conditions among workers. Computational experiments on synthetic problem instances demonstrate that the model consistently generates feasible and human-centered reconfigurations across varying cycle-time conditions, highlighting its potential as a decision-support tool for industrial rebalancing in flexible production environments.
Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation
This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving computational speed on parallel architectures. The MAP estimation problem is reformulated as a continuous-time optimal control problem based on the Onsager-Machlup functional. This reformulation enables the use of a previously proposed parallel-in-time solution for optimal control problems, which we adapt to the current problem. The structure of the resulting optimal control problem admits a parallel solution based on parallel associative scan algorithms. In the linear Gaussian special case, it yields a parallel Kalman-Bucy filter and a parallel continuous-time Rauch-Tung-Striebel smoother. These linear computational methods are further extended to nonlinear continuous-time state-space models through Taylor expansions. We also present the corresponding parallel two-filter smoother. The graphics processing unit (GPU) experiments on linear and nonlinear models demonstrate that the proposed framework achieves a significant speedup in computations while maintaining the accuracy of sequential algorithms.
An End-to-End Neural Network Transceiver Design for OFDM System with FPGA-Accelerated Implementation
The evolution toward sixth-generation (6G) wireless networks demands high-performance transceiver architectures capable of handling complex and dynamic environments. Conventional orthogonal frequency-division multiplexing (OFDM) receivers rely on cascaded discrete Fourier transform (DFT) and demodulation blocks, which are prone to inter-stage error propagation and suboptimal global performance. In this work, we propose two neural network (NN) models DFT-Net and Demodulation-Net (Demod-Net) to jointly replace the IDFT/DFT and demodulation modules in an OFDM transceiver. The models are trained end-to-end (E2E) to minimize bit error rate (BER) while preserving operator equivalence for hybrid deployment. A customized DFT-Demodulation Net Accelerator (DDNA) is further developed to efficiently map the proposed networks onto field-programmable gate array (FPGA) platforms. Leveraging fine-grained pipelining and block matrix operations, DDNA achieves high throughput and flexibility under stringent latency constraints. Experimental results show that the DL-based transceiver consistently outperforms the conventional OFDM system across multiple modulation schemes. With only a modest increase in hardware resource usage, it achieves approximately 1.5 dB BER gain and up to 66\% lower execution time.
Measurement of Material Volume Fractions in a Microwave Resonant Cavity Sensor Using Convolutional Neural Network
A non-destructive, real-time method for estimating the volume fraction of a dielectric mixture inside a resonant cavity is presented. A convolutional neural network (CNN)-based approach is used to estimate the fractional composition of two-phase dielectric mixtures inside a resonant cavity using scattering parameter (S-parameter) measurements. A rectangular cavity sensor with a strip feed structure is characterized using a vector network analyzer (VNA) from 0.01--20~GHz. The CNN is trained using both simulated and experimentally measured S-parameters and achieves high predictive accuracy even without de-embedding or filtering, demonstrating robustness to measurement imperfections. The simulation results achieve a coefficient of determination ($R^2$)=0.99 using $k$-fold cross-validation, while the experimental model using raw data achieves an $R^2=0.94$ with a mean absolute error (MAE) below 6\%. Data augmentation further improves the accuracy of the experimental prediction to above $R^2=0.998$ (MAE$<$0.72\%). The proposed method enables rapid, non-destructive, accurate, low-cost, and real-time estimation of material fractions, illustrating strong potential for sensing applications in microwave material characterization.
Plant Equivalent Controller Realizations for Attack-Resilient Cyber-Physical Systems
As cyber-physical systems (CPSs) become more dependent on data and communication networks, their vulnerability to false data injection (FDI) attacks has raised significant concerns. Among these, stealthy attacks, those that evade conventional detection mechanisms, pose a critical threat to closed-loop performance. This paper introduces a controller-oriented method to enhance CPS resiliency against such attacks without compromising nominal closed-loop behavior. Specifically, we propose the concept of plant equivalent controller (PEC) realizations, representing a class of dynamic output-feedback controllers that preserve the input-output behavior of a given base controller while exhibiting distinct robustness properties in the presence of disturbances and sensor attacks. To quantify and improve robustness, we employ reachable set analysis to assess the impact of stealthy attacks on the closed-loop dynamics. Building on this analysis, we provide mathematical tools (in terms of linear matrix inequalities) to synthesize the optimal PEC realization that minimizes the reachable set under peak-bounded disturbances. The proposed framework thus provides systematic analysis and synthesis tools to enhance the attack resilience of CPSs while maintaining the desired nominal performance. The effectiveness of the approach is demonstrated on the quadruple-tank process subject to stealthy sensor attacks.
Differentiable Material Point Method for the Control of Deformable Objects
Controlling the deformation of flexible objects is challenging due to their non-linear dynamics and high-dimensional configuration space. This work presents a differentiable Material Point Method (MPM) simulator targeted at control applications. We exploit the differentiability of the simulator to optimize a control trajectory in an active damping problem for a hyperelastic rope. The simulator effectively minimizes the kinetic energy of the rope around 2$\times$ faster than a baseline MPPI method and to a 20% lower energy level, while using about 3% of the computation time.
comment: 7 Pages, 4 Figures, 1 Table
Efficient Generation of Smooth Paths with Curvature Guarantees by Mollification
Most path following and trajectory tracking algorithms in mobile robotics require the desired path or trajectory to be defined by at least twice continuously differentiable functions to guarantee key properties such as global convergence, especially for nonholonomic robots like unicycles with speed constraints. Consequently, these algorithms typically exclude continuous but non-differentiable paths, such as piecewise functions. Despite this exclusion, such paths provide convenient high-level inputs for describing robot missions or behavior. While techniques such as spline interpolation or optimization-based methods are commonly used to smooth non-differentiable paths or create feasible ones from sequences of waypoints, they either can produce unnecessarily complex trajectories or are computationally expensive. In this work, we present a method to regularize non-differentiable functions and generate feasible paths through mollification. Specifically, we approximate an arbitrary path with a differentiable function that can converge to it with arbitrary precision. Additionally, we provide a systematic method for bounding the curvature of generated paths, which we demonstrate by applying it to paths resulting from linking a sequence of waypoints with segments. The proposed approach is computationally efficient, enabling real-time implementation on microcontrollers and compatibility with standard trajectory tracking and path following algorithms.
Iterative Tuning of Nonlinear Model Predictive Control for Robotic Manufacturing Tasks
Manufacturing processes are often perturbed by drifts in the environment and wear in the system, requiring control re-tuning even in the presence of repetitive operations. This paper presents an iterative learning framework for automatic tuning of Nonlinear Model Predictive Control (NMPC) weighting matrices based on task-level performance feedback. Inspired by norm-optimal Iterative Learning Control (ILC), the proposed method adaptively adjusts NMPC weights Q and R across task repetitions to minimize key performance indicators (KPIs) related to tracking accuracy, control effort, and saturation. Unlike gradient-based approaches that require differentiating through the NMPC solver, we construct an empirical sensitivity matrix, enabling structured weight updates without analytic derivatives. The framework is validated through simulation on a UR10e robot performing carbon fiber winding on a tetrahedral core. Results demonstrate that the proposed approach converges to near-optimal tracking performance (RMSE within 0.3% of offline Bayesian Optimization (BO)) in just 4 online repetitions, compared to 100 offline evaluations required by BO algorithm. The method offers a practical solution for adaptive NMPC tuning in repetitive robotic tasks, combining the precision of carefully optimized controllers with the flexibility of online adaptation.
On the Complementarity of Shared Electric Mobility and Renewable Energy Communities
Driven by the ongoing energy transition, shared mobility service providers are emerging actors in electrical power systems which aim to shift combustion-based mobility to electric paradigm. In the meantime, Energy Communities are deployed to enhance the local usage of distributed renewable production. As both ators share the same goal of satisfying the demand at the lowest cost, they could take advantage of their complementarity and coordinate their decisions to enhance each other operation. This paper presents an original Mixed-Integer Second Order Cone Programming long-term Electric Vehicle fleet planning optimization problem that integrates the coordination with a Renewable Energy Community and Vehicle-to-Grid capability. This model is used to assess the economic, energy, and grid performances of their collaboration in a 21 buses low-voltage distribution network. Key results show that, both actors coordination can help reducing the yearly cost up to 11.3 % compared to their stand-alone situation and that it may reduce the stress on the substation transformer by 46 % through the activation of the inherent EVs flexibility when subject to peak penalties from the grid operator.
Safe Control of Multi-Agent Systems with Minimal Communication
In many multi-agent systems, communication is limited by bandwidth, latency, and energy constraints. Designing controllers that achieve coordination and safety with minimal communication is critical for scalable and reliable deployment. This paper presents a method for designing controllers that minimize inter-agent communication in multi-agent systems while satisfying safety and coordination requirements, while conforming to communication delay constraints. The control synthesis problem is cast as a rank minimization problem, where a convex relaxation is obtained via system level synthesis. Simulation results on various tasks, including trajectory tracking with relative and heterogeneous sensing, demonstrate that the proposed method significantly reduces inter-agent transmission compared to baseline approaches.
comment: to appear at 2025 IEEE Conference on Decision and Control (CDC)
Large Language Models for Power System Applications: A Comprehensive Literature Survey
This comprehensive literature review examines the emerging applications of Large Language Models (LLMs) in power system engineering. Through a systematic analysis of recent research published between 2020 and 2025, we explore how LLMs are being integrated into various aspects of power system operations, planning, and management. The review covers key application areas including fault diagnosis, load forecasting, cybersecurity, control and optimization, system planning, simulation, and knowledge management. Our findings indicate that while LLMs show promising potential in enhancing power system operations through their advanced natural language processing and reasoning capabilities, significant challenges remain in their practical implementation. These challenges include limited domain-specific training data, concerns about reliability and safety in critical infrastructure, and the need for enhanced explainability. The review also highlights emerging trends such as the development of power system-specific LLMs and hybrid approaches combining LLMs with traditional power engineering methods. We identify crucial research directions for advancing the field, including the development of specialized architectures, improved security frameworks, and enhanced integration with existing power system tools. This survey provides power system researchers and practitioners with a comprehensive overview of the current state of LLM applications in the field and outlines future pathways for research and development.
comment: 17 pages, 1 table, 1 figure
Headroom as A Grid Service in Software-Defined Power Grids: A Peak-to-Peak Control Design Approach
To address system frequency challenges driven by the integration of renewable generation, advanced control strategies are designed at the device level to provide effective frequency support following disturbances. However, typically relying on energy-based performance metrics, these methods cannot guarantee the system frequency constraints such as frequency nadir and maximum Rate-of-Change-of-Frequency (RoCoF). Moreover, locally-designed frequency support cannot minimize the overall system cost to maintain frequency stability. On the other hand, the concept of frequency-constrained system scheduling is introduced, which incorporates frequency dynamic constraints into the system economic optimization, so that frequency requirements can be maintained with minimum cost. However, these works rely on analytical approximations of the frequency dynamic metrics, which are mathematically complicated and tend to be over-conservative for the approximation of IBR headroom requirements.
Information-Optimal Formation Geometry Design for Multimodal UAV Cooperative Perception
The efficacy of UAV swarm cooperative perception fundamentally depends on three-dimensional (3D) formation geometry, which governs target observability and sensor complementarity. In the literature, the exploitation of formation geometry and its impact on UAV sensing have rarely been studied, which can significantly degrade multimodal cooperative perception at scenarios where heterogeneous payloads (vision cameras and LiDAR) should be geometrically arranged to exploit their complementary strengths while managing communication interference and hardware budgets. To bridge this critical gap, we propose an information-theoretic optimization framework that allocation of UAVs and multimodal sensors, configures formation geometries, and flight control. The UAV-sensor allocation is optimized by the Fisher Information Matrix (FIM) determinant maximization. Under this framework we introduce an equivalent formation transition strategy that enhances field-of-view (FOV) coverage without compromising perception accuracy and communication interference. Furthermore, we design a novel Lyapunov-stable flight control scheme with logarithmic potential fields to generate energy-efficient trajectories for formation transitions. Extensive simulations demonstrate our formation-aware design achieves 25.0\% improvement in FOV coverage, 104.2\% enhancement in communication signal strength, and 47.2\% reduction in energy consumption compared to conventional benchmarks. This work establishes that task-driven geometric configuration represents a foundational rather than incidental component in next-generation UAV swarm systems.
Data-Driven Control via Conditional Mean Embeddings: Formal Guarantees via Uncertain MDP Abstraction
Controlling stochastic systems with unknown dynamics and under complex specifications is specially challenging in safety-critical settings, where performance guarantees are essential. We propose a data-driven policy synthesis framework that yields formal performance guarantees for such systems using conditional mean embeddings (CMEs) and uncertain Markov decision processes (UMDPs). From trajectory data, we learn the system's transition kernel as a CME, then construct a finite-state UMDP abstraction whose transition uncertainties capture learning and discretization errors. Next, we generate a policy with formal performance bounds through robust dynamic programming. We demonstrate and empirically validate our method through a temperature regulation benchmark.
Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences
Dynamical decoupling seeks to mitigate phase decoherence in qubits by applying a carefully designed sequence of effectively instantaneous electromagnetic pulses. Although analytic solutions exist for pulse timings that are optimal under specific noise regimes, identifying the optimal timings for a realistic noise spectrum remains challenging. We propose a reinforcement learning (RL)-based method for designing pulse sequences on qubits. Our novel action set enables the RL agent to efficiently navigate this inherently non-convex optimization landscape. The action set, derived from Thompson's group $F$, is applicable to a broad class of sequential decision problems whose states can be represented as bounded sequences. We demonstrate that our RL agent can learn pulse sequences that minimize dephasing without requiring explicit knowledge of the underlying noise spectrum. This work opens the possibility for real-time learning of optimal dynamical decoupling sequences on qubits which are dephasing-limited. The model-free nature of our algorithm suggests that the agent may ultimately learn optimal pulse sequences even in the presence of unmodeled physical effects, such as pulse errors or non-Gaussian noise.
A Fair, Flexible, Zero-Waste Digital Electricity Market: A First-Principles Approach Combining Automatic Market Making, Holarchic Architectures and Shapley Theory
This thesis presents a fundamental rethink of electricity market design at the wholesale and balancing layers. Rather than treating markets as static spot clearing mechanisms, it reframes them as a continuously online, event driven dynamical control system: a two sided marketplace operating directly on grid physics. Existing energy only, capacity augmented, and zonal market designs are shown to admit no shock robust Nash equilibrium under realistic uncertainty, instead relying on price caps, uplift, and regulatory intervention to preserve solvency and security. In response, the thesis develops a holarchic Automatic Market Maker (AMM) in which prices are bounded, exogenous control signals derived from physical tightness rather than emergent equilibrium outcomes. The AMM generalises nodal and zonal pricing through nested scarcity layers, from node to cluster to zone to region to system, such that participant facing prices inherit from the tightest binding constraint. Nodal and zonal pricing therefore emerge as special cases of a unified scarcity propagation rule. Beyond pricing, the AMM functions as a scarcity aware control system and a digitally enforceable rulebook for fair access and proportional allocation under shortage. Fuel costs are recovered through pay as bid energy dispatch consistent with merit order, while non fuel operating and capital costs are allocated according to adequacy, flexibility, and locational contribution. Large scale simulations demonstrate bounded input bounded output stability, controllable procurement costs, zero structural waste, and improved distributional outcomes. The architecture is climate aligned and policy configurable, but requires a managed transition and new operational tools for system operators and market participants.
comment: PhD thesis
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography
Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel linear descriptors-based block field method (MLD-BFM) for continuous decoding of five finger-joint DoFs by leveraging the rich spatial information of high-density surface electromyography (HD sEMG). Twenty-one healthy participants performed dynamic sinusoidal finger movements while HD sEMG signals were recorded from the \textit{extensor digitorum communis} (EDC) and \textit{flexor digitorum superficialis} (FDS) muscles. MLD-BFM extracted region-specific spatial features, including effective field strength ($Σ$), field-strength variation rate ($Φ$), and spatial complexity ($Ω$). Model performance was optimized (block size: $2 \times 2$; window: 0.15 s) and compared with conventional time-domain features and dimensionality reduction approaches when applied to multi-output regression models. MLD-BFM consistently achieved the highest $\mathrm{R}^2_{\mathrm{vw}}$ values across all models. The multilayer perceptron (MLP) combined with MLD-BFM yielded the best performance ($\mathrm{R}^2_{\mathrm{vw}} = 86.68\% \pm 0.33$). Time-domain features also showed strong predictive capability and were statistically comparable to MLD-BFM in some models, whereas dimensionality reduction techniques exhibited lower accuracy. Decoding accuracy was higher for the middle and ring fingers than for the thumb. Overall, MLD-BFM improved continuous finger movement decoding accuracy, underscoring the importance of taking advantage of the spatial richness of HD sEMG. These findings suggest that spatially structured features enhance SPC and provide practical guidance for designing robust, real-time, and responsive myoelectric interfaces.
comment: 39 pages, 13 figures, 2 tables
Safe Online Control-Informed Learning
This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constraints into an online learning process. It employs an extended Kalman filter to incrementally update system parameters in real time, enabling robust and data-efficient adaptation under uncertainty. A softplus barrier function enforces constraint satisfaction during learning and control while eliminating the dependence on high-quality initial guesses. Theoretical analysis establishes convergence and safety guarantees, and the framework's effectiveness is demonstrated on cart-pole and robot-arm systems.
A Convex Obstacle Avoidance Formulation
Autonomous driving requires reliable collision avoidance in dynamic environments. Nonlinear Model Predictive Controllers (NMPCs) are suitable for this task, but struggle in time-critical scenarios requiring high frequency. To meet this demand, optimization problems are often simplified via linearization, narrowing the horizon window, or reduced temporal nodes, each compromising accuracy or reliability. This work presents the first general convex obstacle avoidance formulation, enabled by a novel approach to integrating logic. This facilitates the incorporation of an obstacle avoidance formulation into convex MPC schemes, enabling a convex optimization framework with substantially improved computational efficiency relative to conventional nonconvex methods. A key property of the formulation is that obstacle avoidance remains effective even when obstacles lie outside the prediction horizon, allowing shorter horizons for real-time deployment. In scenarios where nonconvex formulations are unavoidable, the proposed method meets or exceeds the performance of representative nonconvex alternatives. The method is evaluated in autonomous vehicle applications, where system dynamics are highly nonlinear.
comment: 18 pages, 17 figures
Delay Optimization in a Simple Offloading System: Extended Version
We consider a computation offloading system where jobs are processed sequentially at a local server followed by a higher-capacity cloud server. The system offers two service modes, differing in how the processing is split between the servers. Our goal is to design an optimal policy for assigning jobs to service modes and partitioning server resources in order to minimize delay. We begin by characterizing the system's stability region and establishing design principles for service modes that maximize throughput. For any given job assignment strategy, we derive the optimal resource partitioning and present a closed-form expression for the resulting delay. Moreover, we establish that the delay-optimal assignment policy exhibits a distinct breakaway structure: at low system loads, it is optimal to route all jobs through a single service mode, whereas beyond a critical load threshold, jobs must be assigned across both modes. We conclude by validating these theoretical insights through numerical evaluation.
Spacecraft Angular Rate Estimation via Event-Based Camera Sensing
This paper presents a method for determining spacecraft angular rates using event-based camera sensing. This is achieved by analyzing the temporal distribution of brightness events triggered by the apparent motion of stars. The location and polarity of the events are used to infer the apparent motion field of the stars, which is, in turn, employed to estimate the observer angular velocity in the camera frame. This can be converted to the spacecraft angular rates provided an attitude reference. The method is validated through numerical simulation for a synthetic dataset of event streams generated on random spacecraft pointing and rates conditions. The accuracy of the method is assessed, demonstrating its potential to complement or replace conventional rate sensors in spacecraft systems using event camera sensing.
A $μ$-Analysis and Synthesis Framework for Partial Integral Equations using IQCs
We develop a $μ$-analysis and synthesis framework for infinite-dimensional systems that leverages the Integral Quadratic Constraints (IQCs) to compute the structured singular value's upper bound. The methodology formulates robust stability and performance conditions jointly as Linear Partial Integral Inequalities within the Partial Integral Equation framework, establishing connections between IQC multipliers and $μ$-theory. Computational implementation via PIETOOLS enables computational tools that practically applicable to spatially distributed infinite dimensional systems. Illustrations with the help of Partial and Delay Differential Equations validate the effectiveness of the framework, showing a significant reduction in conservatism compared to unstructured methods and providing systematic tools for stability-performance trade-off analysis.
Hybrid Control as a Proxy for Detection and Mitigation of Sensor Attacks in Cooperative Driving
We propose a real-time hybrid controller scheme to detect and mitigate False-Data Injection (FDI) attacks on Cooperative Adaptive Cruise Control (CACC). Our method uses sensor redundancy to create equivalent controller realizations, each driven by distinct sensor subsets but producing identical control inputs when no attack occurs. By comparing control signals and measurements via majority voting, the scheme identifies compromised sensors in real-time and switches to a healthy controller. The hybrid controller uses attack-dependent flow and jump sets, and resets compromised controllers' states. Simulation results demonstrate the effectiveness of this approach.
comment: 6 pages, submitted to the IFAC World Congress 2026
GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG
Recent advancements in head-mounted wearable technology are revolutionizing the field of biopotential measurement, but the integration of these technologies into practical, user-friendly devices remains challenging due to issues with design intrusiveness, comfort, and data privacy. To address these challenges, this paper presents GAPses, a novel smart glasses platform designed for unobtrusive, comfortable, and secure acquisition and processing of electroencephalography (EEG) and electrooculography (EOG) signals. We introduce a direct electrode-electronics interface with custom dry soft electrodes to enhance comfort for long wear. An integrated parallel ultra-low-power RISC-V processor (GAP9, Greenwaves Technologies) processes data at the edge, thereby eliminating the need for continuous data streaming through a wireless link, enhancing privacy, and increasing system reliability in adverse channel conditions. We demonstrate the broad applicability of the designed prototype through validation in a number of EEG-based interaction tasks, including alpha waves, steady-state visual evoked potential analysis, and motor movement classification. Furthermore, we demonstrate an EEG-based biometric subject recognition task, where we reach a sensitivity and specificity of 98.87% and 99.86% respectively, with only 8 EEG channels and an energy consumption per inference on the edge as low as 121 $μJ$. Moreover, in an EOG-based eye movement classification task, we reach an accuracy of 96.68% on 11 classes, resulting in an information transfer rate of 94.78 bit/min, which can be further increased to 161.43 bit/min by reducing the accuracy to 81.43%. The deployed implementation has an energy consumption of 40 $μJ$ per inference and a total system power of only 12.4 mW, of which only 1.61% is used for classification, allowing for continuous operation of more than 22 h with a small 75 mAh battery.
comment: 11 pages, 6 figures
An Input-Output Data-Driven Dissipativity Approach for Compositional Stability Certification of Interconnected LTI MIMO Systems
We propose an input-output data-driven framework for certifying the stability of interconnected multiple-input-multiple-output linear time-invariant discrete-time systems via QSR-dissipativity. That is, by using measured input-output trajectories of each subsystem, we verify dissipative properties and extract local passivity indices without requiring an explicit model identification. These passivity indices are then used to derive conditions under which the equilibrium of the interconnected system is stable. In particular, the framework identifies how the lack of passivity in some subsystems can be compensated by surpluses in others. The proposed approach enables a compositional stability analysis by combining subsystem-level conditions into a criterion valid for the overall interconnected system. We illustrate via a numerical case study, how to compute channel-wise passivity indices and infer stability guarantees directly from data with the proposed method.
BioGAP-Ultra: A Modular Edge-AI Platform for Wearable Multimodal Biosignal Acquisition and Processing
The growing demand for continuous physiological monitoring and human-machine interaction in real-world settings calls for wearable platforms that are flexible, low-power, and capable of on-device intelligence. This work presents BioGAP-Ultra, an advanced multimodal biosensing platform that supports synchronized acquisition of diverse electrophysiological and hemodynamic signals such as EEG, EMG, ECG, and PPG while enabling embedded AI processing at state-of-the-art energy efficiency. BioGAP-Ultra is a major extension of our previous BioGAP design aimed at meeting the rapidly growing requirements of wearable biosensing applications. It features (i) increased on-device storage (x2 SRAM, x4 FLASH), (ii) improved wireless connectivity (supporting up to 1.4 Mbit/s bandwidth, x4 higher than BioGAP), (iii) enhanced number of signal modalities (from 3 to 5) and analog input channels (x2). Further, it is accompanied by a real-time visualization and analysis software suite that supports the hardware design, providing access to raw data and real-time configurability on a mobile phone. Finally, we demonstrate the system's versatility through integration into various wearable form factors: an EEG-PPG headband consuming 32.8 mW, an EMG sleeve at 26.7 mW, and an ECG-PPG chestband requiring only 9.3 mW for continuous acquisition and streaming, tailored for diverse biosignal applications. To showcase its edge-AI capabilities, we further deploy two representative on-device applications: (1) ECG-PPG-based PAT estimation at 8.6 mW, and (2) EMG-ACC-based classification of reach-and-grasp motion phases, achieving 79.9 % $\pm$ 5.7 % accuracy at 23.6 mW. All hardware and software design files are also released open-source with a permissive license.
comment: 17 pages, 15 figures
Physics-informed Modularized Neural Network for Advanced Building Control by Deep Reinforcement Learning
Physics-informed machine learning (PIML) provides a promising solution for building energy modeling and can serve as a virtual environment to enable reinforcement learning (RL) agents to interact and learn. However, challenges remain in efficiently integrating physics priors, evaluating the effectiveness of physics constraints, balancing model accuracy and physics consistency, and enabling real-world implementation. To address these gaps, this study introduces a Physics-Informed Modularized Neural Network (PI-ModNN), which incorporates physics priors through a physics-informed model structure, loss functions, and hard constraints. A new evaluation metric called "temperature response violation" is developed to quantify the physical consistency of data-driven building dynamic models under varying control inputs and training data sizes. Additionally, a physics prior evaluation framework based on rule importance is proposed to assess the contribution of each individual physics prior, offering guidance on selecting appropriate PIML techniques. Results indicate that incorporating physical priors does not always improve model performance; inappropriate priors may decrease model accuracy and consistency. However, hard constraints are effective in enforcing model consistency. Furthermore, we present a general workflow for developing control-oriented PIML models and integrating them with deep reinforcement learning (DRL). Following this framework, a case study implementing DRL in an office space over three months demonstrates potential energy savings of 31.4%. Finally, we provide a general guideline for integrating data-driven models with advanced building control through a four-step evaluation framework, paving the way for reliable and scalable deployment of advanced building controls.
Estimating Hormone Concentrations in the Pituitary-Thyroid Feedback Loop from Irregularly Sampled Measurements
Model-based control techniques have recently been investigated for the recommendation of medication dosages to address thyroid diseases. These techniques often rely on knowledge of internal hormone concentrations that cannot be measured from blood samples. Moreover, the measurable concentrations are typically only obtainable at irregular sampling times. In this work, we empirically verify a notion of sample-based detectability that accounts for irregular sampling of the measurable concentrations on two pituitary-thyroid loop models representing patients with hypo- and hyperthyroidism, respectively, and include the internal concentrations as states. We then implement sample-based moving horizon estimation for the models, and test its performance on virtual patients across a range of sampling schemes. Our study shows robust stability of the estimator across all scenarios, and that more frequent sampling leads to less estimation error in the presence of model uncertainty and misreported dosages.
comment: 8 pages; This work has been submitted to IFAC for possible publication
Off-grid solar energy storage system with lithium iron phosphate (LFP) batteries in high mountains: a case report of Tianchi Lodge in Taiwan
Mountain huts are buildings located at high altitude, providing shelter and a place for hikers. Energy supply on mountain huts remains an open issue. Using renewable energies could be an appropriate solution. Tianchi Lodge, a famous mountain hut in Taiwan, has operated an off-grid solar energy storage system with lithium iron phosphate (LFP) batteries since 2020. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided.
comment: 6 pages, 3 figures, 1 table
On Structural Properties of Risk-Averse Optimal Stopping Problems
We establish structural properties of optimal stopping problems under time-consistent dynamic (coherent) risk measures, focusing on value function monotonicity and the existence of control limit (threshold) optimal policies. While such results are well developed for risk-neutral (expected-value) models, they remain underexplored in risk-averse settings. Coherent risk measures typically lack the tower property and are subadditive rather than additive, complicating structural analysis. We show that value function monotonicity mirrors the risk-neutral case. Moreover, if the risk envelope associated with each coherent risk measure admits a minimal element, the risk-averse optimal stopping problem reduces to an equivalent risk-neutral formulation. We also develop a general procedure for identifying control limit optimal policies and use it to derive practical, verifiable conditions on the risk measures and MDP structure that guarantee their existence. We illustrate the theory and verify these conditions through optimal stopping problems arising in operations, marketing, and finance.
Model Predictive Control with High-Probability Safety Guarantee for Nonlinear Stochastic Systems
We present a model predictive control (MPC) framework for nonlinear stochastic systems that ensures safety guarantee with high probability. Unlike most existing stochastic MPC schemes, our method adopts a set-erosion that converts the probabilistic safety constraint into a tractable deterministic safety constraint on a smaller safe set over deterministic dynamics. As a result, our method is compatible with any off-the-shelf deterministic MPC algorithm. The key to the effectiveness of our method is a tight bound on the stochastic fluctuation of a stochastic trajectory around its nominal version. Our method is scalable and can guarantee safety with high probability level (e.g., 99.99%), making it particularly suitable for safety-critical applications involving complex nonlinear dynamics. Rigorous analysis is conducted to establish a theoretical safety guarantee, and numerical experiments are provided to validate the effectiveness of the proposed MPC method.
Can Carbon-Aware Electric Load Shifting Reduce Emissions? An Equilibrium-Based Analysis
An increasing number of electric loads, such as hydrogen producers or data centers, can be characterized as carbon-sensitive, meaning that they are willing to adapt the timing and/or location of their electricity usage in order to minimize carbon footprints. However, the emission reduction efforts of these carbon-sensitive loads rely on carbon intensity information such as average carbon emissions, and it is unclear whether load shifting based on these signals effectively reduces carbon emissions. To address this open question, we design a carbon-aware equilibrium model, which expands the commonly used equilibrium model for standard (carbon-agnostic) electricity market clearing to include carbon-sensitive consumers that adapt their consumption based on average carbon emission signals and carbon costs. This analysis represents an idealized situation for carbon-sensitive consumers, where their carbon preferences are reflected directly in the market clearing, and contrasts with current practice, where carbon emission signals only become known to consumers a posteriori (i.e., after the market has already been cleared). Furthermore, we extend our model to consider temporal load shifting and time-varying maximum renewable generations. We employ illustrative three-bus examples and numerical simulations on the IEEE RTS-GMLC system to reveal the limitations of the widely adopted average carbon emission signal for guiding carbon emission reduction. Our model offers a novel perspective for evaluating the effectiveness of different carbon signals and contributes to new carbon signal design.
comment: 15 pages, 8 figures, accepted by ACM e-Energy 2026. arXiv admin note: text overlap with arXiv:2501.09853
Optimal Satellite Maneuvers for Spaceborne Jamming Attacks
Satellites are becoming exceedingly critical for communication, making them prime targets for cyber-physical attacks. We consider a rogue satellite in low Earth orbit that jams the uplink communication between another satellite and a ground station. To achieve maximal interference with minimal fuel consumption, the jammer carefully maneuvers itself relative to the target satellite's antenna. We cast this maneuvering objective as a two-stage optimal control problem, involving i) repositioning to an efficient jamming position before uplink communication commences; and ii) maintaining an efficient jamming position after communication has started. We obtain the optimal maneuvering trajectories for the jammer and perform simulations to show how they enable the disruption of uplink communication with reasonable fuel consumption.
Adversarial Pursuits in Cislunar Space
Cislunar space is becoming a critical domain for future lunar and interplanetary missions, yet its remoteness, sparse infrastructure, and unstable dynamics create single points of failure. Adversaries in cislunar orbits can exploit these vulnerabilities to pursue and jam co-located communication relays, potentially severing communications between lunar missions and the Earth. We study a pursuit-evasion scenario between two spacecraft in a cislunar orbit, where the evader must avoid a pursuer-jammer while remaining close to its nominal trajectory. We model the evader-pursuer interaction as a zero-sum adversarial differential game cast in the circular restricted three-body problem. This formulation incorporates critical aspects of cislunar orbital dynamics, including autonomous adjustment of the reference orbit phasing to enable aggressive evading maneuvers, and shaping of the evader's cost with the orbit's stable and unstable manifolds. We solve the resulting nonlinear game locally using a continuous-time differential dynamic programming variant, which iteratively applies linear-quadratic approximations to the Hamilton-Jacobi-Isaacs equation. We simulate the evader's behavior against both a worst-case and a linear-quadratic pursuer. Our results pave the way for securing future missions in cislunar space against emerging cyber threats.
comment: 17 pages, 9 figures
Robotics
SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
Automated road sign recognition is a critical task for intelligent transportation systems, but traditional deep learning methods struggle with the sheer number of sign classes and the impracticality of creating exhaustive labeled datasets. This paper introduces a novel zero-shot recognition framework that adapts the Retrieval-Augmented Generation (RAG) paradigm to address this challenge. Our method first uses a Vision Language Model (VLM) to generate a textual description of a sign from an input image. This description is used to retrieve a small set of the most relevant sign candidates from a vector database of reference designs. Subsequently, a Large Language Model (LLM) reasons over the retrieved candidates to make a final, fine-grained recognition. We validate this approach on a comprehensive set of 303 regulatory signs from the Ohio MUTCD. Experimental results demonstrate the framework's effectiveness, achieving 95.58% accuracy on ideal reference images and 82.45% on challenging real-world road data. This work demonstrates the viability of RAG-based architectures for creating scalable and accurate systems for road sign recognition without task-specific training.
comment: Submitted to IV 2026
Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection
In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed object lists rather than raw sensor data from traditional sensors. Instead of processing other raw data separately and then fusing them at the object level, we propose an end-to-end cross-level fusion concept with Transformer, which integrates highly abstract object list information with raw camera images for 3D object detection. Object lists are fed into a Transformer as denoising queries and propagated together with learnable queries through the latter feature aggregation process. Additionally, a deformable Gaussian mask, derived from the positional and size dimensional priors from the object lists, is explicitly integrated into the Transformer decoder. This directs attention toward the target area of interest and accelerates model training convergence. Furthermore, as there is no public dataset containing object lists as a standalone modality, we propose an approach to generate pseudo object lists from ground-truth bounding boxes by simulating state noise and false positives and negatives. As the first work to conduct cross-level fusion, our approach shows substantial performance improvements over the vision-based baseline on the nuScenes dataset. It demonstrates its generalization capability over diverse noise levels of simulated object lists and real detectors.
comment: 6 pages, 3 figures, accepted at IV2025
MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems
Modern engineering systems, such as autonomous vehicles, flexible robotics, and intelligent aerospace platforms, require controllers that are robust to uncertainties, adaptive to environmental changes, and safety-aware under real-time constraints. RL offers powerful data-driven adaptability for systems with nonlinear dynamics that interact with uncertain environments. RL, however, lacks built-in mechanisms for dynamic constraint satisfaction during exploration. MPC offers structured constraint handling and robustness, but its reliance on accurate models and computationally demanding online optimization may pose significant challenges. This paper proposes an integrated MPC-RL framework that combines stability and safety guarantees of MPC with the adaptability of RL. During training, MPC defines safe control bounds that guide the RL component and that enable constraint-aware policy learning. At deployment, the learned policy operates in real time with a lightweight safety filter based on Lipschitz continuity to ensure constraint satisfaction without heavy online optimizations. The approach, which is validated on a nonlinear aeroelastic wing system, demonstrates improved disturbance rejection, reduced actuator effort, and robust performance under turbulence. The architecture generalizes to other domains with structured nonlinearities and bounded disturbances, offering a scalable solution for safe artificial-intelligence-driven control in engineering applications.
SAGA: Open-World Mobile Manipulation via Structured Affordance Grounding
We present SAGA, a versatile and adaptive framework for visuomotor control that can generalize across various environments, task objectives, and user specifications. To efficiently learn such capability, our key idea is to disentangle high-level semantic intent from low-level visuomotor control by explicitly grounding task objectives in the observed environment. Using an affordance-based task representation, we express diverse and complex behaviors in a unified, structured form. By leveraging multimodal foundation models, SAGA grounds the proposed task representation to the robot's visual observation as 3D affordance heatmaps, highlighting task-relevant entities while abstracting away spurious appearance variations that would hinder generalization. These grounded affordances enable us to effectively train a conditional policy on multi-task demonstration data for whole-body control. In a unified framework, SAGA can solve tasks specified in different forms, including language instructions, selected points, and example demonstrations, enabling both zero-shot execution and few-shot adaptation. We instantiate SAGA on a quadrupedal manipulator and conduct extensive experiments across eleven real-world tasks. SAGA consistently outperforms end-to-end and modular baselines by substantial margins. Together, these results demonstrate that structured affordance grounding offers a scalable and effective pathway toward generalist mobile manipulation.
comment: 9 pages, 7 figures
VLG-Loc: Vision-Language Global Localization from Labeled Footprint Maps
This paper presents Vision-Language Global Localization (VLG-Loc), a novel global localization method that uses human-readable labeled footprint maps containing only names and areas of distinctive visual landmarks in an environment. While humans naturally localize themselves using such maps, translating this capability to robotic systems remains highly challenging due to the difficulty of establishing correspondences between observed landmarks and those in the map without geometric and appearance details. To address this challenge, VLG-Loc leverages a vision-language model (VLM) to search the robot's multi-directional image observations for the landmarks noted in the map. The method then identifies robot poses within a Monte Carlo localization framework, where the found landmarks are used to evaluate the likelihood of each pose hypothesis. Experimental validation in simulated and real-world retail environments demonstrates superior robustness compared to existing scan-based methods, particularly under environmental changes. Further improvements are achieved through the probabilistic fusion of visual and scan-based localization.
High Order Control Lyapunov Function - Control Barrier Function - Quadratic Programming Based Autonomous Driving Controller for Bicyclist Safety
Ensuring the safety of Vulnerable Road Users (VRUs) is a critical challenge in the development of advanced autonomous driving systems in smart cities. Among vulnerable road users, bicyclists present unique characteristics that make their safety both critical and also manageable. Vehicles often travel at significantly higher relative speeds when interacting with bicyclists as compared to their interactions with pedestrians which makes collision avoidance system design for bicyclist safety more challenging. Yet, bicyclist movements are generally more predictable and governed by clear traffic rules as compared to the sudden and sometimes erratic pedestrian motion, offering opportunities for model-based control strategies. To address bicyclist safety in complex traffic environments, this study proposes and develops a High Order Control Lyapunov Function High Order Control Barrier Function Quadratic Programming (HOCLF HOCBF QP) control framework. Through this framework, CLFs constraints guarantee system stability so that the vehicle can track its reference trajectory, whereas CBFs constraints ensure system safety by letting vehicle avoiding potential collisions region with surrounding obstacles. Then by solving a QP problem, an optimal control command that simultaneously satisfies stability and safety requirements can be calculated. Three key bicyclist crash scenarios recorded in the Fatality Analysis Reporting System (FARS) are recreated and used to comprehensively evaluate the proposed autonomous driving bicyclist safety control strategy in a simulation study. Simulation results demonstrate that the HOCLF HOCBF QP controller can help the vehicle perform robust, and collision-free maneuvers, highlighting its potential for improving bicyclist safety in complex traffic environments.
Making Robots Play by the Rules: The ROS 2 CLIPS-Executive
CLIPS is a rule-based programming language for building knowledge-driven applications, well suited for the complex task of coordinating autonomous robots. Inspired by the CLIPS-Executive originally developed for the lesser known Fawkes robotics framework, we present an Integration of CLIPS into the ROS ecosystem. Additionally, we show the flexibility of CLIPS by describing a PDDL-based planning framework integration.
HMPCC: Human-Aware Model Predictive Coverage Control
We address the problem of coordinating a team of robots to cover an unknown environment while ensuring safe operation and avoiding collisions with non-cooperative agents. Traditional coverage strategies often rely on simplified assumptions, such as known or convex environments and static density functions, and struggle to adapt to real-world scenarios, especially when humans are involved. In this work, we propose a human-aware coverage framework based on Model Predictive Control (MPC), namely HMPCC, where human motion predictions are integrated into the planning process. By anticipating human trajectories within the MPC horizon, robots can proactively coordinate their actions %avoid redundant exploration, and adapt to dynamic conditions. The environment is modeled as a Gaussian Mixture Model (GMM), representing regions of interest. Team members operate in a fully decentralized manner, without relying on explicit communication, an essential feature in hostile or communication-limited scenarios. Our results show that human trajectory forecasting enables more efficient and adaptive coverage, improving coordination between human and robotic agents.
Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller
Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.
Optimized Conflict Management for Urban Air Mobility Using Swarm UAV Networks
Urban Air Mobility (UAM) poses unprecedented traffic coordination challenges, especially with increasing UAV densities in dense urban corridors. This paper introduces a mathematical model using a control algorithm to optimize an Edge AI-driven decentralized swarm architecture for intelligent conflict resolution, enabling real-time decision-making with low latency. Using lightweight neural networks, the system leverages edge nodes to perform distributed conflict detection and resolution. A simulation platform was developed to evaluate the scheme under various UAV densities. Results indicate that the conflict resolution time is dramatically minimized up to 3.8 times faster, and accuracy is enhanced compared to traditional centralized control models. The proposed architecture is highly promising for scalable, efficient, and safe aerial traffic management in future UAM systems.
comment: Preprint. Under review for conference submission
D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation
Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning, while modular systems ignore cross-component interdependencies and synergies. To bridge this gap, we propose the Dynamic 3D Vision-Language-Planning Model (D3D-VLP). Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data. This allows different CoT components to mutually reinforce and implicitly supervise each other. To this end, we construct a large-scale dataset with 10M hybrid samples from 5K real scans and 20K synthetic scenes that are compatible with online learning methods such as RL and DAgger. Our D3D-VLP achieves state-of-the-art results on multiple benchmarks, including Vision-and-Language Navigation (R2R-CE, REVERIE-CE, NavRAG-CE), Object-goal Navigation (HM3D-OVON), and Task-oriented Sequential Grounding and Navigation (SG3D). Real-world mobile manipulation experiments further validate the effectiveness.
LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba
We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.
comment: 14 pages. This paper has been published in Advanced Engineering Informatics. Please cite the journal version: DOI: 10.1016/j.aei.2025.104230
Learning Obstacle Avoidance using Double DQN for Quadcopter Navigation
One of the challenges faced by Autonomous Aerial Vehicles is reliable navigation through urban environments. Factors like reduction in precision of Global Positioning System (GPS), narrow spaces and dynamically moving obstacles make the path planning of an aerial robot a complicated task. One of the skills required for the agent to effectively navigate through such an environment is to develop an ability to avoid collisions using information from onboard depth sensors. In this paper, we propose Reinforcement Learning of a virtual quadcopter robot agent equipped with a Depth Camera to navigate through a simulated urban environment.
comment: Fixed typo in second author's name throughout the paper
Human-Like Robot Impedance Regulation Skill Learning from Human-Human Demonstrations
Humans are experts in physical collaboration by leveraging cognitive abilities such as perception, reasoning, and decision-making to regulate compliance behaviors based on their partners' states and task requirements. Equipping robots with similar cognitive-inspired collaboration skills can significantly enhance the efficiency and adaptability of human-robot collaboration (HRC). This paper introduces an innovative HumanInspired Impedance Regulation Skill Learning framework (HIImpRSL) for robotic systems to achieve leader-follower and mutual adaptation in multiple physical collaborative tasks. The proposed framework enables the robot to adapt its compliance based on human states and reference trajectories derived from human-human demonstrations. By integrating electromyography (EMG) signals and motion data, we extract endpoint impedance profiles and reference trajectories to construct a joint representation via imitation learning. An LSTM-based module then learns task-oriented impedance regulation policies, which are implemented through a whole-body impedance controller for online impedance adaptation. Experimental validation was conducted through collaborative transportation, two interactive Tai Chi pushing hands, and collaborative sawing tasks with multiple human subjects, demonstrating the ability of our framework to achieve human-like collaboration skills and the superior performance from the perspective of interactive forces compared to four other related methods.
comment: 13 pages, 9 figures
EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI AAAI 2026
Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a unified framework for efficient embodied AI with flow-based policy learning. To enhance data efficiency, we bring equivariance into flow matching. We theoretically prove that when using an isotropic Gaussian prior and an equivariant velocity prediction network, the resulting action distribution remains equivariant, leading to improved generalization and substantially reduced data demands. To accelerate sampling, we propose a novel acceleration regularization strategy. As direct computation of acceleration is intractable for marginal flow trajectories, we derive a novel surrogate loss that enables stable and scalable training using only conditional trajectories. Across a wide range of robotic manipulation benchmarks, the proposed algorithm achieves competitive or superior performance under limited data while offering dramatically faster inference. These results highlight EfficientFlow as a powerful and efficient paradigm for high-performance embodied AI.
comment: Accepted by AAAI 2026. Project Page: https://efficientflow.github.io/
Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.
Enhancing Sample Efficiency and Exploration in Reinforcement Learning through the Integration of Diffusion Models and Proximal Policy Optimization
Proximal Policy Optimization (PPO) is widely used in continuous control due to its robustness and stable training, yet it remains sample-inefficient in tasks with expensive interactions and high-dimensional action spaces. This paper proposes PPO-DAP (PPO with Diffusion Action Prior), a strictly on-policy framework that improves exploration quality and learning efficiency without modifying the PPO objective. PPO-DAP follows a two-stage protocol. Offline, we pretrain a conditional diffusion action prior on logged trajectories to cover the action distribution supported by the behavior policy. Online, PPO updates the actor-critic only using newly collected on-policy rollouts, while the diffusion prior is adapted around the on-policy state distribution via parameter-efficient tuning (Adapter/LoRA) over a small parameter subset. For each on-policy state, the prior generates multiple action proposals and concentrates them toward high-value regions using critic-based energy reweighting and in-denoising gradient guidance. These proposals affect the actor only through a low-weight imitation loss and an optional soft KL regularizer to the prior; importantly, PPO gradients are never backpropagated through offline logs or purely synthetic trajectories. We further analyze the method from a dual-proximal perspective and derive a one-step performance lower bound. Across eight MuJoCo continuous-control tasks under a unified online budget of 1.0M environment steps, PPO-DAP consistently improves early learning efficiency (area under the learning curve over the first 40 epochs, ALC@40) and matches or exceeds the strongest on-policy baselines in final return on 6/8 tasks, with modest overhead (1.18+/-0.04x wall-clock time and 1.05+/-0.02x peak GPU memory relative to PPO).
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Multiagent Systems
Beyond Task Completion: An Assessment Framework for Evaluating Agentic AI Systems
Recent advances in agentic AI have shifted the focus from standalone Large Language Models (LLMs) to integrated systems that combine LLMs with tools, memory, and other agents to perform complex tasks. These multi-agent architectures enable coordinated reasoning, planning, and execution across diverse domains, allowing agents to collaboratively automate complex workflows. Despite these advances, evaluation and assessment of LLM agents and the multi-agent systems they constitute remain a fundamental challenge. Although various approaches have been proposed in the software engineering literature for evaluating conventional software components, existing methods for AI-based systems often overlook the non-deterministic nature of models. This non-determinism introduces behavioral uncertainty during execution, yet existing evaluations rely on binary task completion metrics that fail to capture it. Evaluating agentic systems therefore requires examining additional dimensions, including the agent ability to invoke tools, ingest and retrieve memory, collaborate with other agents, and interact effectively with its environment. We propose an end-to-end Agent Assessment Framework with four evaluation pillars encompassing LLMs, Memory, Tools, and Environment. We validate the framework on a representative Autonomous CloudOps use case, where experiments reveal behavioral deviations overlooked by conventional metrics, demonstrating its effectiveness in capturing runtime uncertainties.
Value-Aware Multiagent Systems
This paper introduces the concept of value awareness in AI, which goes beyond the traditional value-alignment problem. Our definition of value awareness presents us with a concise and simplified roadmap for engineering value-aware AI. The roadmap is structured around three core pillars: (1) learning and representing human values using formal semantics, (2) ensuring the value alignment of both individual agents and multiagent systems, and (3) providing value-based explainability on behaviour. The paper presents a selection of our ongoing work on some of these topics, along with applications to real-life domains.
EcoNet: Multiagent Planning and Control Of Household Energy Resources Using Active Inference
Advances in automated systems afford new opportunities for intelligent management of energy at household, local area, and utility scales. Home Energy Management Systems (HEMS) can play a role by optimizing the schedule and use of household energy devices and resources. One challenge is that the goals of a household can be complex and conflicting. For example, a household might wish to reduce energy costs and grid-associated greenhouse gas emissions, yet keep room temperatures comfortable. Another challenge is that an intelligent HEMS agent must make decisions under uncertainty. An agent must plan actions into the future, but weather and solar generation forecasts, for example, provide inherently uncertain estimates of future conditions. This paper introduces EcoNet, a Bayesian approach to household and neighborhood energy management that is based on active inference. The aim is to improve energy management and coordination, while accommodating uncertainties and taking into account potentially conditional and conflicting goals and preferences. Simulation results are presented and discussed.
comment: 17 pages, 9 figures
Social welfare optimisation in well-mixed and structured populations
Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive cost while maximising the frequency of cooperation. However, the optimal value of social welfare under such constraints remains largely unexplored. In this work, we hypothesise that achieving maximal social welfare is not guaranteed at the minimal incentive cost required to drive agents to a desired cooperative state. To address this gap, we adopt to a single-objective approach focused on maximising social welfare, building upon foundational evolutionary game theory models that examined cost efficiency in finite populations, in both well-mixed and structured population settings. Our analytical model and agent-based simulations show how different interference strategies, including rewarding local versus global behavioural patterns, affect social welfare and dynamics of cooperation. Our results reveal a significant gap in the per-individual incentive cost between optimising for pure cost efficiency or cooperation frequency and optimising for maximal social welfare. Overall, our findings indicate that incentive design, policy, and benchmarking in multi-agent systems and human societies should prioritise welfare-centric objectives over proxy targets of cost or cooperation frequency.
Fast and Robust Flocking of Protesters on Street Networks
We present a simple model of protesters scattered throughout a city who want to gather into large and mobile groups. This model relies on random walkers on a street network that follow tactics built from a set of basic rules. Our goal is to identify the most important rules for fast and robust flocking of walkers. We explore a wide set of tactics and show the central importance of a specific rule based on alignment. Other rules alone perform poorly, but our experiments show that combining alignment with them enhances flocking, and that obtained groups are then remarkably robust.
The Traitors: Deception and Trust in Multi-Agent Language Model Simulations
As AI systems increasingly assume roles where trust and alignment with human values are essential, understanding when and why they engage in deception has become a critical research priority. We introduce The Traitors, a multi-agent simulation framework inspired by social deduction games, designed to probe deception, trust formation, and strategic communication among large language model (LLM) agents under asymmetric information. A minority of agents the traitors seek to mislead the majority, while the faithful must infer hidden identities through dialogue and reasoning. Our contributions are: (1) we ground the environment in formal frameworks from game theory, behavioral economics, and social cognition; (2) we develop a suite of evaluation metrics capturing deception success, trust dynamics, and collective inference quality; (3) we implement a fully autonomous simulation platform where LLMs reason over persistent memory and evolving social dynamics, with support for heterogeneous agent populations, specialized traits, and adaptive behaviors. Our initial experiments across DeepSeek-V3, GPT-4o-mini, and GPT-4o (10 runs per model) reveal a notable asymmetry: advanced models like GPT-4o demonstrate superior deceptive capabilities yet exhibit disproportionate vulnerability to others' falsehoods. This suggests deception skills may scale faster than detection abilities. Overall, The Traitors provides a focused, configurable testbed for investigating LLM behavior in socially nuanced interactions. We position this work as a contribution toward more rigorous research on deception mechanisms, alignment challenges, and the broader social reliability of AI systems.
comment: 9 main pages, 31 pages
Climate Driven Interactions Between Malaria Transmission and Diabetes Prevalence
Climate change is intensifying infectious and chronic diseases like malaria and diabetes, respectively, especially among the vulnerable populations. Global temperatures have risen by approximately $0.6^\circ$C since 1950, extending the window of transmission for mosquito-borne infections and worsening outcomes in diabetes due to metabolic stress caused by heat. People living with diabetes have already weakened immune defenses and, therefore, are at an alarmingly increased risk of contraction of malaria. However, most models rarely include both ways of interaction in changing climate conditions. In the paper, we introduce a new compartmental epidemiological model based on synthetic data fitted to disease patterns of India from 2019 to 2021. The framework captures temperature-dependent transmission parameters, seasonal variability, and different disease dynamics between diabetic and non-diabetic groups within the three-compartment system. Model calibration using Multi-Start optimization combined with Sequential Quadratic Programming allows us to find outstanding differences between populations. The odds of malaria infection in diabetic individuals were found to be 1.8--4.0 times higher, with peak infection levels in 35--36\%, as compared to 20--21\% in the non-diabetic ones. The fitted model was able to capture well the epidemiological patterns observed, while the basic reproduction number averaged around 2.3, ranging from 0.31 to 2.75 in different seasons. Given that India's diabetic population is set to rise to about 157 million people by 2050, these findings point to a pressing need for concerted efforts toward climate-informed health strategies and monitoring systems that address both malaria and diabetes jointly.
comment: 21 pages, 6 figures
Systems and Control (EESS)
On the embedding transformation for optimal control of multi-mode switched systems
This paper develops an embedding-based approach to solve switched optimal control problems (SOCPs) with an arbitrary number of subsystems. Initially, the discrete switching signal is represented by a set of binary variables, encoding each mode in binary format. An embedded optimal control problem (EOCP) is then formulated by replacing these binary variables with continuous embedded variables that can take intermediate values between zero and one. Although embedding allows SOCPs to be addressed using conventional techniques, the optimal solutions of EOCPs often yield intermediate values for binary variables, which may not be feasible for the original SOCP. To address this challenge, a modified EOCP (MEOCP) is introduced by adding a concave auxiliary cost function of appropriate dimensionality to the main cost function. This addition ensures that the optimal solution of the EOCP is bang-bang, and as a result, feasible for the original SOCP.
Heavy-Duty Electric Vehicles Contribution for Frequency Response in Power Systems with V2G
The integration of heavy-duty electric vehicles (EVs) with Vehicle-to-Grid (V2G) capability offers a promising solution to enhance grid stability by providing primary frequency response in power systems. This paper investigates the potential of heavy-duty EVs to support the California power grid under different charging strategies: immediate, delayed, and constant minimum power charging. Simulation results demonstrate that both V2G-capable EVs and non-V2G modes have great potential to provide primary frequency response, with V2G-capable EVs exhibiting especially strong contributions. The study highlights the influence of charging strategies, control modes, and grid conditions on EV contributions to grid stability, emphasizing their critical role in mitigating the adverse effects of renewable energy penetration.
comment: Accepted at The 16th IEEE International Conference on Green Energy and Smart Systems, 2025
CapOptix: An Options-Framework for Capacity Market Pricing
Electricity markets are under increasing pressure to maintain reliability amidst rising renewable penetration, demand variability, and occasional price shocks. Traditional capacity market designs often fall short in addressing this by relying on expected-value metrics of energy unserved, which overlook risk exposure in such systems. In this work, we present CapOptix, a capacity pricing framework that interprets capacity commitments as reliability options, i.e., financial derivatives of wholesale electricity prices. CapOptix characterizes the capacity premia charged by accounting for structural price shifts modeled by the Markov Regime Switching Process. We apply the framework to historical price data from multiple electricity markets and compare the resulting premium ranges with existing capacity remuneration mechanisms.
MPC-Guided Safe Reinforcement Learning and Lipschitz-Based Filtering for Structured Nonlinear Systems
Modern engineering systems, such as autonomous vehicles, flexible robotics, and intelligent aerospace platforms, require controllers that are robust to uncertainties, adaptive to environmental changes, and safety-aware under real-time constraints. RL offers powerful data-driven adaptability for systems with nonlinear dynamics that interact with uncertain environments. RL, however, lacks built-in mechanisms for dynamic constraint satisfaction during exploration. MPC offers structured constraint handling and robustness, but its reliance on accurate models and computationally demanding online optimization may pose significant challenges. This paper proposes an integrated MPC-RL framework that combines stability and safety guarantees of MPC with the adaptability of RL. During training, MPC defines safe control bounds that guide the RL component and that enable constraint-aware policy learning. At deployment, the learned policy operates in real time with a lightweight safety filter based on Lipschitz continuity to ensure constraint satisfaction without heavy online optimizations. The approach, which is validated on a nonlinear aeroelastic wing system, demonstrates improved disturbance rejection, reduced actuator effort, and robust performance under turbulence. The architecture generalizes to other domains with structured nonlinearities and bounded disturbances, offering a scalable solution for safe artificial-intelligence-driven control in engineering applications.
KANELÉ: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation
Low-latency, resource-efficient neural network inference on FPGAs is essential for applications demanding real-time capability and low power. Lookup table (LUT)-based neural networks are a common solution, combining strong representational power with efficient FPGA implementation. In this work, we introduce KANELÉ, a framework that exploits the unique properties of Kolmogorov-Arnold Networks (KANs) for FPGA deployment. Unlike traditional multilayer perceptrons (MLPs), KANs employ learnable one-dimensional splines with fixed domains as edge activations, a structure naturally suited to discretization and efficient LUT mapping. We present the first systematic design flow for implementing KANs on FPGAs, co-optimizing training with quantization and pruning to enable compact, high-throughput, and low-latency KAN architectures. Our results demonstrate up to a 2700x speedup and orders of magnitude resource savings compared to prior KAN-on-FPGA approaches. Moreover, KANELÉ matches or surpasses other LUT-based architectures on widely used benchmarks, particularly for tasks involving symbolic or physical formulas, while balancing resource usage across FPGA hardware. Finally, we showcase the versatility of the framework by extending it to real-time, power-efficient control systems.
comment: International Symposium on Field-Programmable Gate Arrays 2026 (ISFPGA'2026)
Data-driven Supervisory Control under Attacks via Spectral Learning
The technological advancements facilitating the rapid development of cyber-physical systems (CPS) also render such systems vulnerable to cyber attacks with devastating effects. Supervisory control is a commonly used control method to neutralize attacks on CPS. The supervisor strives to confine the (symbolic) paths of the system to a desired language via sensors and actuators in a closed control loop, even when attackers can manipulate the symbols received by the sensors and actuators. Currently, supervisory control methods face limitations when effectively identifying and mitigating unknown, broad-spectrum attackers. In order to capture the behavior of broad-spectrum attacks on both sensing and actuation channels we model the plant, supervisors, and attackers with finite-state transducers (FSTs). Our general method for addressing unknown attackers involves constructing FST models of the attackers from spectral analysis of their input and output symbol sequences recorded from a history of attack behaviors observed in a supervisory control loop. To construct these FST models, we devise a novel learning method based on the recorded history of attack behaviors. A supervisor is synthesized using such models to neutralize the attacks.
comment: 10 pages
Sensitivity increase of 3D printed, self-sensing, carbon fibers structures with conductive filament matrix due to flexural loading
The excellent structural and piezoresistive properties of continuous carbon fiber make it suitable for both structural and sensing applications. This work studies the use of 3D printed, continuous carbon fiber reinforced beams as self-sensing structures. It is demonstrated how the sensitivity of these carbon fiber strain gauges can be increased irreversibly by means of a pretreatment by ``breaking-in'' the sensors with a large compressive bending load. The increase in the gauge factor is attributed to local progressive fiber failure, due to the combination of the thermal residual stress from the printing process and external loading. The coextrusion of conductive filament around the carbon fibers is demonstrated as a means of improving the reliability, noise and electrical connection of the sensors. A micrograph of the sensor cross section shows that the conductive filament contacts the various carbon fiber bundles. All-in-all, the use of ``breaking-in'' carbon fiber strain gauges in combination with coextrusion of conductive filament hold promises for 3D printed structural sensors with a high sensitivity.
comment: 28 pages, 9 figures, 4 tables
Distributed Reinforcement Learning using Local Smart Meter Data for Voltage Regulation in Distribution Networks
Centralised reinforcement learning (RL) for voltage magnitude regulation in distribution networks typically involves numerous agent-environment interactions and power flow (PF) calculations, inducing computational overhead and privacy concerns over shared data. Thus, we propose a distributed RL algorithm to regulate voltage magnitude. First, a dynamic Thevenin equivalent model is integrated within smart meters (SM), enabling local voltage magnitude estimation using local SM data for RL agent training, and mitigating the dependency of synchronised data collection and centralised PF calculations. To mitigate estimation errors induced by Thevenin model inaccuracies, a voltage magnitude correction strategy that combines piecewise functions and neural networks is introduced. The piecewise function corrects the large errors of estimated voltage magnitude, while a neural network mimics the grid's sensitivity to control actions, improving action adjustment precision. Second, a coordination strategy is proposed to refine local RL agent actions online, preventing voltage magnitude violations induced by excessive actions from multiple independently trained agents. Case studies on energy storage systems validate the feasibility and effectiveness of the proposed approach, demonstrating its potential to improve voltage regulation in distribution networks.
A Rule-Aware Prompt Framework for Structured Numeric Reasoning in Cyber-Physical Systems
Many cyber-physical systems (CPS) rely on high-dimensional numeric telemetry and explicit operating rules to maintain safe and efficient operation. Recent large language models (LLMs) are increasingly considered as decision-support components in such systems, yet most deployments focus on textual inputs and do not directly address rule-grounded reasoning over numeric measurements. This paper proposes a rule-aware prompt framework that systematically encodes CPS domain context, numeric normalization, and decision rules into a modular prompt architecture for LLMs. The framework decomposes prompts into five reusable modules, including role specification, CPS domain context, numeric normalization, rule-aware reasoning, and output schema, and exposes an interface for plugging in diverse rule sets. A key design element is separating rule specification from the representation of normalized numeric deviations, which enables concise prompts that remain aligned with domain rules. We analyze how different normalization strategies and prompt configurations influence rule adherence, interpretability, and token efficiency. The framework is model-agnostic and applicable across CPS domains. To illustrate its behavior, we instantiate it on numeric anomaly assessment in an IEEE 118-bus electric power transmission network and evaluate several prompting and adaptation regimes. The results show that rule-aware, z-score-based value blocks and a hybrid LLM-detector architecture can substantially improve consistency with CPS rules and anomaly detection performance while reducing token usage, providing a reusable bridge between numeric telemetry and general-purpose LLMs.
High Order Control Lyapunov Function - Control Barrier Function - Quadratic Programming Based Autonomous Driving Controller for Bicyclist Safety
Ensuring the safety of Vulnerable Road Users (VRUs) is a critical challenge in the development of advanced autonomous driving systems in smart cities. Among vulnerable road users, bicyclists present unique characteristics that make their safety both critical and also manageable. Vehicles often travel at significantly higher relative speeds when interacting with bicyclists as compared to their interactions with pedestrians which makes collision avoidance system design for bicyclist safety more challenging. Yet, bicyclist movements are generally more predictable and governed by clear traffic rules as compared to the sudden and sometimes erratic pedestrian motion, offering opportunities for model-based control strategies. To address bicyclist safety in complex traffic environments, this study proposes and develops a High Order Control Lyapunov Function High Order Control Barrier Function Quadratic Programming (HOCLF HOCBF QP) control framework. Through this framework, CLFs constraints guarantee system stability so that the vehicle can track its reference trajectory, whereas CBFs constraints ensure system safety by letting vehicle avoiding potential collisions region with surrounding obstacles. Then by solving a QP problem, an optimal control command that simultaneously satisfies stability and safety requirements can be calculated. Three key bicyclist crash scenarios recorded in the Fatality Analysis Reporting System (FARS) are recreated and used to comprehensively evaluate the proposed autonomous driving bicyclist safety control strategy in a simulation study. Simulation results demonstrate that the HOCLF HOCBF QP controller can help the vehicle perform robust, and collision-free maneuvers, highlighting its potential for improving bicyclist safety in complex traffic environments.
Lexicographic Multi-Objective Stochastic Shortest Path with Mixed Max-Sum Costs
We study the Stochastic Shortest Path (SSP) problem for autonomous systems with mixed max-sum cost aggregations under Linear Temporal Logic constraints. Classical SSP formulations rely on sum-aggregated costs, which are suitable for cumulative quantities such as time or energy but fail to capture bottleneck-style objectives such as avoiding high-risk transitions, where performance is determined by the worst single event along a trajectory. Such objectives are particularly important in safety-critical systems, where even one hazardous transition can be unacceptable. To address this limitation, we introduce max-aggregated objectives that minimize the bottleneck cost, i.e., the maximum one-step cost along a trajectory. We show that standard Bellman equations on the original state space do not apply in this setting and propose an augmented MDP with a state variable tracking the running maximum cost, together with a value iteration algorithm. We further identify a cyclic policy phenomenon, where zero-marginal-cost cycles prevent goal reaching under max-aggregation, and resolve it via a finite-horizon formulation. To handle richer task requirements, linear temporal logic specifications are translated into deterministic finite automata and combined with the system to construct a product MDP. We propose a lexicographic value iteration algorithm that handles mixed max-sum objectives under lexicographic ordering on this product MDP. Gridworld case studies demonstrate the effectiveness of the proposed framework.
From Information Freshness to Semantics of Information and Goal-oriented Communications
Future wireless networks must support real-time, data-driven cyber-physical systems in which communication is tightly coupled with sensing, inference, control, and decision-making. Traditional communication paradigms centered on accuracy, throughput, and latency are increasingly inadequate for these systems, where the value of information depends on its semantic relevance to a specific task. This paper provides a unified exposition of the progression from classical distortion-based frameworks, through information freshness metrics such as the Age of Information (AoI) and its variants, to the emerging paradigm of goal-oriented semantics-aware communication. We organize and systematize existing semantics-aware metrics, including content- and version-aware measures, context-dependent distortion formulations, and history-dependent error persistence metrics that capture lasting impact and urgency. Within this framework, we highlight how these metrics address the limitations of purely accuracy- or freshness-centric designs, and how they collectively enable the selective generation and transmission of only task-relevant information. We further review analytical tools based on Markov decision process (MDP) and Lyapunov optimization methods that have been employed to characterize optimal or near-optimal timing and scheduling policies under semantic performance criteria and communication constraints. By synthesizing these developments into a coherent framework, the paper clarifies the design principles underlying goal-oriented, semantics-aware communication systems. It illustrates how they can significantly improve efficiency, reliability, and task performance. The presented perspective aims to serve as a bridge between information-theoretic, control-theoretic, and networking viewpoints, and to guide the design of semantic communication architectures for 6G and beyond.
comment: Submitted to IEEE as invited paper
An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused Learning
High penetration of renewable energy sources (RES) introduces significant uncertainty and intermittency into microgrid operations, posing challenges to economic and reliable scheduling. To address this, this paper proposes an end-to-end decision-focused framework that jointly optimizes probabilistic forecasting and robust operation for microgrids. A multilayer encoder-decoder (MED) probabilistic forecasting model is integrated with a two-stage robust optimization (TSRO) model involving direct load control (DLC) through a differentiable decision pathway, enabling gradient-based feedback from operational outcomes to improve forecasting performance. Unlike conventional sequential approaches, the proposed method aligns forecasting accuracy with operational objectives by directly minimizing decision regret via a surrogate smart predict-then-optimize (SPO) loss function. This integration ensures that probabilistic forecasts are optimized for downstream decisions, enhancing both economic efficiency and robustness. Case studies on modified IEEE 33-bus and 69-bus systems demonstrate that the proposed framework achieves superior forecasting accuracy and operational performance, reducing total and net operation costs by up to 18% compared with conventional forecasting and optimization combinations. The results verify the effectiveness and scalability of the end-to-end decision-focused approach for resilient and cost-efficient microgrid management under uncertainty.
comment: 10 pages
Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller
Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.
Electric Road Systems for Smart Cities: A Scalable Infrastructure Framework for Dynamic Wireless Charging
The transition to electric transportation is a key enabler for intelligent and sustainable cities; however, inadequate charging infrastructure remains a major barrier to large-scale electric vehicle (EV) adoption. This paper presents a scalable Electric Road System (ERS) architecture that enables Dynamic Wireless Charging (DWC) of EVs during motion. The proposed framework integrates inductive charging coils embedded in road pavement, real-time vehicle-to-infrastructure (V2I) communication, and adaptive energy management coordinated with smart grid systems. Modular road segments with a standardized charging process are employed to ensure scalability across urban corridors and interoperability among different EV platforms. System performance is evaluated using a co-simulation framework combining MATLAB-based power analysis with traffic inputs generated in SUMO. Key performance metrics include charging efficiency, energy cost per kilometer, and battery lifecycle improvement. Simulation results indicate a potential reduction in range anxiety and an increase in battery lifespan due to frequent shallow charging cycles. The study further discusses deployment challenges, policy considerations, and energy distribution strategies aligned with climate-resilient urban development. A case study of a tier-1 Indian city is presented to analyze the cost-benefit trade-offs of retrofitting high-density urban corridors with ERS. The proposed framework provides a practical foundation for next-generation EV infrastructure planning in smart cities.
comment: Preprint. Under review for conference submission. Simulation-based study
Quadratic-Programming-based Control of Multi-Robot Systems for Cooperative Object Transport
This paper investigates the control problem of steering a group of spherical mobile robots to cooperatively transport a spherical object. By controlling the movements of the robots to exert appropriate contact (pushing) forces, it is desired that the object follows a velocity command. To solve the problem, we first treat the robots' positions as virtual control inputs of the object, and propose a velocity-tracking controller based on quadratic programming (QP), enabling the robots to cooperatively generate desired contact forces while minimizing the sum of the contact-force magnitudes. Then, we design position-tracking controllers for the robots. By appropriately designing the objective function and the constraints for the QP, it is guaranteed that the QP admits a unique solution and the QP-based velocity-tracking controller is Lipschitz continuous. Finally, we consider the closed-loop system as an interconnection of two subsystems, corresponding to the velocity-tracking error of the object and the position-tracking error of the robots, and employ nonlinear small-gain techniques for stability analysis. The effectiveness of the proposed design is demonstrated through numerical simulations.
Feasible-Set Reshaping for Constraint Qualification in Optimization-Based Control
This paper presents a novel feasible-set reshaping technique to optimization-based control with ensured constraint qualification. In our problem setting, the feasible set of admissible control inputs depends on the real-time state of the plant, and the linear independence constraint qualification (LICQ) may not be satisfied in some regions of interest. By feasible-set reshaping, we project the constraints of the original feasible set onto an appropriately chosen constant matrix with its rows forming a positive span of the space of the optimization variable. It is proved that the reshaped feasible set is nonempty and satisfies LICQ, as long as the original feasible set is nonempty. The effectiveness of the proposed method is verified by constructing Lipschitz continuous quadratic-program-based (QP-based) controllers based on the reshaped feasible sets.
Robustness analysis in static and dynamic quantum state tomography
Quantum state tomography is a core task in quantum system identification. Real experimental conditions often deviate from nominal designs, introducing errors in both the measurement devices and the Hamiltonian governing the system's dynamics. In this paper, we investigate the robustness of quantum state tomography against such perturbations in both static and dynamic settings using linear regression estimation. We derive explicit bounds that quantify how bounded errors in the measurement devices and the Hamiltonian affect the mean squared error (MSE) upper bound in each scenario. Numerical simulations for qubit systems illustrate how these bounds scale with resources.
comment: 7 pages, 3 figures
A Unified Low-rank ADI Framework with Shared Linear Solves for Simultaneously Solving Multiple Lyapunov, Sylvester, and Riccati Equations
The alternating direction implicit (ADI) methods are computationally efficient and numerically effective tools for computing low-rank solutions of large-scale linear matrix equations. It is known in the literature that the low-rank ADI method for Lyapunov equations is a Petrov-Galerkin projection algorithm that implicitly performs model order reduction. It recursively enforces interpolation at the mirror images of the ADI shifts and places the poles of the reduced-order models at the ADI shifts. In this paper, we show that the low-rank ADI methods for Sylvester and Riccati equations are also Petrov-Galerkin projection algorithms that implicitly perform model order reduction. These methods likewise enforce interpolation at the mirror images of the ADI shifts; however, they do not place the poles at the mirror images of the interpolation points. Instead, their pole placement ensures that the projected Sylvester and Riccati equations they implicitly solve admit a unique solution. By observing that the ADI methods for Lyapunov, Sylvester, and Riccati equations differ only in pole placement and not in their interpolatory nature, we show that the shifted linear solves-which constitute the bulk of the computational cost-can be shared. The pole-placement step involves only small-scale operations and is therefore inexpensive. We propose a unified ADI framework that requires only two shifted linear solves per iteration to simultaneously solve six Lyapunov equations, one Sylvester equation, and ten Riccati equations, thus substantially increasing the return on investment for the computational cost spent on the linear solves. All operations needed to extract the individual solutions from these shared linear solves are small-scale and inexpensive. Three numerical examples are presented to demonstrate the effectiveness of the proposed ADI framework.
LocoMamba: Vision-Driven Locomotion via End-to-End Deep Reinforcement Learning with Mamba
We introduce LocoMamba, a vision-driven cross-modal DRL framework built on selective state-space models, specifically leveraging Mamba, that achieves near-linear-time sequence modeling, effectively captures long-range dependencies, and enables efficient training with longer sequences. First, we embed proprioceptive states with a multilayer perceptron and patchify depth images with a lightweight convolutional neural network, producing compact tokens that improve state representation. Second, stacked Mamba layers fuse these tokens via near-linear-time selective scanning, reducing latency and memory footprint, remaining robust to token length and image resolution, and providing an inductive bias that mitigates overfitting. Third, we train the policy end-to-end with Proximal Policy Optimization under terrain and appearance randomization and an obstacle-density curriculum, using a compact state-centric reward that balances progress, smoothness, and safety. We evaluate our method in challenging simulated environments with static and moving obstacles as well as uneven terrain. Compared with state-of-the-art baselines, our method achieves higher returns and success rates with fewer collisions, exhibits stronger generalization to unseen terrains and obstacle densities, and improves training efficiency by converging in fewer updates under the same compute budget.
comment: 14 pages. This paper has been published in Advanced Engineering Informatics. Please cite the journal version: DOI: 10.1016/j.aei.2025.104230
SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling
Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SUPERGEN, an efficient tile-based framework for ultra-high-resolution video generation. SUPERGEN features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SUPERGEN incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SUPERGEN also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations show that SUPERGEN maximizes performance gains while achieving high output quality across various benchmarks.
Major Space Weather Risks Identified via Coupled Physics-Engineering-Economic Modeling
Space weather poses an important but under-quantified threat to critical infrastructure, the economy, and society. While extreme geomagnetic storms are recognized as potential global catastrophes, their socio-economic impacts remain poorly quantified. Here we present a novel physics-engineering-economic framework that links geophysical drivers of geomagnetic storms to power grid geoelectric fields, transformer vulnerability, and macroeconomic consequences. Using the United States as an example, we estimate daily economic losses from transformer thermal heating of 1.37 billion USD (95 percent confidence interval: 1.16 to 1.58 billion USD) for a 100-year geomagnetic storm, with power outages affecting 4.1 million people and 101,000 businesses. A 250-year event could raise losses to 2.09 billion USD per day (95 percent confidence interval: 1.84 to 2.34 billion USD), disrupting power for more than 6 million people and 155,000 businesses. Crucially, the framework is scalable and transferable, offering a template for assessing space weather risk to critical infrastructure in other countries. This integrative approach provides the first end-to-end quantification of space weather socio-economic impacts, bridging space physics through to policy-relevant metrics. Our results demonstrate that coupled socio-economic modeling of space weather is both feasible and essential, enabling evidence-based decision making in infrastructure protection and global risk management.
Adaptive Federated Learning to Optimize Integrated Flows in Cyber-Physical Data Centers
Data centers play an increasingly critical role in societal digitalization, yet their rapidly growing energy demand poses significant challenges for sustainable operation. To enhance the energy efficiency of geographically distributed data centers, this paper formulates a multi-period optimization model that captures the interdependence of electricity, heat, and data flows. The optimization of such integrated multi-domain flows inherently involves mixed-integer formulations and the access to proprietary or sensitive datasets, which correspondingly exacerbate computational complexity and raise data-privacy concerns. To address these challenges, an adaptive federated learning-to-optimization approach is proposed, accounting for the heterogeneity of datasets across distributed data centers. To safeguard privacy, cryptography techniques are leveraged in both the learning and optimization processes. A model acceptance criterion with convergence guarantee is developed to improve learning performance and filter out potentially contaminated data, while a verifiable double aggregation mechanism is further proposed to simultaneously ensure privacy and integrity of shared data during optimization. Theoretical analysis and numerical simulations demonstrate that the proposed approach preserves the privacy and integrity of shared data, achieves near-optimal performance, and exhibits high computational efficiency, making it suitable for large-scale data center optimization under privacy constraints.
Robotics
Autonomously Unweaving Multiple Cables Using Visual Feedback
Many cable management tasks involve separating out the different cables and removing tangles. Automating this task is challenging because cables are deformable and can have combinations of knots and multiple interwoven segments. Prior works have focused on untying knots in one cable, which is one subtask of cable management. However, in this paper, we focus on a different subtask called multi-cable unweaving, which refers to removing the intersections among multiple interwoven cables to separate them and facilitate further manipulation. We propose a method that utilizes visual feedback to unweave a bundle of loosely entangled cables. We formulate cable unweaving as a pick-and-place problem, where the grasp position is selected from discrete nodes in a graph-based cable state representation. Our cable state representation encodes both topological and geometric information about the cables from the visual image. To predict future cable states and identify valid actions, we present a novel state transition model that takes into account the straightening and bending of cables during manipulation. Using this state transition model, we select between two high-level action primitives and calculate predicted immediate costs to optimize the lower-level actions. We experimentally demonstrate that iterating the above perception-planning-action process enables unweaving electric cables and shoelaces with an 84% success rate on average.
comment: 6 pages, 5 figures
Sim2Real Reinforcement Learning for Soccer skills
This thesis work presents a more efficient and effective approach to training control-related tasks for humanoid robots using Reinforcement Learning (RL). The traditional RL methods are limited in adapting to real-world environments, complexity, and natural motions, but the proposed approach overcomes these limitations by using curriculum training and Adversarial Motion Priors (AMP) technique. The results show that the developed RL policies for kicking, walking, and jumping are more dynamic, and adaptive, and outperformed previous methods. However, the transfer of the learned policy from simulation to the real world was unsuccessful, highlighting the limitations of current RL methods in fully adapting to real-world scenarios.
comment: Undergrad Thesis
Unifying Quadrotor Motion Planning and Control by Chaining Different Fidelity Models
Many aerial tasks involving quadrotors demand both instant reactivity and long-horizon planning. High-fidelity models enable accurate control but are too slow for long horizons; low-fidelity planners scale but degrade closed-loop performance. We present Unique, a unified MPC that cascades models of different fidelity within a single optimization: a short-horizon, high-fidelity model for accurate control, and a long-horizon, low-fidelity model for planning. We align costs across horizons, derive feasibility-preserving thrust and body-rate constraints for the point-mass model, and introduce transition constraints that match the different states, thrust-induced acceleration, and jerk-body-rate relations. To prevent local minima emerging from nonsmooth clutter, we propose a 3D progressive smoothing schedule that morphs norm-based obstacles along the horizon. In addition, we deploy parallel randomly initialized MPC solvers to discover lower-cost local minima on the long, low-fidelity horizon. In simulation and real flights, under equal computational budgets, Unique improves closed-loop position or velocity tracking by up to 75% compared with standard MPC and hierarchical planner-tracker baselines. Ablations and Pareto analyses confirm robust gains across horizon variations, constraint approximations, and smoothing schedules.
INDOOR-LiDAR: Bridging Simulation and Reality for Robot-Centric 360 degree Indoor LiDAR Perception -- A Robot-Centric Hybrid Dataset
We present INDOOR-LIDAR, a comprehensive hybrid dataset of indoor 3D LiDAR point clouds designed to advance research in robot perception. Existing indoor LiDAR datasets often suffer from limited scale, inconsistent annotation formats, and human-induced variability during data collection. INDOOR-LIDAR addresses these limitations by integrating simulated environments with real-world scans acquired using autonomous ground robots, providing consistent coverage and realistic sensor behavior under controlled variations. Each sample consists of dense point cloud data enriched with intensity measurements and KITTI-style annotations. The annotation schema encompasses common indoor object categories within various scenes. The simulated subset enables flexible configuration of layouts, point densities, and occlusions, while the real-world subset captures authentic sensor noise, clutter, and domain-specific artifacts characteristic of real indoor settings. INDOOR-LIDAR supports a wide range of applications including 3D object detection, bird's-eye-view (BEV) perception, SLAM, semantic scene understanding, and domain adaptation between simulated and real indoor domains. By bridging the gap between synthetic and real-world data, INDOOR-LIDAR establishes a scalable, realistic, and reproducible benchmark for advancing robotic perception in complex indoor environments.
Programmable Deformation Design of Porous Soft Actuator through Volumetric-Pattern-Induced Anisotropy
Conventional soft pneumatic actuators, typically based on hollow elastomeric chambers, often suffer from small structural support and require costly geometry-specific redesigns for multimodal functionality. Porous materials such as foam, filled into chambers, can provide structural stability for the actuators. However, methods to achieve programmable deformation by tailoring the porous body itself remain underexplored. In this paper, a novel design method is presented to realize soft porous actuators with programmable deformation by incising specific patterns into the porous foam body. This approach introduces localized structural anisotropy of the foam guiding the material's deformation under a global vacuum input. Furthermore, three fundamental patterns on a cylindrical foam substrate are discussed: transverse for bending, longitudinal for tilting, and diagonal for twisting. A computational model is built with Finite Element Analysis (FEA), to investigate the mechanism of the incision-patterning method. Experiments demonstrate that with a potential optimal design of the pattern array number N, actuators can achieve bending up to $80^{\circ}$ (N=2), tilting of $18^{\circ}$ (N=1), and twisting of $115^{\circ}$ (N=8). The versatility of our approach is demonstrated via pattern transferability, scalability, and mold-less rapid prototyping of complex designs. As a comprehensive application, we translate the human hand crease map into a functional incision pattern, creating a bio-inspired soft robot hand capable of human-like adaptive grasping. Our work provides a new, efficient, and scalable paradigm for the design of multi-functional soft porous robots.
From Human Intention to Action Prediction: A Comprehensive Benchmark for Intention-driven End-to-End Autonomous Driving
Current end-to-end autonomous driving systems operate at a level of intelligence akin to following simple steering commands. However, achieving genuinely intelligent autonomy requires a paradigm shift: moving from merely executing low-level instructions to understanding and fulfilling high-level, abstract human intentions. This leap from a command-follower to an intention-fulfiller, as illustrated in our conceptual framework, is hindered by a fundamental challenge: the absence of a standardized benchmark to measure and drive progress on this complex task. To address this critical gap, we introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions. Intention-Drive features two core contributions: (1) a new dataset of complex scenarios paired with corresponding natural language intentions, and (2) a novel evaluation protocol centered on the Intent Success Rate (ISR), which assesses the semantic fulfillment of the human's goal beyond simple geometric accuracy. Through an extensive evaluation of a spectrum of baseline models on Intention-Drive, we reveal a significant performance deficit, showing that the baseline model struggle to achieve the comprehensive scene and intention understanding required for this advanced task.
CAR-CHASE: Car-Like Robot Conflict-Aware Heuristic Adaptive Search Enhancement
Multi-Agent Path Finding (MAPF) for car-like robots, addressed by algorithms such as Conflict-Based Search with Continuous Time (CL-CBS), faces significant computational challenges due to expensive kinematic heuristic calculations. Traditional heuristic caching assumes that the heuristic function depends only on the state, which is incorrect in CBS where constraints from conflict resolution make the search space context-dependent. We propose \textbf{CAR-CHASE} (Car-Like Robot Conflict-Aware Heuristic Adaptive Search Enhancement), a novel approach that combines \textbf{conflict-aware heuristic caching} -- which caches heuristic values based on both state and relevant constraint context -- with an \textbf{adaptive hybrid heuristic} that intelligently switches between fast approximate and exact computations. Our key innovations are (1) a compact \emph{conflict fingerprint} that efficiently encodes which constraints affect a state's heuristic, (2) a relevance filter using spatial, temporal, and geometric criteria, and (3) an adaptive switching strategy with theoretical quality bounds. Experimental evaluation on 480 benchmark instances with varying agent counts (10 to 30) and obstacle densities (0\% and 50\%) demonstrates a geometric mean speedup of 2.46$\times$ over the baseline CL-CBS implementation while maintaining solution optimality. The optimizations improve success rate from 77.9\% to 84.8\% (+6.9 percentage points), reduce total runtime by 70.1\%, and enable solving 33 additional instances that previously timed out. Performance gains scale with problem complexity, reaching up to 4.06$\times$ speedup for challenging 30-agent obstacle scenarios. Our techniques are general and applicable to other CBS variants.
Robust Underwater Localization of Buoyancy Driven microFloats Using Acoustic Time-of-Flight Measurements
Accurate underwater localization remains a challenge for inexpensive autonomous platforms that require highfrequency position updates. In this paper, we present a robust, low-cost localization pipeline for buoyancy-driven microFloats operating in coastal waters. We build upon previous work by introducing a bidirectional acoustic Time-of-Flight (ToF) localization framework, which incorporates both float-to-buoy and buoy-to-float transmissions, thereby increasing the number of usable measurements. The method integrates nonlinear trilateration with a filtering of computed position estimates based on geometric cost and Cramer-Rao Lower Bounds (CRLB). This approach removes outliers caused by multipath effects and other acoustic errors from the ToF estimation and improves localization robustness without relying on heavy smoothing. We validate the framework in two field deployments in Puget Sound, Washington, USA. The localization pipeline achieves median positioning errors below 4 m relative to GPS positions. The filtering technique shows a reduction in mean error from 139.29 m to 12.07 m, and improved alignment of trajectories with GPS paths. Additionally, we demonstrate a Time-Difference-of-Arrival (TDoA) localization for unrecovered floats that were transmitting during the experiment. Range-based acoustic localization techniques are widely used and generally agnostic to hardware-this work aims to maximize their utility by improving positioning frequency and robustness through careful algorithmic design.
comment: 9 pages
Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy
Fall recovery is a critical skill for humanoid robots in dynamic environments such as RoboCup, where prolonged downtime often decides the match. Recent techniques using deep reinforcement learning (DRL) have produced robust get-up behaviors, yet existing methods require training of separate policies for each robot morphology. This paper presents a single DRL policy capable of recovering from falls across seven humanoid robots with diverse heights (0.48 - 0.81 m), weights (2.8 - 7.9 kg), and dynamics. Trained with CrossQ, the unified policy transfers zero-shot up to 86 +/- 7% (95% CI [81, 89]) on unseen morphologies, eliminating the need for robot-specific training. Comprehensive leave-one-out experiments, morph scaling analysis, and diversity ablations show that targeted morphological coverage improves zero-shot generalization. In some cases, the shared policy even surpasses the specialist baselines. These findings illustrate the practicality of morphology-agnostic control for fall recovery, laying the foundation for generalist humanoid control. The software is open-source and available at: https://github.com/utra-robosoccer/unified-humanoid-getup
comment: Accepted at 28th RoboCup International Symposium
Semantic Zone based 3D Map Management for Mobile Robot
Mobile robots in large-scale indoor environments, such as hospitals and logistics centers, require accurate 3D spatial representations. However, 3D maps consume substantial memory, making it difficult to maintain complete map data within limited computational resources. Existing SLAM frameworks typically rely on geometric distance or temporal metrics for memory management, often resulting in inefficient data retrieval in spatially compartmentalized environments. To address this, we propose a semantic zone-based 3D map management method that shifts the paradigm from geometry-centric to semantics-centric control. Our approach partitions the environment into meaningful spatial units (e.g., lobbies, hallways) and designates these zones as the primary unit for memory management. By dynamically loading only task-relevant zones into Working Memory (WM) and offloading inactive zones to Long-Term Memory (LTM), the system strictly enforces user-defined memory thresholds. Implemented within the RTAB-Map framework, our method demonstrates substantial reductions in unnecessary signature load/unload cycles and cumulative memory utilization compared to standard approaches. The results confirm that semantic zone-based management ensures stable, predictable memory usage while preserving map availability for navigation. Code is available at: https://github.com/huichangs/rtabmap/tree/segment
comment: 12 pages, 11 figures
Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.
comment: 9 Pages, 8 Figures
A Hybrid Deep Learning Framework for Emotion Recognition in Children with Autism During NAO Robot-Mediated Interaction
Understanding emotional responses in children with Autism Spectrum Disorder (ASD) during social interaction remains a critical challenge in both developmental psychology and human-robot interaction. This study presents a novel deep learning pipeline for emotion recognition in autistic children in response to a name-calling event by a humanoid robot (NAO), under controlled experimental settings. The dataset comprises of around 50,000 facial frames extracted from video recordings of 15 children with ASD. A hybrid model combining a fine-tuned ResNet-50-based Convolutional Neural Network (CNN) and a three-layer Graph Convolutional Network (GCN) trained on both visual and geometric features extracted from MediaPipe FaceMesh landmarks. Emotions were probabilistically labeled using a weighted ensemble of two models: DeepFace's and FER, each contributing to soft-label generation across seven emotion classes. Final classification leveraged a fused embedding optimized via Kullback-Leibler divergence. The proposed method demonstrates robust performance in modeling subtle affective responses and offers significant promise for affective profiling of ASD children in clinical and therapeutic human-robot interaction contexts, as the pipeline effectively captures micro emotional cues in neurodivergent children, addressing a major gap in autism-specific HRI research. This work represents the first such large-scale, real-world dataset and pipeline from India on autism-focused emotion analysis using social robotics, contributing an essential foundation for future personalized assistive technologies.
comment: 12 pages, journal paper
Navigation Around Unknown Space Objects Using Visible-Thermal Image Fusion SC
As the popularity of on-orbit operations grows, so does the need for precise navigation around unknown resident space objects (RSOs) such as other spacecraft, orbital debris, and asteroids. The use of Simultaneous Localization and Mapping (SLAM) algorithms is often studied as a method to map out the surface of an RSO and find the inspector's relative pose using a lidar or conventional camera. However, conventional cameras struggle during eclipse or shadowed periods, and lidar, though robust to lighting conditions, tends to be heavier, bulkier, and more power-intensive. Thermal-infrared cameras can track the target RSO throughout difficult illumination conditions without these limitations. While useful, thermal-infrared imagery lacks the resolution and feature-richness of visible cameras. In this work, images of a target satellite in low Earth orbit are photo-realistically simulated in both visible and thermal-infrared bands. Pixel-level fusion methods are used to create visible/thermal-infrared composites that leverage the best aspects of each camera. Navigation errors from a monocular SLAM algorithm are compared between visible, thermal-infrared, and fused imagery in various lighting and trajectories. Fused imagery yields substantially improved navigation performance over visible-only and thermal-only methods.
comment: 18 pages, 11 figures. To be published in proceedings of AIAA SCITECH 2026 Forum
B-ActiveSEAL: Scalable Uncertainty-Aware Active Exploration with Tightly Coupled Localization-Mapping
Active robot exploration requires decision-making processes that integrate localization and mapping under tightly coupled uncertainty. However, managing these interdependent uncertainties over long-term operations in large-scale environments rapidly becomes computationally intractable. To address this challenge, we propose B-ActiveSEAL, a scalable information-theoretic active exploration framework that explicitly accounts for coupled uncertainties-from perception through mapping-into the decision-making process. Our framework (i) adaptively balances map uncertainty (exploration) and localization uncertainty (exploitation), (ii) accommodates a broad class of generalized entropy measures, enabling flexible and uncertainty-aware active exploration, and (iii) establishes Behavioral entropy (BE) as an effective information measure for active exploration by enabling intuitive and adaptive decision-making under coupled uncertainties. We establish a theoretical foundation for propagating coupled uncertainties and integrating them into general entropy formulations, enabling uncertainty-aware active exploration under tightly coupled localization-mapping. The effectiveness of the proposed approach is validated through rigorous theoretical analysis and extensive experiments on open-source maps and ROS-Unity simulations across diverse and complex environments. The results demonstrate that B-ActiveSEAL achieves a well-balanced exploration-exploitation trade-off and produces diverse, adaptive exploration behaviors across environments, highlighting clear advantages over representative baselines.
comment: 18 pages, 17 figures
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes
Video diffusion techniques have advanced significantly in recent years; however, they struggle to generate realistic imagery of car crashes due to the scarcity of accident events in most driving datasets. Improving traffic safety requires realistic and controllable accident simulations. To tackle the problem, we propose Ctrl-Crash, a controllable car crash video generation model that conditions on signals such as bounding boxes, crash types, and an initial image frame. Our approach enables counterfactual scenario generation where minor variations in input can lead to dramatically different crash outcomes. To support fine-grained control at inference time, we leverage classifier-free guidance with independently tunable scales for each conditioning signal. Ctrl-Crash achieves state-of-the-art performance across quantitative video quality metrics (e.g., FVD and JEDi) and qualitative measurements based on a human-evaluation of physical realism and video quality compared to prior diffusion-based methods.
comment: Under review at Pattern Recognition Letters
End-to-End Dexterous Arm-Hand VLA Policies via Shared Autonomy: VR Teleoperation Augmented by Autonomous Hand VLA Policy for Efficient Data Collection
Achieving human-like dexterous manipulation remains a major challenge for general-purpose robots. While Vision-Language-Action (VLA) models show potential in learning skills from demonstrations, their scalability is limited by scarce high-quality training data. Existing data collection methods face inherent constraints: manual teleoperation overloads human operators, while automated planning often produces unnatural motions. We propose a Shared Autonomy framework that divides control between macro and micro motions. A human operator guides the robot's arm pose through intuitive VR teleoperation, while an autonomous DexGrasp-VLA policy handles fine-grained hand control using real-time tactile and visual feedback. This division significantly reduces cognitive load and enables efficient collection of high-quality coordinated arm-hand demonstrations. Using this data, we train an end-to-end VLA policy enhanced with our novel Arm-Hand Feature Enhancement module, which captures both distinct and shared representations of macro and micro movements for more natural coordination. Our Corrective Teleoperation system enables continuous policy improvement through human-in-the-loop failure recovery. Experiments demonstrate that our framework generates high-quality data with minimal manpower and achieves a 90% success rate across diverse objects, including unseen instances. Comprehensive evaluations validate the system's effectiveness in developing dexterous manipulation capabilities.
Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains
Learning is the basis of both biological and artificial systems when it comes to mimicking intelligent behaviors. From the classical PPO (Proximal Policy Optimization), there is a series of deep reinforcement learning algorithms which are widely used in training locomotion policies for quadrupedal robots because of their stability and sample efficiency. However, among all these variants, experiments and simulations often converge prematurely, leading to suboptimal locomotion and reduced task performance. Therefore, in this paper, we introduce Entropy-Controlled Intrinsic Motivation (ECIM), an entropy-based reinforcement learning algorithm in contrast with the PPO series, that can reduce premature convergence by combining intrinsic motivation with adaptive exploration. For experiments, in order to parallel with other baselines, we chose to apply it in Isaac Gym across six terrain categories: upward slopes, downward slopes, uneven rough terrain, ascending stairs, descending stairs, and flat ground as widely used. For comparison, our experiments consistently achieve better performance: task rewards increase by 4--12%, peak body pitch oscillation is reduced by 23--29%, joint acceleration decreases by 20--32%, and joint torque consumption declines by 11--20%. Overall, our model ECIM, by combining entropy control and intrinsic motivation control, achieves better results in stability across different terrains for quadrupedal locomotion, and at the same time reduces energetic cost and makes it a practical choice for complex robotic control tasks.
BEASST: Behavioral Entropic Gradient based Adaptive Source Seeking for Mobile Robots
This paper presents BEASST (Behavioral Entropic Gradient-based Adaptive Source Seeking for Mobile Robots), a novel framework for robotic source seeking in complex, unknown environments. Our approach enables mobile robots to efficiently balance exploration and exploitation by modeling normalized signal strength as a surrogate probability of source location. Building on Behavioral Entropy(BE) with Prelec's probability weighting function, we define an objective function that adapts robot behavior from risk-averse to risk-seeking based on signal reliability and mission urgency. The framework provides theoretical convergence guarantees under unimodal signal assumptions and practical stability under bounded disturbances. Experimental validation across DARPA SubT and multi-room scenarios demonstrates that BEASST consistently outperforms state-of-the-art methods, achieving 15% reduction in path length and 20% faster source localization through intelligent uncertainty-driven navigation that dynamically transitions between aggressive pursuit and cautious exploration.
Camera Calibration via Circular Patterns: A Comprehensive Framework with Detection Uncertainty and Unbiased Projection Model
Camera calibration using planar targets has been widely favored, and two types of control points have been mainly considered as measurements: the corners of the checkerboard and the centroid of circles. Since a centroid is derived from numerous pixels, the circular pattern provides more precise measurements than the checkerboard. However, the existing projection model of circle centroids is biased under lens distortion, resulting in low performance. To surmount this limitation, we propose an unbiased projection model of the circular pattern and demonstrate its superior accuracy compared to the checkerboard. Complementing this, we introduce uncertainty into circular patterns to enhance calibration robustness and completeness. Defining centroid uncertainty improves the performance of calibration components, including pattern detection, optimization, and evaluation metrics. We also provide guidelines for performing good camera calibration based on the evaluation metric. The core concept of this approach is to model the boundary points of a two-dimensional shape as a Markov random field, considering its connectivity. The shape distribution is propagated to the centroid uncertainty through an appropriate shape representation based on the Green theorem. Consequently, the resulting framework achieves marked gains in calibration accuracy and robustness. The complete source code and demonstration video are available at https://github.com/chaehyeonsong/discocal.
Multiagent Systems
Emergence: Overcoming Privileged Information Bias in Asymmetric Embodied Agents via Active Querying
Large Language Models (LLMs) act as powerful reasoning engines but struggle with "symbol grounding" in embodied environments, particularly when information is asymmetrically distributed. We investigate the Privileged Information Bias (or "Curse of Knowledge"), where a knowledgeable "Leader" agent fails to guide a sensor-limited "Follower" due to a lack of Theory of Mind. To quantify this phenomenon, we propose a novel Asymmetric Assistive Reasoning framework within AI2-THOR. Our experiments reveal a significant "Success Gap": while the Leader successfully perceives the target in 35.0% of episodes, the collaborative team succeeds only 17.0% of the time, implying that nearly 50% of feasible plans fail solely due to communicative grounding errors. We demonstrate that a "Pull-based" protocol (active querying) is significantly more robust than standard "Push-based" instruction, with successful episodes featuring 2x the frequency of clarification requests. This research isolates the mechanism of active uncertainty reduction as a prerequisite for safe human-AI and robot-robot collaboration.
comment: 12 pages, 9 pages of content, 6 tables, 5 figures
Bidirectional human-AI collaboration in brain tumour assessments improves both expert human and AI agent performance
The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an inverse approach is possible: AI benefiting from the support of an expert human agent. Here, we investigate both human-AI clinical partnership paradigms in the magnetic resonance imaging-guided characterisation of patients with brain tumours. We reveal that human-AI partnerships improve accuracy and metacognitive ability not only for radiologists supported by AI, but also for AI agents supported by radiologists. Moreover, the greatest patient benefit was evident with an AI agent supported by a human one. Synergistic improvements in agent accuracy, metacognitive performance, and inter-rater agreement suggest that AI can create more capable, confident, and consistent clinical agents, whether human or model-based. Our work suggests that the maximal value of AI in healthcare could emerge not from replacing human intelligence, but from AI agents that routinely leverage and amplify it.
comment: 38 pages, 6 figures, 7 supplementary figures
Trusted AI Agents in the Cloud
AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However, these agents run within a complex multi-party ecosystem, where untrusted components can lead to data leakage, tampering, or unintended behavior. Existing Confidential Virtual Machines (CVMs) provide only per binary protection and offer no guarantees for cross-principal trust, accelerator-level isolation, or supervised agent behavior. We present Omega, a system that enables trusted AI agents by enforcing end-to-end isolation, establishing verifiable trust across all contributing principals, and supervising every external interaction with accountable provenance. Omega builds on Confidential VMs and Confidential GPUs to create a Trusted Agent Platform that hosts many agents within a single CVM using nested isolation. It also provides efficient multi-agent orchestration with cross-principal trust establishment via differential attestation, and a policy specification and enforcement framework that governs data access, tool usage, and inter-agent communication for data protection and regulatory compliance. Implemented on AMD SEV-SNP and NVIDIA H100, Omega fully secures agent state across CVM-GPU, and achieves high performance while enabling high-density, policy-compliant multi-agent deployments at cloud scale.
Regret Lower Bounds for Decentralized Multi-Agent Stochastic Shortest Path Problems NeurIPS 2025
Multi-agent systems (MAS) are central to applications such as swarm robotics and traffic routing, where agents must coordinate in a decentralized manner to achieve a common objective. Stochastic Shortest Path (SSP) problems provide a natural framework for modeling decentralized control in such settings. While the problem of learning in SSP has been extensively studied in single-agent settings, the decentralized multi-agent variant remains largely unexplored. In this work, we take a step towards addressing that gap. We study decentralized multi-agent SSPs (Dec-MASSPs) under linear function approximation, where the transition dynamics and costs are represented using linear models. Applying novel symmetry-based arguments, we identify the structure of optimal policies. Our main contribution is the first regret lower bound for this setting based on the construction of hard-to-learn instances for any number of agents, $n$. Our regret lower bound of $Ω(\sqrt{K})$, over $K$ episodes, highlights the inherent learning difficulty in Dec-MASSPs. These insights clarify the learning complexity of decentralized control and can further guide the design of efficient learning algorithms in multi-agent systems.
comment: To appear in 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Systems and Control (EESS)
Improved Directional State Transition Tensors for Accurate Aerocapture Performance Analysis
Aerocapture is a unique challenge for semi-analytical propagation because its nonconservative dynamics lead to force magnitudes that vary substantially across the trajectory. State transition tensors (STTs), higher-order Taylor series expansions of the solution flow, have been widely used as a computationally efficient semi-analytical propagation method for orbital scenarios, but have not previously been applied to aerocapture. However, obtaining the higher-order STTs requires integrating exponentially more equations. Directional state transition tensors (DSTTs) mitigate this cost by projecting the state into a reduced-dimension basis. This work develops novel dynamics analysis techniques to identify effective bases for this reduction, including augmented higher-order Cauchy Green tensors tailored to quantities of interest such as apoapsis radius. Results show that DSTTs constructed along these bases significantly reduce computational cost while maintaining accuracy in apoapsis and energy prediction. In particular, certain of these DSTTs outperform traditional DSTTs in nonlinear perturbation propagation for key state subsets and quantities of interest. These results establish STTs and DSTTs as practical tools for aerocapture performance analysis to enable robust guidance and navigation.
comment: Submitted to AIAA Journal of Guidance, Control, and Dynamics for publication
AI-Driven Real-Time Kick Classification in Olympic Taekwondo Using Sensor Fusion
Olympic Taekwondo has faced challenges in spectator engagement due to static, defensive gameplay and contentious scoring. Current Protector and Scoring Systems (PSS) rely on impact sensors and simplistic logic, encouraging safe strategies that diminish the sport's dynamism. This paper proposes an AI-powered scoring system that integrates existing PSS sensors with additional accelerometers, gyroscopes, magnetic/RFID, and impact force sensors in a sensor fusion framework. The system classifies kicks in real-time to identify technique type, contact location, impact force, and even the part of the foot used. A machine learning pipeline employing sensor fusion and Support Vector Machines (SVMs) is detailed, enabling automatic kick technique recognition for scoring. We present a novel kick scoring rubric that awards points based on specific kick techniques (e.g., turning and spinning kicks) to incentivize dynamic attacks. Drawing on a 2024 study achieving 96-98% accuracy, we validate the feasibility of real-time kick classification and further propose enhancements to this methodology, such as ensemble SVM classifiers and expanded datasets, to achieve the high-stakes accuracy required by the sport. We analyze how the proposed system can improve scoring fairness, reduce rule exploitation and illegitimate tactics, encourage more dynamic techniques, and enhance spectator understanding and excitement. The paper includes system design illustrations, a kick scoring table from an AI-augmented rule set, and discusses anticipated impacts on Olympic Taekwondo.
comment: 13 pages, 4 figures
Unifying Quadrotor Motion Planning and Control by Chaining Different Fidelity Models
Many aerial tasks involving quadrotors demand both instant reactivity and long-horizon planning. High-fidelity models enable accurate control but are too slow for long horizons; low-fidelity planners scale but degrade closed-loop performance. We present Unique, a unified MPC that cascades models of different fidelity within a single optimization: a short-horizon, high-fidelity model for accurate control, and a long-horizon, low-fidelity model for planning. We align costs across horizons, derive feasibility-preserving thrust and body-rate constraints for the point-mass model, and introduce transition constraints that match the different states, thrust-induced acceleration, and jerk-body-rate relations. To prevent local minima emerging from nonsmooth clutter, we propose a 3D progressive smoothing schedule that morphs norm-based obstacles along the horizon. In addition, we deploy parallel randomly initialized MPC solvers to discover lower-cost local minima on the long, low-fidelity horizon. In simulation and real flights, under equal computational budgets, Unique improves closed-loop position or velocity tracking by up to 75% compared with standard MPC and hierarchical planner-tracker baselines. Ablations and Pareto analyses confirm robust gains across horizon variations, constraint approximations, and smoothing schedules.
Data-Selective Online Battery Identification Using Extended Time Regular Expressions
In this paper, we propose a data-efficient online battery identification method which targets highly informative battery cell data segments based on the driving pattern of the vehicle. We consider the case of a vehicle driving on/off a motorway and construct an Extended Time Regular Expression (ETRE) to detect data segments fitting these driving patterns. Simulation results indicate that by only using up to 10.71% of the data on average, the proposed method provides a low-bias and low-variance estimator under non-negligible current and voltage noise compared to other conventional estimation algorithms.
comment: This work has been submitted to IFAC for possible publication
Evaluating Asynchronous Semantics in Trace-Discovered Resilience Models: A Case Study on the OpenTelemetry Demo
While distributed tracing and chaos engineering are becoming standard for microservices, resilience models remain largely manual and bespoke. We revisit a trace-discovered connectivity model that derives a service dependency graph from traces and uses Monte Carlo simulation to estimate endpoint availability under fail-stop service failures. Compared to earlier work, we (i) derive the graph directly from raw OpenTelemetry traces, (ii) attach endpoint-specific success predicates, and (iii) add a simple asynchronous semantics that treats Kafka edges as non-blocking for immediate HTTP success. We apply this model to the OpenTelemetry Demo ("Astronomy Shop") using a GitHub Actions workflow that discovers the graph, runs simulations, and executes chaos experiments that randomly kill microservices in a Docker Compose deployment. Across the studied failure fractions, the model reproduces the overall availability degradation curve, while asynchronous semantics for Kafka edges change predicted availabilities by at most about 10^(-5) (0.001 percentage points). This null result suggests that for immediate HTTP availability in this case study, explicitly modeling asynchronous dependencies is not warranted, and a simpler connectivity-only model is sufficient.
Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving
Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.
comment: 9 Pages, 8 Figures
Braess' Paradoxes in Coupled Power and Transportation Systems
Transportation electrification introduces strong coupling between the power and transportation systems. In this paper, we generalize the classical notion of Braess' paradox to coupled power and transportation systems, and examine how the cross-system coupling induces new types of Braess' paradoxes. To this end, we model the power and transportation networks as graphs, coupled with charging points connecting to nodes in both graphs. The power system operation is characterized by the economic dispatch optimization, while the transportation system user equilibrium models travelers' route and charging choices. By analyzing simple coupled systems, we demonstrate that capacity expansion in either transportation or power system can deteriorate the performance of both systems, and uncover the fundamental mechanisms for such new Braess' paradoxes to occur. We also provide necessary and sufficient conditions of the occurrences of Braess' paradoxes for general coupled systems, leading to managerial insights for infrastructure planners. For general networks, through characterizing the generalized user equilibrium of the coupled systems, we develop efficient algorithms to detect Braess' paradoxes and novel charging pricing policies to mitigate them.
comment: 61 pages, 17 figures
Realizing Space-oriented Control in Smart Buildings via Word Embeddings
This paper presents a novel framework for implementing space-oriented control systems in smart buildings. In contrast to conventional device-oriented approaches, which often suffer from issues related to development efficiency and portability, our framework adopts a space-oriented paradigm that leverages natural language processing and word embedding techniques. The proposed framework features a chat-based graphical user interface (GUI) that converts natural language inputs into actionable OpenAI API calls, thereby enabling intuitive space level (e.g., room) control within smart environments. To support efficient embedding-based search and metadata retrieval, the framework integrates a vector database powered by Elasticsearch. This ensures the accurate identification and invocation of appropriate smart building APIs. A prototype implementation has been tested in a smart building environment at the University of Tokyo, demonstrating the feasibility of the approach.
comment: 4pages, 4figures, 1 table, accepted by IEEE GCCE 2025
Fractional Calculus in Optimal Control and Game Theory: Theory, Numerics, and Applications -- A Survey
Many physical, biological, and engineered systems exhibit memory effects that challenge Markovian models. Fractional calculus provides nonlocal operators to capture hereditary dynamics. This survey connects modeling, analysis, and controller/game design for systems with memory. We unify notation for Caputo, Riemann-Liouville, and Grunwald-Letnikov derivatives and relate them to practical approximations, including diffusive (sum-of-exponentials) state augmentation and frequency-domain realizations (e.g., Oustaloup). We review fractional extensions of the calculus of variations and the Pontryagin maximum principle, and dynamic-programming formulations with memory, including path-dependent HJB for optimal control and HJI for zero-sum games. We cover design tools such as LQR, MPC, and fractional-order PID, as well as fractional differential games with Nash, Stackelberg, and minimax equilibria. Computational approaches are compared across time-domain schemes, frequency-domain approximations, and diffusive augmentations, highlighting accuracy-complexity trade-offs and remedies for the curse of history (windowing and sum-of-exponentials). We conclude with applications and open problems on equilibria with memory, Isaacs-type conditions, constraint handling, and scalable solvers.
comment: 42 pages, 1 figure, comprehensive survey on fractional calculus in optimal control and game theory
A Framework for Scalable Digital Twin Deployment in Smart Campus Building Facility Management
Digital twin (DT) offers significant opportunities for enhancing facility management (FM) in campus environments. However, existing research often focuses narrowly on isolated domains, such as point-cloud geometry or energy analytics, without providing a scalable and interoperable workflow that integrates building geometry, equipment metadata, and operational data into a unified FM platform. This study proposes a comprehensive framework for scalable digital-twin deployment in smart campus buildings by integrating 3D laser scanning, BIM modeling, and IoT-enabled data visualization to support facility operations and maintenance. The methodology includes: (1) reality capture using terrestrial laser scanning and structured point-cloud processing; (2) development of an enriched BIM model incorporating architectural, mechanical, electrical, plumbing, conveying, and sensor systems; and (3) creation of a digital-twin environment that links equipment metadata, maintenance policies, and simulated IoT data within a digital-twin management platform. A case study of the Price Gilbert Building at Georgia Tech demonstrates the implementation of this workflow. A total of 509 equipment items were modeled and embedded with OmniClass classifications into the digital twin. Ten interactive dashboards were developed to visualize system performance. Results show that the proposed framework enables centralized asset documentation, improved system visibility, and enhanced preventive and reactive maintenance workflows. Although most IoT data were simulated due to limited existing sensor infrastructure, the prototype validates the feasibility of a scalable digital twin for facility management and establishes a reference model for real-time monitoring, analytics integration, and future autonomous building operations.
Neuromorphic Photonic Computing with an Electro-Optic Analog Memory
In neuromorphic photonic systems, device operations are typically governed by analog signals, necessitating digital-to-analog converters (DAC) and analog-to-digital converters (ADC). However, data movement between memory and these converters in conventional von Neumann architectures incur significant energy costs. We propose an analog electronic memory co-located with photonic computing units to eliminate repeated long-distance data movement. Here, we demonstrate a monolithically integrated neuromorphic photonic circuit with on-chip capacitive analog memory and evaluate its performance in machine learning for in situ training and inference using the MNIST dataset. Our analysis shows that integrating analog memory into a neuromorphic photonic architecture can achieve over 26x power savings compared to conventional SRAM-DAC architectures. Furthermore, maintaining a minimum analog memory retention-to-network-latency ratio of 100 maintains >90% inference accuracy, enabling leaky analog memories without substantial performance degradation. This approach reduces reliance on DACs, minimizes data movement, and offers a scalable pathway toward energy-efficient, high-speed neuromorphic photonic computing.
Cyber-Resilient Data-Driven Event-Triggered Secure Control for Autonomous Vehicles Under False Data Injection Attacks
This paper proposes a cyber-resilient secure control framework for autonomous vehicles (AVs) subject to false data injection (FDI) threats as actuator attacks. The framework integrates data-driven modeling, event-triggered communication, and fractional-order sliding mode control (FSMC) to enhance the resilience against adversarial interventions. A dynamic model decomposition (DMD)-based methodology is employed to extract the lateral dynamics from real-world data, eliminating the reliance on conventional mechanistic modeling. To optimize communication efficiency, an event-triggered transmission scheme is designed to reduce the redundant transmissions while ensuring system stability. Furthermore, an extended state observer (ESO) is developed for real-time estimation and mitigation of actuator attack effects. Theoretical stability analysis, conducted using Lyapunov methods and linear matrix inequality (LMI) formulations, guarantees exponential error convergence. Extensive simulations validate the proposed event-triggered secure control framework, demonstrating substantial improvements in attack mitigation, communication efficiency, and lateral tracking performance. The results show that the framework effectively counteracts actuator attacks while optimizing communication-resource utilization, making it highly suitable for safety-critical AV applications.
comment: 14 pages, 8 figures
Learning Dissipative Chaotic Dynamics with Boundedness Guarantees
Chaotic dynamics, commonly seen in weather systems and fluid turbulence, are characterized by their sensitivity to initial conditions, which makes accurate prediction challenging. Recent approaches have focused on developing data-driven models that attempt to preserve invariant statistics over long horizons since many chaotic systems exhibit dissipative behaviors and ergodicity. Despite the recent progress in such models, they are still often prone to generating unbounded trajectories, leading to invalid statistics evaluation. To address this fundamental challenge, we introduce a modular framework that provides formal guarantees of trajectory boundedness for neural network chaotic dynamics models. Our core contribution is a dissipative projection layer that leverages control-theoretic principles to ensure the learned system is dissipative. Specifically, our framework simultaneously learns a dynamics emulator and an energy-like function, where the latter is used to construct an algebraic dissipative constraint within the projection layer. Furthermore, the learned invariant level set provides an outer estimate for the system's strange attractor, which is known to be difficult to characterize due to its complex geometry. We demonstrate our model's ability to produce bounded long-horizon forecasts that preserve invariant statistics for chaotic dynamical systems, including Lorenz 96 and a reduced-order model of the Kuramoto-Sivashinsky equation.
Nash Equilibrium Learning In Large Populations With First-Order Payoff Modifications
We establish Nash equilibrium learning in large populations of noncooperative, strategic agents. Our analysis considers the broadest class to date of payoff mechanisms with first-order modifications, capable of modeling bounded rationality and anticipatory effects, averaging, or Padé delay approximations. We propose a framework that, for the first time, combines two nonstandard system-theoretic passivity notions. Our results hold for discontinuous best response dynamics alongside continuous learning rules, significantly extending prior work.
comment: 6 pages, 3 figures
Optimal Rates of Convergence for Entropy Regularization in Discounted Markov Decision Processes
We study the error introduced by entropy regularization in infinite-horizon discrete discounted Markov decision processes. We show that this error decreases exponentially in the inverse regularization strength, both in a weighted KL-divergence and in value with a problem-specific exponent. This is in contrast to previously known estimates, of the order $O(τ)$, where $τ$ is the regularization strength. We provide a lower bound that matches our upper bound up to a polynomial term, thereby characterizing the exponential convergence rate for entropy regularization. Our proof relies on the observation that the solutions of entropy-regularized Markov decision processes solve a gradient flow of the unregularized reward with respect to a Riemannian metric common in natural policy gradient methods. This correspondence allows us to identify the limit of this gradient flow as the generalized maximum entropy optimal policy, thereby characterizing the implicit bias of this gradient flow, which corresponds to a time-continuous version of the natural policy gradient method. We use our improved error estimates to show that for entropy-regularized natural policy gradient methods, the overall error decays exponentially in the square root of the number of iterations, improving over existing sublinear guarantees. Finally, we extend our analysis to settings beyond the entropy. In particular, we characterize the implicit bias regarding general convex potentials and their resulting generalized natural policy gradients.
comment: 32 pages, 1 figure
Inverse Optimal Control for Linear Quadratic Tracking with Unknown Target States
This paper addresses the inverse optimal control for the linear quadratic tracking problem with a fixed but unknown target state, which aims to estimate the possible triplets comprising the target state, the state weight matrix, and the input weight matrix from observed optimal control input and the corresponding state trajectories. Sufficient conditions have been provided for the unique determination of both the linear quadratic cost function as well as the target state. A computationally efficient and numerically reliable parameter identification algorithm is proposed by equating optimal control strategies with a system of linear equations, and the associated relative error upper bound is derived in terms of data volume and signal-to-noise ratio. Moreover, the proposed inverse optimal control algorithm is applied for the joint cluster coordination and intent identification of a multi-agent system. By incorporating the structural constraint of the Laplace matrix, the relative error upper bound can be reduced accordingly. Finally, the algorithm's efficiency and accuracy are validated by a vehicle-on-a-lever example and a multi-agent formation control example.
Variational Bayesian Inference for Multiple Extended Targets or Unresolved Group Targets Tracking
In this work, we propose a method for tracking multiple extended targets or unresolvable group targets in a clutter environment. Firstly, based on the Random Matrix Model (RMM), the joint state of the target is modeled as the Gamma Gaussian Inverse Wishart (GGIW) distribution. Considering the uncertainty of measurement origin caused by the clutters, we adopt the idea of probabilistic data association and describe the joint association event as an unknown parameter in the joint prior distribution. Then the Variational Bayesian Inference (VBI) is employed to approximately solve the non-analytical posterior distribution. Furthermore, to ensure the practicability of the proposed method, we further provide two potential lightweight schemes to reduce its computational complexity. One of them is based on clustering, which effectively prunes the joint association events. The other is a simplification of the variational posterior through marginal association probabilities. Finally, the effectiveness of the proposed method is demonstrated by simulation and real data experiments, and we show that the proposed method outperforms current state-of-the-art methods in terms of accuracy and adaptability.
comment: 21 pages, 14 figures, 4 tables
Quantum Encrypted Control of Networked Systems
Encrypted control has been extensively studied to ensure the confidentiality of system states and control inputs for networked control systems. This paper presents a computationally efficient encrypted control framework for networked systems enabled by quantum communication. A quantum channel between sensors and actuators is used to generate identical secret keys, whose security is further enhanced through quantum key distribution. These keys enable lightweight encryption and decryption while preserving confidentiality and control accuracy. We develop a novel encryption-decryption architecture for state-feedback control of linear systems based on quantum keys, and characterize the impact of quantum state errors on closed-loop stability. In particular, we establish the existence of a critical threshold on intrinsic quantum noise below which stability is guaranteed. In contrast to classical encrypted control schemes, which may collapse under a single key-bit error, the proposed quantum encrypted control exhibits strong robustness to key imperfections. We further adopt quantization techniques to address the scenarios with limited communication bits in practical situations, and implement privacy protection for quantum keys based on a stochastic quantizer. These results demonstrate that integrating quantum technologies into control systems in a nontrivial and principled manner, even at their current level of maturity, can yield substantial performance gains in reducing computational complexity and improving resilience to key errors while ensuring security against multiple eavesdropping sources.
Fixed Horizon Linear Quadratic Covariance Steering in Continuous Time with Hilbert-Schmidt Terminal Cost
We formulate and solve the fixed horizon linear quadratic covariance steering problem in continuous time with a terminal cost measured in Hilbert-Schmidt (i.e., Frobenius) norm error between the desired and the controlled terminal covariances. For this problem, the necessary conditions of optimality become a coupled matrix ODE two-point boundary value problem. To solve this system of equations, we design a matricial recursive algorithm and prove its convergence. The proposed algorithm and its analysis make use of the linear fractional transforms parameterized by the state transition matrix of the associated Hamiltonian matrix. To illustrate the results, we provide two numerical examples: one with a two dimensional and another with a six dimensional state space.
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: 34 pages, 7 figures
Robotics
AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis
The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited diversity and fidelity with pronounced sim-to-real gaps. While generative models present an attractive solution, existing methods often alter only visual appearances without creating new behaviors, or suffer from embodiment inconsistencies that yield implausible motions. To address these limitations, we introduce AnchorDream, an embodiment-aware world model that repurposes pretrained video diffusion models for robot data synthesis. AnchorDream conditions the diffusion process on robot motion renderings, anchoring the embodiment to prevent hallucination while synthesizing objects and environments consistent with the robot's kinematics. Starting from only a handful of human teleoperation demonstrations, our method scales them into large, diverse, high-quality datasets without requiring explicit environment modeling. Experiments show that the generated data leads to consistent improvements in downstream policy learning, with relative gains of 36.4% in simulator benchmarks and nearly double performance in real-world studies. These results suggest that grounding generative world models in robot motion provides a practical path toward scaling imitation learning.
comment: Project page: https://jay-ye.github.io/AnchorDream/
Toward a Decision Support System for Energy-Efficient Ferry Operation on Lake Constance based on Optimal Control
The maritime sector is undergoing a disruptive technological change driven by three main factors: autonomy, decarbonization, and digital transformation. Addressing these factors necessitates a reassessment of inland vessel operations. This paper presents the design and development of a decision support system for ferry operations based on a shrinking-horizon optimal control framework. The problem formulation incorporates a mathematical model of the ferry's dynamics and environmental disturbances, specifically water currents and wind, which can significantly influence the dynamics. Real-world data and illustrative scenarios demonstrate the potential of the proposed system to effectively support ferry crews by providing real-time guidance. This enables enhanced operational efficiency while maintaining predefined maneuver durations. The findings suggest that optimal control applications hold substantial promise for advancing future ferry operations on inland waters. A video of the real-world ferry MS Insel Mainau operating on Lake Constance is available at: https://youtu.be/i1MjCdbEQyE
comment: 6 pages, 8 figures
Agile Flight Emerges from Multi-Agent Competitive Racing
Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the environment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. Code: https://github.com/Jirl-upenn/AgileFlight_MultiAgent
ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics
Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements.
comment: 9 pages, 5 figures
BLURR: A Boosted Low-Resource Inference for Vision-Language-Action Models
Vision-language-action (VLA) models enable impressive zero shot manipulation, but their inference stacks are often too heavy for responsive web demos or high frequency robot control on commodity GPUs. We present BLURR, a lightweight inference wrapper that can be plugged into existing VLA controllers without retraining or changing model checkpoints. Instantiated on the pi-zero VLA controller, BLURR keeps the original observation interfaces and accelerates control by combining an instruction prefix key value cache, mixed precision execution, and a single step rollout schedule that reduces per step computation. In our SimplerEnv based evaluation, BLURR maintains task success rates comparable to the original controller while significantly lowering effective FLOPs and wall clock latency. We also build an interactive web demo that allows users to switch between controllers and toggle inference options in real time while watching manipulation episodes. This highlights BLURR as a practical approach for deploying modern VLA policies under tight compute budgets.
comment: 10 pages, 3 figures. Code and integration scripts will be released at this http URL: https://github.com/JijiKing-Sam/BLURR-A-Boosted-Low-Resource-Inference-for-Vision-Language-Action-Model
The Influence of Human-like Appearance on Expected Robot Explanations
A robot's appearance is a known factor influencing user's mental model and human-robot interaction, that has not been studied in the context of its influence in expected robot explanations. In this study, we investigate whether and to what extent the human-like appearance of robots elicits anthropomorphism, which is conceptualised as an attribution of mental capacities, and how the level of anthropomorphism is revealed in explanations that people expect to receive. We designed a between-subject study comprising conditions with visual stimuli of three domestic service robots with varying human-like appearance, and we prompted respondents to provide explanations they would expect to receive from the robot for the same robot actions. We found that most explanations were anthropomorphic across all conditions. However, there is a positive correlation between the anthropomorphic explanations and human-like appearance. We also report on more nuanced trends observed in non-anthropomorphic explanations and trends in robot descriptions.
Bench-Push: Benchmarking Pushing-based Navigation and Manipulation Tasks for Mobile Robots ICRA 2026
Mobile robots are increasingly deployed in cluttered environments with movable objects, posing challenges for traditional methods that prohibit interaction. In such settings, the mobile robot must go beyond traditional obstacle avoidance, leveraging pushing or nudging strategies to accomplish its goals. While research in pushing-based robotics is growing, evaluations rely on ad hoc setups, limiting reproducibility and cross-comparison. To address this, we present Bench-Push, the first unified benchmark for pushing-based mobile robot navigation and manipulation tasks. Bench-Push includes multiple components: 1) a comprehensive range of simulated environments that capture the fundamental challenges in pushing-based tasks, including navigating a maze with movable obstacles, autonomous ship navigation in ice-covered waters, box delivery, and area clearing, each with varying levels of complexity; 2) novel evaluation metrics to capture efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-Push to evaluate example implementations of established baselines across environments. Bench-Push is open-sourced as a Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.
comment: Under review for ICRA 2026
Two-dimensional Decompositions of High-dimensional Configurations for Efficient Multi-vehicle Coordination at Intelligent Intersections
For multi-vehicle complex traffic scenarios in shared spaces such as intelligent intersections, safe coordination and trajectory planning is challenging due to computational complexity. To meet this challenge, we introduce a computationally efficient method for generating collision-free trajectories along predefined vehicle paths. We reformulate a constrained minimum-time trajectory planning problem as a problem in a high-dimensional configuration space, where conflict zones are modeled by high-dimensional polyhedra constructed from two-dimensional rectangles. Still, in such a formulation, as the number of vehicles involved increases, the computational complexity increases significantly. To address this, we propose two algorithms for near-optimal local optimization that significantly reduce the computational complexity by decomposing the high-dimensional problem into a sequence of 2D graph search problems. The resulting trajectories are then incorporated into a Nonlinear Model Predictive Control (NMPC) framework to ensure safe and smooth vehicle motion. We furthermore show in numerical evaluation that this approach significantly outperforms existing MILP-based time-scheduling; both in terms of objective-value and computational time.
Architecting Large Action Models for Human-in-the-Loop Intelligent Robots
The realization of intelligent robots, operating autonomously and interacting with other intelligent agents, human or artificial, requires the integration of environment perception, reasoning, and action. Classic Artificial Intelligence techniques for this purpose, focusing on symbolic approaches, have long-ago hit the scalability wall on compute and memory costs. Advances in Large Language Models in the past decade (neural approaches) have resulted in unprecedented displays of capability, at the cost of control, explainability, and interpretability. Large Action Models aim at extending Large Language Models to encompass the full perception, reasoning, and action cycle; however, they typically require substantially more comprehensive training and suffer from the same deficiencies in reliability. Here, we show it is possible to build competent Large Action Models by composing off-the-shelf foundation models, and that their control, interpretability, and explainability can be effected by incorporating symbolic wrappers and associated verification on their outputs, achieving verifiable neuro-symbolic solutions for intelligent robots. Our experiments on a multi-modal robot demonstrate that Large Action Model intelligence does not require massive end-to-end training, but can be achieved by integrating efficient perception models with a logic-driven core. We find that driving action execution through the generation of Planning Domain Definition Language (PDDL) code enables a human-in-the-loop verification stage that effectively mitigates action hallucinations. These results can support practitioners in the design and development of robotic Large Action Models across novel industries, and shed light on the ongoing challenges that must be addressed to ensure safety in the field.
UniBYD: A Unified Framework for Learning Robotic Manipulation Across Embodiments Beyond Imitation of Human Demonstrations
In embodied intelligence, the embodiment gap between robotic and human hands brings significant challenges for learning from human demonstrations. Although some studies have attempted to bridge this gap using reinforcement learning, they remain confined to merely reproducing human manipulation, resulting in limited task performance. In this paper, we propose UniBYD, a unified framework that uses a dynamic reinforcement learning algorithm to discover manipulation policies aligned with the robot's physical characteristics. To enable consistent modeling across diverse robotic hand morphologies, UniBYD incorporates a unified morphological representation (UMR). Building on UMR, we design a dynamic PPO with an annealed reward schedule, enabling reinforcement learning to transition from imitation of human demonstrations to explore policies adapted to diverse robotic morphologies better, thereby going beyond mere imitation of human hands. To address the frequent failures of learning human priors in the early training stage, we design a hybrid Markov-based shadow engine that enables reinforcement learning to imitate human manipulations in a fine-grained manner. To evaluate UniBYD comprehensively, we propose UniManip, the first benchmark encompassing robotic manipulation tasks spanning multiple hand morphologies. Experiments demonstrate a 67.90% improvement in success rate over the current state-of-the-art. Upon acceptance of the paper, we will release our code and benchmark at https://github.com/zhanheng-creator/UniBYD.
Atomic Action Slicing: Planner-Aligned Options for Generalist VLA Agents
Current vision-language-action (VLA) models generalize poorly, particularly when tasks require new compositions of skills or objects. We introduce Atomic Action Slicing (AAS), a planner-aligned approach that decomposes long-horizon demonstrations into short, typed atomic actions that are easier for planners to use and policies to learn. Using LIBERO demonstrations, AAS produces a validated dataset of 2,124 atomic segments labeled with action type, temporal span, and confidence. A stronger segmenter (Gemini 2.5 Pro) closely matches planner-defined plans and remains robust under keyframe jitter, while smaller models perform worse on multi-object tasks. Fine-tuning CLIP-RT+ on our atomic dataset improves task success from 94.2% to 95.3% on LIBERO-Goal and 83.8% to 88.8% on LIBERO-Long. We publicly release the GATE-VLAP dataset on HuggingFace(https://huggingface.co/datasets/gate-institute/GATE-VLAP-datasets)
comment: The 41st ACM/SIGAPP Symposium On Applied Computing
Cross-Entropy Optimization of Physically Grounded Task and Motion Plans
Autonomously performing tasks often requires robots to plan high-level discrete actions and continuous low-level motions to realize them. Previous TAMP algorithms have focused mainly on computational performance, completeness, or optimality by making the problem tractable through simplifications and abstractions. However, this comes at the cost of the resulting plans potentially failing to account for the dynamics or complex contacts necessary to reliably perform the task when object manipulation is required. Additionally, approaches that ignore effects of the low-level controllers may not obtain optimal or feasible plan realizations for the real system. We investigate the use of a GPU-parallelized physics simulator to compute realizations of plans with motion controllers, explicitly accounting for dynamics, and considering contacts with the environment. Using cross-entropy optimization, we sample the parameters of the controllers, or actions, to obtain low-cost solutions. Since our approach uses the same controllers as the real system, the robot can directly execute the computed plans. We demonstrate our approach for a set of tasks where the robot is able to exploit the environment's geometry to move an object. Website and code: https://andreumatoses.github.io/research/parallel-realization
comment: Preprint
CarlaNCAP: A Framework for Quantifying the Safety of Vulnerable Road Users in Infrastructure-Assisted Collective Perception Using EuroNCAP Scenarios
The growing number of road users has significantly increased the risk of accidents in recent years. Vulnerable Road Users (VRUs) are particularly at risk, especially in urban environments where they are often occluded by parked vehicles or buildings. Autonomous Driving (AD) and Collective Perception (CP) are promising solutions to mitigate these risks. In particular, infrastructure-assisted CP, where sensor units are mounted on infrastructure elements such as traffic lights or lamp posts, can help overcome perceptual limitations by providing enhanced points of view, which significantly reduces occlusions. To encourage decision makers to adopt this technology, comprehensive studies and datasets demonstrating safety improvements for VRUs are essential. In this paper, we propose a framework for evaluating the safety improvement by infrastructure-based CP specifically targeted at VRUs including a dataset with safety-critical EuroNCAP scenarios (CarlaNCAP) with 11k frames. Using this dataset, we conduct an in-depth simulation study and demonstrate that infrastructure-assisted CP can significantly reduce accident rates in safety-critical scenarios, achieving up to 100% accident avoidance compared to a vehicle equipped with sensors with only 33%. Code is available at https://github.com/ekut-es/carla_ncap
Mirror Skin: In Situ Visualization of Robot Touch Intent on Robotic Skin
Effective communication of robotic touch intent is a key factor in promoting safe and predictable physical human-robot interaction (pHRI). While intent communication has been widely studied, existing approaches lack the spatial specificity and semantic depth necessary to convey robot touch actions. We present Mirror Skin, a cephalopod-inspired concept that utilizes high-resolution, mirror-like visual feedback on robotic skin. By mapping in-situ visual representations of a human's body parts onto the corresponding robot's touch region, Mirror Skin communicates who shall initiate touch, where it will occur, and when it is imminent. To inform the design of Mirror Skin, we conducted a structured design exploration with experts in virtual reality (VR), iteratively refining six key dimensions. A subsequent controlled user study demonstrated that Mirror Skin significantly enhances accuracy and reduces response times for interpreting touch intent. These findings highlight the potential of visual feedback on robotic skin to communicate human-robot touch interactions.
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges
Vision-Language-Action (VLA) models are driving a revolution in robotics, enabling machines to understand instructions and interact with the physical world. This field is exploding with new models and datasets, making it both exciting and challenging to keep pace with. This survey offers a clear and structured guide to the VLA landscape. We design it to follow the natural learning path of a researcher: we start with the basic Modules of any VLA model, trace the history through key Milestones, and then dive deep into the core Challenges that define recent research frontier. Our main contribution is a detailed breakdown of the five biggest challenges in: (1) Representation, (2) Execution, (3) Generalization, (4) Safety, and (5) Dataset and Evaluation. This structure mirrors the developmental roadmap of a generalist agent: establishing the fundamental perception-action loop, scaling capabilities across diverse embodiments and environments, and finally ensuring trustworthy deployment-all supported by the essential data infrastructure. For each of them, we review existing approaches and highlight future opportunities. We position this paper as both a foundational guide for newcomers and a strategic roadmap for experienced researchers, with the dual aim of accelerating learning and inspiring new ideas in embodied intelligence. A live version of this survey, with continuous updates, is maintained on our \href{https://suyuz1.github.io/Survery/}{project page}.
Incremental Validation of Automated Driving Functions using Generic Volumes in Micro- Operational Design Domains
The validation of highly automated, perception-based driving systems must ensure that they function correctly under the full range of real-world conditions. Scenario-based testing is a prominent approach to addressing this challenge, as it involves the systematic simulation of objects and environments. Operational Design Domains (ODDs) are usually described using a taxonomy of qualitative designations for individual objects. However, the process of transitioning from taxonomy to concrete test cases remains unstructured, and completeness is theoretical. This paper introduces a structured method of subdividing the ODD into manageable sections, termed micro-ODDs (mODDs), and deriving test cases with abstract object representations. This concept is demonstrated using a one-dimensional, laterally guided manoeuvre involving a shunting locomotive within a constrained ODD. In this example, mODDs are defined and refined into narrow taxonomies that enable test case generation. Obstacles are represented as generic cubes of varying sizes, providing a simplified yet robust means of evaluating perception performance. A series of tests were conducted in a closed-loop, co-simulated virtual environment featuring photorealistic rendering and simulated LiDAR, GNSS and camera sensors. The results demonstrate how edge cases in obstacle detection can be systematically explored and how perception quality can be evaluated based on observed vehicle behaviour, using crash versus safe stop as the outcome metrics. These findings support the development of a standardised framework for safety argumentation and offer a practical step towards the validation and authorisation of automated driving functions.
Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits
Equivariant diffusion policies (EDPs) combine the generative expressivity of diffusion models with the strong generalization and sample efficiency afforded by geometric symmetries. While steering these policies with reinforcement learning (RL) offers a promising mechanism for fine-tuning beyond demonstration data, directly applying standard (non-equivariant) RL can be sample-inefficient and unstable, as it ignores the symmetries that EDPs are designed to exploit. In this paper, we theoretically establish that the diffusion process of an EDP is equivariant, which in turn induces a group-invariant latent-noise MDP that is well-suited for equivariant diffusion steering. Building on this theory, we introduce a principled symmetry-aware steering framework and compare standard, equivariant, and approximately equivariant RL strategies through comprehensive experiments across tasks with varying degrees of symmetry. While we identify the practical boundaries of strict equivariance under symmetry breaking, we show that exploiting symmetry during the steering process yields substantial benefits-enhancing sample efficiency, preventing value divergence, and achieving strong policy improvements even when EDPs are trained from extremely limited demonstrations.
Towards Logic-Aware Manipulation: A Knowledge Primitive for VLM-Based Assistants in Smart Manufacturing
Existing pipelines for vision-language models (VLMs) in robotic manipulation prioritize broad semantic generalization from images and language, but typically omit execution-critical parameters required for contact-rich actions in manufacturing cells. We formalize an object-centric manipulation-logic schema, serialized as an eight-field tuple τ, which exposes object, interface, trajectory, tolerance, and force/impedance information as a first-class knowledge signal between human operators, VLM-based assistants, and robot controllers. We instantiate τ and a small knowledge base (KB) on a 3D-printer spool-removal task in a collaborative cell, and analyze τ-conditioned VLM planning using plan-quality metrics adapted from recent VLM/LLM planning benchmarks, while demonstrating how the same schema supports taxonomy-tagged data augmentation at training time and logic-aware retrieval-augmented prompting at test time as a building block for assistant systems in smart manufacturing enterprises.
comment: 8 pages, 2 figures, submitted to the 2026 IFAC World Congress
Optimal Control and Structurally-Informed Gradient Optimization of a Custom 4-DOF Rigid-Body Manipulator
This work develops a control-centric framework for a custom 4-DOF rigid-body manipulator by coupling a reduced-order Pontryagin's Maximum Principle (PMP) controller with a physics-informed Gradient Descent stage. The reduced PMP model provides a closed-form optimal control law for the joint accelerations, while the Gradient Descent module determines the corresponding time horizons by minimizing a cost functional built directly from the full Rigid-Body Dynamics. Structural-mechanics reaction analysis is used only to initialize feasible joint velocities-most critically the azimuthal component-ensuring that the optimizer begins in a physically admissible region. The resulting kinematic trajectories and dynamically consistent time horizons are then supplied to the symbolic Euler-Lagrange model to yield closed-form inverse-dynamics inputs. This pipeline preserves a strict control-theoretic structure while embedding the physical constraints and loading behavior of the manipulator in a computationally efficient way.
comment: 6 pages + 18 page appendix
Elevation Aware 2D/3D Co-simulation Framework for Large-scale Traffic Flow and High-fidelity Vehicle Dynamics
Reliable testing of autonomous driving systems requires simulation environments that combine large-scale traffic modeling with realistic 3D perception and terrain. Existing tools rarely capture real-world elevation, limiting their usefulness in cities with complex topography. This paper presents an automated, elevation-aware co-simulation framework that integrates SUMO with CARLA using a pipeline that fuses OpenStreetMap road networks and USGS elevation data into physically consistent 3D environments. The system generates smooth elevation profiles, validates geometric accuracy, and enables synchronized 2D-3D simulation across platforms. Demonstrations on multiple regions of San Francisco show the framework's scalability and ability to reproduce steep and irregular terrain. The result is a practical foundation for high-fidelity autonomous vehicle testing in realistic, elevation-rich urban settings.
Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy
The pursuit of out-of-distribution generalization in Vision-Language-Action (VLA) models is often hindered by catastrophic forgetting of the Vision-Language Model (VLM) backbone during fine-tuning. While co-training with external reasoning data helps, it requires experienced tuning and data-related overhead. Beyond such external dependencies, we identify an intrinsic cause within VLA datasets: modality imbalance, where language diversity is much lower than visual and action diversity. This imbalance biases the model toward visual shortcuts and language forgetting. To address this, we introduce BayesVLA, a Bayesian factorization that decomposes the policy into a visual-action prior, supporting seeing-to-act, and a language-conditioned likelihood, enabling prompt-to-specify. This inherently preserves generalization and promotes instruction following. We further incorporate pre- and post-contact phases to better leverage pre-trained foundation models. Information-theoretic analysis formally validates our effectiveness in mitigating shortcut learning. Extensive experiments show superior generalization to unseen instructions, objects, and environments compared to existing methods. Project page is available at: https://xukechun.github.io/papers/BayesVLA.
A Stochastic Approach to Terrain Maps for Safe Lunar Landing
Safely landing on the lunar surface is a challenging task, especially in the heavily-shadowed South Pole region where traditional vision-based hazard detection methods are not reliable. The potential existence of valuable resources at the lunar South Pole has made landing in that region a high priority for many space agencies and commercial companies. However, relying on a LiDAR for hazard detection during descent is risky, as this technology is fairly untested in the lunar environment. There exists a rich log of lunar surface data from the Lunar Reconnaissance Orbiter (LRO), which could be used to create informative prior maps of the surface before descent. In this work, we propose a method for generating stochastic elevation maps from LRO data using Gaussian processes (GPs), which are a powerful Bayesian framework for non-parametric modeling that produce accompanying uncertainty estimates. In high-risk environments such as autonomous spaceflight, interpretable estimates of terrain uncertainty are critical. However, no previous approaches to stochastic elevation mapping have taken LRO Digital Elevation Model (DEM) confidence maps into account, despite this data containing key information about the quality of the DEM in different areas. To address this gap, we introduce a two-stage GP model in which a secondary GP learns spatially varying noise characteristics from DEM confidence data. This heteroscedastic information is then used to inform the noise parameters for the primary GP, which models the lunar terrain. Additionally, we use stochastic variational GPs to enable scalable training. By leveraging GPs, we are able to more accurately model the impact of heteroscedastic sensor noise on the resulting elevation map. As a result, our method produces more informative terrain uncertainty, which can be used for downstream tasks such as hazard detection and safe landing site selection.
comment: Accepted to IEEE Aerospace 2026
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
Modified Hybrid A* Collision-Free Path-Planning for Automated Reverse Parking
Parking a vehicle in tight spaces is a challenging task to perform due to the scarcity of feasible paths that are also collision-free. This paper presents a strategy to tackle this kind of maneuver with a modified Hybrid-A* path-planning algorithm that combines the feasibility guarantee inherent in the standard Hybrid A* algorithm with the addition of static obstacle collision avoidance. A kinematic single-track model is derived to describe the low-speed motion of the vehicle, which is subsequently used as the motion model in the Hybrid A* path-planning algorithm to generate feasible motion primitive branches. The model states are also used to reconstruct the vehicle centerline, which, in conjunction with an inflated binary occupancy map, facilitates static obstacle collision avoidance functions. Simulation study and animation are set up to test the efficacy of the approach, and the proposed algorithm proves to consistently provide kinematically feasible trajectories that are also collision-free.
Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus
The development of robust Autonomous Vehicles (AVs) is bottlenecked by the scarcity of "Long-Tail" training data. While fleets collect petabytes of video logs, identifying rare safety-critical events (e.g., erratic jaywalking, construction diversions) remains a manual, cost-prohibitive process. Existing solutions rely on coarse metadata search, which lacks precision, or cloud-based VLMs, which are privacy-invasive and expensive. We introduce Semantic-Drive, a local-first, neuro-symbolic framework for semantic data mining. Our approach decouples perception into two stages: (1) Symbolic Grounding via a real-time open-vocabulary detector (YOLOE) to anchor attention, and (2) Cognitive Analysis via a Reasoning VLM that performs forensic scene analysis. To mitigate hallucination, we implement a "System 2" inference-time alignment strategy, utilizing a multi-model "Judge-Scout" consensus mechanism. Benchmarked on the nuScenes dataset against the Waymo Open Dataset (WOD-E2E) taxonomy, Semantic-Drive achieves a Recall of 0.966 (vs. 0.475 for CLIP) and reduces Risk Assessment Error by 40\% compared to single models. The system runs entirely on consumer hardware (NVIDIA RTX 3090), offering a privacy-preserving alternative to the cloud.
Taylor-Lagrange Control for Safety-Critical Systems
This paper proposes a novel Taylor-Lagrange Control (TLC) method for nonlinear control systems to ensure the safety and stability through Taylor's theorem with Lagrange remainder. To achieve this, we expand a safety or stability function with respect to time along the system dynamics using the Lie derivative and Taylor's theorem. This expansion enables the control input to appear in the Taylor series at an order equivalent to the relative degree of the function. We show that the proposed TLC provides necessary and sufficient conditions for system safety and is applicable to systems and constraints of arbitrary relative degree. The TLC exhibits connections with existing Control Barrier Function (CBF) and Control Lyapunov Function (CLF) methods, and it further extends the CBF and CLF methods to the complex domain, especially for higher order cases. Compared to High-Order CBFs (HOCBFs), TLC is less restrictive as it does not require forward invariance of the intersection of a set of safe sets while HOCBFs do. We employ TLC to reformulate a constrained optimal control problem as a sequence of quadratic programs with a zero-order hold implementation method, and demonstrate the safety of zero-order hold TLC using an event-triggered control method to address inter-sampling effects. Finally, we illustrate the effectiveness of the proposed TLC method through an adaptive cruise control system and a robot control problem, and compare it with existing CBF methods.
comment: 13 pages
A Review of Learning-Based Motion Planning: Toward a Data-Driven Optimal Control Approach
Motion planning for high-level autonomous driving is constrained by a fundamental trade-off between the transparent, yet brittle, nature of pipeline methods and the adaptive, yet opaque, "black-box" characteristics of modern learning-based systems. This paper critically synthesizes the evolution of the field -- from pipeline methods through imitation learning, reinforcement learning, and generative AI -- to demonstrate how this persistent dilemma has hindered the development of truly trustworthy systems. To resolve this impasse, we conduct a comprehensive review of learning-based motion planning methods. Based on this review, we outline a data-driven optimal control paradigm as a unifying framework that synergistically integrates the verifiable structure of classical control with the adaptive capacity of machine learning, leveraging real-world data to continuously refine key components such as system dynamics, cost functions, and safety constraints. We explore this framework's potential to enable three critical next-generation capabilities: "Human-Centric" customization, "Platform-Adaptive" dynamics adaptation, and "System Self-Optimization" via self-tuning. We conclude by proposing future research directions based on this paradigm, aimed at developing intelligent transportation systems that are simultaneously safe, interpretable, and capable of human-like autonomy.
comment: 34 pages, 11 figures
Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge
The 2025 BEHAVIOR Challenge is designed to rigorously track progress toward solving long-horizon tasks by physical agents in simulated environments. BEHAVIOR-1K focuses on everyday household tasks that people most want robots to assist with and these tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications. This report presents our solution to the 2025 BEHAVIOR Challenge in a very close 2nd place and substantially outperforms the rest of the submissions. Building on $π_{0.5}$, we focus on systematically building our solution by studying the effects of training techniques and data. Through careful ablations, we show the scaling power in pre-training and post-training phases for competitive performance. We summarize our practical lessons and design recommendations that we hope will provide actionable insights for the broader embodied AI community when adapting powerful foundation models to complex embodied scenarios.
comment: preprint
Decoupled Q-Chunking
Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value targets accumulate across steps and result in biased value estimates. Recent work has proposed to use chunked critics, which estimate the value of short action sequences ("chunks") rather than individual actions, speeding up value backup. However, extracting policies from chunked critics is challenging: policies must output the entire action chunk open-loop, which can be sub-optimal for environments that require policy reactivity and also challenging to model especially when the chunk length grows. Our key insight is to decouple the chunk length of the critic from that of the policy, allowing the policy to operate over shorter action chunks. We propose a novel algorithm that achieves this by optimizing the policy against a distilled critic for partial action chunks, constructed by optimistically backing up from the original chunked critic to approximate the maximum value achievable when a partial action chunk is extended to a complete one. This design retains the benefits of multi-step value propagation while sidestepping both the open-loop sub-optimality and the difficulty of learning action chunking policies for long action chunks. We evaluate our method on challenging, long-horizon offline goal-conditioned tasks and show that it reliably outperforms prior methods. Code: github.com/ColinQiyangLi/dqc.
comment: 76 pages, 14 figures
From "Thumbs Up" to "10 out of 10": Reconsidering Scalar Feedback in Interactive Reinforcement Learning
Learning from human feedback is an effective way to improve robotic learning in exploration-heavy tasks. Compared to the wide application of binary human feedback, scalar human feedback has been used less because it is believed to be noisy and unstable. In this paper, we compare scalar and binary feedback, and demonstrate that scalar feedback benefits learning when properly handled. We collected binary or scalar feedback respectively from two groups of crowdworkers on a robot task. We found that when considering how consistently a participant labeled the same data, scalar feedback led to less consistency than binary feedback; however, the difference vanishes if small mismatches are allowed. Additionally, scalar and binary feedback show no significant differences in their correlations with key Reinforcement Learning targets. We then introduce Stabilizing TEacher Assessment DYnamics (STEADY) to improve learning from scalar feedback. Based on the idea that scalar feedback is muti-distributional, STEADY re-constructs underlying positive and negative feedback distributions and re-scales scalar feedback based on feedback statistics. We show that models trained with \textit{scalar feedback + STEADY } outperform baselines, including binary feedback and raw scalar feedback, in a robot reaching task with non-expert human feedback. Our results show that both binary feedback and scalar feedback are dynamic, and scalar feedback is a promising signal for use in interactive Reinforcement Learning.
Iterative Compositional Data Generation for Robot Control
Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.
comment: Corrected reference chronological order and added acknowledgements; results unchanged
An effective control of large systems of active particles: An application to evacuation problem
Manipulation of large systems of active particles is a serious challenge across diverse domains, including crowd management, control of robotic swarms, and coordinated material transport. The development of advanced control strategies for complex scenarios is hindered, however, by the lack of scalability and robustness of the existing methods, in particular, due to the need of an individual control for each agent. One possible solution involves controlling a system through a leader or a group of leaders, which other agents tend to follow. Using such an approach we develop an effective control strategy for a leader, combining reinforcement learning (RL) with artificial forces acting on the system. To describe the guidance of active particles by a leader we introduce the generalized Vicsek model. This novel method is then applied to the problem of the effective evacuation by a robot-rescuer (leader) of large groups of people from hazardous places. We demonstrate, that while a straightforward application of RL yields suboptimal results, even for advanced architectures, our approach provides a robust and efficient evacuation strategy. The source code supporting this study is publicly available at: https://github.com/cinemere/evacuation.
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
Fast Multi-Party Open-Ended Conversation with a Social Robot
Multi-party open-ended conversation remains a major challenge in human-robot interaction, particularly when robots must recognise speakers, allocate turns, and respond coherently under overlapping or rapidly shifting dialogue. This paper presents a multi-party conversational system that combines multimodal perception (voice direction of arrival, speaker diarisation, face recognition) with a large language model for response generation. Implemented on the Furhat robot, the system was evaluated with 30 participants across two scenarios: (i) parallel, separate conversations and (ii) shared group discussion. Results show that the system maintains coherent and engaging conversations, achieving high addressee accuracy in parallel settings (92.6%) and strong face recognition reliability (80-94%). Participants reported clear social presence and positive engagement, although technical barriers such as audio-based speaker recognition errors and response latency affected the fluidity of group interactions. The results highlight both the promise and limitations of LLM-based multi-party interaction and outline concrete directions for improving multimodal cue integration and responsiveness in future social robots.
comment: 15 pages, 5 figures, 4 tables; 2 appendices
Feedback-MPPI: Fast Sampling-Based MPC via Rollout Differentiation -- Adios low-level controllers
Model Predictive Path Integral control is a powerful sampling-based approach suitable for complex robotic tasks due to its flexibility in handling nonlinear dynamics and non-convex costs. However, its applicability in real-time, highfrequency robotic control scenarios is limited by computational demands. This paper introduces Feedback-MPPI (F-MPPI), a novel framework that augments standard MPPI by computing local linear feedback gains derived from sensitivity analysis inspired by Riccati-based feedback used in gradient-based MPC. These gains allow for rapid closed-loop corrections around the current state without requiring full re-optimization at each timestep. We demonstrate the effectiveness of F-MPPI through simulations and real-world experiments on two robotic platforms: a quadrupedal robot performing dynamic locomotion on uneven terrain and a quadrotor executing aggressive maneuvers with onboard computation. Results illustrate that incorporating local feedback significantly improves control performance and stability, enabling robust, high-frequency operation suitable for complex robotic systems.
Real-Time QP Solvers: A Concise Review and Practical Guide Towards Legged Robots
Quadratic programming (QP) underpins real-time robotics by enabling efficient, constrained optimization in state estimation, motion planning, and control. In legged locomotion and manipulation, essential modules like inverse dynamics, Model Predictive Control (MPC), and Whole-Body Control (WBC) are inherently QP-based, demanding reliable solutions amid tight timing, energy, and computational resources on embedded platforms. This paper presents a comprehensive analysis and benchmarking study of QP solvers for legged robotics. We begin by formulating the standard convex QP and classify solvers into principal algorithmic approaches: interior-point methods, active-set strategies, operator-splitting schemes, and augmented Lagrangian/proximal approaches, while also discussing solver code generation for fixed-structure QPs. Each solver is examined in terms of algorithmic structure, computational characteristics, and its ability to exploit problem structure and warm-starting. Performance is reviewed using publicly available benchmarks, with a focus on metrics such as computation time, constraint satisfaction, and robustness under perturbations. Unified comparison tables yield practical guidance for solver selection, underscoring trade-offs in speed, accuracy, and energy efficiency. Our findings emphasize the synergy between solvers, tasks, and hardware -- e.g., sparse structured IPMs for long-horizon MPC and dense active-set for high-frequency WBC to advance agile, autonomous legged systems, with emerging trends toward ill-conditioned, conic, and code-generated deployments.
comment: 12 pages, 1 figure, 2 tables
Driving Through Uncertainty: Risk-Averse Control with LLM Commonsense for Autonomous Driving under Perception Deficits
Partial perception deficits can compromise autonomous vehicle safety by disrupting environmental understanding. Existing protocols typically default to entirely risk-avoidant actions such as immediate stops, which are detrimental to navigation goals and lack flexibility for rare driving scenarios. Yet, in cases of minor risk, halting the vehicle may be unnecessary, and more adaptive responses are preferable. In this paper, we propose LLM-RCO, a risk-averse framework leveraging large language models (LLMs) to integrate human-like driving commonsense into autonomous systems facing perception deficits. LLM-RCO features four key modules interacting with the dynamic driving environment: hazard inference, short-term motion planner, action condition verifier, and safety constraint generator, enabling proactive and context-aware actions in such challenging conditions. To enhance the driving decision-making of LLMs, we construct DriveLM-Deficit, a dataset of 53,895 video clips featuring deficits of safety-critical objects, annotated for LLM fine-tuning in hazard detection and motion planning. Extensive experiments in adverse driving conditions with the CARLA simulator demonstrate that LLM-RCO promotes proactive maneuvers over purely risk-averse actions in perception deficit scenarios, underscoring its value for boosting autonomous driving resilience against perception loss challenges.
Social Mediation through Robots -- A Scoping Review on Improving Group Interactions through Directed Robot Action using an Extended Group Process Model
Group processes refer to the dynamics that occur within a group and are critical for understanding how groups function. With robots being increasingly placed within small groups, improving these processes has emerged as an important application of social robotics. Social Mediation Robots elicit behavioral change within groups by deliberately influencing the processes of groups. While research in this field has demonstrated that robots can effectively affect interpersonal dynamics, there is a notable gap in integrating these insights to develop coherent understanding and theory. We present a scoping review of literature targeting changes in social interactions between multiple humans through intentional action from robotic agents. To guide our review, we adapt the classical Input-Process-Output (I-P-O) models that we call "Mediation I-P-O model". We evaluated 1633 publications, which yielded 89 distinct social mediation concepts. We construct 11 mediation approaches robots can use to shape processes in small groups and teams. This work strives to produce generalizable insights and evaluate the extent to which the potential of social mediation through robots has been realized thus far. We hope that the proposed framework encourages a holistic approach to the study of social mediation and provides a foundation to standardize future reporting in the domain.
comment: Early version of the published journal paper: Weisswange, T. H., Javed, H., Dietrich, M., Jung, M. F., & Jamali, N. (2025). Design Implications for Robots that Facilitate Groups-A Scoping Review on Improving Group Interactions through Directed Robot Action. ACM Transactions on Human-Robot Interaction
WARPD: World model Assisted Reactive Policy Diffusion NeurIPS 2025
With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural policies) directly, instead of open-loop trajectories. By learning behavioral distributions in parameter space rather than trajectory space, WARPD offers two major advantages: (1) extended action horizons with robustness to perturbations, while maintaining high task performance, and (2) significantly reduced inference costs. Empirically, WARPD outperforms DP in long-horizon and perturbed environments, and achieves multitask performance on par with DP while requiring only ~ 1/45th of the inference-time FLOPs per step.
comment: Outstanding Paper Award at the Embodied World Models for Decision Making Workshop at NeurIPS 2025
Communication-Efficient Module-Wise Federated Learning for Grasp Pose Detection in Cluttered Environments
Grasp pose detection (GPD) is a fundamental capability for robotic autonomy, but its reliance on large, diverse datasets creates significant data privacy and centralization challenges. Federated Learning (FL) offers a privacy-preserving solution, but its application to GPD is hindered by the substantial communication overhead of large models, a key issue for resource-constrained robots. To address this, we propose a novel module-wise FL framework that begins by analyzing the learning dynamics of the GPD model's functional components. This analysis identifies slower-converging modules, to which our framework then allocates additional communication effort. This is realized through a two-phase process: a standard full-model training phase is followed by a communication-efficient phase where only the identified subset of slower-converging modules is trained and their partial updates are aggregated. Extensive experiments on the GraspNet-1B dataset demonstrate that our method outperforms standard FedAvg and other baselines, achieving higher accuracy for a given communication budget. Furthermore, real-world experiments on a physical robot validate our approach, showing a superior grasp success rate compared to baseline methods in cluttered scenes. Our work presents a communication-efficient framework for training robust, generalized GPD models in a decentralized manner, effectively improving the trade-off between communication cost and model performance.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L). 8 pages, 5 figures
MOFU: Development of a MOrphing Fluffy Unit with Expansion and Contraction Capabilities and Evaluation of the Animacy of Its Movements
Robots designed for therapy and social interaction aim to evoke a sense of animacy in humans. While many studies have focused on life like appearance or joint based movements, the effect of whole body volume changing movements commonly observed in living organisms has received little attention. In this study, we developed MOFU MOrphing Fluffy Unit, a mobile robot capable of whole body expansion and contraction using a single motor enclosed in a fluffy exterior. MOFU employs a Jitterbug geometric transformation mechanism that enables smooth diameter changes from approximately 210 mm to 280 mm with a single actuator, and is equipped with a differential two wheel drive mechanism for locomotion. We conducted an online survey using videos of MOFU behaviors and evaluated perceived animacy using the Godspeed Questionnaire Series. First, we compared stationary conditions with and without expansion contraction and with and without rotational motion. Both expansion contraction and rotation independently increased perceived animacy. Second, we examined whether presenting two MOFUs simultaneously would further enhance animacy perception, but no significant difference was observed. Exploratory analyses were also conducted across four dual robot motion conditions. Third, when expansion contraction was combined with locomotion, animacy ratings were higher than for locomotion alone. These results suggest that whole body volume changing movements enhance perceived animacy in robots, indicating that physical volume change is an important design element for future social and therapeutic robots.
Bio-inspired reconfigurable stereo vision for robotics using omnidirectional cameras ICRA 2025
This work introduces a novel bio-inspired reconfigurable stereo vision system for robotics, leveraging omnidirectional cameras and a novel algorithm to achieve flexible visual capabilities. Inspired by the adaptive vision of various species, our visual system addresses traditional stereo vision limitations, i.e., immutable camera alignment with narrow fields of view, by introducing a reconfigurable stereo vision system to robotics. Our key innovations include the reconfigurable stereo vision strategy that allows dynamic camera alignment, a robust depth measurement system utilizing a nonrectified geometrical method combined with a deep neural network for feature matching, and a geometrical compensation technique to enhance visual accuracy. Implemented on a metamorphic robot, this vision system demonstrates its great adaptability to various scenarios by switching its configurations of 316° monocular with 79° binocular field for fast target seeking and 242° monocular with 150° binocular field for detailed close inspection.
comment: 7 pages, 8 figures, submitted to IEEE ICRA 2025
Adaptive Compressive Tactile Subsampling: Enabling High Spatiotemporal Resolution in Scalable Robotic Skin
Robots require full-body, high-resolution tactile sensing to operate safely in unstructured environments, enabling reflexive responses and closed-loop control. However, the pixel counts needed for dense, large-area coverage limit readout rates of most tactile arrays to <100 Hz, hindering their use in high-speed tasks. We present Adaptive Compressive Tactile Subsampling (ACTS), a scalable and data-driven method that greatly enhances traditional tactile matrices by leveraging adaptive sensor sampling and sparse recovery. By adaptively allocating measurements to informative regions, ACTS is especially effective for spatially sparse signals common in real-world interactions. Tested on a 1024-pixel tactile sensor array (32x32), ACTS achieved frame rates up to 1,000 Hz, an 18X improvement over conventional raster scanning, with minimal reconstruction error. For the first time, ACTS enables wearable, large-area, high-density tactile sensing systems that can deliver high-speed results. We demonstrate rapid object classification within 20 ms of contact, high-speed projectile detection, ricochet angle estimation, and soft deformation tracking, in tactile and robotics applications, all using flexible, high-density tactile arrays. These include high-resolution tactile gloves, pressure insoles, and full-body configurations covering robotic arms and human-sized mannequins. We further showcase tactile-based closed-loop control by guiding a metallic ball to trace letters using tactile feedback and by executing tactile-only whole-hand reflexes on a fully sensorized LEAP hand to stabilize grasps, prevent slip, and avoid sharp objects, validating ACTS for real-time interaction and motion control. ACTS transforms standard, low-cost, and robust tactile sensors into high-speed systems enabling scalable, responsive, and adaptive tactile perception for robots and wearables operating in dynamic environments.
comment: 51 pages, 11 main figures, 16 supplemental figures, Videos can be accessed at https://tinyurl.com/ACTS-videos
In-situ Value-aligned Human-Robot Interactions with Physical Constraints IROS 2025
Equipped with Large Language Models (LLMs), human-centered robots are now capable of performing a wide range of tasks that were previously deemed challenging or unattainable. However, merely completing tasks is insufficient for cognitive robots, who should learn and apply human preferences to future scenarios. In this work, we propose a framework that combines human preferences with physical constraints, requiring robots to complete tasks while considering both. Firstly, we developed a benchmark of everyday household activities, which are often evaluated based on specific preferences. We then introduced In-Context Learning from Human Feedback (ICLHF), where human feedback comes from direct instructions and adjustments made intentionally or unintentionally in daily life. Extensive sets of experiments, testing the ICLHF to generate task plans and balance physical constraints with preferences, have demonstrated the efficiency of our approach. Project page: https://iclhf.github.io .
comment: 8 pages, 7 figures. Accepted by IROS 2025
UMArm: Untethered, Modular, Portable, Soft Pneumatic Arm
Robotic arms are essential to modern industries, however, their adaptability to unstructured environments remains limited. Soft robotic arms, particularly those actuated pneumatically, offer greater adaptability in unstructured environments and enhanced safety for human-robot interaction. However, current pneumatic soft arms are constrained by limited degrees of freedom, precision, payload capacity, and reliance on bulky external pressure regulators. In this work, a novel pneumatically driven rigid-soft hybrid arm, ``UMArm'', is presented. The shortcomings of pneumatically actuated soft arms are addressed by densely integrating high-force-to-weight-ratio, self-regulated McKibben actuators onto a lightweight rigid spine structure. The modified McKibben actuators incorporate valves and controllers directly inside, eliminating the need for individual pressure lines and external regulators, significantly reducing system weight and complexity. Full untethered operation, high payload capacity, precision, and directionally tunable compliance are achieved by the UMArm. Portability is demonstrated through a wearable assistive arm experiment, and versatility is showcased by reconfiguring the system into an inchworm robot. The results of this work show that the high-degree-of-freedom, external-regulator-free pneumatically driven arm systems like the UMArm possess great potential for real-world unstructured environments.
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Visual Reinforcement Learning (RL) agents must learn to act based on high-dimensional image data where only a small fraction of the pixels is task-relevant. This forces agents to waste exploration and computational resources on irrelevant features, leading to sample-inefficient and unstable learning. To address this, inspired by human visual foveation, we introduce Gaze on the Prize. This framework augments visual RL with a learnable foveal attention mechanism (Gaze), guided by a self-supervised signal derived from the agent's experience pursuing higher returns (the Prize). Our key insight is that return differences reveal what matters most: If two similar representations produce different outcomes, their distinguishing features are likely task-relevant, and the gaze should focus on them accordingly. This is realized through return-guided contrastive learning that trains the attention to distinguish between the features relevant to success and failure. We group similar visual representations into positives and negatives based on their return differences and use the resulting labels to construct contrastive triplets. These triplets provide the training signal that teaches the attention mechanism to produce distinguishable representations for states associated with different outcomes. Our method achieves up to 2.52x improvement in sample efficiency and can solve challenging tasks from the ManiSkill3 benchmark that the baseline fails to learn, without modifying the underlying algorithm or hyperparameters.
comment: Project page: https://andrewcwlee.github.io/gaze-on-the-prize
STITCHER: Constrained Trajectory Planning in Complex Environments with Real-Time Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.
STITCHER: Real-Time Trajectory Planning with Motion Primitive Search
Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy constraints can be systematically computed. However, strict requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical situations. This work presents an optimization-free planning framework called STITCHER that leverages graph search to generate long-range trajectories by stitching short trajectory segments together in real time. STITCHER is shown to outperform modern optimization-based planners through its innovative planning architecture and several algorithmic developments that make real-time planning possible. Simulation results show safe trajectories through complex environments can be generated in milliseconds that cover tens of meters.
Multiagent Systems
Agile Flight Emerges from Multi-Agent Competitive Racing
Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) and strategy (e.g., overtaking or blocking) emerge from agents trained with reinforcement learning. We provide evidence in both simulation and the real world that this approach outperforms the common paradigm of training agents in isolation with rewards that prescribe behavior, e.g., progress on the raceline, in particular when the complexity of the environment increases, e.g., in the presence of obstacles. Moreover, we find that multi-agent competition yields policies that transfer more reliably to the real world than policies trained with a single-agent progress-based reward, despite the two methods using the same simulation environment, randomization strategy, and hardware. In addition to improved sim-to-real transfer, the multi-agent policies also exhibit some degree of generalization to opponents unseen at training time. Overall, our work, following in the tradition of multi-agent competitive game-play in digital domains, shows that sparse task-level rewards are sufficient for training agents capable of advanced low-level control in the physical world. Code: https://github.com/Jirl-upenn/AgileFlight_MultiAgent
Evaluating Cooperative Resilience in Multiagent Systems: A Comparison Between Humans and LLMs
This paper presents a comparative analysis of cooperative resilience in multi-agent systems, defined as the ability to anticipate, resist, recover from, and transform to disruptive events that affect collective well-being. We focus on mixed-motive social dilemmas instantiated as a \textit{Tragedy of the Commons} environment from the Melting Pot suite, where we systematically compare human groups and Large Language Model (LLM)-based agents, each evaluated with and without explicit communication. Cooperative resilience is assessed under a continuously disruptive condition induced by a persistent unsustainable consumption bot, together with intermittent environmental shocks implemented as stochastic removal of shared resources across scenarios. This experimental design establishes a benchmark for cooperative resilience across agent architectures and interaction modalities, constituting a key step toward systematically comparing humans and LLM-based agents. Using this framework, we find that human groups with communication achieve the highest cooperative resilience compared to all other groups. Communication also improves the resilience of LLM agents, but their performance remains below human levels. Motivated by the performance of humans, we further examine a long-horizon setting with harsher environmental conditions, where humans sustain the shared resource and maintain high resilience in diverse disruption scenarios. Together, these results suggest that human decision-making under adverse social conditions can inform the design of artificial agents that promote prosocial and resilient behaviors.
comment: Supplementary material in https://github.com/mavivi95/resilience_humans_vs_LLM/blob/main/Supplementary_File.pdf
AutoFSM: A Multi-agent Framework for FSM Code Generation with IR and SystemC-Based Testing
With the rapid advancement of large language models (LLMs) in code generation, their applications in hardware design are receiving growing attention. However, existing LLMs face several challenges when generating Verilog code for finite state machine (FSM) control logic, including frequent syntax errors, low debugging efficiency, and heavy reliance on test benchmarks. To address these challenges, this paper proposes AutoFSM, a multi-agent collaborative framework designed for FSM code generation tasks. AutoFSM introduces a structurally clear intermediate representation (IR) to reduce syntax error rate during code generation and provides a supporting toolchain to enable automatic translation from IR to Verilog. Furthermore, AutoFSM is the first to integrate SystemC-based modeling with automatic testbench generation, thereby improving debugging efficiency and feedback quality. To systematically evaluate the framework's performance, we construct SKT-FSM, the first hierarchical FSM benchmark in the field, comprising 67 FSM samples across different complexity levels. Experimental results show that, under the same base LLM, AutoFSM consistently outperforms the open-source framework MAGE on the SKT-FSM benchmark, achieving up to an 11.94% improvement in pass rate and up to a 17.62% reduction in syntax error rate. These results demonstrate the potential of combining LLMs with structured IR and automated testing to improve the reliability and scalability of register-transfer level (RTL) code generation.
comment: This version corrects a typo in the section title ("Intruction" -> "Introduction") that appears in the published version
Elevation Aware 2D/3D Co-simulation Framework for Large-scale Traffic Flow and High-fidelity Vehicle Dynamics
Reliable testing of autonomous driving systems requires simulation environments that combine large-scale traffic modeling with realistic 3D perception and terrain. Existing tools rarely capture real-world elevation, limiting their usefulness in cities with complex topography. This paper presents an automated, elevation-aware co-simulation framework that integrates SUMO with CARLA using a pipeline that fuses OpenStreetMap road networks and USGS elevation data into physically consistent 3D environments. The system generates smooth elevation profiles, validates geometric accuracy, and enables synchronized 2D-3D simulation across platforms. Demonstrations on multiple regions of San Francisco show the framework's scalability and ability to reproduce steep and irregular terrain. The result is a practical foundation for high-fidelity autonomous vehicle testing in realistic, elevation-rich urban settings.
Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control
Effective mixed traffic control requires balancing efficiency, fairness, and safety. Existing approaches excel at optimizing efficiency and enforcing safety constraints but lack mechanisms to ensure equitable service, resulting in systematic starvation of vehicles on low-demand approaches. We propose a hierarchical framework combining multi-objective reinforcement learning for local intersection control with strategic routing for network-level coordination. Our approach introduces a Conflict Threat Vector that provides agents with explicit risk signals for proactive conflict avoidance, and a queue parity penalty that ensures equitable service across all traffic streams. Extensive experiments on a real-world network across different robot vehicle (RV) penetration rates demonstrate substantial improvements: up to 53% reductions in average wait time, up to 86% reductions in maximum starvation, and up to 86\% reduction in conflict rate compared to baselines, while maintaining fuel efficiency. Our analysis reveals that strategic routing effectiveness scales with RV penetration, becoming increasingly valuable at higher autonomy levels. The results demonstrate that multi-objective optimization through well-curated reward functions paired with strategic RV routing yields significant benefits in fairness and safety metrics critical for equitable mixed-autonomy deployment.
How AI Agents Follow the Herd of AI? Network Effects, History, and Machine Optimism
Understanding decision-making in multi-AI-agent frameworks is crucial for analyzing strategic interactions in network-effect-driven contexts. This study investigates how AI agents navigate network-effect games, where individual payoffs depend on peer participatio--a context underexplored in multi-agent systems despite its real-world prevalence. We introduce a novel workflow design using large language model (LLM)-based agents in repeated decision-making scenarios, systematically manipulating price trajectories (fixed, ascending, descending, random) and network-effect strength. Our key findings include: First, without historical data, agents fail to infer equilibrium. Second, ordered historical sequences (e.g., escalating prices) enable partial convergence under weak network effects but strong effects trigger persistent "AI optimism"--agents overestimate participation despite contradictory evidence. Third, randomized history disrupts convergence entirely, demonstrating that temporal coherence in data shapes LLMs' reasoning, unlike humans. These results highlight a paradigm shift: in AI-mediated systems, equilibrium outcomes depend not just on incentives, but on how history is curated, which is impossible for human.
comment: 7 pages, 5 figures
The Agentic Regulator: Risks for AI in Finance and a Proposed Agent-based Framework for Governance
Generative and agentic artificial intelligence is entering financial markets faster than existing governance can adapt. Current model-risk frameworks assume static, well-specified algorithms and one-time validations; large language models and multi-agent trading systems violate those assumptions by learning continuously, exchanging latent signals, and exhibiting emergent behavior. Drawing on complex adaptive systems theory, we model these technologies as decentralized ensembles whose risks propagate along multiple time-scales. We then propose a modular governance architecture. The framework decomposes oversight into four layers of "regulatory blocks": (i) self-regulation modules embedded beside each model, (ii) firm-level governance blocks that aggregate local telemetry and enforce policy, (iii) regulator-hosted agents that monitor sector-wide indicators for collusive or destabilizing patterns, and (iv) independent audit blocks that supply third-party assurance. Eight design strategies enable the blocks to evolve as fast as the models they police. A case study on emergent spoofing in multi-agent trading shows how the layered controls quarantine harmful behavior in real time while preserving innovation. The architecture remains compatible with today's model-risk rules yet closes critical observability and control gaps, providing a practical path toward resilient, adaptive AI governance in financial systems.
Osprey: Production-Ready Agentic AI for Safety-Critical Control Systems
Operating large-scale scientific facilities requires coordinating diverse subsystems, translating operator intent into precise hardware actions, and maintaining strict safety oversight. Language model-driven agents offer a natural interface for these tasks, but most existing approaches are not yet reliable or safe enough for production use. In this paper, we introduce Osprey, a framework for using agentic AI in large, safety-critical facility operations. Osprey is built around the needs of control rooms and addresses these challenges in four ways. First, it uses a plan-first orchestrator that generates complete execution plans, including all dependencies, for human review before any hardware is touched. Second, a coordination layer manages complex data flows, keeps data types consistent, and automatically downsamples large datasets when needed. Third, a classifier dynamically selects only the tools required for a given task, keeping prompts compact as facilities add capabilities. Fourth, connector abstractions and deployment patterns work across different control systems and are ready for day-to-day use. We demonstrate the framework through two case studies: a control-assistant tutorial showing semantic channel mapping and historical data integration, and a production deployment at the Advanced Light Source, where Osprey manages real-time operations across hundreds of thousands of control channels. These results establish Osprey as a production-ready framework for deploying agentic AI in complex, safety-critical environments.
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Ensuring cognitive stability in autonomous multi-agent systems (MAS) is a central challenge for large-scale, distributed AI. While existing observability tools monitor system outputs, they cannot quantify how rapidly agentic workflows recover once reasoning coherence has been lost. We adapt classical reliability metrics-Mean Time-to-Recovery (MTTR), Mean Time Between Failures (MTBF), and related ratios-into the cognitive domain, defining MTTR-A (Mean Time-to-Recovery for Agentic Systems) as a runtime measure of cognitive recovery latency. MTTR-A quantifies the time required for a MAS to detect reasoning drift and restore consistent operation, capturing the recovery of reasoning coherence rather than infrastructural repair. A benchmark simulation using the AG~News corpus and the LangGraph orchestration framework was conducted, modeling recovery latencies across multiple reflex modes. Automated reflexes restored stability within approximately 6s on average, while human-approval interventions required about 12s. Across 200 runs, the median simulated MTTR-A was 6.21+-2.14s, MTBF=6.7+-2.14s, and NRR=0.08, demonstrating measurable runtime resilience across reflex strategies. By formalizing recovery latency as a quantifiable property of distributed reasoning-and deriving reliability bounds linking recovery time and cognitive uptime-this work establishes a foundation for runtime dependability in agentic cognition, transforming cognitive recovery from an ad-hoc process into a standardized, interpretable performance
comment: preprint
The Emergence of Complex Behavior in Large-Scale Ecological Environments
We explore how physical scale and population size shape the emergence of complex behaviors in open-ended ecological environments. In our setting, agents are unsupervised and have no explicit rewards or learning objectives but instead evolve over time according to reproduction, mutation, and selection. As they act, agents also shape their environment and the population around them in an ongoing dynamic ecology. Our goal is not to optimize a single high-performance policy, but instead to examine how behaviors emerge and evolve across large populations due to natural competition and environmental pressures. We use modern hardware along with a new multi-agent simulator to scale the environment and population to sizes much larger than previously attempted, reaching populations of over 60,000 agents, each with their own evolved neural network policy. We identify various emergent behaviors such as long-range resource extraction, vision-based foraging, and predation that arise under competitive and survival pressures. We examine how sensing modalities and environmental scale affect the emergence of these behaviors and find that some of them appear only in sufficiently large environments and populations, and that larger scales increase the stability and consistency of these emergent behaviors. While there is a rich history of research in evolutionary settings, our scaling results on modern hardware provide promising new directions to explore ecology as an instrument of machine learning in an era of increasingly abundant computational resources and efficient machine frameworks. Experimental code is available at https://github.com/jbejjani2022/ecological-emergent-behavior.
comment: 33 pages, 23 figures, 12 tables, experiment code available at https://github.com/jbejjani2022/ecological-emergent-behavior
CREW-WILDFIRE: Benchmarking Agentic Multi-Agent Collaborations at Scale
Despite rapid progress in large language model (LLM)-based multi-agent systems, current benchmarks fall short in evaluating their scalability, robustness, and coordination capabilities in complex, dynamic, real-world tasks. Existing environments typically focus on small-scale, fully observable, or low-complexity domains, limiting their utility for developing and assessing next-generation multi-agent Agentic AI frameworks. We introduce CREW-Wildfire, an open-source benchmark designed to close this gap. Built atop the human-AI teaming CREW simulation platform, CREW-Wildfire offers procedurally generated wildfire response scenarios featuring large maps, heterogeneous agents, partial observability, stochastic dynamics, and long-horizon planning objectives. The environment supports both low-level control and high-level natural language interactions through modular Perception and Execution modules. We implement and evaluate several state-of-the-art LLM-based multi-agent Agentic AI frameworks, uncovering significant performance gaps that highlight the unsolved challenges in large-scale coordination, communication, spatial reasoning, and long-horizon planning under uncertainty. By providing more realistic complexity, scalable architecture, and behavioral evaluation metrics, CREW-Wildfire establishes a critical foundation for advancing research in scalable multi-agent Agentic intelligence. All code, environments, data, and baselines will be released to support future research in this emerging domain.
comment: Our project website is at: http://generalroboticslab.com/CREW-Wildfire
Hamiltonian of polymatrix zero-sum games
The understanding of a dynamical system's properties can be significantly advanced by establishing it as a Hamiltonian system and then systematically exploring its inherent symmetries. By formulating agents' strategies and cumulative payoffs as canonically conjugate variables, we identify the Hamiltonian function that generates the dynamics of poly-matrix zero-sum games. We reveal the symmetries of our Hamiltonian and derive the associated conserved quantities, showing how the conservation of probability and the invariance of the Fenchel coupling are intrinsically encoded within the system. Furthermore, we propose the dissipation FTRL (DFTRL) dynamics by introducing a perturbation that dissipates the Fenchel coupling, proving convergence to the Nash equilibrium and linking DFTRL to last-iterate convergent algorithms. Our results highlight the potential of Hamiltonian dynamics in uncovering the structural properties of learning dynamics in games, and pave the way for broader applications of Hamiltonian dynamics in game theory and machine learning.
comment: 28 pages. v2: minor corrections. v3: minor improvements and references added. Published version
Computing Evolutionarily Stable Strategies in Imperfect-Information Games
We present an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. Our main algorithm is for two-player games, and we describe how it can be extended to multiplayer games. The algorithm is sound and computes all ESSs in nondegenerate games and a subset of them in degenerate games which contain an infinite continuum of symmetric Nash equilibria. The algorithm is anytime and can be stopped early to find one or more ESSs. We experiment on an imperfect-information cancer signaling game as well as random games to demonstrate scalability.
Systems and Control (EESS)
Toward a Decision Support System for Energy-Efficient Ferry Operation on Lake Constance based on Optimal Control
The maritime sector is undergoing a disruptive technological change driven by three main factors: autonomy, decarbonization, and digital transformation. Addressing these factors necessitates a reassessment of inland vessel operations. This paper presents the design and development of a decision support system for ferry operations based on a shrinking-horizon optimal control framework. The problem formulation incorporates a mathematical model of the ferry's dynamics and environmental disturbances, specifically water currents and wind, which can significantly influence the dynamics. Real-world data and illustrative scenarios demonstrate the potential of the proposed system to effectively support ferry crews by providing real-time guidance. This enables enhanced operational efficiency while maintaining predefined maneuver durations. The findings suggest that optimal control applications hold substantial promise for advancing future ferry operations on inland waters. A video of the real-world ferry MS Insel Mainau operating on Lake Constance is available at: https://youtu.be/i1MjCdbEQyE
comment: 6 pages, 8 figures
LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems AAAI 2026
Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification tools fall short when faced with systems that embed both opaque, black-box AI components and complex stochastic dynamics. To address these challenges, we introduce LUCID (Learning-enabled Uncertainty-aware Certification of stochastIc Dynamical systems), a verification engine for certifying safety of black-box stochastic dynamical systems from a finite dataset of random state transitions. As such, LUCID is the first known tool capable of establishing quantified safety guarantees for such systems. Thanks to its modular architecture and extensive documentation, LUCID is designed for easy extensibility. LUCID employs a data-driven methodology rooted in control barrier certificates, which are learned directly from system transition data, to ensure formal safety guarantees. We use conditional mean embeddings to embed data into a reproducing kernel Hilbert space (RKHS), where an RKHS ambiguity set is constructed that can be inflated to robustify the result to out-of-distribution behavior. A key innovation within LUCID is its use of a finite Fourier kernel expansion to reformulate a semi-infinite non-convex optimization problem into a tractable linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. LUCID thus offers a robust and efficient verification framework, able to handle the complexities of modern black-box systems while providing formal guarantees of safety. These unique capabilities are demonstrated on challenging benchmarks.
comment: The manuscript has been accepted for publication in the main track of AAAI 2026
Model Error Resonance: The Geometric Nature of Error Dynamics
This paper introduces a geometric theory of model error, treating true and model dynamics as geodesic flows generated by distinct affine connections on a smooth manifold. When these connections differ, the resulting trajectory discrepancy--termed the Latent Error Dynamic Response (LEDR)--acquires an intrinsic dynamical structure governed by curvature. We show that the LEDR satisfies a Jacobi-type equation, where curvature mismatch acts as an explicit forcing term. In the important case of a flat model connection, the LEDR reduces to a classical Jacobi field on the true manifold, causing Model Error Resonance (MER) to emerge under positive sectional curvature. The theory is extended to a discrete-time analogue, establishing that this geometric structure and its resonant behavior persist in sampled systems. A closed-form analysis of a sphere--plane example demonstrates that curvature can be inferred directly from the LEDR evolution. This framework provides a unified geometric interpretation of structured error dynamics and offers foundational tools for curvature-informed model validation.
Two-dimensional Decompositions of High-dimensional Configurations for Efficient Multi-vehicle Coordination at Intelligent Intersections
For multi-vehicle complex traffic scenarios in shared spaces such as intelligent intersections, safe coordination and trajectory planning is challenging due to computational complexity. To meet this challenge, we introduce a computationally efficient method for generating collision-free trajectories along predefined vehicle paths. We reformulate a constrained minimum-time trajectory planning problem as a problem in a high-dimensional configuration space, where conflict zones are modeled by high-dimensional polyhedra constructed from two-dimensional rectangles. Still, in such a formulation, as the number of vehicles involved increases, the computational complexity increases significantly. To address this, we propose two algorithms for near-optimal local optimization that significantly reduce the computational complexity by decomposing the high-dimensional problem into a sequence of 2D graph search problems. The resulting trajectories are then incorporated into a Nonlinear Model Predictive Control (NMPC) framework to ensure safe and smooth vehicle motion. We furthermore show in numerical evaluation that this approach significantly outperforms existing MILP-based time-scheduling; both in terms of objective-value and computational time.
High-Dimensional Surrogate Modeling for Closed-Loop Learning of Neural-Network-Parameterized Model Predictive Control
Learning controller parameters from closed-loop data has been shown to improve closed-loop performance. Bayesian optimization, a widely used black-box and sample-efficient learning method, constructs a probabilistic surrogate of the closed-loop performance from few experiments and uses it to select informative controller parameters. However, it typically struggles with dense high-dimensional controller parameterizations, as they may appear, for example, in tuning model predictive controllers, because standard surrogate models fail to capture the structure of such spaces. This work suggests that the use of Bayesian neural networks as surrogate models may help to mitigate this limitation. Through a comparison between Gaussian processes with Matern kernels, finite-width Bayesian neural networks, and infinite-width Bayesian neural networks on a cart-pole task, we find that Bayesian neural network surrogate models achieve faster and more reliable convergence of the closed-loop cost and enable successful optimization of parameterizations with hundreds of dimensions. Infinite-width Bayesian neural networks also maintain performance in settings with more than one thousand parameters, whereas Matern-kernel Gaussian processes rapidly lose effectiveness. These results indicate that Bayesian neural network surrogate models may be suitable for learning dense high-dimensional controller parameterizations and offer practical guidance for selecting surrogate models in learning-based controller design.
comment: 7 pages, 2 figures
Architecting Large Action Models for Human-in-the-Loop Intelligent Robots
The realization of intelligent robots, operating autonomously and interacting with other intelligent agents, human or artificial, requires the integration of environment perception, reasoning, and action. Classic Artificial Intelligence techniques for this purpose, focusing on symbolic approaches, have long-ago hit the scalability wall on compute and memory costs. Advances in Large Language Models in the past decade (neural approaches) have resulted in unprecedented displays of capability, at the cost of control, explainability, and interpretability. Large Action Models aim at extending Large Language Models to encompass the full perception, reasoning, and action cycle; however, they typically require substantially more comprehensive training and suffer from the same deficiencies in reliability. Here, we show it is possible to build competent Large Action Models by composing off-the-shelf foundation models, and that their control, interpretability, and explainability can be effected by incorporating symbolic wrappers and associated verification on their outputs, achieving verifiable neuro-symbolic solutions for intelligent robots. Our experiments on a multi-modal robot demonstrate that Large Action Model intelligence does not require massive end-to-end training, but can be achieved by integrating efficient perception models with a logic-driven core. We find that driving action execution through the generation of Planning Domain Definition Language (PDDL) code enables a human-in-the-loop verification stage that effectively mitigates action hallucinations. These results can support practitioners in the design and development of robotic Large Action Models across novel industries, and shed light on the ongoing challenges that must be addressed to ensure safety in the field.
A Modeling and Optimization Framework for Fostering Modal Shift through the Integration of Tradable Credits and Demand-Responsive Autonomous Shuttles
Tradable Credit Schemes (TCS) promote the use of public and shared transport by capping private car usage while maintaining fair welfare outcomes by allowing credit trading. However, most existing studies assume unlimited public transit capacity or a fixed occupancy of shared modes, often neglecting waiting time and oversimplifying time-based costs by depending solely on in-vehicle travel time. These assumptions can overstate the system's performance with TCS regulation, especially when there are insufficient public or shared transport supplies. To address this, we develop a dynamic multimodal equilibrium model to capture operation constraints and induced waiting times under TCS regulation. The model integrates travelers' mode choices, credit trading, traffic dynamics, and waiting time, which depend on key operational features of service vehicles such as fleet size and capacity. Besides, most TCS studies assume fixed transport supply, overlooking supply-side responses triggered by demand shifts. Therefore, we further propose integrating adaptive supply management through the deployment of Demand-Responsive Autonomous Shuttles (DRAS) and developing a bi-level optimization framework that incorporates the equilibrium model to jointly optimize TCS design and operational strategies for the DRAS. We apply the framework to a section of the A10 highway near Paris, France, to examine demand-supply interactions and assess the potential benefits of jointly implementing TCS and DRAS. Numerical results demonstrate the importance of modeling operational features within multimodal equilibrium and incorporating flexible supply in TCS policies for mitigating overall generalized cost.
Safe Bayesian optimization across noise models via scenario programming
Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and safety guarantees. However, most safe BO algorithms assume homoscedastic sub-Gaussian measurement noise, an assumption that does not hold in many relevant applications. In this article, we propose a straightforward yet rigorous approach for safe BO across noise models, including homoscedastic sub-Gaussian and heteroscedastic heavy-tailed distributions. We provide a high-probability bound on the measurement noise via the scenario approach, integrate these bounds into high probability confidence intervals, and prove safety and optimality for our proposed safe BO algorithm. We deploy our algorithm in synthetic examples and in tuning a controller for the Franka Emika manipulator in simulation.
comment: Accepted for publication (IEEE Control System Letters)
Data-driven control-oriented modelling for MPC-based control of urban drainage systems
This article presents a data-driven, control-oriented modelling methodology for urban drainage systems (UDS). The proposed framework requires three main key components: input-output data from the element to be modelled, expert knowledge to define the model structure, and data-fitting techniques to obtain optimal parameters. The methodology is evaluated using a realistic benchmark from an UDS in Madrid, Spain. The results show high model accuracy and improved performance within a MPC scheme, reducing discharge and increasing treatment facilities utilization.
comment: This work has been submitted to IFAC WC 2026 for review. It has 7 pages and 2 figures
Shared Situational Awareness Using Hybrid Zonotopes with Confidence Metric
Situational awareness for connected and automated vehicles describes the ability to perceive and predict the behavior of other road-users in the near surroundings. However, pedestrians can become occluded by vehicles or infrastructure, creating significant safety risks due to limited visibility. Vehicle-to-everything communication enables the sharing of perception data between connected road-users, allowing for a more comprehensive awareness. The main challenge is how to fuse perception data when measurements are inconsistent with the true locations of pedestrians. Inconsistent measurements can occur due to sensor noise, false positives, or communication issues. This paper employs set-based estimation with constrained zonotopes to compute a confidence metric for the measurement set from each sensor. These sets and their confidences are then fused using hybrid zonotopes. This method can account for inconsistent measurements, enabling reliable and robust fusion of the sensor data. The effectiveness of the proposed method is demonstrated in both simulation and real experiments.
Optimal Delay Compensation in Networked Predictive Control
Networked Predictive Control is widely used to mitigate the effect of delays and dropouts in Networked Control Systems, particularly when these exceed the sampling time. A key design choice of these methods is the delay bound, which determines the prediction horizon and the robustness to information loss. This work develops a systematic method to select the optimal bound by quantifying the trade-off between prediction errors and open-loop operation caused by communication losses. Simulation studies demonstrate the performance gains achieved with the optimal bound.
comment: Submitted to the 2026 IFAC World Congress
A Robust Model Predictive Control Method for Networked Control Systems
Robustly compensating network constraints such as delays and packet dropouts in networked control systems is crucial for remotely controlling dynamical systems. This work proposes a novel prediction consistent method to cope with delays and packet losses as encountered in UDP-type communication systems. The augmented control system preserves all properties of the original model predictive control method under the network constraints. Furthermore, we propose to use linear tube MPC with the novel method and show that the system converges robustly to the origin under mild conditions. We illustrate this with simulation examples of a cart pole and a continuous stirred tank reactor.
comment: Accepted for publication in the Proceedings of the 63rd IEEE Conference on Decision and Control
An Input-Output Data-Driven Dissipativity Approach for Compositional Stability Certification of Interconnected LTI MIMO Systems
We propose an input-output data-driven framework for certifying the stability of interconnected multiple-input-multiple-output linear time-invariant discrete-time systems via QSR-dissipativity. That is, by using measured input-output trajectories of each subsystem, we verify dissipative properties and extract local passivity indices without requiring an explicit model identification.These passivity indices are then used to derive conditions under which the equilibrium of the interconnected system is stable. In particular, the framework identifies how the lack of passivity in some subsystems can be compensated by surpluses in others. The proposed approach enables a compositional stability analysis by combining subsystem-level conditions into a criterion valid for the overall interconnected system. We illustrate via a numerical case study, how to compute channel-wise passivity indices and infer stability guarantees directly from data with the proposed method.
Results for Global Attractivity of Interior Equilibrium Points for Lotka-Volterra Systems
This paper provides global attractivity results for the interior equilibrium point of a general Lotka-Volterra system with no restriction on the dimension of the system and with no special structure or properties of the interaction matrix. The main result contains as special cases all known general results, including the Volterra-Lyapunov theorem and the recently proposed eigenvector conditions. Moreover, global attractivity of the interior equilibrium point is shown for a three-dimensional example, where none of the existing general results can be applied.
comment: 20 pages
Incremental Validation of Automated Driving Functions using Generic Volumes in Micro- Operational Design Domains
The validation of highly automated, perception-based driving systems must ensure that they function correctly under the full range of real-world conditions. Scenario-based testing is a prominent approach to addressing this challenge, as it involves the systematic simulation of objects and environments. Operational Design Domains (ODDs) are usually described using a taxonomy of qualitative designations for individual objects. However, the process of transitioning from taxonomy to concrete test cases remains unstructured, and completeness is theoretical. This paper introduces a structured method of subdividing the ODD into manageable sections, termed micro-ODDs (mODDs), and deriving test cases with abstract object representations. This concept is demonstrated using a one-dimensional, laterally guided manoeuvre involving a shunting locomotive within a constrained ODD. In this example, mODDs are defined and refined into narrow taxonomies that enable test case generation. Obstacles are represented as generic cubes of varying sizes, providing a simplified yet robust means of evaluating perception performance. A series of tests were conducted in a closed-loop, co-simulated virtual environment featuring photorealistic rendering and simulated LiDAR, GNSS and camera sensors. The results demonstrate how edge cases in obstacle detection can be systematically explored and how perception quality can be evaluated based on observed vehicle behaviour, using crash versus safe stop as the outcome metrics. These findings support the development of a standardised framework for safety argumentation and offer a practical step towards the validation and authorisation of automated driving functions.
Controlled Evolution-Based Day-Ahead Robust Dispatch Considering Frequency Security with Frequency Regulation Loads and Curtailable Loads
With the extensive integration of volatile and uncertain renewable energy, power systems face significant challenges in primary frequency regulation due to instantaneous power fluctuations. However, the maximum frequency deviation constraint is inherently non-convex, and commonly used two-stage dispatch methods overlook causality, potentially resulting in infeasible day-ahead decisions. This paper presents a controlled evolution-based day-ahead robust dispatch method to address these issues. First, we suggest the convex relaxation technique to transform the maximum frequency deviation constraint to facilitate optimization. Then, an evolution-based robust dispatch framework is introduced to align day-ahead decisions with intraday strategies, ensuring both frequency security and power supply reliability. Additionally, a novel controlled evolution-based algorithm is developed to solve this framework efficiently. Case studies on a modified IEEE 14-bus system demonstrate the superiority of the proposed method in enhancing frequency security and system reliability.
comment: 2025 IEEE Power & Energy Society General Meeting (PESGM)
ALS-U AR RF Equipment Protection System
This paper presents the design and status of Accumulator Ring (AR) RF Equipment Protection System (EPS) of Advanced Light Source Upgrade project at LBNL. The key components of AR RF EPS include a Master Interlock PLC subsystem handling supervisory control and slow interlocks in \SI{}{\milli\second} scale, an FPGA-based LLRF Controller managing fast interlocks in \SI{}{\micro\second} scale, a 60 kW high-power amplifier with standalone PLC-based slow (\SI{}{\milli\second} scale) and FPGA-based fast (\SI{}{\micro\second} scale) protection systems, and an RF Drive Control Chassis acting as primary RF mitigation device. The design of AR RF EPS is presented along with internal RF and external AR subsystems interfaces.
Gig-work Management System with Chance-Constraints Verification Algorithm
This paper proposes the framework of an efficient gig-work management system. A gig-work management system recommends one-off tasks with information about task hours and wages to gig-workers. To enable effective management, this paper develops a model of gig-workers' decision-making. Then, based on the model, we formulate an optimization problem to determine the optimal task hours and wages. The formulated problem belongs to the class of chance-constrained model predictive control (CC-MPC) problems. To efficiently solve the CC-MPC problem, we develop an approximate solution algorithm with guaranteed confidence levels. Finally, we develop gig-worker models based on data collected through crowdsourcing.
comment: 6 pages, 5 figures, submitted to IFAC World Congress 2026
Model Reduction of Multicellular Communication Systems via Singular Perturbation: Sender Receiver Systems
We investigate multicellular sender receiver systems embedded in hydrogel beads, where diffusible signals mediate interactions among heterogeneous cells. Such systems are modeled by PDE ODE couplings that combine three dimensional diffusion with nonlinear intracellular dynamics, making analysis and simulation challenging. We show that the diffusion dynamics converges exponentially to a quasi steady spatial profile and use singular perturbation theory to reduce the model to a finite dimensional multiagent network. A closed form communication matrix derived from the spherical Green's function captures the effective sender receiver coupling. Numerical results show the reduced model closely matches the full dynamics while enabling scalable simulation of large cell populations.
Mitigating Dynamic Tip-Over during Mobile Crane Slewing using Input Shaping
Payload swing during rapid slewing of mobile cranes poses a safety risk, as it generates overturning moments that can lead to tip-over accidents of mobile cranes. Currently, to limit the risk of tip-over, mobile crane operators are forced to either reduce the slewing speed (which lowers productivity) or reduce the load being carried to reduce the induced moments. Both of these approaches reduce productivity. This paper seeks to enable rapid slewing without compromising safety by applying input shaping to the crane-slewing commands generated by the operator. A key advantage of this approach is that the input shaper requires only the information about the rope length, and does not require detailed mobile crane dynamics. Simulations and experiments show that the proposed method reduces residual payload swing and enables significantly higher slewing speeds without tip over, reducing slewing completion time by at least 38% compared to unshaped control. Human control with input shaping improves task completion time by 13%, reduces the peak swing by 18%, and reduces the potential of collisions by 82% when compared to unshaped control. Moreover, shaped control with a human had no tip-over, whereas large swing led to tip-over without input shaping. Thereby, the proposed method substantially recovers the operational-safety envelope of mobile cranes (designed to avoid tip-over using static analysis) that would otherwise be lost in dynamic conditions. Videos and demonstrations are available at https://youtu.be/dVy3bbIhrBU.
comment: Submitted to conference
Congestion Reduction in EV Charger Placement Using Traffic Equilibrium Models
Growing EV adoption can worsen traffic conditions if chargers are sited without regard to their impact on congestion. We study how to strategically place EV chargers to reduce congestion using two equilibrium models: one based on congestion games and one based on an atomic queueing simulation. We apply both models within a scalable greedy station-placement algorithm. Experiments show that this greedy scheme yields optimal or near-optimal congestion outcomes in realistic networks, even though global optimality is not guaranteed as we show with a counterexample. We also show that the queueing-based approach yields more realistic results than the congestion-game model, and we present a unified methodology that calibrates congestion delays from queue simulation and solves equilibrium in link-space.
Goal Reaching with Eikonal-Constrained Hierarchical Quasimetric Reinforcement Learning
Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
DT-MPC: Synthesizing Derivation-Free Model Predictive Control from Power Converter Netlists via Physics-Informed Neural Digital Twins
Model Predictive Control (MPC) is a powerful control strategy for power electronics, but it highly relies on manually-derived and topology-specific analytical models, which is labor-intensive and time-consuming in practical designs. To overcome this bottleneck, this paper introduces a Digital-Twin-based MPC (DT-MPC) framework for generic power converters that can systematically translate a high-level circuit into an objective-aware control policy by leveraging a DT as a high-fidelity system model. Furthermore, a physics-informed neural surrogate predictor is proposed to accelerate predictions by DT and enable real-time operation. A gradient-free simplex search optimizer is also introduced to efficiently handle complex multi-objective optimization. The efficacy of the framework has been validated through a cloud-to-edge deployment on a 1500 W dual active bridge converter. Experimental results show that the synthesized predictive model achieves an inference speed over 7 times faster than real time, the DT-MPC controller outperforms several human-designed counterparts, and the overall framework reduces engineering design time by over 95\%, verifying the superiority of DT-MPC on generalized power converters.
comment: 15 pages, 15 figures
Modified Hybrid A* Collision-Free Path-Planning for Automated Reverse Parking
Parking a vehicle in tight spaces is a challenging task to perform due to the scarcity of feasible paths that are also collision-free. This paper presents a strategy to tackle this kind of maneuver with a modified Hybrid-A* path-planning algorithm that combines the feasibility guarantee inherent in the standard Hybrid A* algorithm with the addition of static obstacle collision avoidance. A kinematic single-track model is derived to describe the low-speed motion of the vehicle, which is subsequently used as the motion model in the Hybrid A* path-planning algorithm to generate feasible motion primitive branches. The model states are also used to reconstruct the vehicle centerline, which, in conjunction with an inflated binary occupancy map, facilitates static obstacle collision avoidance functions. Simulation study and animation are set up to test the efficacy of the approach, and the proposed algorithm proves to consistently provide kinematically feasible trajectories that are also collision-free.
Online Full ZVS Optimization for Modular Multi-Active Bridge Converter in MV PET
Multi-active bridge (MAB) converters, the core of the state-of-the-art medium-voltage power electronic transformers, can flexibly connect multiple DC ports among distributed DC grids and loads, but suffer from hard switching under conventional single phase-shift control, especially under unbalanced voltage conversion ratios and light load conditions. Although some offline methods manage to improve the efficiency through complex optimization structures, there lacks online optimization methods that are simple but effective due to the strong coupling among ports of the converter. By leveraging the time-domain model of the MAB converter under the multiple phase-shift modulation scheme, this paper simplifies the optimization process and proposes an online optimization method that can achieve full zero-voltage switching (ZVS) operation regardless of the load conditions. The proposed method has simple solutions with only voltage conversion ratios involved and can be implemented within a wide operation range without additional sensors or advanced controllers. A four-port MAB converter is constructed as the prototype. The simulation and experimental results have verified the feasibility and superiority of the proposed online strategy in achieving ZVS operation, dynamic response, and efficiency improvement.
comment: 8 pages, 11 figures. Accecpted by APEC 2026
Bandit-Based Rate Adaptation for a Single-Server Queue
This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot $t$ the transmitter chooses a rate $V(t)$ from a continuous interval. Transmission succeeds if and only if $V(t)\le C(t)$, where channel capacities $\{C(t)\}$ and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let $\varepsilon>0$ denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of $\varepsilon$, achieves a $\mathcal{O}\!\big(\log^{3.5}(1/\varepsilon)/\varepsilon^{3}\big)$ time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether $\varepsilon$ is known, there exists an environment in which the worst-case time-average expected queue size is $Ω(1/\varepsilon^{2})$. Thus, while a gap remains in the setting without knowledge of $\varepsilon$, we show that if $\varepsilon$ is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves $\mathcal{O}\!\big(\log(1/\varepsilon)/\varepsilon^{2}\big)$, matching the converse up to logarithmic factors.
Taylor-Lagrange Control for Safety-Critical Systems
This paper proposes a novel Taylor-Lagrange Control (TLC) method for nonlinear control systems to ensure the safety and stability through Taylor's theorem with Lagrange remainder. To achieve this, we expand a safety or stability function with respect to time along the system dynamics using the Lie derivative and Taylor's theorem. This expansion enables the control input to appear in the Taylor series at an order equivalent to the relative degree of the function. We show that the proposed TLC provides necessary and sufficient conditions for system safety and is applicable to systems and constraints of arbitrary relative degree. The TLC exhibits connections with existing Control Barrier Function (CBF) and Control Lyapunov Function (CLF) methods, and it further extends the CBF and CLF methods to the complex domain, especially for higher order cases. Compared to High-Order CBFs (HOCBFs), TLC is less restrictive as it does not require forward invariance of the intersection of a set of safe sets while HOCBFs do. We employ TLC to reformulate a constrained optimal control problem as a sequence of quadratic programs with a zero-order hold implementation method, and demonstrate the safety of zero-order hold TLC using an event-triggered control method to address inter-sampling effects. Finally, we illustrate the effectiveness of the proposed TLC method through an adaptive cruise control system and a robot control problem, and compare it with existing CBF methods.
comment: 13 pages
Pivot-Only Azimuthal Control and Attitude Estimation of Balloon-borne Payloads SC
This paper presents an attitude estimation and yaw-rate control framework for balloon-borne payloads using pivot-only actuation, motivated by the Taurus experiment. Taurus is a long-duration balloon instrument designed for rapid azimuthal scanning at approximately 30 deg/s using a motorized pivot at the flight-train connection, without a reaction wheel. We model the gondola as a rigid body subject to realistic disturbances and sensing limitations, and implement a Multiplicative Extended Kalman Filter (MEKF) that estimates attitude and gyroscope bias by fusing inertial and vector-camera measurements. A simple PI controller uses the estimated states to regulate yaw rate. Numerical simulations incorporating representative disturbance and measurement noise levels are used to evaluate closed-loop control performance and MEKF behavior under flight-like conditions. Experimental tests on the Taurus gondola validate the pivot-only approach, demonstrating stable high-rate tracking under realistic hardware constraints. The close agreement between simulation and experiment indicates that the simplified rigid-body model captures the dominant dynamics relevant for controller design and integrated estimation-and-control development.
comment: AIAA SCITECH 2026 Forum
MTTR-A: Measuring Cognitive Recovery Latency in Multi-Agent Systems
Ensuring cognitive stability in autonomous multi-agent systems (MAS) is a central challenge for large-scale, distributed AI. While existing observability tools monitor system outputs, they cannot quantify how rapidly agentic workflows recover once reasoning coherence has been lost. We adapt classical reliability metrics-Mean Time-to-Recovery (MTTR), Mean Time Between Failures (MTBF), and related ratios-into the cognitive domain, defining MTTR-A (Mean Time-to-Recovery for Agentic Systems) as a runtime measure of cognitive recovery latency. MTTR-A quantifies the time required for a MAS to detect reasoning drift and restore consistent operation, capturing the recovery of reasoning coherence rather than infrastructural repair. A benchmark simulation using the AG~News corpus and the LangGraph orchestration framework was conducted, modeling recovery latencies across multiple reflex modes. Automated reflexes restored stability within approximately 6s on average, while human-approval interventions required about 12s. Across 200 runs, the median simulated MTTR-A was 6.21+-2.14s, MTBF=6.7+-2.14s, and NRR=0.08, demonstrating measurable runtime resilience across reflex strategies. By formalizing recovery latency as a quantifiable property of distributed reasoning-and deriving reliability bounds linking recovery time and cognitive uptime-this work establishes a foundation for runtime dependability in agentic cognition, transforming cognitive recovery from an ad-hoc process into a standardized, interpretable performance
comment: preprint
Convex computation of regions of attraction from data using Sums-of-Squares programming
This paper focuses on the analysis of the Region of Attraction (RoA) for unknown autonomous dynamical systems. A data-driven approach based on the moment-Sum-of-Squares (SoS) hierarchy is proposed, enabling novel RoA outer approximations despite the reduced information on the dynamics. The main contribution consists of bypassing the system model and, hence, the recurring constraint on its polynomial structure. Numerical experiments showcase the influence of data on learned approximating sets, highlighting the potential of this method.
A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research
Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.
comment: Under Review
Higher-Order Meta Distribution Reliability Analysis of Wireless Networks
Communication reliability, as defined by 3GPP, is the probability of achieving a desired quality of service (QoS). Traditionally, this metric is evaluated by averaging the QoS success indicator over spatiotemporal random variables. Recently, the meta distribution (MD) has emerged as a two-level analysis tool that characterizes system-level reliability as a function of link-level reliability thresholds. However, existing MD studies have two limitations. First, they focus exclusively on spatial and temporal randomness corresponding to node distribution and fading channels, respectively, leaving stochastic behaviors in other domains largely unexplored. Second, they are restricted to first-order MDs with two randomness levels, restricting applicability to scenarios requiring higher-order MD characterization. To address these gaps, we propose a hierarchical framework for higher-order MD reliability in wireless networks, where each layer's success probability is formulated and fed into the next layer, yielding overall MD reliability at the highest level. We apply this framework to wireless networks by capturing three levels of temporal dynamics representing fast, slow, and static random elements, and provide a comprehensive second-order MD reliability analysis for two application scenarios. The effectiveness of the proposed approach is demonstrated via these representative scenarios, supported by detailed analytical and numerical evaluations. Our results highlight the value of hierarchical MD representations across multiple domains and reveal the significant influence of inner-layer target reliabilities on overall performance.
Stabilizing Rate of Stochastic Control Systems
This paper develops a quantitative framework for analyzing the mean-square exponential stabilization of stochastic linear systems with multiplicative noise, focusing specifically on the optimal stabilizing rate, which characterizes the fastest exponential stabilization achievable under admissible control policies. Our contributions are twofold. First, we extend norm-based techniques from deterministic switched systems to the stochastic setting, deriving a verifiable necessary and sufficient condition for the exact attainability of the optimal stabilizing rate, together with computable upper and lower bounds. Second, by restricting attention to state-feedback policies, we reformulate the optimal stabilizing rate problem as an optimal control problem with a nonlinear cost function and derive a Bellman-type equation. Since this Bellman-type equation is not directly tractable, we recast it as a nonlinear matrix eigenvalue problem whose valid solutions require strictly positive-definite matrices. To ensure the existence of such solutions, we introduce a regularization scheme and develop a Regularized Normalized Value Iteration (RNVI) algorithm, which in turn generates strictly positive-definite fixed points for a perturbed version of original nonlinear matrix eigenvalue problem while producing feedback controllers. Evaluating these regularized solutions further yields certified lower and upper bounds for the optimal stabilizing rate, resulting in a constructive and verifiable framework for determining the fastest achievable mean-square stabilization under multiplicative noise.
comment: 30 pages
Linear Quadratic Regulators: A New Look
Linear time-invariant control systems can be considered as finitely generated modules over the commutative principal ideal ring $\mathbb{R}[\frac{d}{dt}]$ of linear differential operators with respect to the time derivative. The Kalman controllability in this algebraic language is translated as the freeness of the system module. Linear quadratic regulators rely on quadratic Lagrangians, or cost functions. Any flat output, i.e., any basis of the corresponding free module leads to an open-loop control strategy via an Euler-Lagrange equation, which becomes here a linear ordinary differential equation with constant coefficients. In this approach, the two-point boundary value problem, including the control variables, becomes tractable. It yields notions of optimal time horizon, optimal parameter design and optimal rest-to-rest trajectories. The loop is closed via an intelligent controller derived from model-free control, which is known to exhibit excellent performance concerning model mismatches and disturbances.
Multiport Analytical Pixel Electromagnetic Simulator (MAPES) for AI-assisted RFIC and Microwave Circuit Design
This paper proposes a novel analytical framework, termed the Multiport Analytical Pixel Electromagnetic Simulator (MAPES). MAPES enables efficient and accurate prediction of the electromagnetic (EM) performance of arbitrary pixel-based microwave (MW) and RFIC structures. Inspired by the Integrated Internal Multiport Method (IMPM), MAPES extends the concept to the pixel presence/absence domain used in AI-assisted EM design. By introducing virtual pixels and diagonal virtual pixels and inserting virtual ports at critical positions, MAPES captures all horizontal, vertical, and diagonal electromagnetic couplings within a single multiport impedance matrix. Only a small set of full-wave simulations (typically about 1% of the datasets required by AI-assisted EM simulators) is needed to construct this matrix. Subsequently, any arbitrary pixel configuration can be evaluated analytically using a closed-form multiport relation without additional full-wave calculations. The proposed approach eliminates data-driven overfitting and ensures accurate results across all design variations. Comprehensive examples for single- and double-layer CMOS processes (180 nm and 65 nm) and PCBs confirm that MAPES achieves high prediction accuracy with 600- 2000x speed improvement compared to CST simulations. Owing to its efficiency, scalability and reliability, MAPES provides a practical and versatile tool for AI-assisted MW circuit and RFIC design across diverse fabrication technologies.
Adaptive Compressive Tactile Subsampling: Enabling High Spatiotemporal Resolution in Scalable Robotic Skin
Robots require full-body, high-resolution tactile sensing to operate safely in unstructured environments, enabling reflexive responses and closed-loop control. However, the pixel counts needed for dense, large-area coverage limit readout rates of most tactile arrays to <100 Hz, hindering their use in high-speed tasks. We present Adaptive Compressive Tactile Subsampling (ACTS), a scalable and data-driven method that greatly enhances traditional tactile matrices by leveraging adaptive sensor sampling and sparse recovery. By adaptively allocating measurements to informative regions, ACTS is especially effective for spatially sparse signals common in real-world interactions. Tested on a 1024-pixel tactile sensor array (32x32), ACTS achieved frame rates up to 1,000 Hz, an 18X improvement over conventional raster scanning, with minimal reconstruction error. For the first time, ACTS enables wearable, large-area, high-density tactile sensing systems that can deliver high-speed results. We demonstrate rapid object classification within 20 ms of contact, high-speed projectile detection, ricochet angle estimation, and soft deformation tracking, in tactile and robotics applications, all using flexible, high-density tactile arrays. These include high-resolution tactile gloves, pressure insoles, and full-body configurations covering robotic arms and human-sized mannequins. We further showcase tactile-based closed-loop control by guiding a metallic ball to trace letters using tactile feedback and by executing tactile-only whole-hand reflexes on a fully sensorized LEAP hand to stabilize grasps, prevent slip, and avoid sharp objects, validating ACTS for real-time interaction and motion control. ACTS transforms standard, low-cost, and robust tactile sensors into high-speed systems enabling scalable, responsive, and adaptive tactile perception for robots and wearables operating in dynamic environments.
comment: 51 pages, 11 main figures, 16 supplemental figures, Videos can be accessed at https://tinyurl.com/ACTS-videos
Local Dissipativity Analysis of Nonlinear Systems
Dissipativity is an input-output (IO) characterization of nonlinear systems that enables compositional robust control through Vidyasagar's Network Dissipativity Theorem (VDNT). However, determining the dissipativity of a system is an involved and, often, model-specific process. We present a general method to determine the local dissipativity properties of smooth, nonlinear, control affine systems. We simultaneously search for the optimal IO characterization of a system and synthesize a continuous piecewise affine (CPA) storage function via a convex optimization problem. To do so, we reformulate the dissipation inequality as a matrix inequality (MI) and develop novel linear matrix inequality (LMI) bounds for a triangulation to impose the dissipativity conditions on the CPA storage function Further, we develop a method to synthesize a combined quadratic and CPA storage function to expand the systems the optimization problem is applicable to. Finally, we establish that our method will always find a feasible IO characterization and storage function given that the system is sufficiently strictly locally dissipative and demonstrate the efficacy of our method in determining the conic bounds and gain of various nonlinear systems.
Robotics
ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning
Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapid, high-frequency local contact dynamics. Integrating these signals is challenging due to their fundamental frequency and informational disparities. In this work, we propose ImplicitRDP, a unified end-to-end visual-force diffusion policy that integrates visual planning and reactive force control within a single network. We introduce Structural Slow-Fast Learning, a mechanism utilizing causal attention to simultaneously process asynchronous visual and force tokens, allowing the policy to perform closed-loop adjustments at the force frequency while maintaining the temporal coherence of action chunks. Furthermore, to mitigate modality collapse where end-to-end models fail to adjust the weights across different modalities, we propose Virtual-target-based Representation Regularization. This auxiliary objective maps force feedback into the same space as the action, providing a stronger, physics-grounded learning signal than raw force prediction. Extensive experiments on contact-rich tasks demonstrate that ImplicitRDP significantly outperforms both vision-only and hierarchical baselines, achieving superior reactivity and success rates with a streamlined training pipeline. Code and videos will be publicly available at https://implicit-rdp.github.io.
comment: Project page: https://implicit-rdp.github.io
Any4D: Unified Feed-Forward Metric 4D Reconstruction
We present Any4D, a scalable multi-view transformer for metric-scale, dense feed-forward 4D reconstruction. Any4D directly generates per-pixel motion and geometry predictions for N frames, in contrast to prior work that typically focuses on either 2-view dense scene flow or sparse 3D point tracking. Moreover, unlike other recent methods for 4D reconstruction from monocular RGB videos, Any4D can process additional modalities and sensors such as RGB-D frames, IMU-based egomotion, and Radar Doppler measurements, when available. One of the key innovations that allows for such a flexible framework is a modular representation of a 4D scene; specifically, per-view 4D predictions are encoded using a variety of egocentric factors (depthmaps and camera intrinsics) represented in local camera coordinates, and allocentric factors (camera extrinsics and scene flow) represented in global world coordinates. We achieve superior performance across diverse setups - both in terms of accuracy (2-3X lower error) and compute efficiency (15X faster), opening avenues for multiple downstream applications.
comment: Project Website: https://any-4d.github.io/
Curriculum-Based Reinforcement Learning for Autonomous UAV Navigation in Unknown Curved Tubular Conduit
Autonomous drone navigation in confined tubular environments remains a major challenge due to the constraining geometry of the conduits, the proximity of the walls, and the perceptual limitations inherent to such scenarios. We propose a reinforcement learning approach enabling a drone to navigate unknown three-dimensional tubes without any prior knowledge of their geometry, relying solely on local observations from LiDAR and a conditional visual detection of the tube center. In contrast, the Pure Pursuit algorithm, used as a deterministic baseline, benefits from explicit access to the centerline, creating an information asymmetry designed to assess the ability of RL to compensate for the absence of a geometric model. The agent is trained through a progressive Curriculum Learning strategy that gradually exposes it to increasingly curved geometries, where the tube center frequently disappears from the visual field. A turning-negotiation mechanism, based on the combination of direct visibility, directional memory, and LiDAR symmetry cues, proves essential for ensuring stable navigation under such partial observability conditions. Experiments show that the PPO policy acquires robust and generalizable behavior, consistently outperforming the deterministic controller despite its limited access to geometric information. Validation in a high-fidelity 3D environment further confirms the transferability of the learned behavior to a continuous physical dynamics. The proposed approach thus provides a complete framework for autonomous navigation in unknown tubular environments and opens perspectives for industrial, underground, or medical applications where progressing through narrow and weakly perceptive conduits represents a central challenge.
Decoupled Q-Chunking
Temporal-difference (TD) methods learn state and action values efficiently by bootstrapping from their own future value predictions, but such a self-bootstrapping mechanism is prone to bootstrapping bias, where the errors in the value targets accumulate across steps and result in biased value estimates. Recent work has proposed to use chunked critics, which estimate the value of short action sequences ("chunks") rather than individual actions, speeding up value backup. However, extracting policies from chunked critics is challenging: policies must output the entire action chunk open-loop, which can be sub-optimal for environments that require policy reactivity and also challenging to model especially when the chunk length grows. Our key insight is to decouple the chunk length of the critic from that of the policy, allowing the policy to operate over shorter action chunks. We propose a novel algorithm that achieves this by optimizing the policy against a distilled critic for partial action chunks, constructed by optimistically backing up from the original chunked critic to approximate the maximum value achievable when a partial action chunk is extended to a complete one. This design retains the benefits of multi-step value propagation while sidestepping both the open-loop sub-optimality and the difficulty of learning action chunking policies for long action chunks. We evaluate our method on challenging, long-horizon offline goal-conditioned tasks and show that it reliably outperforms prior methods. Code: github.com/ColinQiyangLi/dqc.
comment: 76 pages, 14 figures
Digital Twin Supervised Reinforcement Learning Framework for Autonomous Underwater Navigation
Autonomous navigation in underwater environments remains a major challenge due to the absence of GPS, degraded visibility, and the presence of submerged obstacles. This article investigates these issues through the case of the BlueROV2, an open platform widely used for scientific experimentation. We propose a deep reinforcement learning approach based on the Proximal Policy Optimization (PPO) algorithm, using an observation space that combines target-oriented navigation information, a virtual occupancy grid, and ray-casting along the boundaries of the operational area. The learned policy is compared against a reference deterministic kinematic planner, the Dynamic Window Approach (DWA), commonly employed as a robust baseline for obstacle avoidance. The evaluation is conducted in a realistic simulation environment and complemented by validation on a physical BlueROV2 supervised by a 3D digital twin of the test site, helping to reduce risks associated with real-world experimentation. The results show that the PPO policy consistently outperforms DWA in highly cluttered environments, notably thanks to better local adaptation and reduced collisions. Finally, the experiments demonstrate the transferability of the learned behavior from simulation to the real world, confirming the relevance of deep RL for autonomous navigation in underwater robotics.
Iterative Compositional Data Generation for Robot Control
Collecting robotic manipulation data is expensive, making it impractical to acquire demonstrations for the combinatorially large space of tasks that arise in multi-object, multi-robot, and multi-environment settings. While recent generative models can synthesize useful data for individual tasks, they do not exploit the compositional structure of robotic domains and struggle to generalize to unseen task combinations. We propose a semantic compositional diffusion transformer that factorizes transitions into robot-, object-, obstacle-, and objective-specific components and learns their interactions through attention. Once trained on a limited subset of tasks, we show that our model can zero-shot generate high-quality transitions from which we can learn control policies for unseen task combinations. Then, we introduce an iterative self-improvement procedure in which synthetic data is validated via offline reinforcement learning and incorporated into subsequent training rounds. Our approach substantially improves zero-shot performance over monolithic and hard-coded compositional baselines, ultimately solving nearly all held-out tasks and demonstrating the emergence of meaningful compositional structure in the learned representations.
V-OCBF: Learning Safety Filters from Offline Data via Value-Guided Offline Control Barrier Functions
Ensuring safety in autonomous systems requires controllers that satisfy hard, state-wise constraints without relying on online interaction. While existing Safe Offline RL methods typically enforce soft expected-cost constraints, they do not guarantee forward invariance. Conversely, Control Barrier Functions (CBFs) provide rigorous safety guarantees but usually depend on expert-designed barrier functions or full knowledge of the system dynamics. We introduce Value-Guided Offline Control Barrier Functions (V-OCBF), a framework that learns a neural CBF entirely from offline demonstrations. Unlike prior approaches, V-OCBF does not assume access to the dynamics model; instead, it derives a recursive finite-difference barrier update, enabling model-free learning of a barrier that propagates safety information over time. Moreover, V-OCBF incorporates an expectile-based objective that avoids querying the barrier on out-of-distribution actions and restricts updates to the dataset-supported action set. The learned barrier is then used with a Quadratic Program (QP) formulation to synthesize real-time safe control. Across multiple case studies, V-OCBF yields substantially fewer safety violations than baseline methods while maintaining strong task performance, highlighting its scalability for offline synthesis of safety-critical controllers without online interaction or hand-engineered barriers.
comment: 23 pages, 8 figure, 7 tables
AERMANI-Diffusion: Regime-Conditioned Diffusion for Dynamics Learning in Aerial Manipulators
Aerial manipulators undergo rapid, configuration-dependent changes in inertial coupling forces and aerodynamic forces, making accurate dynamics modeling a core challenge for reliable control. Analytical models lose fidelity under these nonlinear and nonstationary effects, while standard data-driven methods such as deep neural networks and Gaussian processes cannot represent the diverse residual behaviors that arise across different operating conditions. We propose a regime-conditioned diffusion framework that models the full distribution of residual forces using a conditional diffusion process and a lightweight temporal encoder. The encoder extracts a compact summary of recent motion and configuration, enabling consistent residual predictions even through abrupt transitions or unseen payloads. When combined with an adaptive controller, the framework enables dynamics uncertainty compensation and yields markedly improved tracking accuracy in real-world tests.
Distribution-Free Stochastic MPC for Joint-in-Time Chance-Constrained Linear Systems
This work presents a stochastic model predictive control (MPC) framework for linear systems subject to joint-in-time chance constraints under unknown disturbance distributions. Unlike existing stochastic MPC formulations that rely on parametric or Gaussian assumptions or require expensive offline computations, the proposed method leverages conformal prediction (CP) as a streamlined tool to construct finite-sample confidence regions for the system's stochastic error trajectories with minimal computational effort. These regions enable the relaxation of probabilistic constraints while providing formal guarantees. By employing an indirect feedback mechanism and a probabilistic set-based formulation, we prove recursive feasibility of the relaxed optimization problem and establish chance constraint satisfaction in closed-loop. Furthermore, we extend the approach to the more general output feedback setting with unknown measurement noise distributions. Given available noise samples, we establish satisfaction of the joint chance constraints and recursive feasibility via output measurements alone. Numerical examples demonstrate the effectiveness and advantages of the proposed method compared to existing approaches.
On the Stabilization of Rigid Formations on Regular Curves
This work deals with the problem of stabilizing a multi-agent rigid formation on a general class of planar curves. Namely, we seek to stabilize an equilateral polygonal formation on closed planar differentiable curves after a path sweep. The task of finding an inscribed regular polygon centered at the point of interest is solved via a randomized multi-start Newton-Like algorithm for which one is able to ascertain the existence of a minimizer. Then we design a continuous feedback law that guarantees convergence to, and sufficient sweeping of the curve, followed by convergence to the desired formation vertices while ensuring inter-agent avoidance. The proposed approach is validated through numerical simulations for different classes of curves and different rigid formations. Code: https://github.com/mebbaid/paper-elobaid-ifacwc-2026
How to Brake? Ethical Emergency Braking with Deep Reinforcement Learning
Connected and automated vehicles (CAVs) have the potential to enhance driving safety, for example by enabling safe vehicle following and more efficient traffic scheduling. For such future deployments, safety requirements should be addressed, where the primary such are avoidance of vehicle collisions and substantial mitigating of harm when collisions are unavoidable. However, conservative worst-case-based control strategies come at the price of reduced flexibility and may compromise overall performance. In light of this, we investigate how Deep Reinforcement Learning (DRL) can be leveraged to improve safety in multi-vehicle-following scenarios involving emergency braking. Specifically, we investigate how DRL with vehicle-to-vehicle communication can be used to ethically select an emergency breaking profile in scenarios where overall, or collective, three-vehicle harm reduction or collision avoidance shall be obtained instead of single-vehicle such. As an algorithm, we provide a hybrid approach that combines DRL with a previously published method based on analytical expressions for selecting optimal constant deceleration. By combining DRL with the previous method, the proposed hybrid approach increases the reliability compared to standalone DRL, while achieving superior performance in terms of overall harm reduction and collision avoidance.
Evaluating Gemini Robotics Policies in a Veo World Simulator
Generative world models hold significant potential for simulating interactions with visuomotor policies in varied environments. Frontier video models can enable generation of realistic observations and environment interactions in a scalable and general manner. However, the use of video models in robotics has been limited primarily to in-distribution evaluations, i.e., scenarios that are similar to ones used to train the policy or fine-tune the base video model. In this report, we demonstrate that video models can be used for the entire spectrum of policy evaluation use cases in robotics: from assessing nominal performance to out-of-distribution (OOD) generalization, and probing physical and semantic safety. We introduce a generative evaluation system built upon a frontier video foundation model (Veo). The system is optimized to support robot action conditioning and multi-view consistency, while integrating generative image-editing and multi-view completion to synthesize realistic variations of real-world scenes along multiple axes of generalization. We demonstrate that the system preserves the base capabilities of the video model to enable accurate simulation of scenes that have been edited to include novel interaction objects, novel visual backgrounds, and novel distractor objects. This fidelity enables accurately predicting the relative performance of different policies in both nominal and OOD conditions, determining the relative impact of different axes of generalization on policy performance, and performing red teaming of policies to expose behaviors that violate physical or semantic safety constraints. We validate these capabilities through 1600+ real-world evaluations of eight Gemini Robotics policy checkpoints and five tasks for a bimanual manipulator.
LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator
We propose LEO-RobotAgent, a general-purpose language-driven intelligent agent framework for robots. Under this framework, LLMs can operate different types of robots to complete unpredictable complex tasks across various scenarios. This framework features strong generalization, robustness, and efficiency. The application-level system built around it can fully enhance bidirectional human-robot intent understanding and lower the threshold for human-robot interaction. Regarding robot task planning, the vast majority of existing studies focus on the application of large models in single-task scenarios and for single robot types. These algorithms often have complex structures and lack generalizability. Thus, the proposed LEO-RobotAgent framework is designed with a streamlined structure as much as possible, enabling large models to independently think, plan, and act within this clear framework. We provide a modular and easily registrable toolset, allowing large models to flexibly call various tools to meet different requirements. Meanwhile, the framework incorporates a human-robot interaction mechanism, enabling the algorithm to collaborate with humans like a partner. Experiments have verified that this framework can be easily adapted to mainstream robot platforms including unmanned aerial vehicles (UAVs), robotic arms, and wheeled robot, and efficiently execute a variety of carefully designed tasks with different complexity levels. Our code is available at https://github.com/LegendLeoChen/LEO-RobotAgent.
Motion Planning for Safe Landing of a Human-Piloted Parafoil
Most skydiving accidents occur during the parafoil-piloting and landing stages and result from human lapses in judgment while piloting the parafoil. Training of novice pilots is protracted due to the lack of functional and easily accessible training simulators. Moreover, work on parafoil trajectory planning suitable for aiding human training remains limited. To bridge this gap, we study the problem of computing safe trajectories for human-piloted parafoil flight and examine how such trajectories fare against human-generated solutions. For the algorithmic part, we adapt the sampling-based motion planner Stable Sparse RRT (SST) by Li et al., to cope with the problem constraints while minimizing the bank angle (control effort) as a proxy for safety. We then compare the computer-generated solutions with data from human-generated parafoil flight, where the algorithm offers a relative cost improvement of 20\%-80\% over the performance of the human pilot. We observe that human pilots tend to, first, close the horizontal distance to the landing area, and then address the vertical gap by spiraling down to the suitable altitude for starting a landing maneuver. The algorithm considered here makes smoother and more gradual descents, arriving at the landing area at the precise altitude necessary for the final approach while maintaining safety constraints. Overall, the study demonstrates the potential of computer-generated guidelines, rather than traditional rules of thumb, which can be integrated into future simulators to train pilots for safer and more cost-effective flights.
Mr. Virgil: Learning Multi-robot Visual-range Relative Localization IROS
Ultra-wideband (UWB)-vision fusion localization has achieved extensive applications in the domain of multi-agent relative localization. The challenging matching problem between robots and visual detection renders existing methods highly dependent on identity-encoded hardware or delicate tuning algorithms. Overconfident yet erroneous matches may bring about irreversible damage to the localization system. To address this issue, we introduce Mr. Virgil, an end-to-end learning multi-robot visual-range relative localization framework, consisting of a graph neural network for data association between UWB rangings and visual detections, and a differentiable pose graph optimization (PGO) back-end. The graph-based front-end supplies robust matching results, accurate initial position predictions, and credible uncertainty estimates, which are subsequently integrated into the PGO back-end to elevate the accuracy of the final pose estimation. Additionally, a decentralized system is implemented for real-world applications. Experiments spanning varying robot numbers, simulation and real-world, occlusion and non-occlusion conditions showcase the stability and exactitude under various scenes compared to conventional methods. Our code is available at: https://github.com/HiOnes/Mr-Virgil.
comment: Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Neural Ranging Inertial Odometry ICRA
Ultra-wideband (UWB) has shown promising potential in GPS-denied localization thanks to its lightweight and drift-free characteristics, while the accuracy is limited in real scenarios due to its sensitivity to sensor arrangement and non-Gaussian pattern induced by multi-path or multi-signal interference, which commonly occurs in many typical applications like long tunnels. We introduce a novel neural fusion framework for ranging inertial odometry which involves a graph attention UWB network and a recurrent neural inertial network. Our graph net learns scene-relevant ranging patterns and adapts to any number of anchors or tags, realizing accurate positioning without calibration. Additionally, the integration of least squares and the incorporation of nominal frame enhance overall performance and scalability. The effectiveness and robustness of our methods are validated through extensive experiments on both public and self-collected datasets, spanning indoor, outdoor, and tunnel environments. The results demonstrate the superiority of our proposed IR-ULSG in handling challenging conditions, including scenarios outside the convex envelope and cases where only a single anchor is available.
comment: Accepted by 2025 IEEE International Conference on Robotics and Automation (ICRA)
Contact SLAM: An Active Tactile Exploration Policy Based on Physical Reasoning Utilized in Robotic Fine Blind Manipulation Tasks
Contact-rich manipulation is difficult for robots to execute and requires accurate perception of the environment. In some scenarios, vision is occluded. The robot can then no longer obtain real-time scene state information through visual feedback. This is called ``blind manipulation". In this manuscript, a novel physically-driven contact cognition method, called ``Contact SLAM", is proposed. It estimates the state of the environment and achieves manipulation using only tactile sensing and prior knowledge of the scene. To maximize exploration efficiency, this manuscript also designs an active exploration policy. The policy gradually reduces uncertainties in the manipulation scene. The experimental results demonstrated the effectiveness and accuracy of the proposed method in several contact-rich tasks, including the difficult and delicate socket assembly task and block-pushing task.
comment: 8 pages, 8 figures
Seamless Outdoor-Indoor Pedestrian Positioning System with GNSS/UWB/IMU Fusion: A Comparison of EKF, FGO, and PF
Accurate and continuous pedestrian positioning across outdoor-indoor environments remains challenging because GNSS, UWB, and inertial PDR are complementary yet individually fragile under signal blockage, multipath, and drift. This paper presents a unified GNSS/UWB/IMU fusion framework for seamless pedestrian localization and provides a controlled comparison of three probabilistic back-ends: an error-state extended Kalman filter, sliding-window factor graph optimization, and a particle filter. The system uses chest-mounted IMU-based PDR as the motion backbone and integrates absolute updates from GNSS outdoors and UWB indoors. To enhance transition robustness and mitigate urban GNSS degradation, we introduce a lightweight map-based feasibility constraint derived from OpenStreetMap building footprints, treating most building interiors as non-navigable while allowing motion inside a designated UWB-instrumented building. The framework is implemented in ROS 2 and runs in real time on a wearable platform, with visualization in Foxglove. We evaluate three scenarios: indoor (UWB+PDR), outdoor (GNSS+PDR), and seamless outdoor-indoor (GNSS+UWB+PDR). Results show that the ESKF provides the most consistent overall performance in our implementation.
comment: 8 pages, 4 figures, submitted to The 17th International Conference on Ambient Systems, Networks and Technologies
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we not explicitly hint that it is a misconception to think that humans learn fast. Learning process takes time. Babies start learning to move in the restricted liquid area called placenta. Children often are limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for dozen millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is no secret that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set a limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are kind of immersed in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update Actor and Critic in one pass, as well as combine Actor and Critic in one Object and implement their Losses in one line.
Design and Implementation of a High-Precision Wind-Estimation UAV with Onboard Sensors
Accurate real-time wind vector estimation is essential for enhancing the safety, navigation accuracy, and energy efficiency of unmanned aerial vehicles (UAVs). Traditional approaches rely on external sensors or simplify vehicle dynamics, which limits their applicability during agile flight or in resource-constrained platforms. This paper proposes a real-time wind estimation method based solely on onboard sensors. The approach first estimates external aerodynamic forces using a disturbance observer (DOB), and then maps these forces to wind vectors using a thin-plate spline (TPS) model. A custom-designed wind barrel mounted on the UAV enhances aerodynamic sensitivity, further improving estimation accuracy. The system is validated through comprehensive experiments in wind tunnels, indoor and outdoor flights. Experimental results demonstrate that the proposed method achieves consistently high-accuracy wind estimation across controlled and real-world conditions, with speed RMSEs as low as \SI{0.06}{m/s} in wind tunnel tests, \SI{0.22}{m/s} during outdoor hover, and below \SI{0.38}{m/s} in indoor and outdoor dynamic flights, and direction RMSEs under \ang{7.3} across all scenarios, outperforming existing baselines. Moreover, the method provides vertical wind estimates -- unavailable in baselines -- with RMSEs below \SI{0.17}{m/s} even during fast indoor translations.
comment: https://www.sciencedirect.com/science/article/abs/pii/S0263224125032415?via%3Dihub
RoboNeuron: A Modular Framework Linking Foundation Models and ROS for Embodied AI
Current embodied AI systems face severe engineering impediments, primarily characterized by poor cross-scenario adaptability, rigid inter-module coupling, and fragmented inference acceleration. To overcome these limitations, we propose RoboNeuron, a universal deployment framework for embodied intelligence. RoboNeuron is the first framework to deeply integrate the cognitive capabilities of Large Language Models (LLMs) and Vision-Language-Action (VLA) models with the real-time execution backbone of the Robot Operating System (ROS). We utilize the Model Context Protocol (MCP) as a semantic bridge, enabling the LLM to dynamically orchestrate underlying robotic tools. The framework establishes a highly modular architecture that strictly decouples sensing, reasoning, and control by leveraging ROS's unified communication interfaces. Crucially, we introduce an automated tool to translate ROS messages into callable MCP functions, significantly streamlining development. RoboNeuron significantly enhances cross-scenario adaptability and component flexibility, while establishing a systematic platform for horizontal performance benchmarking, laying a robust foundation for scalable real-world embodied applications.
CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSMP adopts a causal-learning-based dual-branch architecture to enhance generalization, while RLMR leverages panoramic visual prompting with chain-of-thought reasoning to support interpretable spatial understanding and navigation. We further introduce an uncertainty-aware collaboration mechanism (UCM) that adaptively fuses decisions from both models. For obstacle avoidance, in simulation, we replace the rule-based controller with a fully learnable point-goal policy, and in real-world deployment, we design a LiDAR-based clustering module for generating navigable waypoints and pair it with an online SLAM-based local controller. CLASH achieves state-of-the-art (SoTA) results (ranking 1-st) on the VLN-CE leaderboard, significantly improving SR and SPL on the test-unseen set over the previous SoTA methods. Real-world experiments demonstrate CLASH's strong robustness, validating its effectiveness in both simulation and deployment scenarios.
Design and Validation of an Under-actuated Robotic Finger with Synchronous Tendon Routing
Tendon-driven under-actuated robotic fingers provide advantages for dexterous manipulation through reduced actuator requirements and simplified mechanical design. However, achieving both high load capacity and adaptive compliance in a compact form remains challenging. This paper presents an under-actuated tendon-driven robotic finger (UTRF) featuring a synchronous tendon routing that mechanically couples all joints with fixed angular velocity ratios, enabling the entire finger to be actuated by a single actuator. This approach significantly reduces the number of actuators required in multi-finger hands, resulting in a lighter and more compact structure without sacrificing stiffness or compliance. The kinematic and static models of the finger are derived, incorporating tendon elasticity to predict structural stiffness. A single-finger prototype was fabricated and tested under static loading, showing an average deflection prediction error of 1.0 mm (0.322% of total finger length) and a measured stiffness of 1.2x10^3 N/m under a 3 kg tip load. Integration into a five-finger robotic hand (UTRF-RoboHand) demonstrates effective object manipulation across diverse scenarios, confirming that the proposed routing achieves predictable stiffness and reliable grasping performance with a minimal actuator count.
comment: 7 pages and 11 figures
Design of a six wheel suspension and a three-axis linear actuation mechanism for a laser weeding robot
Mobile robots are increasingly utilized in agriculture to automate labor-intensive tasks such as weeding, sowing, harvesting and soil analysis. Recently, agricultural robots have been developed to detect and remove weeds using mechanical tools or precise herbicide sprays. Mechanical weeding is inefficient over large fields, and herbicides harm the soil ecosystem. Laser weeding with mobile robots has emerged as a sustainable alternative in precision farming. In this paper, we present an autonomous weeding robot that uses controlled exposure to a low energy laser beam for weed removal. The proposed robot is six-wheeled with a novel double four-bar suspension for higher stability. The laser is guided towards the detected weeds by a three-dimensional linear actuation mechanism. Field tests have demonstrated the robot's capability to navigate agricultural terrains effectively by overcoming obstacles up to 15 cm in height. At an optimal speed of 42.5 cm/s, the robot achieves a weed detection rate of 86.2\% and operating time of 87 seconds per meter. The laser actuation mechanism maintains a minimal mean positional error of 1.54 mm, combined with a high hit rate of 97\%, ensuring effective and accurate weed removal. This combination of speed, accuracy, and efficiency highlights the robot's potential for significantly enhancing precision farming practices.
comment: 15 Pages, 10 figures
Lies We Can Trust: Quantifying Action Uncertainty with Inaccurate Stochastic Dynamics through Conformalized Nonholonomic Lie Groups
We propose Conformal Lie-group Action Prediction Sets (CLAPS), a symmetry-aware conformal prediction-based algorithm that constructs, for a given action, a set guaranteed to contain the resulting system configuration at a user-defined probability. Our assurance holds under both aleatoric and epistemic uncertainty, non-asymptotically, and does not require strong assumptions about the true system dynamics, the uncertainty sources, or the quality of the approximate dynamics model. Typically, uncertainty quantification is tackled by making strong assumptions about the error distribution or magnitude, or by relying on uncalibrated uncertainty estimates - i.e., with no link to frequentist probabilities - which are insufficient for safe control. Recently, conformal prediction has emerged as a statistical framework capable of providing distribution-free probabilistic guarantees on test-time prediction accuracy. While current conformal methods treat robots as Euclidean points, many systems have non-Euclidean configurations, e.g., some mobile robots have SE(2). In this work, we rigorously analyze configuration errors using Lie groups, extending previous Euclidean Space theoretical guarantees to SE(2). Our experiments on a simulated JetBot, and on a real MBot, suggest that by considering the configuration space's structure, our symmetry-informed nonconformity score leads to more volume-efficient prediction regions which represent the underlying uncertainty better than existing approaches.
comment: 13 pages, 7 figures. Under review
Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine
This paper presents a reinforcement learning framework that incorporates a Contextual Reward Machine for task-oriented grasping. The Contextual Reward Machine reduces task complexity by decomposing grasping tasks into manageable sub-tasks. Each sub-task is associated with a stage-specific context, including a reward function, an action space, and a state abstraction function. This contextual information enables efficient intra-stage guidance and improves learning efficiency by reducing the state-action space and guiding exploration within clearly defined boundaries. In addition, transition rewards are introduced to encourage or penalize transitions between stages which guides the model toward desirable stage sequences and further accelerates convergence. When integrated with the Proximal Policy Optimization algorithm, the proposed method achieved a 95% success rate across 1,000 simulated grasping tasks encompassing diverse objects, affordances, and grasp topologies. It outperformed the state-of-the-art methods in both learning speed and success rate. The approach was transferred to a real robot, where it achieved a success rate of 83.3% in 60 grasping tasks over six affordances. These experimental results demonstrate superior accuracy, data efficiency, and learning efficiency. They underscore the model's potential to advance task-oriented grasping in both simulated and real-world settings.
Latent Chain-of-Thought World Modeling for End-to-End Driving
Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior work uses natural language to express chain-of-thought (CoT) reasoning before producing driving actions. However, text may not be the most efficient representation for reasoning. In this work, we present Latent-CoT-Drive (LCDrive): a model that expresses CoT in a latent language that captures possible outcomes of the driving actions being considered. Our approach unifies CoT reasoning and decision making by representing both in an action-aligned latent space. Instead of natural language, the model reasons by interleaving (1) action-proposal tokens, which use the same vocabulary as the model's output actions; and (2) world model tokens, which are grounded in a learned latent world model and express future outcomes of these actions. We cold start latent CoT by supervising the model's action proposals and world model tokens based on ground-truth future rollouts of the scene. We then post-train with closed-loop reinforcement learning to strengthen reasoning capabilities. On a large-scale end-to-end driving benchmark, LCDrive achieves faster inference, better trajectory quality, and larger improvements from interactive reinforcement learning compared to both non-reasoning and text-reasoning baselines.
comment: Technical Report
Active Optics for Hyperspectral Imaging of Reflective Agricultural Leaf Sensors
Monitoring plant health increasingly relies on leaf-mounted sensors that provide real-time physiological data, yet efficiently locating and sampling these sensors in complex agricultural environments remains a major challenge. We present an integrated, adaptive, and scalable system that autonomously detects and interrogates plant sensors using a coordinated suite of low-cost optical components including a LiDAR, liquid lens, monochrome camera, filter wheel, and Fast Steering Mirror (FSM). The system first uses LiDAR to identify the distinct reflective signatures of sensors within the field, then dynamically redirects the camera s field of view via the FSM to target each sensor for hyperspectral imaging. The liquid lens continuously adjusts focus to maintain image sharpness across varying depths, enabling precise spectral measurements. We validated the system in controlled indoor experiments, demonstrating accurate detection and tracking of reflective plant sensors and successful acquisition of their spectral data. To our knowledge, no other system currently integrates these sensing and optical modalities for agricultural monitoring. This work establishes a foundation for adaptive, low-cost, and automated plant sensor interrogation, representing a significant step toward scalable, real-time plant health monitoring in precision agriculture.
Learning Category-level Last-meter Navigation from RGB Demonstrations of a Single-instance
Achieving precise positioning of the mobile manipulator's base is essential for successful manipulation actions that follow. Most of the RGB-based navigation systems only guarantee coarse, meter-level accuracy, making them less suitable for the precise positioning phase of mobile manipulation. This gap prevents manipulation policies from operating within the distribution of their training demonstrations, resulting in frequent execution failures. We address this gap by introducing an object-centric imitation learning framework for last-meter navigation, enabling a quadruped mobile manipulator robot to achieve manipulation-ready positioning using only RGB observations from its onboard cameras. Our method conditions the navigation policy on three inputs: goal images, multi-view RGB observations from the onboard cameras, and a text prompt specifying the target object. A language-driven segmentation module and a spatial score-matrix decoder then supply explicit object grounding and relative pose reasoning. Using real-world data from a single object instance within a category, the system generalizes to unseen object instances across diverse environments with challenging lighting and background conditions. To comprehensively evaluate this, we introduce two metrics: an edge-alignment metric, which uses ground truth orientation, and an object-alignment metric, which evaluates how well the robot visually faces the target. Under these metrics, our policy achieves 73.47% success in edge-alignment and 96.94% success in object-alignment when positioning relative to unseen target objects. These results show that precise last-meter navigation can be achieved at a category-level without depth, LiDAR, or map priors, enabling a scalable pathway toward unified mobile manipulation. Project page: https://rpm-lab-umn.github.io/category-level-last-meter-nav/
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching
Stereo foundation models achieve strong zero-shot generalization but remain computationally prohibitive for real-time applications. Efficient stereo architectures, on the other hand, sacrifice robustness for speed and require costly per-domain fine-tuning. To bridge this gap, we present Fast-FoundationStereo, a family of architectures that achieve, for the first time, strong zero-shot generalization at real-time frame rate. We employ a divide-and-conquer acceleration strategy with three components: (1) knowledge distillation to compress the hybrid backbone into a single efficient student; (2) blockwise neural architecture search for automatically discovering optimal cost filtering designs under latency budgets, reducing search complexity exponentially; and (3) structured pruning for eliminating redundancy in the iterative refinement module. Furthermore, we introduce an automatic pseudo-labeling pipeline used to curate 1.4M in-the-wild stereo pairs to supplement synthetic training data and facilitate knowledge distillation. The resulting model can run over 10x faster than FoundationStereo while closely matching its zero-shot accuracy, thus establishing a new state-of-the-art among real-time methods. Project page: https://nvlabs.github.io/Fast-FoundationStereo/
Design and Experimental Validation of Closed-Form CBF-Based Safe Control for Stewart Platform Under Multiple Constraints
This letter presents a closed-form solution of Control Barrier Function (CBF) framework for enforcing safety constraints on a Stewart robotic platform. The proposed method simultaneously handles multiple position and velocity constraints through an explicit closed-form control law, eliminating the need to solve a Quadratic Program (QP) at every control step and enabling efficient real-time implementation. This letter derives necessary and sufficient conditions under which the closed-form expression remains non-singular, thereby ensuring well-posedness of the CBF solution to multi-constraint problem. The controller is validated in both simulation and hardware experiments on a custom-built Stewart platform prototype, demonstrating safetyguaranteed performance that is comparable to the QP-based formulation, while reducing computation time by more than an order of magnitude. The results confirm that the proposed approach provides a reliable and computationally lightweight framework for real-time safe control of parallel robotic systems. The experimental videos are available on the project website. (https://nail-uh.github.io/StewartPlatformSafeControl.github.io/)
comment: 9 pages
Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description
Many manufacturing environments operate in low-light conditions or within enclosed machines where conventional vision systems struggle. Infrared cameras provide complementary advantages in such environments. Simultaneously, supervised AI systems require large labeled datasets, which makes zero-shot learning frameworks more practical for applications including infrared cameras. Recent advances in vision-language foundation models (VLMs) offer a new path in zero-shot predictions from paired image-text representations. However, current VLMs cannot understand infrared camera data since they are trained on RGB data. This work introduces VLM-IRIS (Vision-Language Models for InfraRed Industrial Sensing), a zero-shot framework that adapts VLMs to infrared data by preprocessing infrared images captured by a FLIR Boson sensor into RGB-compatible inputs suitable for CLIP-based encoders. We demonstrate zero-shot workpiece presence detection on a 3D printer bed where temperature differences between the build plate and workpieces make the task well-suited for thermal imaging. VLM-IRIS converts the infrared images to magma representation and applies centroid prompt ensembling with a CLIP ViT-B/32 encoder to achieve high accuracy on infrared images without any model retraining. These findings demonstrate that the proposed improvements to VLMs can be effectively extended to thermal applications for label-free monitoring.
Taxonomy and Modular Tool System for Versatile and Effective Non-Prehensile Manipulations
General-purpose robotic end-effectors of limited complexity, like the parallel-jaw gripper, are appealing for their balance of simplicity and effectiveness in a wide range of manipulation tasks. However, while many such manipulators offer versatility in grasp-like interactions, they are not optimized for non-prehensile actions like pressing, rubbing, or scraping -- manipulations needed for many common tasks. To perform such tasks, humans use a range of different body parts or tools with different rigidity, friction, etc., according to the properties most effective for a given task. Here, we discuss a taxonomy for the key properties of a non-actuated end-effector, laying the groundwork for a systematic understanding of the affordances of non-prehensile manipulators. We then present a modular tool system, based on the taxonomy, that can be used by a standard two-fingered gripper to extend its versatility and effectiveness in performing such actions. We demonstrate the application of the tool system in aerospace and household scenarios that require a range of non-prehensile and prehensile manipulations.
comment: 34 pages, 10 figures, 2 tables, supplementary videos: https://youtu.be/Hcefy53PY0M, https://youtu.be/nFF9k91hsfU, https://youtu.be/EulPLskNIZQ
WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control
Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipulation-aware locomotion. This confines the robot to a limited workspace, preventing it from performing large-space loco-manipulation. We attribute this to: (1) the challenge of acquiring loco-manipulation knowledge due to the scarcity of humanoid teleoperation data, and (2) the difficulty of faithfully and reliably executing locomotion commands, stemming from the limited precision and stability of existing RL controllers. To acquire richer loco-manipulation knowledge, we propose a unified latent learning framework that enables Vision-Language-Action (VLA) system to learn from low-cost action-free egocentric videos. Moreover, an efficient human data collection pipeline is devised to augment the dataset and scale the benefits. To more precisely execute the desired locomotion commands, we present a loco-manipulation-oriented (LMO) RL policy specifically tailored for accurate and stable core loco-manipulation movements, such as advancing, turning, and squatting. Building on these components, we introduce WholeBodyVLA, a unified framework for humanoid loco-manipulation. To the best of our knowledge, WholeBodyVLA is one of its kind enabling large-space humanoid loco-manipulation. It is verified via comprehensive experiments on the AgiBot X2 humanoid, outperforming prior baseline by 21.3%. It also demonstrates strong generalization and high extensibility across a broad range of tasks.
Towards Accessible Physical AI: LoRA-Based Fine-Tuning of VLA Models for Real-World Robot Control
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in robotic manipulation,enabling robots to execute natural language commands through end-to-end learning from visual observations.However, deploying large-scale VLA models on affordable robotic platforms remains challenging due to computational constraints and the need for efficient adaptation to new robot embodiments. This paper presents an efficient fine-tuning methodology and real-world deployment analysis for adapting VLA models to low-cost robotic manipulation systems.We propose a resource-efficient fine-tuning strategy using Low-Rank Adaptation (LoRA) and quantization techniques that enable multi-billion parameter VLA models ( 3.1B parameters) to run on consumer-grade GPUs with 8GB VRAM. Our methodology addresses the critical challenge of adapting pre-trained VLA models to new robot embodiments with limited demonstration data, focusing on the trade-offs between frozen and unfrozen vision encoders. Through real-world deployment on the SO101 robotic arm for a button-pressing manipulation task, we demonstrate that our approach achieves effective manipulation performance while maintaining computational efficiency. We provide detailed analysis of deployment challenges, failure modes, and the relationship between training data quantity and real-world performance,trained on 200 demonstration episodes. Our results show that with proper fine-tuning methodology, VLA models can be successfully deployed on affordable robotic platforms,making advanced manipulation capabilities accessible beyond expensive research robots.
From Generated Human Videos to Physically Plausible Robot Trajectories
Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner? This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video. To address this, we introduce a two-stage pipeline. First, we lift video pixels into a 4D human representation and then retarget to the humanoid morphology. Second, we propose GenMimic-a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards. As a result, GenMimic can mimic human actions from noisy, generated videos. We curate GenMimicBench, a synthetic human-motion dataset generated using two video generation models across a spectrum of actions and contexts, establishing a benchmark for assessing zero-shot generalization and policy robustness. Extensive experiments demonstrate improvements over strong baselines in simulation and confirm coherent, physically stable motion tracking on a Unitree G1 humanoid robot without fine-tuning. This work offers a promising path to realizing the potential of video generation models as high-level policies for robot control.
comment: For project website, see https://genmimic.github.io
MaskedManipulator: Versatile Whole-Body Manipulation SIGGRAPH
We tackle the challenges of synthesizing versatile, physically simulated human motions for full-body object manipulation. Unlike prior methods that are focused on detailed motion tracking, trajectory following, or teleoperation, our framework enables users to specify versatile high-level objectives such as target object poses or body poses. To achieve this, we introduce MaskedManipulator, a generative control policy distilled from a tracking controller trained on large-scale human motion capture data. This two-stage learning process allows the system to perform complex interaction behaviors, while providing intuitive user control over both character and object motions. MaskedManipulator produces goal-directed manipulation behaviors that expand the scope of interactive animation systems beyond task-specific solutions.
comment: SIGGRAPH Asia 2025 (Project page: https://research.nvidia.com/labs/par/maskedmanipulator/ )
Reframing Human-Robot Interaction Through Extended Reality: Unlocking Safer, Smarter, and More Empathic Interactions with Virtual Robots and Foundation Models
This perspective reframes human-robot interaction (HRI) through extended reality (XR), arguing that virtual robots powered by large foundation models (FMs) can serve as cognitively grounded, empathic agents. Unlike physical robots, XR-native agents are unbound by hardware constraints and can be instantiated, adapted, and scaled on demand, while still affording embodiment and co-presence. We synthesize work across XR, HRI, and cognitive AI to show how such agents can support safety-critical scenarios, socially and cognitively empathic interaction across domains, and outreaching physical capabilities with XR and AI integration. We then discuss how multimodal large FMs (e.g., large language model, large vision model, and vision-language model) enable context-aware reasoning, affect-sensitive situations, and long-term adaptation, positioning virtual robots as cognitive and empathic mediators rather than mere simulation assets. At the same time, we highlight challenges and potential risks, including overtrust, cultural and representational bias, privacy concerns around biometric sensing, and data governance and transparency. The paper concludes by outlining a research agenda for human-centered, ethically grounded XR agents - emphasizing multi-layered evaluation frameworks, multi-user ecosystems, mixed virtual-physical embodiment, and societal and ethical design practices to envision XR-based virtual agents powered by FMs as reshaping future HRI into a more efficient and adaptive paradigm.
comment: This paper is under review
WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenizer that preserves scalar geometry via triplet-margin learning, a geometry-aware flow objective and a simulator-guided GRPO alignment that integrates safety, ego progress, and comfort rewards while retaining parallel generation. A multi-stage adaptation converts a pre-trained auto-regressive backbone (Janus-1.5B) from causal decoding to non-causal flow model and strengthens road-scene competence through continued multimodal pretraining. Thanks to the inherent nature of consistency model training and parallel decoding inference, WAM-Flow achieves superior closed-loop performance against autoregressive and diffusion-based VLA baselines, with 1-step inference attaining 89.1 PDMS and 5-step inference reaching 90.3 PDMS on NAVSIM v1 benchmark. These results establish discrete flow matching as a new promising paradigm for end-to-end autonomous driving. The code will be publicly available soon.
comment: 18 pages, 11 figures. Code & Model: https://github.com/fudan-generative-vision/WAM-Flow
Panoramic Out-of-Distribution Segmentation
Panoramic imaging enables capturing 360° images with an ultra-wide Field-of-View (FoV) for dense omnidirectional perception, which is critical to applications, such as autonomous driving and augmented reality, etc. However, current panoramic semantic segmentation methods fail to identify outliers, and pinhole Out-of-distribution Segmentation (OoS) models perform unsatisfactorily in the panoramic domain due to pixel distortions and background clutter. To address these issues, we introduce a new task, Panoramic Out-of-distribution Segmentation (PanOoS), with the aim of achieving comprehensive and safe scene understanding. Furthermore, we propose the first solution, POS, which adapts to the characteristics of panoramic images through text-guided prompt distribution learning. Specifically, POS integrates a disentanglement strategy designed to materialize the cross-domain generalization capability of CLIP. The proposed Prompt-based Restoration Attention (PRA) optimizes semantic decoding by prompt guidance and self-adaptive correction, while Bilevel Prompt Distribution Learning (BPDL) refines the manifold of per-pixel mask embeddings via semantic prototype supervision. Besides, to compensate for the scarcity of PanOoS datasets, we establish two benchmarks: DenseOoS, which features diverse outliers in complex environments, and QuadOoS, captured by a quadruped robot with a panoramic annular lens system. Extensive experiments demonstrate superior performance of POS, with AuPRC improving by 34.25% and FPR95 decreasing by 21.42% on DenseOoS, outperforming state-of-the-art pinhole-OoS methods. Moreover, POS achieves leading closed-set segmentation capabilities and advances the development of panoramic understanding. Code and datasets will be available at https://github.com/MengfeiD/PanOoS.
comment: Code and datasets will be available at https://github.com/MengfeiD/PanOoS
Towards Open-World Human Action Segmentation Using Graph Convolutional Networks IROS25
Human-object interaction segmentation is a fundamental task of daily activity understanding, which plays a crucial role in applications such as assistive robotics, healthcare, and autonomous systems. Most existing learning-based methods excel in closed-world action segmentation, they struggle to generalize to open-world scenarios where novel actions emerge. Collecting exhaustive action categories for training is impractical due to the dynamic diversity of human activities, necessitating models that detect and segment out-of-distribution actions without manual annotation. To address this issue, we formally define the open-world action segmentation problem and propose a structured framework for detecting and segmenting unseen actions. Our framework introduces three key innovations: 1) an Enhanced Pyramid Graph Convolutional Network (EPGCN) with a novel decoder module for robust spatiotemporal feature upsampling. 2) Mixup-based training to synthesize out-of-distribution data, eliminating reliance on manual annotations. 3) A novel Temporal Clustering loss that groups in-distribution actions while distancing out-of-distribution samples. We evaluate our framework on two challenging human-object interaction recognition datasets: Bimanual Actions and 2 Hands and Object (H2O) datasets. Experimental results demonstrate significant improvements over state-of-the-art action segmentation models across multiple open-set evaluation metrics, achieving 16.9% and 34.6% relative gains in open-set segmentation (F1@50) and out-of-distribution detection performances (AUROC), respectively. Additionally, we conduct an in-depth ablation study to assess the impact of each proposed component, identifying the optimal framework configuration for open-world action segmentation.
comment: 8 pages, 3 figures, accepted in IROS25, Hangzhou, China
Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation IROS25
Accurate temporal segmentation of human actions is critical for intelligent robots in collaborative settings, where a precise understanding of sub-activity labels and their temporal structure is essential. However, the inherent noise in both human pose estimation and object detection often leads to over-segmentation errors, disrupting the coherence of action sequences. To address this, we propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 30 fps) motion data (skeleton and object detections) to mitigate fragmentation. Our framework introduces three key contributions. First, a sinusoidal encoding strategy that maps 3D skeleton coordinates into a continuous sin-cos space to enhance spatial representation robustness. Second, a temporal graph fusion module that aligns multi-modal inputs with differing resolutions via hierarchical feature aggregation, Third, inspired by the smooth transitions inherent to human actions, we design SmoothLabelMix, a data augmentation technique that mixes input sequences and labels to generate synthetic training examples with gradual action transitions, enhancing temporal consistency in predictions and reducing over-segmentation artifacts. Extensive experiments on the Bimanual Actions Dataset, a public benchmark for human-object interaction understanding, demonstrate that our approach outperforms state-of-the-art methods, especially in action segmentation accuracy, achieving F1@10: 94.5% and F1@25: 92.8%.
comment: 8 pages, 5 figures, accepted in IROS25, Hangzhou, China
Continuous Vision-Language-Action Co-Learning with Semantic-Physical Alignment for Behavioral Cloning AAAI 2026
Language-conditioned manipulation facilitates human-robot interaction via behavioral cloning (BC), which learns control policies from human demonstrations and serves as a cornerstone of embodied AI. Overcoming compounding errors in sequential action decisions remains a central challenge to improving BC performance. Existing approaches mitigate compounding errors through data augmentation, expressive representation, or temporal abstraction. However, they suffer from physical discontinuities and semantic-physical misalignment, leading to inaccurate action cloning and intermittent execution. In this paper, we present Continuous vision-language-action Co-Learning with Semantic-Physical Alignment (CCoL), a novel BC framework that ensures temporally consistent execution and fine-grained semantic grounding. It generates robust and smooth action execution trajectories through continuous co-learning across vision, language, and proprioceptive inputs (e.g., robot internal states). Meanwhile, we anchor language semantics to visuomotor representations by a bidirectional cross-attention to learn contextual information for action generation, successfully overcoming the problem of semantic-physical misalignment. Extensive experiments show that CCoL achieves an average 8.0% relative improvement across three simulation suites, with up to 19.2% relative gain in human-demonstrated bimanual insertion tasks. Real-world tests on a 7-DoF robot further confirm CCoL's generalization under unseen and noisy object states.
comment: Accepted at AAAI 2026, the Project website is available at https://qhemu.github.io/CCoL/
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/
Enhancing Hand Palm Motion Gesture Recognition by Eliminating Reference Frame Bias via Frame-Invariant Similarity Measures
The ability of robots to recognize human gestures facilitates a natural and accessible human-robot collaboration. However, most work in gesture recognition remains rooted in reference frame-dependent representations. This poses a challenge when reference frames vary due to different work cell layouts, imprecise frame calibrations, or other environmental changes. This paper investigated the use of invariant trajectory descriptors for robust hand palm motion gesture recognition under reference frame changes. First, a novel dataset of recorded Hand Palm Motion (HPM) gestures is introduced. The motion gestures in this dataset were specifically designed to be distinguishable without dependence on specific reference frames or directional cues. Afterwards, multiple invariant trajectory descriptor approaches were benchmarked to assess how their performances generalize to this novel HPM dataset. After this offline benchmarking, the best scoring approach is validated for online recognition by developing a real-time Proof of Concept (PoC). In this PoC, hand palm motion gestures were used to control the real-time movement of a manipulator arm. The PoC demonstrated a high recognition reliability in real-time operation, achieving an $F_1$-score of 92.3%. This work demonstrates the effectiveness of the invariant descriptor approach as a standalone solution. Moreover, we believe that the invariant descriptor approach can also be utilized within other state-of-the-art pattern recognition and learning systems to improve their robustness against reference frame variations.
comment: This is the preprint version of a paper accepted for publication at the 2025 IEEE International Conference on Automation Science and Engineering (CASE). The final published version is available at DOI: 10.1109/CASE58245.2025.11163910
Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions
This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and control method for multi-AUV task execution. Extensive experimental evaluations in the underwater data collection task demonstrate the system's operational feasibility, with quantitative results showing significant performance improvements over baseline methods. The proposed system exhibits robust coordination capabilities between USV and AUVs while maintaining stability in extreme sea conditions. To facilitate reproducibility and community advancement, we provide an open-source simulation toolkit available at: https://github.com/360ZMEM/USV-AUV-colab .
comment: This paper has been submitted to IEEE Transactions on Mobile Computing, and is currently under minor revision
ShapeForce: Low-Cost Soft Robotic Wrist for Contact-Rich Manipulation
Contact feedback is essential for contact-rich robotic manipulation, as it allows the robot to detect subtle interaction changes and adjust its actions accordingly. Six-axis force-torque sensors are commonly used to obtain contact feedback, but their high cost and fragility have discouraged many researchers from adopting them in contact-rich tasks. To offer a more cost-efficient and easy-accessible source of contact feedback, we present ShapeForce, a low-cost, plug-and-play soft wrist that provides force-like signals for contact-rich robotic manipulation. Inspired by how humans rely on relative force changes in contact rather than precise force magnitudes, ShapeForce converts external force and torque into measurable deformations of its compliant core, which are then estimated via marker-based pose tracking and converted into force-like signals. Our design eliminates the need for calibration or specialized electronics to obtain exact values, and instead focuses on capturing force and torque changes sufficient for enabling contact-rich manipulation. Extensive experiments across diverse contact-rich tasks and manipulation policies demonstrate that ShapeForce delivers performance comparable to six-axis force-torque sensors at an extremely low cost.
Transformer Driven Visual Servoing and Dual Arm Impedance Control for Fabric Texture Matching
In this paper, we propose a method to align and place a fabric piece on top of another using a dual-arm manipulator and a grayscale camera, so that their surface textures are accurately matched. We propose a novel control scheme that combines Transformer-driven visual servoing with dualarm impedance control. This approach enables the system to simultaneously control the pose of the fabric piece and place it onto the underlying one while applying tension to keep the fabric piece flat. Our transformer-based network incorporates pretrained backbones and a newly introduced Difference Extraction Attention Module (DEAM), which significantly enhances pose difference prediction accuracy. Trained entirely on synthetic images generated using rendering software, the network enables zero-shot deployment in real-world scenarios without requiring prior training on specific fabric textures. Real-world experiments demonstrate that the proposed system accurately aligns fabric pieces with different textures.
comment: 8 pages, 11 figures. Accepted to IEEE Robotics and Automation Letters (RA-L)
SEA: Semantic Map Prediction for Active Exploration of Uncertain Areas
In this paper, we propose SEA, a novel approach for active robot exploration through semantic map prediction and a reinforcement learning-based hierarchical exploration policy. Unlike existing learning-based methods that rely on one-step waypoint prediction, our approach enhances the agent's long-term environmental understanding to facilitate more efficient exploration. We propose an iterative prediction-exploration framework that explicitly predicts the missing areas of the map based on current observations. The difference between the actual accumulated map and the predicted global map is then used to guide exploration. Additionally, we design a novel reward mechanism that leverages reinforcement learning to update the long-term exploration strategies, enabling us to construct an accurate semantic map within limited steps. Experimental results demonstrate that our method significantly outperforms state-of-the-art exploration strategies, achieving superior coverage ares of the global map within the same time constraints.
comment: Project page: https://robo-lavira.github.io/sea-active-exp
Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous SC
Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits the operational scalability in real rendezvous missions. This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings -- fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform -- demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden.
comment: 28 pages, 12 figures. Submitted to AIAA SCITECH 2026
Development and Testing for Perception Based Autonomous Landing of a Long-Range QuadPlane
QuadPlanes combine the range efficiency of fixed-wing aircraft with the maneuverability of multi-rotor platforms for long-range autonomous missions. In GPS-denied or cluttered urban environments, perception-based landing is vital for reliable operation. Unlike structured landing zones, real-world sites are unstructured and highly variable, requiring strong generalization capabilities from the perception system. Deep neural networks (DNNs) provide a scalable solution for learning landing site features across diverse visual and environmental conditions. While perception-driven landing has been shown in simulation, real-world deployment introduces significant challenges. Payload and volume constraints limit high-performance edge AI devices like the NVIDIA Jetson Orin Nano, which are crucial for real-time detection and control. Accurate pose estimation during descent is necessary, especially in the absence of GPS, and relies on dependable visual-inertial odometry. Achieving this with limited edge AI resources requires careful optimization of the entire deployment framework. The flight characteristics of large QuadPlanes further complicate the problem. These aircraft exhibit high inertia, reduced thrust vectoring, and slow response times further complicate stable landing maneuvers. This work presents a lightweight QuadPlane system for efficient vision-based autonomous landing and visual-inertial odometry, specifically developed for long-range QuadPlane operations such as aerial monitoring. It describes the hardware platform, sensor configuration, and embedded computing architecture designed to meet demanding real-time, physical constraints. This establishes a foundation for deploying autonomous landing in dynamic, unstructured, GPS-denied environments.
Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing
Most reinforcement learning(RL)-based methods for drone racing target fixed, obstacle-free tracks, leaving the generalization to unknown, cluttered environments largely unaddressed. This challenge stems from the need to balance racing speed and collision avoidance, limited feasible space causing policy exploration trapped in local optima during training, and perceptual ambiguity between gates and obstacles in depth maps-especially when gate positions are only coarsely specified. To overcome these issues, we propose a two-phase learning framework: an initial soft-collision training phase that preserves policy exploration for high-speed flight, followed by a hard-collision refinement phase that enforces robust obstacle avoidance. An adaptive, noise-augmented curriculum with an asymmetric actor-critic architecture gradually shifts the policy's reliance from privileged gate-state information to depth-based visual input. We further impose Lipschitz constraints and integrate a track-primitive generator to enhance motion stability and cross-environment generalization. We evaluate our framework through extensive simulation and ablation studies, and validate it in real-world experiments on a computationally constrained quadrotor. The system achieves agile flight while remaining robust to gate-position errors, developing a generalizable drone racing framework with the capability to operate in diverse, partially unknown and cluttered environments. https://yufengsjtu.github.io/MasterRacing.github.io/
comment: 8 pages, 9 figures, accepted to Robotics and Automation Letters
Multi-Robot Path Planning Combining Heuristics and Multi-Agent Reinforcement Learning
Multi-robot path finding in dynamic environments is a highly challenging classic problem. In the movement process, robots need to avoid collisions with other moving robots while minimizing their travel distance. Previous methods for this problem either continuously replan paths using heuristic search methods to avoid conflicts or choose appropriate collision avoidance strategies based on learning approaches. The former may result in long travel distances due to frequent replanning, while the latter may have low learning efficiency due to low sample exploration and utilization, and causing high training costs for the model. To address these issues, we propose a path planning method, MAPPOHR, which combines heuristic search, empirical rules, and multi-agent reinforcement learning. The method consists of two layers: a real-time planner based on the multi-agent reinforcement learning algorithm, MAPPO, which embeds empirical rules in the action output layer and reward functions, and a heuristic search planner used to create a global guiding path. During movement, the heuristic search planner replans new paths based on the instructions of the real-time planner. We tested our method in 10 different conflict scenarios. The experiments show that the planning performance of MAPPOHR is better than that of existing learning and heuristic methods. Due to the utilization of empirical knowledge and heuristic search, the learning efficiency of MAPPOHR is higher than that of existing learning methods.
An Introduction to Deep Reinforcement and Imitation Learning
Embodied agents, such as robots and virtual characters, must continuously select actions to execute tasks effectively, solving complex sequential decision-making problems. Given the difficulty of designing such controllers manually, learning-based approaches have emerged as promising alternatives, most notably Deep Reinforcement Learning (DRL) and Deep Imitation Learning (DIL). DRL leverages reward signals to optimize behavior, while DIL uses expert demonstrations to guide learning. This document introduces DRL and DIL in the context of embodied agents, adopting a concise, depth-first approach to the literature. It is self-contained, presenting all necessary mathematical and machine learning concepts as they are needed. It is not intended as a survey of the field; rather, it focuses on a small set of foundational algorithms and techniques, prioritizing in-depth understanding over broad coverage. The material ranges from Markov Decision Processes to REINFORCE and Proximal Policy Optimization (PPO) for DRL, and from Behavioral Cloning to Dataset Aggregation (DAgger) and Generative Adversarial Imitation Learning (GAIL) for DIL.
Aligning Humans and Robots via Reinforcement Learning from Implicit Human Feedback
Conventional reinforcement learning (RL) ap proaches often struggle to learn effective policies under sparse reward conditions, necessitating the manual design of complex, task-specific reward functions. To address this limitation, rein forcement learning from human feedback (RLHF) has emerged as a promising strategy that complements hand-crafted rewards with human-derived evaluation signals. However, most existing RLHF methods depend on explicit feedback mechanisms such as button presses or preference labels, which disrupt the natural interaction process and impose a substantial cognitive load on the user. We propose a novel reinforcement learning from implicit human feedback (RLIHF) framework that utilizes non-invasive electroencephalography (EEG) signals, specifically error-related potentials (ErrPs), to provide continuous, implicit feedback without requiring explicit user intervention. The proposed method adopts a pre-trained decoder to transform raw EEG signals into probabilistic reward components, en abling effective policy learning even in the presence of sparse external rewards. We evaluate our approach in a simulation environment built on the MuJoCo physics engine, using a Kinova Gen2 robotic arm to perform a complex pick-and-place task that requires avoiding obstacles while manipulating target objects. The results show that agents trained with decoded EEG feedback achieve performance comparable to those trained with dense, manually designed rewards. These findings validate the potential of using implicit neural feedback for scalable and human-aligned reinforcement learning in interactive robotics.
comment: Accepted to IEEE Int. Conf. Syst., Man, Cybern. (SMC) 2025
3DSGrasp: 3D Shape-Completion for Robotic Grasp
Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed from few and sparse viewpoints before the grasping action, leading to the generation of wrong or inaccurate grasp poses. We propose a novel grasping strategy, named 3DSGrasp, that predicts the missing geometry from the partial PCD to produce reliable grasp poses. Our proposed PCD completion network is a Transformer-based encoder-decoder network with an Offset-Attention layer. Our network is inherently invariant to the object pose and point's permutation, which generates PCDs that are geometrically consistent and completed properly. Experiments on a wide range of partial PCD show that 3DSGrasp outperforms the best state-of-the-art method on PCD completion tasks and largely improves the grasping success rate in real-world scenarios. The code and dataset will be made available upon acceptance.
Model-Based Lookahead Reinforcement Learning for in-hand manipulation
In-Hand Manipulation, as many other dexterous tasks, remains a difficult challenge in robotics by combining complex dynamic systems with the capability to control and manoeuvre various objects using its actuators. This work presents the application of a previously developed hybrid Reinforcement Learning (RL) Framework to In-Hand Manipulation task, verifying that it is capable of improving the performance of the task. The model combines concepts of both Model-Free and Model-Based Reinforcement Learning, by guiding a trained policy with the help of a dynamic model and value-function through trajectory evaluation, as done in Model Predictive Control. This work evaluates the performance of the model by comparing it with the policy that will be guided. To fully explore this, various tests are performed using both fully-actuated and under-actuated simulated robotic hands to manipulate different objects for a given task. The performance of the model will also be tested for generalization tests, by changing the properties of the objects in which both the policy and dynamic model were trained, such as density and size, and additionally by guiding a trained policy in a certain object to perform the same task in a different one. The results of this work show that, given a policy with high average reward and an accurate dynamic model, the hybrid framework improves the performance of in-hand manipulation tasks for most test cases, even when the object properties are changed. However, this improvement comes at the expense of increasing the computational cost, due to the complexity of trajectory evaluation.
See What I Mean? Expressiveness and Clarity in Robot Display Design
Nonverbal visual symbols and displays play an important role in communication when humans and robots work collaboratively. However, few studies have investigated how different types of non-verbal cues affect objective task performance, especially in a dynamic environment that requires real time decision-making. In this work, we designed a collaborative navigation task where the user and the robot only had partial information about the map on each end and thus the users were forced to communicate with a robot to complete the task. We conducted our study in a public space and recruited 37 participants who randomly passed by our setup. Each participant collaborated with a robot utilizing either animated anthropomorphic eyes and animated icons, or static anthropomorphic eyes and static icons. We found that participants that interacted with a robot with animated displays reported the greatest level of trust and satisfaction; that participants interpreted static icons the best; and that participants with a robot with static eyes had the highest completion success. These results suggest that while animation can foster trust with robots, human-robot communication can be optimized by the addition of familiar static icons that may be easier for users to interpret. We published our code, designed symbols, and collected results online at: https://github.com/mattufts/huamn_Cozmo_interaction.
Multiagent Systems
Thinking While Driving: A Concurrent Framework for Real-Time, LLM-Based Adaptive Routing
We present Thinking While Driving, a concurrent routing framework that integrates LLMs into a graph-based traffic environment. Unlike approaches that require agents to stop and deliberate, our system enables LLM-based route planning while agents are moving, significantly reducing intersection wait times. Under high traffic, agents average just 0.75 seconds of decision latency. To coordinate many agents in real-time, we implement a non-blocking asynchronous architecture using Unity coroutines and a dedicated request manager. The environment is a weighted undirected graph with live congestion metrics, updated continuously by the agents to enable shared perception. Our results show LLM-driven agents can dynamically adapt to traffic, reroute around congestion, and exhibit behaviors beyond static pathfinding, all while maintaining real-time performance. This work provides a reproducible framework for future research in adaptive routing and multi-agent cooperation.
Better Prevent than Tackle: Valuing Defense in Soccer Based on Graph Neural Networks
Evaluating defensive performance in soccer remains challenging, as effective defending is often expressed not through visible on-ball actions such as interceptions and tackles, but through preventing dangerous opportunities before they arise. Existing approaches have largely focused on valuing on-ball actions, leaving much of defenders' true impact unmeasured. To address this gap, we propose DEFCON (DEFensive CONtribution evaluator), a comprehensive framework that quantifies player-level defensive contributions for every attacking situation in soccer. Leveraging Graph Attention Networks, DEFCON estimates the success probability and expected value of each attacking option, along with each defender's responsibility for stopping it. These components yield an Expected Possession Value (EPV) for the attacking team before and after each action, and DEFCON assigns positive or negative credits to defenders according to whether they reduced or increased the opponent's EPV. Trained on 2023-24 and evaluated on 2024-25 Eredivisie event and tracking data, DEFCON's aggregated player credits exhibit strong positive correlations with market valuations. Finally, we showcase several practical applications, including in-game timelines of defensive contributions, spatial analyses across pitch zones, and pairwise summaries of attacker-defender interactions.
Certifying Concavity and Monotonicity in Games via Sum-of-Squares Hierarchies NeurIPS 2025
Concavity and its refinements underpin tractability in multiplayer games, where players independently choose actions to maximize their own payoffs which depend on other players' actions. In concave games, where players' strategy sets are compact and convex, and their payoffs are concave in their own actions, strong guarantees follow: Nash equilibria always exist and decentralized algorithms converge to equilibria. If the game is furthermore monotone, an even stronger guarantee holds: Nash equilibria are unique under strictness assumptions. Unfortunately, we show that certifying concavity or monotonicity is NP-hard, already for games where utilities are multivariate polynomials and compact, convex basic semialgebraic strategy sets -- an expressive class that captures extensive-form games with imperfect recall. On the positive side, we develop two hierarchies of sum-of-squares programs that certify concavity and monotonicity of a given game, and each level of the hierarchies can be solved in polynomial time. We show that almost all concave/monotone games are certified at some finite level of the hierarchies. Subsequently, we introduce SOS-concave/monotone games, which globally approximate concave/monotone games, and show that for any given game we can compute the closest SOS-concave/monotone game in polynomial time. Finally, we apply our techniques to canonical examples of imperfect recall extensive-form games.
comment: NeurIPS 2025
Computing Evolutionarily Stable Strategies in Imperfect-Information Games
We present an algorithm for computing evolutionarily stable strategies (ESSs) in symmetric perfect-recall extensive-form games of imperfect information. Our main algorithm is for two-player games, and we describe how it can be extended to multiplayer games. The algorithm is sound and computes all ESSs in nondegenerate games and a subset of them in degenerate games which contain an infinite continuum of symmetric Nash equilibria. The algorithm is anytime and can be stopped early to find one or more ESSs. We experiment on an imperfect-information cancer signaling game as well as random games to demonstrate scalability.
AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
Evaluating large language models (LLMs) has recently emerged as a critical issue for safe and trustworthy application of LLMs in the medical domain. Although a variety of static medical question-answering (QA) benchmarks have been proposed, many aspects remain underexplored, such as the effectiveness of LLMs in generating responses in dynamic, interactive clinical multi-turn conversation situations and the identification of multi-faceted evaluation strategies beyond simple accuracy. However, formally evaluating a dynamic, interactive clinical situation is hindered by its vast combinatorial space of possible patient states and interaction trajectories, making it difficult to standardize and quantitatively measure such scenarios. Here, we introduce AutoMedic, a multi-agent simulation framework that enables automated evaluation of LLMs as clinical conversational agents. AutoMedic transforms off-the-shelf static QA datasets into virtual patient profiles, enabling realistic and clinically grounded multi-turn clinical dialogues between LLM agents. The performance of various clinical conversational agents is then assessed based on our CARE metric, which provides a multi-faceted evaluation standard of clinical conversational accuracy, efficiency/strategy, empathy, and robustness. Our findings, validated by human experts, demonstrate the validity of AutoMedic as an automated evaluation framework for clinical conversational agents, offering practical guidelines for the effective development of LLMs in conversational medical applications.
Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning AAMAS 2026
Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME's variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67--83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.
comment: Submitted to AAMAS 2026
GLOW: Graph-Language Co-Reasoning for Agentic Workflow Performance Prediction
Agentic Workflows (AWs) have emerged as a promising paradigm for solving complex tasks. However, the scalability of automating their generation is severely constrained by the high cost and latency of execution-based evaluation. Existing AW performance prediction methods act as surrogates but fail to simultaneously capture the intricate topological dependencies and the deep semantic logic embedded in AWs. To address this limitation, we propose GLOW, a unified framework for AW performance prediction that combines the graph-structure modeling capabilities of GNNs with the reasoning power of LLMs. Specifically, we introduce a graph-oriented LLM, instruction-tuned on graph tasks, to extract topologically aware semantic features, which are fused with GNN-encoded structural representations. A contrastive alignment strategy further refines the latent space to distinguish high-quality AWs. Extensive experiments on FLORA-Bench show that GLOW outperforms state-of-the-art baselines in prediction accuracy and ranking utility.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TransSimHub.
comment: 9 pages, 4 figures
Towards Foundation Models with Native Multi-Agent Intelligence
Foundation models (FMs) are increasingly assuming the role of the "brain" of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence across 41 large language models showing that strong single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.
Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics
As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has profound implications for safety, coordination, and the design of AI-driven social and economic infrastructures. Assessing such behaviour requires methods that capture not only what LLMs output, but the underlying intentions that guide their decisions. In this work, we extend the FAIRGAME framework to systematically evaluate LLM behaviour in repeated social dilemmas through two complementary advances: a payoff-scaled Prisoners Dilemma isolating sensitivity to incentive magnitude, and an integrated multi-agent Public Goods Game with dynamic payoffs and multi-agent histories. These environments reveal consistent behavioural signatures across models and languages, including incentive-sensitive cooperation, cross-linguistic divergence and end-game alignment toward defection. To interpret these patterns, we train traditional supervised classification models on canonical repeated-game strategies and apply them to FAIRGAME trajectories, showing that LLMs exhibit systematic, model- and language-dependent behavioural intentions, with linguistic framing at times exerting effects as strong as architectural differences. Together, these findings provide a unified methodological foundation for auditing LLMs as strategic agents and reveal systematic cooperation biases with direct implications for AI governance, collective decision-making, and the design of safe multi-agent systems.
Systems and Control (EESS)
Distributionally Robust Regret Optimal Control Under Moment-Based Ambiguity Sets
In this paper, we consider a class of finite-horizon, linear-quadratic stochastic control problems, where the probability distribution governing the noise process is unknown but assumed to belong to an ambiguity set consisting of all distributions whose mean and covariance lie within norm balls centered at given nominal values. To address the distributional ambiguity, we explore the design of causal affine control policies to minimize the worst-case expected regret over all distributions in the given ambiguity set. The resulting minimax optimal control problem is shown to admit an equivalent reformulation as a tractable convex program that corresponds to a regularized version of the nominal linear-quadratic stochastic control problem. While this convex program can be recast as a semidefinite program, semidefinite programs are typically solved using primal-dual interior point methods that scale poorly with the problem size in practice. To address this limitation, we propose a scalable dual projected subgradient method to compute optimal controllers to an arbitrary accuracy. Numerical experiments are presented to benchmark the proposed method against state-of-the-art data-driven and distributionally robust control design approaches.
comment: 21 pages, 2 figures
A Differentiable Digital Twin of Distributed Link Scheduling for Contention-Aware Networking
Many routing and flow optimization problems in wired networks can be solved efficiently using minimum cost flow formulations. However, this approach does not extend to wireless multi-hop networks, where the assumptions of fixed link capacity and linear cost structure collapse due to contention for shared spectrum resources. The key challenge is that the long-term capacity of a wireless link becomes a non-linear function of its network context, including network topology, link quality, and the traffic assigned to neighboring links. In this work, we pursue a new direction of modeling wireless network under randomized medium access control by developing an analytical network digital twin (NDT) that predicts link duty cycles from network context. We generalize randomized contention as finding a Maximal Independent Set (MIS) on the conflict graph using weighted Luby's algorithm, derive an analytical model of link duty cycles, and introduce an iterative procedure that resolves the circular dependency among duty cycle, link capacity, and contention probability. Our numerical experiments show that the proposed NDT accurately predicts link duty cycles and congestion patterns with up to a 5000x speedup over packet-level simulation, and enables us to optimize link scheduling using gradient descent for reduced congestion and radio footprint.
comment: 5 pages, 8 figures, presented in Asilomar Conference on Signals, Systems, and Computers 2025
Spectral Decompositions of Controllability Gramian and Its Inverse based on System Eigenvalues in Companion Form
Controllability and observability Gramians, along with their inverses, are widely used to solve various problems in control theory. This paper proposes spectral decompositions of the controllability Gramian and its inverse based on system eigenvalues for a continuous LTI dynamical system in the controllability canonical (companion) form. The Gramian and its inverse are represented as sums of Hermitian matrices, each corresponding to individual system eigenvalues or their pairwise combinations. These decompositions are obtained for the solutions of both algebraic and differential Lyapunov and Riccati equations with arbitrary initial conditions, allowing for the estimation of system spectral properties over an arbitrary time interval and their prediction at future moments. The derived decompositions are also generalized to the case of multiple eigenvalues in the dynamics matrix spectrum, enabling a closed-form estimation of the effects of resonant interactions with the system's eigenmodes. The spectral components are interpreted as measurable quantities in the minimum energy control problem. Therefore, they are unambiguously defined and can quantitatively characterize the influence of individual eigenmodes and associated system devices on controllability, observability, and the asymptotic dynamics of perturbation energy. The additional information obtained from these decompositions can improve the accuracy of algorithms in solving various practical problems, such as stability analysis, minimum energy control, structural design, tuning regulators, optimal placement of actuators and sensors, network analysis, and model order reduction.
Low-Order $\mathcal{H}_2 / \mathcal{H}_\infty$ Controller Design for Aeroelastic Vibration Suppression
This paper presents an $\mathcal{H}_2 / \mathcal{H}_\infty$ minimization-based output-feedback controller for active aeroelastic vibration suppression in a cantilevered beam. First, a nonlinear structural model incorporating moderate deflection and aerodynamic loading is derived and discretized using the finite element method (FEM). Then, a low-order linear model is identified from random gaussian input response data from the FEM model to synthesize an output-feedback controller using the $\mathcal{H}_2 / \mathcal{H}_\infty$ framework. A frequency-weighted dynamic filter is introduced to emphasize disturbance frequencies of interest, enabling the controller to target dominant vibration modes. Simulation results demonstrate the effectiveness of the proposed technique for vibration suppression and study its robustness to system parameter variations, including actuator placement.
comment: To be presented in the 2026 Scitech Forum
Data-driven Pressure Recovery in Diffusers
This paper investigates the application of a data-driven technique based on retrospective cost optimization to optimize the frequency of mass injection into an S-shaped diffuser, with the objective of maximizing the pressure recovery. Experimental data indicated that there is an optimal injection frequency between 100 Hz and 300 Hz with a mass flow rate of 1 percent of the free stream. High-fidelity numerical simulations using compressible unsteady Reynolds-Averaged Navier-Stokes (URANS) are conducted to investigate the mean and temporal features resulting from mass injection into an S-shaped diffuser with differing injection speeds and pulse frequencies. The results are compared with experiments to confirm the accuracy of the numerical solution. Overall, 2-D simulations are relatively in good agreement with the experiment, with 3-D simulations currently under investigation to benchmark the effect of spanwise instabilities. Simulation results with the proposed data-driven technique show improvements upon a baseline case by increasing pressure recovery and reducing the region of flow recirculation within the diffuser.
comment: To be presented at the 2026 Scitech Forum
Distribution-Free Stochastic MPC for Joint-in-Time Chance-Constrained Linear Systems
This work presents a stochastic model predictive control (MPC) framework for linear systems subject to joint-in-time chance constraints under unknown disturbance distributions. Unlike existing stochastic MPC formulations that rely on parametric or Gaussian assumptions or require expensive offline computations, the proposed method leverages conformal prediction (CP) as a streamlined tool to construct finite-sample confidence regions for the system's stochastic error trajectories with minimal computational effort. These regions enable the relaxation of probabilistic constraints while providing formal guarantees. By employing an indirect feedback mechanism and a probabilistic set-based formulation, we prove recursive feasibility of the relaxed optimization problem and establish chance constraint satisfaction in closed-loop. Furthermore, we extend the approach to the more general output feedback setting with unknown measurement noise distributions. Given available noise samples, we establish satisfaction of the joint chance constraints and recursive feasibility via output measurements alone. Numerical examples demonstrate the effectiveness and advantages of the proposed method compared to existing approaches.
Active prognosis and diagnosis of modular discrete-event systems
This paper addresses the verification and enforcement of prognosability and diagnosability for discreteevent systems (DESs) modeled by deterministic finite automata. We establish the equivalence between prognosability (respectively, diagnosability) and pre-normality over a subset of the non-faulty language (respectively, a suffix of the faulty language). We then demonstrate the existence of supremal prognosable (respectively, diagnosable) and normal sublanguages. Furthermore, an algorithm is then designed to compute the supremal controllable, normal, and prognosable (respectively, diagnosable) sublanguages. Since DESs are typically composed of multiple components operating in parallel, pure local supervisors are generally insufficient, as prognosability and diagnosability are global properties of a system. Given the limited work on enforcing prognosability or diagnosability in modular DESs, where these properties are enforced through local supervisors, this paper leverages a refined version of pre-normality to compute modular supervisors for local subsystems. The resulting closed-loop system is shown to be globally controllable, normal, and prognosable/ diagnosable. Examples are provided to illustrate the proposed method.
Estimating Hormone Concentrations in the Pituitary-Thyroid Feedback Loop from Irregularly Sampled Measurements
Model-based control techniques have recently been investigated for the recommendation of medication dosages to address thyroid diseases. These techniques often rely on knowledge of internal hormone concentrations that cannot be measured from blood samples. Moreover, the measurable concentrations are typically only obtainable at irregular sampling times. In this work, we empirically verify a notion of sample-based detectability that accounts for irregular sampling of the measurable concentrations on two pituitary-thyroid loop models representing patients with hypo- and hyperthyroidism, respectively, and include the internal concentrations as states. We then implement sample-based moving horizon estimation for the models, and test its performance on virtual patients across a range of sampling schemes. Our study shows robust stability of the estimator across all scenarios, and that more frequent sampling leads to less estimation error in the presence of model uncertainty and misreported dosages.
comment: 8 pages; This work has been submitted to IFAC for possible publication
Linear Quadratic Regulators: A New Look
Linear time-invariant control systems can be considered as finitely generated modules over the commutative principal ideal ring $\mathbb{R}[\frac{d}{dt}]$ of linear differential operators with respect to the time derivative. The Kalman controllability in this algebraic language is translated as the freeness of the system module. Linear quadratic regulators rely on quadratic Lagrangians, or cost functions. Any flat output, i.e., any basis of the corresponding free module leads to an open-loop control strategy via an Euler-Lagrange equation, which becomes here a linear ordinary differential equation with constant coefficients. In this approach, the two-point boundary value problem, including the control variables, becomes tractable. It yields notions of optimal time horizon, optimal parameter design and optimal rest-to-rest trajectories. The loop is closed via an intelligent controller derived from model-free control, which is known to exhibit excellent performance concerning model mismatches and disturbances.
Codeshare agreements between airlines: literature review with the aid of artificial intelligence
Codeshare agreements are contracts that allow two or more airlines to share seats on the same flight. These agreements, which are widespread in commercial aviation as a response to highly competitive environments, have enabled the expansion of airline networks without additional costs or risks for the companies involved. The literature presents ambiguous effects associated with the practice, with evidence of increased supply and reduced prices in situations of route complementarity, while also pointing to anti-competitive impacts in markets where companies act as competitors. A review of scientific production over time, including theoretical contributions and case studies, is essential to understand the evolution of these agreements and their implications, especially in the Brazilian context, marked by its own characteristics and particular regulatory history. Thus, this article reviews the literature on codesharing, with an emphasis on the Brazilian market, and uses the Litmaps computational tool, based on artificial intelligence techniques, to support the contextual analysis of publications through their citation relationships. The ultimate goal is to identify and evaluate the main evidence accumulated over decades on the effects of these agreements in Brazil. The joint analysis of the contributions allows us to outline the current state of knowledge, characterize specificities observed in the Brazilian market, and identify gaps that may guide future studies.
NWP-based Atmospheric Refractivity Modeling and Fast & Stable Non-uniform Plane Wave Ray-Tracing Simulations for LEO Link Analysis
Existing low-Earth-orbit (LEO) communication link analyses face two main challenges: (1) limited accuracy of 3D atmospheric refractivity reconstructed from sparsely sampled radiosonde data, and (2) numerical instability in previous non-uniform plane-wave ray-tracing algorithms (i.e., underflow under standard double precision), where non-uniform plane waves inevitably arise at complex-valued dielectric interfaces, is caused by extremely small atmospheric loss terms. To address these issues, we reconstruct a high-resolution 3D complex-valued refractivity model using numerical weather prediction data, and develop a fast and numerically stable non-uniform plane-wave ray tracer. The method remains stable in double precision and delivers a 24-fold speedup over high-precision benchmarks. Comparisons show that boresight-error deviations and path-loss differences between the rigorous method and the uniform-plane-wave approximation remain negligible, even under heavy precipitation. Although rays in a lossy atmosphere experience different phase- and attenuation-direction vectors-forming non-uniform plane waves-the resulting effective attenuation along the path is nearly identical to that predicted by the uniform-plane-wave model. These findings justify the continued use of uniform-plane-wave ray tracing in practical LEO link analyses.
Structural Methods for handling mode changes in multimode DAE systems
Hybrid systems are an important concept in Cyber-Physical Systems modeling, for which multiphysics modeling from first principles and the reuse of models from libraries are key. To achieve this, DAEs must be used to specify the dynamics in each discrete state (or mode in our context). This led to the development of DAE-based equational languages supporting multiple modes, of which Modelica is a popular standard. Mode switching can be time- or state-based. Impulsive behaviors can occur at mode changes. While mode changes are well understood in particular physics (e.g., contact mechanics), this is not the case in physics-agnostic paradigms such as Modelica. This situation causes difficulties for the compilation of programs, often requiring users to manually smooth out mode changes. In this paper, we propose a novel approach for the hot restart at mode changes in such paradigms. We propose a mathematical meaning for hot restarts (such a mathematical meaning does not exist in general), as well as a combined structural and impulse analysis for mode changes, generating the hot restart even in the presence of impulses. Our algorithm detects at compile time if the mode change is insufficiently specified, in which case it returns diagnostics information to the user.
comment: 53 pages, 3 figures
Collision-Aware Density-Driven Control of Multi-Agent Systems via Control Barrier Functions
This paper tackles the problem of safe and efficient area coverage using a multi-agent system operating in environments with obstacles. Applications such as environmental monitoring and search and rescue require robot swarms to cover large domains under resource constraints, making both coverage efficiency and safety essential. To address the efficiency aspect, we adopt the Density-Driven Control (D$^2$C) framework, which uses optimal transport theory to steer agents according to a reference distribution that encodes spatial coverage priorities. To ensure safety, we incorporate Control Barrier Functions (CBFs) into the framework. While CBFs are commonly used for collision avoidance, we extend their applicability by introducing obstacle-specific formulations for both circular and rectangular shapes. In particular, we analytically derive a unit normal vector based on the agent's position relative to the nearest face of a rectangular obstacle, improving safety enforcement in environments with non-smooth boundaries. Additionally, a velocity-dependent term is incorporated into the CBF to enhance collision avoidance. Simulation results validate the proposed method by demonstrating smoother navigation near obstacles and more efficient area coverage than the existing method, while still ensuring collision-free operation.
comment: 6 pages, 7 figures, Accepted for presentation at the 5th Modeling, Estimation and Control Conference (MECC 2025), Pittsburgh, USA
Structural Sign Herdability in Temporally Switching Networks with Fixed Topology
This paper investigates structural herdability in a special class of temporally switching networks with fixed topology. We show that when the underlying digraph remains unchanged across all snapshots, the network attains complete SS herdability even in the presence of signed or layer dilations, a condition not applicable to static networks. This reveals a fundamental structural advantage of temporal dynamics and highlights a novel mechanism through which switching can overcome classical obstructions to herdability. To validate these conclusions, we utilize a more relaxed form of sign matching within each snapshot of the temporal network. Furthermore, we show that when all snapshots share the same underlying topology, the temporally switching network achieves $\mathcal{SS}$ herdability within just two snapshots, which is fewer than the number required for structural controllability. Several examples are included to demonstrate these results.
A Novel Phase-Noise Module for the QUCS Circuit Simulator. Part I : the Periodic Steady-State
The paper discusses work done to expand and extend the capabilities of the open-source QUCS circuit simulator through the implementation of a computationally efficient time-domain steady-state analysis module, supporting simulation of autonomous circuits. To our knowledge, this represents the first time such an analysis module has been implemented in the QUCS environment. Hitherto, the only available option was a harmonic-balance module which was strictly limited to non-autonomous (driven) circuits. The research has several important scientific and industrial applications in the area of large-signal steady-state analysis of autonomous circuits e.g. free-running and coupled oscillator circuit networks. The reported results will have great impact w.r.t. analyzing, synthesizing and optimizing oscillatory behavior of various important industrial circuits and systems. The developed tool, furthermore, introduces support for simulating noise performance of circuits operating under large-signal conditions. This paper is the first part of a two-part series documenting the implementation of a novel (coupled)-oscillator phase-noise simulator engine in the QUCS environment. The goal of this undertaking is the advancement of the open-source QUCS project towards becoming a viable competitor to the commercial simulators currently on the market.
comment: 13 pages, 5 figures
Design of a six wheel suspension and a three-axis linear actuation mechanism for a laser weeding robot
Mobile robots are increasingly utilized in agriculture to automate labor-intensive tasks such as weeding, sowing, harvesting and soil analysis. Recently, agricultural robots have been developed to detect and remove weeds using mechanical tools or precise herbicide sprays. Mechanical weeding is inefficient over large fields, and herbicides harm the soil ecosystem. Laser weeding with mobile robots has emerged as a sustainable alternative in precision farming. In this paper, we present an autonomous weeding robot that uses controlled exposure to a low energy laser beam for weed removal. The proposed robot is six-wheeled with a novel double four-bar suspension for higher stability. The laser is guided towards the detected weeds by a three-dimensional linear actuation mechanism. Field tests have demonstrated the robot's capability to navigate agricultural terrains effectively by overcoming obstacles up to 15 cm in height. At an optimal speed of 42.5 cm/s, the robot achieves a weed detection rate of 86.2\% and operating time of 87 seconds per meter. The laser actuation mechanism maintains a minimal mean positional error of 1.54 mm, combined with a high hit rate of 97\%, ensuring effective and accurate weed removal. This combination of speed, accuracy, and efficiency highlights the robot's potential for significantly enhancing precision farming practices.
comment: 15 Pages, 10 figures
L2 Ethernet Switch VLSI Implementation
Ethernet switches are foundational to the global internet infrastructure. These devices route packets of data on a local area network between source addresses to destination media access control addresses. On the L2 layer of the Open Systems Interconnections model, Ethernet switches take in digitized data from a Media Independent Interface and send it to the corresponding output port for the destination address. Switches need to handle parallel input and output streams from each port, prioritizing throughput, efficiency, and packet integrity. Due to the confidential nature of the networking device industry, there do not exist many open source implementations of switching fabrics. We propose an open source design for an L2 Ethernet switch along with the power, performance, and area tradeoffs for architecture decisions.
comment: 9 pages, 21 figures
Lies We Can Trust: Quantifying Action Uncertainty with Inaccurate Stochastic Dynamics through Conformalized Nonholonomic Lie Groups
We propose Conformal Lie-group Action Prediction Sets (CLAPS), a symmetry-aware conformal prediction-based algorithm that constructs, for a given action, a set guaranteed to contain the resulting system configuration at a user-defined probability. Our assurance holds under both aleatoric and epistemic uncertainty, non-asymptotically, and does not require strong assumptions about the true system dynamics, the uncertainty sources, or the quality of the approximate dynamics model. Typically, uncertainty quantification is tackled by making strong assumptions about the error distribution or magnitude, or by relying on uncalibrated uncertainty estimates - i.e., with no link to frequentist probabilities - which are insufficient for safe control. Recently, conformal prediction has emerged as a statistical framework capable of providing distribution-free probabilistic guarantees on test-time prediction accuracy. While current conformal methods treat robots as Euclidean points, many systems have non-Euclidean configurations, e.g., some mobile robots have SE(2). In this work, we rigorously analyze configuration errors using Lie groups, extending previous Euclidean Space theoretical guarantees to SE(2). Our experiments on a simulated JetBot, and on a real MBot, suggest that by considering the configuration space's structure, our symmetry-informed nonconformity score leads to more volume-efficient prediction regions which represent the underlying uncertainty better than existing approaches.
comment: 13 pages, 7 figures. Under review
Optimality Deviation using the Koopman Operator
This paper investigates the impact of approximation error in data-driven optimal control problem of nonlinear systems while using the Koopman operator. While the Koopman operator enables a simplified representation of nonlinear dynamics through a lifted state space, the presence of approximation error inevitably leads to deviations in the computed optimal controller and the resulting value function. We derive explicit upper bounds for these optimality deviations, which characterize the worst-case effect of approximation error. Supported by numerical examples, these theoretical findings provide a quantitative foundation for improving the robustness of data-driven optimal controller design.
Permutation-Equivariant Learning for Dynamic Security Assessment of Power System Frequency Response
This paper presents a hybrid model-AI framework for real-time dynamic security assessment of frequency stability in power systems. The proposed method rapidly estimates key frequency parameters under a dynamic set of disturbances, which are continuously updated based on operating conditions and unit commitment. To achieve this, the framework builds on a modal-based formulation of the system frequency response (SFR), which leverages the system's eigenstructure to predict key frequency stability metrics. A Deep Sets-inspired neural network is employed to estimate the complex modal coefficients required by the modal-based SFR approach, formulated as a permutation-equivariant learning problem. This enables fast and accurate prediction of the frequency nadir and its timing across different operating conditions and disturbances. The framework achieves scalability by reusing precomputed modal structures and updating only the disturbance-specific coefficients. It demonstrates strong generalization capabilities without requiring an extensive set of operating scenarios during training or the widespread deployment of phasor measurement units (PMUs). The method is validated on the IEEE 39-bus and 118-bus systems, showing superior accuracy, robustness, and computational efficiency compared to purely data-driven approaches.
Traffic Equilibrium in Mixed-Autonomy Network with Capped Customer Waiting
This paper develops a unified modeling framework to capture the equilibrium-state interactions among ride-hailing companies, travelers, and traffic of mixed-autonomy transportation networks. Our framework integrates four interrelated sub-modules: (i) the operational behavior of representative ride-hailing Mixed-Fleet Traffic Network Companies (MiFleet TNCs) managing autonomous vehicle (AV) and human-driven vehicle (HV) fleets, (ii) traveler mode-choice decisions taking into account travel costs and waiting time, (iii) capped customer waiting times to reflect the option available to travelers not to wait for TNCs' service beyond his/her patience and to resort to existing travel modes, and (iv) a flow-dependent traffic congestion model for travel times. A key modeling feature distinguishes AVs and HVs across the pickup and service (customer-on-board) stages: AVs follow Wardrop pickup routes but may deviate during service under company coordination, whereas HVs operate in the reverse manner. The overall framework is formulated as a Nonlinear Complementarity Problem (NCP), which is equivalent to a Variational Inequality(VI) formulation based on which the existence of a variational equilibrium solution to the traffic model is established. Numerical experiments examine how AV penetration and Wardrop relaxation factors, which bound route deviation, affect company, traveler, and system performance to various degrees. The results provide actionable insights for policymakers on regulating AV adoption and company vehicle deviation behavior in modern-day traffic systems that are fast changing due to the advances in technology and information accessibility.
comment: under review for journal
CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound
Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of modeling MILPs that accurately represent stochastic real world problems leads to suboptimal performance in the real world. Recently, machine learning methods have been applied to build MILP models for decision quality rather than how accurately they model the real world problem. However, these approaches typically rely on supervised learning, assume access to true optimal decisions, and use surrogates for the MILP gradients. In this work, we introduce a proof of concept CORL framework that end to end fine tunes an MILP scheme using reinforcement learning (RL) on real world data to maximize its operational performance. We enable this by casting an MILP solved by B&B as a differentiable stochastic policy compatible with RL. We validate the CORL method in a simple illustrative combinatorial sequential decision making example.
Design and Experimental Validation of Closed-Form CBF-Based Safe Control for Stewart Platform Under Multiple Constraints
This letter presents a closed-form solution of Control Barrier Function (CBF) framework for enforcing safety constraints on a Stewart robotic platform. The proposed method simultaneously handles multiple position and velocity constraints through an explicit closed-form control law, eliminating the need to solve a Quadratic Program (QP) at every control step and enabling efficient real-time implementation. This letter derives necessary and sufficient conditions under which the closed-form expression remains non-singular, thereby ensuring well-posedness of the CBF solution to multi-constraint problem. The controller is validated in both simulation and hardware experiments on a custom-built Stewart platform prototype, demonstrating safetyguaranteed performance that is comparable to the QP-based formulation, while reducing computation time by more than an order of magnitude. The results confirm that the proposed approach provides a reliable and computationally lightweight framework for real-time safe control of parallel robotic systems. The experimental videos are available on the project website. (https://nail-uh.github.io/StewartPlatformSafeControl.github.io/)
comment: 9 pages
Linear quadratic control for discrete-time systems with stochastic and bounded noises
This paper focuses on the linear quadratic control (LQC) design of systems corrupted by both stochastic noise and bounded noise simultaneously. When only of these noises are considered, the LQC strategy leads to stochastic or robust controllers, respectively. However, there is no LQC strategy that can simultaneously handle stochastic and bounded noises efficiently. This limits the scope where existing LQC strategies can be applied. In this work, we look into the LQC problem for discrete-time systems that have both stochastic and bounded noises in its dynamics. We develop a state estimation for such systems by efficiently combining a Kalman filter and an ellipsoid set-membership filter. The developed state estimation can recover the estimation optimality when the system is subject to both kinds of noise, the stochastic and the bounded. Upon the estimated state, we derive a robust state-feedback optimal control law for the LQC problem. The control law derivation takes into account both stochastic and bounded-state estimation errors, so as to avoid over-conservativeness while sustaining stability in the control. In this way, the developed LQC strategy extends the range of scenarios where LQC can be applied, especially those of real-world control systems with diverse sensing which are subject to different kinds of noise. We present numerical simulations, and the results demonstrate the enhanced control performance with the proposed strategy.
Physics Informed Dynamical Modeling of Extrusion Based 3D Printing Processes
The trade off between model fidelity and computational cost remains a central challenge in the computational modeling of extrusion based 3D printing, particularly for real time optimization and control. Although high fidelity simulations have advanced considerably for offline analysis, dynamical modeling tailored for online, control oriented applications is still significantly underdeveloped. In this study, we propose a reduced order dynamical flow model that captures the transient behavior of extrusion based 3D printing. The model is grounded in physics based principles derived from the Navier Stokes equations and further simplified through spatial averaging and input dependent parameterization. To assess its performance, the model is identified via a nonlinear least squares approach using Computational Fluid Dynamics (CFD) simulation data spanning a range of printing conditions and subsequently validated across multiple combinations of training and testing scenarios. The results demonstrate strong agreement with the CFD data within the nozzle, the nozzle substrate gap, and the deposited layer regions. Overall, the proposed reduced order model successfully captures the dominant flow dynamics of the process while maintaining a level of simplicity compatible with real time control and optimization.
comment: 21 pages, 12 figures
LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding NeurIPS 2025
Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -- remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules -- such as their representation quality -- limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline in which the LLM serves as a neural architecture design agent, leveraging language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.
comment: NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning
Adapting to Change: A Comparison of Continual and Transfer Learning for Modeling Building Thermal Dynamics under Concept Drifts
Transfer Learning (TL) is currently the most effective approach for modeling building thermal dynamics when only limited data are available. TL uses a pretrained model that is fine-tuned to a specific target building. However, it remains unclear how to proceed after initial fine-tuning, as more operational measurement data are collected over time. This challenge becomes even more complex when the dynamics of the building change, for example, after a retrofit or a change in occupancy. In Machine Learning literature, Continual Learning (CL) methods are used to update models of changing systems. TL approaches can also address this challenge by reusing the pretrained model at each update step and fine-tuning it with new measurement data. A comprehensive study on how to incorporate new measurement data over time to improve prediction accuracy and address the challenges of concept drifts (changes in dynamics) for building thermal dynamics is still missing. Therefore, this study compares several CL and TL strategies, as well as a model trained from scratch, for thermal dynamics modeling during building operation. The methods are evaluated using 5--7 years of simulated data representative of single-family houses in Central Europe, including scenarios with concept drifts from retrofits and changes in occupancy. We propose a CL strategy (Seasonal Memory Learning) that provides greater accuracy improvements than existing CL and TL methods, while maintaining low computational effort. SML outperformed the benchmark of initial fine-tuning by 28.1\% without concept drifts and 34.9\% with concept drifts.
comment: Currently under review
Impact and Mitigation of Current Saturation Algorithms in Grid-Forming Inverters on Power Swing Detection
Grid-forming (GFM) inverter-based resources (IBRs) are capable of emulating the external characteristics of synchronous generators (SGs) through the careful design of the control loops. However, the current limiter in the control loops of the GFM IBR poses challenges to the effectiveness of power swing detection functions designed for SG-based systems. Among various current limiting strategies, current saturation algorithms (CSAs) are widely employed for their strict current limiting capability, and are the focus of this paper. The paper presents a theoretical analysis of the conditions for entering and exiting the current saturation mode of the GFM IBR under three CSAs. The corresponding impedance trajectories observed by the relay on the GFM IBR side are investigated. The analysis results reveal that the unique impedance trajectories under these CSAs markedly differ from those associated with SGs. Moreover, it is demonstrated that the conventional power swing detection scheme may lose functionality due to the rapid movement of the trajectory. To mitigate this issue, an optimal current saturation strategy is proposed. Conclusions are validated through simulations in MATLAB/Simulink.
Never too Cocky to Cooperate: An FIM and RL-based USV-AUV Collaborative System for Underwater Tasks in Extreme Sea Conditions
This paper develops a novel unmanned surface vehicle (USV)-autonomous underwater vehicle (AUV) collaborative system designed to enhance underwater task performance in extreme sea conditions. The system integrates a dual strategy: (1) high-precision multi-AUV localization enabled by Fisher information matrix-optimized USV path planning, and (2) reinforcement learning-based cooperative planning and control method for multi-AUV task execution. Extensive experimental evaluations in the underwater data collection task demonstrate the system's operational feasibility, with quantitative results showing significant performance improvements over baseline methods. The proposed system exhibits robust coordination capabilities between USV and AUVs while maintaining stability in extreme sea conditions. To facilitate reproducibility and community advancement, we provide an open-source simulation toolkit available at: https://github.com/360ZMEM/USV-AUV-colab .
comment: This paper has been submitted to IEEE Transactions on Mobile Computing, and is currently under minor revision
JITServe: SLO-aware LLM Serving with Imprecise Request Information
The integration of Large Language Models (LLMs) into applications ranging from interactive chatbots to multi-agent systems has introduced a wide spectrum of service-level objectives (SLOs) for responsiveness. These include latency-sensitive requests emphasizing per-token latency in streaming chat, deadline-sensitive requests requiring rapid full responses to trigger external tools, and compound requests with evolving dependencies across multiple LLM calls. Despite-or perhaps, because of-this workload diversity and unpredictable request information (e.g., response lengths and dependencies), existing request schedulers have focused on aggregate performance, unable to ensure application-level SLO needs. This paper presents JITServe, the first SLO-aware LLM serving system designed to maximize service goodput (e.g., the number of tokens meeting request SLOs) across diverse workloads. JITServe novelly schedules requests using imprecise request information and gradually relaxes this conservatism by refining request information estimates as generation progresses. It applies a grouped margin goodput maximization algorithm to allocate just enough serving bandwidth to satisfy each request's SLO just-in-time (JIT), maximizing residual capacity for others, while deciding the composition of requests in a batch to maximize efficiency and goodput with provable guarantees. Our evaluation across diverse realistic workloads, including chat, deep research, and agentic pipelines, shows that JITServe improves service goodput by 1.4x-6.3x, alternatively achieving 28.5%-83.2% resource savings, compared to state-of-the-art designs.
TranSimHub:A Unified Air-Ground Simulation Platform for Multi-Modal Perception and Decision-Making
Air-ground collaborative intelligence is becoming a key approach for next-generation urban intelligent transportation management, where aerial and ground systems work together on perception, communication, and decision-making. However, the lack of a unified multi-modal simulation environment has limited progress in studying cross-domain perception, coordination under communication constraints, and joint decision optimization. To address this gap, we present TranSimHub, a unified simulation platform for air-ground collaborative intelligence. TranSimHub offers synchronized multi-view rendering across RGB, depth, and semantic segmentation modalities, ensuring consistent perception between aerial and ground viewpoints. It also supports information exchange between the two domains and includes a causal scene editor that enables controllable scenario creation and counterfactual analysis under diverse conditions such as different weather, emergency events, and dynamic obstacles. We release TranSimHub as an open-source platform that supports end-to-end research on perception, fusion, and control across realistic air and ground traffic scenes. Our code is available at https://github.com/Traffic-Alpha/TransSimHub.
comment: 9 pages, 4 figures
Quantum Encrypted Control of Networked Systems
Encrypted control has been extensively studied to ensure the confidentiality of system states and control inputs for networked control systems. This paper presents a computationally efficient encrypted control framework for networked systems enabled by quantum communication. A quantum channel between sensors and actuators is used to generate identical secret keys, whose security is further enhanced through quantum key distribution. These keys enable lightweight encryption and decryption while preserving confidentiality and control accuracy. We develop a novel encryption-decryption architecture for state-feedback control of linear systems based on quantum keys, and characterize the impact of quantum state errors on closed-loop stability. In particular, we establish the existence of a critical threshold on intrinsic quantum noise below which stability is guaranteed. In contrast to classical encrypted control schemes, which may collapse under a single key-bit error, the proposed quantum encrypted control exhibits strong robustness to key imperfections. We further adopt quantization techniques to address the scenarios with limited communication bits in practical situations, and implement privacy protection for quantum keys based on a stochastic quantizer. These results demonstrate that integrating quantum technologies into control systems in a nontrivial and principled manner, even at their current level of maturity, can yield substantial performance gains in reducing computational complexity and improving resilience to key errors while ensuring security against multiple eavesdropping sources.
comment: 17 pages
Robotics
Closing the Train-Test Gap in World Models for Gradient-Based Planning
World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.
HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models
Vision-Language-Action (VLA) models have recently enabled robotic manipulation by grounding visual and linguistic cues into actions. However, most VLAs assume the Markov property, relying only on the current observation and thus suffering from temporal myopia that degrades long-horizon coherence. In this work, we view motion as a more compact and informative representation of temporal context and world dynamics, capturing inter-state changes while filtering static pixel-level noise. Building on this idea, we propose HiF-VLA (Hindsight, Insight, and Foresight for VLAs), a unified framework that leverages motion for bidirectional temporal reasoning. HiF-VLA encodes past dynamics through hindsight priors, anticipates future motion via foresight reasoning, and integrates both through a hindsight-modulated joint expert to enable a ''think-while-acting'' paradigm for long-horizon manipulation. As a result, HiF-VLA surpasses strong baselines on LIBERO-Long and CALVIN ABC-D benchmarks, while incurring negligible additional inference latency. Furthermore, HiF-VLA achieves substantial improvements in real-world long-horizon manipulation tasks, demonstrating its broad effectiveness in practical robotic settings.
comment: Project page: https://hifvla.github.io Github: https://github.com/OpenHelix-Team/HiF-VLA
Token Expand-Merge: Training-Free Token Compression for Vision-Language-Action Models
Vision-Language-Action (VLA) models pretrained on large-scale multimodal datasets have emerged as powerful foundations for robotic perception and control. However, their massive scale, often billions of parameters, poses significant challenges for real-time deployment, as inference becomes computationally expensive and latency-sensitive in dynamic environments. To address this, we propose Token Expand-and-Merge-VLA (TEAM-VLA), a training-free token compression framework that accelerates VLA inference while preserving task performance. TEAM-VLA introduces a dynamic token expansion mechanism that identifies and samples additional informative tokens in the spatial vicinity of attention-highlighted regions, enhancing contextual completeness. These expanded tokens are then selectively merged in deeper layers under action-aware guidance, effectively reducing redundancy while maintaining semantic coherence. By coupling expansion and merging within a single feed-forward pass, TEAM-VLA achieves a balanced trade-off between efficiency and effectiveness, without any retraining or parameter updates. Extensive experiments on LIBERO benchmark demonstrate that TEAM-VLA consistently improves inference speed while maintaining or even surpassing the task success rate of full VLA models. The code is public available on \href{https://github.com/Jasper-aaa/TEAM-VLA}{https://github.com/Jasper-aaa/TEAM-VLA}
comment: 8 pages, 5 figures
LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating
Towards human-robot coexistence, socially aware navigation is significant for mobile robots. Yet existing studies on this area focus mainly on path efficiency and pedestrian collision avoidance, which are essential but represent only a fraction of social navigation. Beyond these basics, robots must also comply with user instructions, aligning their actions to task goals and social norms expressed by humans. In this work, we present LISN-Bench, the first simulation-based benchmark for language-instructed social navigation. Built on Rosnav-Arena 3.0, it is the first standardized social navigation benchmark to incorporate instruction following and scene understanding across diverse contexts. To address this task, we further propose Social-Nav-Modulator, a fast-slow hierarchical system where a VLM agent modulates costmaps and controller parameters. Decoupling low-level action generation from the slower VLM loop reduces reliance on high-frequency VLM inference while improving dynamic avoidance and perception adaptability. Our method achieves an average success rate of 91.3%, which is greater than 63% than the most competitive baseline, with most of the improvements observed in challenging tasks such as following a person in a crowd and navigating while strictly avoiding instruction-forbidden regions. The project website is at: https://social-nav.github.io/LISN-project/
comment: 8 pages
Py-DiSMech: A Scalable and Efficient Framework for Discrete Differential Geometry-Based Modeling and Control of Soft Robots
High-fidelity simulation has become essential to the design and control of soft robots, where large geometric deformations and complex contact interactions challenge conventional modeling tools. Recent advances in the field demand simulation frameworks that combine physical accuracy, computational scalability, and seamless integration with modern control and optimization pipelines. In this work, we present Py-DiSMech, a Python-based, open-source simulation framework for modeling and control of soft robotic structures grounded in the principles of Discrete Differential Geometry (DDG). By discretizing geometric quantities such as curvature and strain directly on meshes, Py-DiSMech captures the nonlinear deformation of rods, shells, and hybrid structures with high fidelity and reduced computational cost. The framework introduces (i) a fully vectorized NumPy implementation achieving order-of-magnitude speed-ups over existing geometry-based simulators; (ii) a penalty-energy-based fully implicit contact model that supports rod-rod, rod-shell, and shell-shell interactions; (iii) a natural-strain-based feedback-control module featuring a proportional-integral (PI) controller for shape regulation and trajectory tracking; and (iv) a modular, object-oriented software design enabling user-defined elastic energies, actuation schemes, and integration with machine-learning libraries. Benchmark comparisons demonstrate that Py-DiSMech substantially outperforms the state-of-the-art simulator Elastica in computational efficiency while maintaining physical accuracy. Together, these features establish Py-DiSMech as a scalable, extensible platform for simulation-driven design, control validation, and sim-to-real research in soft robotics.
comment: https://github.com/structuresComp/dismech-python
YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos
Visual navigation has emerged as a practical alternative to traditional robotic navigation pipelines that rely on detailed mapping and path planning. However, constructing and maintaining 3D maps is often computationally expensive and memory-intensive. We address the problem of visual navigation when exploration videos of a large environment are available. The videos serve as a visual reference, allowing a robot to retrace the explored trajectories without relying on metric maps. Our proposed method, YOPO-Nav (You Only Pass Once), encodes an environment into a compact spatial representation composed of interconnected local 3D Gaussian Splatting (3DGS) models. During navigation, the framework aligns the robot's current visual observation with this representation and predicts actions that guide it back toward the demonstrated trajectory. YOPO-Nav employs a hierarchical design: a visual place recognition (VPR) module provides coarse localization, while the local 3DGS models refine the goal and intermediate poses to generate control actions. To evaluate our approach, we introduce the YOPO-Campus dataset, comprising 4 hours of egocentric video and robot controller inputs from over 6 km of human-teleoperated robot trajectories. We benchmark recent visual navigation methods on trajectories from YOPO-Campus using a Clearpath Jackal robot. Experimental results show YOPO-Nav provides excellent performance in image-goal navigation for real-world scenes on a physical robot. The dataset and code will be made publicly available for visual navigation and scene representation research.
Visual Heading Prediction for Autonomous Aerial Vehicles
The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
Robotic manipulation requires both rich multimodal perception and effective learning frameworks to handle complex real-world tasks. See-through-skin (STS) sensors, which combine tactile and visual perception, offer promising sensing capabilities, while modern imitation learning provides powerful tools for policy acquisition. However, existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking. Furthermore, integrating these rich multimodal signals into learning-based manipulation pipelines remains an open challenge. We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for manipulation. Our sensor features a fully transparent elastomer, persistent illumination, novel keyline markers, and efficient tracking, while our learning system integrates these signals through a Transformer-based Diffusion Policy. Experiments on five challenging real-world tasks show that TacThru-UMI achieves an average success rate of 85.5%, significantly outperforming the baselines of alternating tactile-visual (66.3%) and vision-only (55.4%). The system excels in critical scenarios, including contact detection with thin and soft objects and precision manipulation requiring multimodal coordination. This work demonstrates that combining simultaneous multimodal perception with modern learning frameworks enables more precise, adaptable robotic manipulation.
Bridging the Basilisk Astrodynamics Framework with ROS 2 for Modular Spacecraft Simulation and Hardware Integration
Integrating high-fidelity spacecraft simulators with modular robotics frameworks remains a challenge for autonomy development. This paper presents a lightweight, open-source communication bridge between the Basilisk astrodynamics simulator and the Robot Operating System 2 (ROS 2), enabling real-time, bidirectional data exchange for spacecraft control. The bridge requires no changes to Basilisk's core and integrates seamlessly with ROS 2 nodes. We demonstrate its use in a leader-follower formation flying scenario using nonlinear model predictive control, deployed identically in both simulation and on the ATMOS planar microgravity testbed. This setup supports rapid development, hardware-in-the-loop testing, and seamless transition from simulation to hardware. The bridge offers a flexible and scalable platform for modular spacecraft autonomy and reproducible research workflows.
comment: Presented at the International Conference on Space Robotics (iSpaRo) 2025. To appear in IEEE Xplore
High-Resolution Water Sampling via a Solar-Powered Autonomous Surface Vehicle
Accurate water quality assessment requires spatially resolved sampling, yet most unmanned surface vehicles (USVs) can collect only a limited number of samples or rely on single-point sensors with poor representativeness. This work presents a solar-powered, fully autonomous USV featuring a novel syringe-based sampling architecture capable of acquiring 72 discrete, contamination-minimized water samples per mission. The vehicle incorporates a ROS 2 autonomy stack with GPS-RTK navigation, LiDAR and stereo-vision obstacle detection, Nav2-based mission planning, and long-range LoRa supervision, enabling dependable execution of sampling routes in unstructured environments. The platform integrates a behavior-tree autonomy architecture adapted from Nav2, enabling mission-level reasoning and perception-aware navigation. A modular 6x12 sampling system, controlled by distributed micro-ROS nodes, provides deterministic actuation, fault isolation, and rapid module replacement, achieving spatial coverage beyond previously reported USV-based samplers. Field trials in Achocalla Lagoon (La Paz, Bolivia) demonstrated 87% waypoint accuracy, stable autonomous navigation, and accurate physicochemical measurements (temperature, pH, conductivity, total dissolved solids) comparable to manually collected references. These results demonstrate that the platform enables reliable high-resolution sampling and autonomous mission execution, providing a scalable solution for aquatic monitoring in remote environments.
ReMoSPLAT: Reactive Mobile Manipulation Control on a Gaussian Splat
Reactive control can gracefully coordinate the motion of the base and the arm of a mobile manipulator. However, incorporating an accurate representation of the environment to avoid obstacles without involving costly planning remains a challenge. In this work, we present ReMoSPLAT, a reactive controller based on a quadratic program formulation for mobile manipulation that leverages a Gaussian Splat representation for collision avoidance. By integrating additional constraints and costs into the optimisation formulation, a mobile manipulator platform can reach its intended end effector pose while avoiding obstacles, even in cluttered scenes. We investigate the trade-offs of two methods for efficiently calculating robot-obstacle distances, comparing a purely geometric approach with a rasterisation-based approach. Our experiments in simulation on both synthetic and real-world scans demonstrate the feasibility of our method, showing that the proposed approach achieves performance comparable to controllers that rely on perfect ground-truth information.
comment: 9 pages, 5 figures
GLaD: Geometric Latent Distillation for Vision-Language-Action Models
Most existing Vision-Language-Action (VLA) models rely primarily on RGB information, while ignoring geometric cues crucial for spatial reasoning and manipulation. In this work, we introduce GLaD, a geometry-aware VLA framework that incorporates 3D geometric priors during pretraining through knowledge distillation. Rather than distilling geometric features solely into the vision encoder, we align the LLM's hidden states corresponding to visual tokens with features from a frozen geometry-aware vision transformer (VGGT), ensuring that geometric understanding is deeply integrated into the multimodal representations that drive action prediction. Pretrained on the Bridge dataset with this geometry distillation mechanism, GLaD achieves 94.1% average success rate across four LIBERO task suites, outperforming UniVLA (92.5%) which uses identical pretraining data. These results validate that geometry-aware pretraining enhances spatial reasoning and policy generalization without requiring explicit depth sensors or 3D annotations.
Super4DR: 4D Radar-centric Self-supervised Odometry and Gaussian-based Map Optimization
Conventional SLAM systems using visual or LiDAR data often struggle in poor lighting and severe weather. Although 4D radar is suited for such environments, its sparse and noisy point clouds hinder accurate odometry estimation, while the radar maps suffer from obscure and incomplete structures. Thus, we propose Super4DR, a 4D radar-centric framework for learning-based odometry estimation and gaussian-based map optimization. First, we design a cluster-aware odometry network that incorporates object-level cues from the clustered radar points for inter-frame matching, alongside a hierarchical self-supervision mechanism to overcome outliers through spatio-temporal consistency, knowledge transfer, and feature contrast. Second, we propose using 3D gaussians as an intermediate representation, coupled with a radar-specific growth strategy, selective separation, and multi-view regularization, to recover blurry map areas and those undetected based on image texture. Experiments show that Super4DR achieves a 67% performance gain over prior self-supervised methods, nearly matches supervised odometry, and narrows the map quality disparity with LiDAR while enabling multi-modal image rendering.
comment: 17 pages, 20 figures
UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories AAAI 2026
Navigating complex urban environments using natural language instructions poses significant challenges for embodied agents, including noisy language instructions, ambiguous spatial references, diverse landmarks, and dynamic street scenes. Current visual navigation methods are typically limited to simulated or off-street environments, and often rely on precise goal formats, such as specific coordinates or images. This limits their effectiveness for autonomous agents like last-mile delivery robots navigating unfamiliar cities. To address these limitations, we introduce UrbanNav, a scalable framework that trains embodied agents to follow free-form language instructions in diverse urban settings. Leveraging web-scale city walking videos, we develop an scalable annotation pipeline that aligns human navigation trajectories with language instructions grounded in real-world landmarks. UrbanNav encompasses over 1,500 hours of navigation data and 3 million instruction-trajectory-landmark triplets, capturing a wide range of urban scenarios. Our model learns robust navigation policies to tackle complex urban scenarios, demonstrating superior spatial reasoning, robustness to noisy instructions, and generalization to unseen urban settings. Experimental results show that UrbanNav significantly outperforms existing methods, highlighting the potential of large-scale web video data to enable language-guided, real-world urban navigation for embodied agents.
comment: 9 pages, 5 figures, accepted to AAAI 2026
CS3D: An Efficient Facial Expression Recognition via Event Vision
Responsive and accurate facial expression recognition is crucial to human-robot interaction for daily service robots. Nowadays, event cameras are becoming more widely adopted as they surpass RGB cameras in capturing facial expression changes due to their high temporal resolution, low latency, computational efficiency, and robustness in low-light conditions. Despite these advantages, event-based approaches still encounter practical challenges, particularly in adopting mainstream deep learning models. Traditional deep learning methods for facial expression analysis are energy-intensive, making them difficult to deploy on edge computing devices and thereby increasing costs, especially for high-frequency, dynamic, event vision-based approaches. To address this challenging issue, we proposed the CS3D framework by decomposing the Convolutional 3D method to reduce the computational complexity and energy consumption. Additionally, by utilizing soft spiking neurons and a spatial-temporal attention mechanism, the ability to retain information is enhanced, thus improving the accuracy of facial expression detection. Experimental results indicate that our proposed CS3D method attains higher accuracy on multiple datasets compared to architectures such as the RNN, Transformer, and C3D, while the energy consumption of the CS3D method is just 21.97\% of the original C3D required on the same device.
Mastering Diverse, Unknown, and Cluttered Tracks for Robust Vision-Based Drone Racing
Most reinforcement learning(RL)-based methods for drone racing target fixed, obstacle-free tracks, leaving the generalization to unknown, cluttered environments largely unaddressed. This challenge stems from the need to balance racing speed and collision avoidance, limited feasible space causing policy exploration trapped in local optima during training, and perceptual ambiguity between gates and obstacles in depth maps-especially when gate positions are only coarsely specified. To overcome these issues, we propose a two-phase learning framework: an initial soft-collision training phase that preserves policy exploration for high-speed flight, followed by a hard-collision refinement phase that enforces robust obstacle avoidance. An adaptive, noise-augmented curriculum with an asymmetric actor-critic architecture gradually shifts the policy's reliance from privileged gate-state information to depth-based visual input. We further impose Lipschitz constraints and integrate a track-primitive generator to enhance motion stability and cross-environment generalization. We evaluate our framework through extensive simulation and ablation studies, and validate it in real-world experiments on a computationally constrained quadrotor. The system achieves agile flight while remaining robust to gate-position errors, developing a generalizable drone racing framework with the capability to operate in diverse, partially unknown and cluttered environments. https://yufengsjtu.github.io/MasterRacing.github.io/
comment: 8 pages, 9 figures, Robotics and Automation Letters accept
REASAN: Learning Reactive Safe Navigation for Legged Robots
We present a novel modularized end-to-end framework for legged reactive navigation in complex dynamic environments using a single light detection and ranging (LiDAR) sensor. The system comprises four simulation-trained modules: three reinforcement-learning (RL) policies for locomotion, safety shielding, and navigation, and a transformer-based exteroceptive estimator that processes raw point-cloud inputs. This modular decomposition of complex legged motor-control tasks enables lightweight neural networks with simple architectures, trained using standard RL practices with targeted reward shaping and curriculum design, without reliance on heuristics or sophisticated policy-switching mechanisms. We conduct comprehensive ablations to validate our design choices and demonstrate improved robustness compared to existing approaches in challenging navigation tasks. The resulting reactive safe navigation (REASAN) system achieves fully onboard and real-time reactive navigation across both single- and multi-robot settings in complex environments. We release our training and deployment code at https://github.com/ASIG-X/REASAN.
comment: 8 pages
ViTA-Seg: Vision Transformer for Amodal Segmentation in Robotics
Occlusions in robotic bin picking compromise accurate and reliable grasp planning. We present ViTA-Seg, a class-agnostic Vision Transformer framework for real-time amodal segmentation that leverages global attention to recover complete object masks, including hidden regions. We proposte two architectures: a) Single-Head for amodal mask prediction; b) Dual-Head for amodal and occluded mask prediction. We also introduce ViTA-SimData, a photo-realistic synthetic dataset tailored to industrial bin-picking scenario. Extensive experiments on two amodal benchmarks, COOCA and KINS, demonstrate that ViTA-Seg Dual Head achieves strong amodal and occlusion segmentation accuracy with computational efficiency, enabling robust, real-time robotic manipulation.
On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds
This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.
Development of a Compliant Gripper for Safe Robot-Assisted Trouser Dressing-Undressing
In recent years, many countries, including Japan, have rapidly aging populations, making the preservation of seniors' quality of life a significant concern. For elderly people with impaired physical abilities, support for toileting is one of the most important issues. This paper details the design, development, experimental assessment, and potential application of the gripper system, with a focus on the unique requirements and obstacles involved in aiding elderly or hemiplegic individuals in dressing and undressing trousers. The gripper we propose seeks to find the right balance between compliance and grasping forces, ensuring precise manipulation while maintaining a safe and compliant interaction with the users. The gripper's integration into a custom--built robotic manipulator system provides a comprehensive solution for assisting hemiplegic individuals in their dressing and undressing tasks. Experimental evaluations and comparisons with existing studies demonstrate the gripper's ability to successfully assist in both dressing and dressing of trousers in confined spaces with a high success rate. This research contributes to the advancement of assistive robotics, empowering elderly, and physically impaired individuals to maintain their independence and improve their quality of life.
Sequential Testing for Descriptor-Agnostic LiDAR Loop Closure in Repetitive Environments
We propose a descriptor-agnostic, multi-frame loop closure verification method that formulates LiDAR loop closure as a truncated Sequential Probability Ratio Test (SPRT). Instead of deciding from a single descriptor comparison or using fixed thresholds with late-stage Iterative Closest Point (ICP) vetting, the verifier accumulates a short temporal stream of descriptor similarities between a query and each candidate. It then issues an accept/reject decision adaptively once sufficient multi-frame evidence has been observed, according to user-specified Type-I/II error design targets. This precision-first policy is designed to suppress false positives in structurally repetitive indoor environments. We evaluate the verifier on a five-sequence library dataset, using a fixed retrieval front-end with several representative LiDAR global descriptors. Performance is assessed via segment-level K-hit precision-recall and absolute trajectory error (ATE) and relative pose error (RPE) after pose graph optimization. Across descriptors, the sequential verifier consistently improves precision and reduces the impact of aliased loops compared with single-frame and heuristic multi-frame baselines. Our implementation and dataset will be released at: https://github.com/wanderingcar/snu_library_dataset.
comment: 8 pages, 4 figures
A Hierarchical, Model-Based System for High-Performance Humanoid Soccer
The development of athletic humanoid robots has gained significant attention as advances in actuation, sensing, and control enable increasingly dynamic, real-world capabilities. RoboCup, an international competition of fully autonomous humanoid robots, provides a uniquely challenging benchmark for such systems, culminating in the long-term goal of competing against human soccer players by 2050. This paper presents the hardware and software innovations underlying our team's victory in the RoboCup 2024 Adult-Sized Humanoid Soccer Competition. On the hardware side, we introduce an adult-sized humanoid platform built with lightweight structural components, high-torque quasi-direct-drive actuators, and a specialized foot design that enables powerful in-gait kicks while preserving locomotion robustness. On the software side, we develop an integrated perception and localization framework that combines stereo vision, object detection, and landmark-based fusion to provide reliable estimates of the ball, goals, teammates, and opponents. A mid-level navigation stack then generates collision-aware, dynamically feasible trajectories, while a centralized behavior manager coordinates high-level decision making, role selection, and kick execution based on the evolving game state. The seamless integration of these subsystems results in fast, precise, and tactically effective gameplay, enabling robust performance under the dynamic and adversarial conditions of real matches. This paper presents the design principles, system architecture, and experimental results that contributed to ARTEMIS's success as the 2024 Adult-Sized Humanoid Soccer champion.
D$^2$GSLAM: 4D Dynamic Gaussian Splatting SLAM
Recent advances in Dense Simultaneous Localization and Mapping (SLAM) have demonstrated remarkable performance in static environments. However, dense SLAM in dynamic environments remains challenging. Most methods directly remove dynamic objects and focus solely on static scene reconstruction, which ignores the motion information contained in these dynamic objects. In this paper, we present D$^2$GSLAM, a novel dynamic SLAM system utilizing Gaussian representation, which simultaneously performs accurate dynamic reconstruction and robust tracking within dynamic environments. Our system is composed of four key components: (i) We propose a geometric-prompt dynamic separation method to distinguish between static and dynamic elements of the scene. This approach leverages the geometric consistency of Gaussian representation and scene geometry to obtain coarse dynamic regions. The regions then serve as prompts to guide the refinement of the coarse mask for achieving accurate motion mask. (ii) To facilitate accurate and efficient mapping of the dynamic scene, we introduce dynamic-static composite representation that integrates static 3D Gaussians with dynamic 4D Gaussians. This representation allows for modeling the transitions between static and dynamic states of objects in the scene for composite mapping and optimization. (iii) We employ a progressive pose refinement strategy that leverages both the multi-view consistency of static scene geometry and motion information from dynamic objects to achieve accurate camera tracking. (iv) We introduce a motion consistency loss, which leverages the temporal continuity in object motions for accurate dynamic modeling. Our D$^2$GSLAM demonstrates superior performance on dynamic scenes in terms of mapping and tracking accuracy, while also showing capability in accurate dynamic modeling.
Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation
Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learning (MARL) often suffers from inefficient exploration and fails to scale to large scenarios. We propose PGF-MAPPO (Path-Guided Frontier MAPPO), a hierarchical framework bridging topological planning with reactive control. To resolve local minima and sparse rewards, we integrate an A*-based potential field for dense reward shaping. Furthermore, we introduce Directional Frontier Allocation, combining Farthest Point Sampling (FPS) with geometric angle suppression to enforce spatial dispersion and accelerate coverage. The architecture employs a parameter-shared decentralized critic, maintaining O(1) model complexity suitable for robotic swarms. Experiments demonstrate that PGF-MAPPO achieves superior capture efficiency against faster evaders. Policies trained on 10x10 maps exhibit robust zero-shot generalization to unseen 20x20 environments, significantly outperforming rule-based and learning-based baselines.
comment: 7 pages, 7 figures
H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos
Robots that learn manipulation skills from everyday human videos could acquire broad capabilities without tedious robot data collection. We propose a video-to-video translation framework that converts ordinary human-object interaction videos into motion-consistent robot manipulation videos with realistic, physically grounded interactions. Our approach does not require any paired human-robot videos for training only a set of unpaired robot videos, making the system easy to scale. We introduce a transferable representation that bridges the embodiment gap: by inpainting the robot arm in training videos to obtain a clean background and overlaying a simple visual cue (a marker and arrow indicating the gripper's position and orientation), we can condition a generative model to insert the robot arm back into the scene. At test time, we apply the same process to human videos (inpainting the person and overlaying human pose cues) and generate high-quality robot videos that mimic the human's actions. We fine-tune a SOTA video diffusion model (Wan 2.2) in an in-context learning manner to ensure temporal coherence and leveraging of its rich prior knowledge. Empirical results demonstrate that our approach achieves significantly more realistic and grounded robot motions compared to baselines, pointing to a promising direction for scaling up robot learning from unlabeled human videos. Project page: https://showlab.github.io/H2R-Grounder/
comment: 13 pages, 6 figures
Observability Analysis and Composite Disturbance Filtering for a Bar Tethered to Dual UAVs Subject to Multi-source Disturbances
Cooperative suspended aerial transportation is highly susceptible to multi-source disturbances such as aerodynamic effects and thrust uncertainties. To achieve precise load manipulation, existing methods often rely on extra sensors to measure cable directions or the payload's pose, which increases the system cost and complexity. A fundamental question remains: is the payload's pose observable under multi-source disturbances using only the drones' odometry information? To answer this question, this work focuses on the two-drone-bar system and proves that the whole system is observable when only two or fewer types of lumped disturbances exist by using the observability rank criterion. To the best of our knowledge, we are the first to present such a conclusion and this result paves the way for more cost-effective and robust systems by minimizing their sensor suites. Next, to validate this analysis, we consider the situation where the disturbances are only exerted on the drones, and develop a composite disturbance filtering scheme. A disturbance observer-based error-state extended Kalman filter is designed for both state and disturbance estimation, which renders improved estimation performance for the whole system evolving on the manifold $(\mathbb{R}^3)^2\times(TS^2)^3$. Our simulation and experimental tests have validated that it is possible to fully estimate the state and disturbance of the system with only odometry information of the drones.
COVLM-RL: Critical Object-Oriented Reasoning for Autonomous Driving Using VLM-Guided Reinforcement Learning
End-to-end autonomous driving frameworks face persistent challenges in generalization, training efficiency, and interpretability. While recent methods leverage Vision-Language Models (VLMs) through supervised learning on large-scale datasets to improve reasoning, they often lack robustness in novel scenarios. Conversely, reinforcement learning (RL)-based approaches enhance adaptability but remain data-inefficient and lack transparent decision-making. % contribution To address these limitations, we propose COVLM-RL, a novel end-to-end driving framework that integrates Critical Object-oriented (CO) reasoning with VLM-guided RL. Specifically, we design a Chain-of-Thought (CoT) prompting strategy that enables the VLM to reason over critical traffic elements and generate high-level semantic decisions, effectively transforming multi-view visual inputs into structured semantic decision priors. These priors reduce the input dimensionality and inject task-relevant knowledge into the RL loop, accelerating training and improving policy interpretability. However, bridging high-level semantic guidance with continuous low-level control remains non-trivial. To this end, we introduce a consistency loss that encourages alignment between the VLM's semantic plans and the RL agent's control outputs, enhancing interpretability and training stability. Experiments conducted in the CARLA simulator demonstrate that COVLM-RL significantly improves the success rate by 30\% in trained driving environments and by 50\% in previously unseen environments, highlighting its strong generalization capability.
comment: 8 pages, 5 figures
Development and Testing for Perception Based Autonomous Landing of a Long-Range QuadPlane
QuadPlanes combine the range efficiency of fixed-wing aircraft with the maneuverability of multi-rotor platforms for long-range autonomous missions. In GPS-denied or cluttered urban environments, perception-based landing is vital for reliable operation. Unlike structured landing zones, real-world sites are unstructured and highly variable, requiring strong generalization capabilities from the perception system. Deep neural networks (DNNs) provide a scalable solution for learning landing site features across diverse visual and environmental conditions. While perception-driven landing has been shown in simulation, real-world deployment introduces significant challenges. Payload and volume constraints limit high-performance edge AI devices like the NVIDIA Jetson Orin Nano, which are crucial for real-time detection and control. Accurate pose estimation during descent is necessary, especially in the absence of GPS, and relies on dependable visual-inertial odometry. Achieving this with limited edge AI resources requires careful optimization of the entire deployment framework. The flight characteristics of large QuadPlanes further complicate the problem. These aircraft exhibit high inertia, reduced thrust vectoring, and slow response times further complicate stable landing maneuvers. This work presents a lightweight QuadPlane system for efficient vision-based autonomous landing and visual-inertial odometry, specifically developed for long-range QuadPlane operations such as aerial monitoring. It describes the hardware platform, sensor configuration, and embedded computing architecture designed to meet demanding real-time, physical constraints. This establishes a foundation for deploying autonomous landing in dynamic, unstructured, GPS-denied environments.
Scene-agnostic Hierarchical Bimanual Task Planning via Visual Affordance Reasoning
Embodied agents operating in open environments must translate high-level instructions into grounded, executable behaviors, often requiring coordinated use of both hands. While recent foundation models offer strong semantic reasoning, existing robotic task planners remain predominantly unimanual and fail to address the spatial, geometric, and coordination challenges inherent to bimanual manipulation in scene-agnostic settings. We present a unified framework for scene-agnostic bimanual task planning that bridges high-level reasoning with 3D-grounded two-handed execution. Our approach integrates three key modules. Visual Point Grounding (VPG) analyzes a single scene image to detect relevant objects and generate world-aligned interaction points. Bimanual Subgoal Planner (BSP) reasons over spatial adjacency and cross-object accessibility to produce compact, motion-neutralized subgoals that exploit opportunities for coordinated two-handed actions. Interaction-Point-Driven Bimanual Prompting (IPBP) binds these subgoals to a structured skill library, instantiating synchronized unimanual or bimanual action sequences that satisfy hand-state and affordance constraints. Together, these modules enable agents to plan semantically meaningful, physically feasible, and parallelizable two-handed behaviors in cluttered, previously unseen scenes. Experiments show that it produces coherent, feasible, and compact two-handed plans, and generalizes to cluttered scenes without retraining, demonstrating robust scene-agnostic affordance reasoning for bimanual tasks.
comment: 8 pages, 4 figures
One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation
Learning dexterous bimanual manipulation policies critically depends on large-scale, high-quality demonstrations, yet current paradigms face inherent trade-offs: teleoperation provides physically grounded data but is prohibitively labor-intensive, while simulation-based synthesis scales efficiently but suffers from sim-to-real gaps. We present BiDemoSyn, a framework that synthesizes contact-rich, physically feasible bimanual demonstrations from a single real-world example. The key idea is to decompose tasks into invariant coordination blocks and variable, object-dependent adjustments, then adapt them through vision-guided alignment and lightweight trajectory optimization. This enables the generation of thousands of diverse and feasible demonstrations within several hour, without repeated teleoperation or reliance on imperfect simulation. Across six dual-arm tasks, we show that policies trained on BiDemoSyn data generalize robustly to novel object poses and shapes, significantly outperforming recent baselines. By bridging the gap between efficiency and real-world fidelity, BiDemoSyn provides a scalable path toward practical imitation learning for complex bimanual manipulation without compromising physical grounding.
comment: under review
UPETrack: Unidirectional Position Estimation for Tracking Occluded Deformable Linear Objects
Real-time state tracking of Deformable Linear Objects (DLOs) is critical for enabling robotic manipulation of DLOs in industrial assembly, medical procedures, and daily-life applications. However, the high-dimensional configuration space, nonlinear dynamics, and frequent partial occlusions present fundamental barriers to robust real-time DLO tracking. To address these limitations, this study introduces UPETrack, a geometry-driven framework based on Unidirectional Position Estimation (UPE), which facilitates tracking without the requirement for physical modeling, virtual simulation, or visual markers. The framework operates in two phases: (1) visible segment tracking is based on a Gaussian Mixture Model (GMM) fitted via the Expectation Maximization (EM) algorithm, and (2) occlusion region prediction employing UPE algorithm we proposed. UPE leverages the geometric continuity inherent in DLO shapes and their temporal evolution patterns to derive a closed-form positional estimator through three principal mechanisms: (i) local linear combination displacement term, (ii) proximal linear constraint term, and (iii) historical curvature term. This analytical formulation allows efficient and stable estimation of occluded nodes through explicit linear combinations of geometric components, eliminating the need for additional iterative optimization. Experimental results demonstrate that UPETrack surpasses two state-of-the-art tracking algorithms, including TrackDLO and CDCPD2, in both positioning accuracy and computational efficiency.
MPC for momentum counter-balanced and zero-impulse contact with a free-spinning satellite
In on-orbit robotics, a servicer satellite's ability to make contact with a free-spinning target satellite is essential to completing most on-orbit servicing (OOS) tasks. This manuscript develops a nonlinear model predictive control (MPC) framework that generates feasible controls for a servicer satellite to achieve zero-impulse contact with a free-spinning target satellite. The overall maneuver requires coordination between two separately actuated modules of the servicer satellite: (1) a moment generation module and (2) a manipulation module. We apply MPC to control both modules by explicitly modeling the cross-coupling dynamics between them. We demonstrate that the MPC controller can enforce actuation and state constraints that prior control approaches could not account for. We evaluate the performance of the MPC controller by simulating zero-impulse contact scenarios with a free-spinning target satellite via numerical Monte Carlo (MC) trials and comparing the simulation results with prior control approaches. Our simulation results validate the effectiveness of the MPC controller in maintaining spin synchronization and zero-impulse contact under operation constraints, moving contact location, and observation and actuation noise.
comment: 21 pages, 4 figures, 5 tables, submission for AIAA SciTech 2026 conference
Inertial Magnetic SLAM Systems Using Low-Cost Sensors
Spatially inhomogeneous magnetic fields offer a valuable, non-visual information source for positioning. Among systems leveraging this, magnetic field-based simultaneous localization and mapping (SLAM) systems are particularly attractive because they can provide positioning information and build a magnetic field map on the fly. Moreover, they have bounded error within mapped regions. However, state-of-the-art methods typically require low-drift odometry data provided by visual odometry or a wheel encoder, etc. This is because these systems need to minimize/reduce positioning errors while exploring, which happens when they are in unmapped regions. To address these limitations, this work proposes a loosely coupled and a tightly coupled inertial magnetic SLAM (IM-SLAM) system. The proposed systems use commonly available low-cost sensors: an inertial measurement unit (IMU), a magnetometer array, and a barometer. The use of non-visual data provides a significant advantage over visual-based systems, making it robust to low-visibility conditions. Both systems employ state-space representations, and magnetic field models on different scales. The difference lies in how they use a local and global magnetic field model. The loosely coupled system uses these models separately in two state-space models, while the tightly coupled system integrates them into one state-space model. Experiment results show that the tightly coupled IM-SLAM system achieves lower positioning errors than the loosely coupled system in most scenarios, with typical errors on the order of meters per 100 meters traveled. These results demonstrate the feasiblity of developing a full 3D IM-SLAM systems using low-cost sensors and the potential of applying these systems in emergency response scenarios such as mine/fire rescue.
CHyLL: Learning Continuous Neural Representations of Hybrid Systems
Learning the flows of hybrid systems that have both continuous and discrete time dynamics is challenging. The existing method learns the dynamics in each discrete mode, which suffers from the combination of mode switching and discontinuities in the flows. In this work, we propose CHyLL (Continuous Hybrid System Learning in Latent Space), which learns a continuous neural representation of a hybrid system without trajectory segmentation, event functions, or mode switching. The key insight of CHyLL is that the reset map glues the state space at the guard surface, reformulating the state space as a piecewise smooth quotient manifold where the flow becomes spatially continuous. Building upon these insights and the embedding theorems grounded in differential topology, CHyLL concurrently learns a singularity-free neural embedding in a higher-dimensional space and the continuous flow in it. We showcase that CHyLL can accurately predict the flow of hybrid systems with superior accuracy and identify the topological invariants of the hybrid systems. Finally, we apply CHyLL to the stochastic optimal control problem.
Fast Functionally Redundant Inverse Kinematics for Robotic Toolpath Optimisation in Manufacturing Tasks
Industrial automation with six-axis robotic arms is critical for many manufacturing tasks, including welding and additive manufacturing applications; however, many of these operations are functionally redundant due to the symmetrical tool axis, which effectively makes the operation a five-axis task. Exploiting this redundancy is crucial for achieving the desired workspace and dexterity required for the feasibility and optimisation of toolpath planning. Inverse kinematics algorithms can solve this in a fast, reactive framework, but these techniques are underutilised over the more computationally expensive offline planning methods. We propose a novel algorithm to solve functionally redundant inverse kinematics for robotic manipulation utilising a task space decomposition approach, the damped least-squares method and Halley's method to achieve fast and robust solutions with reduced joint motion. We evaluate our methodology in the case of toolpath optimisation in a cold spray coating application on a non-planar surface. The functionally redundant inverse kinematics algorithm can quickly solve motion plans that minimise joint motion, expanding the feasible operating space of the complex toolpath. We validate our approach on an industrial ABB manipulator and cold-spray gun executing the computed toolpath.
comment: Published at the Australasian Conference on Robotics and Automation (ACRA 2025) https://ssl.linklings.net/conferences/acra/acra2025_proceedings/views/includes/files/pap149s2.pdf
Push Smarter, Not Harder: Hierarchical RL-Diffusion Policy for Efficient Nonprehensile Manipulation
Nonprehensile manipulation, such as pushing objects across cluttered environments, presents a challenging control problem due to complex contact dynamics and long-horizon planning requirements. In this work, we propose HeRD, a hierarchical reinforcement learning-diffusion policy that decomposes pushing tasks into two levels: high-level goal selection and low-level trajectory generation. We employ a high-level reinforcement learning (RL) agent to select intermediate spatial goals, and a low-level goal-conditioned diffusion model to generate feasible, efficient trajectories to reach them. This architecture combines the long-term reward maximizing behaviour of RL with the generative capabilities of diffusion models. We evaluate our method in a 2D simulation environment and show that it outperforms the state-of-the-art baseline in success rate, path efficiency, and generalization across multiple environment configurations. Our results suggest that hierarchical control with generative low-level planning is a promising direction for scalable, goal-directed nonprehensile manipulation. Code, documentation, and trained models are available: https://github.com/carosteven/HeRD.
comment: 8 pages, 8 figures
Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge
The 2025 BEHAVIOR Challenge is designed to rigorously track progress toward solving long-horizon tasks by physical agents in simulated environments. BEHAVIOR-1K focuses on everyday household tasks that people most want robots to assist with and these tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications. This report presents our solution to the 2025 BEHAVIOR Challenge in a very close 2nd place and substantially outperforms the rest of the submissions. Building on $π_{0.5}$, we focus on systematically building our solution by studying the effects of training techniques and data. Through careful ablations, we show the scaling power in pre-training and post-training phases for competitive performance. We summarize our practical lessons and design recommendations that we hope will provide actionable insights for the broader embodied AI community when adapting powerful foundation models to complex embodied scenarios.
comment: preprint
C*: A Coverage Path Planning Algorithm for Unknown Environments using Rapidly Covering Graphs
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. C* is built upon the concept of a Rapidly Covering Graph (RCG), which is incrementally constructed during robot navigation via progressive sampling of the search space. By using efficient sampling and pruning techniques, the RCG is constructed to be a minimum-sufficient graph, where its nodes and edges form the potential waypoints and segments of the coverage trajectory, respectively. The RCG tracks the coverage progress, generates the coverage trajectory and helps the robot to escape from the dead-end situations. To minimize coverage time, C* produces the desired back-and-forth coverage pattern, while adapting to the TSP-based optimal coverage of local isolated regions, called coverage holes, which are surrounded by obstacles and covered regions. It is analytically proven that C* provides complete coverage of unknown environments. The algorithmic simplicity and low computational complexity of C* make it easy to implement and suitable for real-time on-board applications. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) laboratory experiments using an autonomous robot. C* yields near optimal trajectories, and a comparative evaluation with seven existing CPP methods demonstrates significant improvements in performance in terms of coverage time, number of turns, trajectory length, and overlap ratio, while preventing the formation of coverage holes. Finally, C* is comparatively evaluated on two different CPP applications using 1) energy-constrained robots and 2) multi-robot teams.
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
comment: This work has been submitted to the IEEE for possible publication
Model-free Adaptive Output Feedback Vibration Suppression in a Cantilever Beam
This paper presents a model-free adaptive control approach to suppress vibrations in a cantilevered beam excited by an unknown disturbance. The cantilevered beam under harmonic excitation is modeled using a lumped parameter approach. Based on retrospective cost optimization, a sampled-data adaptive controller is developed to suppress vibrations caused by external disturbances. Both displacement and acceleration measurements are considered for feedback. Since acceleration measurements are more sensitive to spillover, which excites higher frequency modes, a filter is developed to extract key displacement information from the acceleration data and enhance suppression performance. The vibration suppression performance is compared using both displacement and acceleration measurements.
comment: 16 pages, 14 figures, to be presented at Scitech 2026
PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration
Existing Active SLAM methodologies face issues such as slow exploration speed and suboptimal paths. To address these limitations, we propose a hybrid framework combining a Path-Uncertainty Co-Optimization Deep Reinforcement Learning framework and a Lightweight Stagnation Detection mechanism. The Path-Uncertainty Co-Optimization framework jointly optimizes travel distance and map uncertainty through a dual-objective reward function, balancing exploration and exploitation. The Lightweight Stagnation Detection reduces redundant exploration through Lidar Static Anomaly Detection and Map Update Stagnation Detection, terminating episodes on low expansion rates. Experimental results show that compared with the frontier-based method and RRT method, our approach shortens exploration time by up to 65% and reduces path distance by up to 42%, significantly improving exploration efficiency in complex environments while maintaining reliable map completeness. Ablation studies confirm that the collaborative mechanism accelerates training convergence. Empirical validation on a physical robotic platform demonstrates the algorithm's practical applicability and its successful transferability from simulation to real-world environments.
Breaking the Circle: An Autonomous Control-Switching Strategy for Stable Orographic Soaring in MAVs
Orographic soaring can significantly extend the endurance of micro aerial vehicles (MAVs), but circling behavior, arising from control conflicts between the longitudinal and vertical axes, increases energy consumption and the risk of divergence. We propose a control switching method, named SAOS: Switched Control for Autonomous Orographic Soaring, which mitigates circling behavior by selectively controlling either the horizontal or vertical axis, effectively transforming the system from underactuated to fully actuated during soaring. Additionally, the angle of attack is incorporated into the INDI controller to improve force estimation. Simulations with randomized initial positions and wind tunnel experiments on two MAVs demonstrate that the SAOS improves position convergence, reduces throttle usage, and mitigates roll oscillations caused by pitch-roll coupling. These improvements enhance energy efficiency and flight stability in constrained soaring environments.
comment: 13 pages, 15 figures
Hybrid A* Path Planning with Multi-Modal Motion Extension for Four-Wheel Steering Mobile Robots
Four-wheel independent steering (4WIS) systems provide mobile robots with a rich set of motion modes, such as Ackermann steering, lateral steering, and parallel movement, offering superior maneuverability in constrained environments. However, existing path planning methods generally assume a single kinematic model and thus fail to fully exploit the multi-modal capabilities of 4WIS platforms. To address this limitation, we propose an extended Hybrid A* framework that operates in a four-dimensional state space incorporating both spatial states and motion modes. Within this framework, we design multi-modal Reeds-Shepp curves tailored to the distinct kinematic constraints of each motion mode, develop an enhanced heuristic function that accounts for mode-switching costs, and introduce a terminal connection strategy with intelligent mode selection to ensure smooth transitions between different steering patterns. The proposed planner enables seamless integration of multiple motion modalities within a single path, significantly improving flexibility and adaptability in complex environments. Results demonstrate significantly improved planning performance for 4WIS robots in complex environments.
comment: Corrected some formatting errors
Mind to Hand: Purposeful Robotic Control via Embodied Reasoning
Humans act with context and intention, with reasoning playing a central role. While internet-scale data has enabled broad reasoning capabilities in AI systems, grounding these abilities in physical action remains a major challenge. We introduce Lumo-1, a generalist vision-language-action (VLA) model that unifies robot reasoning ("mind") with robot action ("hand"). Our approach builds upon the general multi-modal reasoning capabilities of pre-trained vision-language models (VLMs), progressively extending them to embodied reasoning and action prediction, and ultimately towards structured reasoning and reasoning-action alignment. This results in a three-stage pre-training pipeline: (1) Continued VLM pre-training on curated vision-language data to enhance embodied reasoning skills such as planning, spatial understanding, and trajectory prediction; (2) Co-training on cross-embodiment robot data alongside vision-language data; and (3) Action training with reasoning process on trajectories collected on Astribot S1, a bimanual mobile manipulator with human-like dexterity and agility. Finally, we integrate reinforcement learning to further refine reasoning-action consistency and close the loop between semantic inference and motor control. Extensive experiments demonstrate that Lumo-1 achieves significant performance improvements in embodied vision-language reasoning, a critical component for generalist robotic control. Real-world evaluations further show that Lumo-1 surpasses strong baselines across a wide range of challenging robotic tasks, with strong generalization to novel objects and environments, excelling particularly in long-horizon tasks and responding to human-natural instructions that require reasoning over strategy, concepts and space.
comment: 49 pages, 25 figures
SensHRPS: Sensing Comfortable Human-Robot Proxemics and Personal Space With Eye-Tracking
Social robots must adjust to human proxemic norms to ensure user comfort and engagement. While prior research demonstrates that eye-tracking features reliably estimate comfort in human-human interactions, their applicability to interactions with humanoid robots remains unexplored. In this study, we investigate user comfort with the robot "Ameca" across four experimentally controlled distances (0.5 m to 2.0 m) using mobile eye-tracking and subjective reporting (N=19). We evaluate multiple machine learning and deep learning models to estimate comfort based on gaze features. Contrary to previous human-human studies where Transformer models excelled, a Decision Tree classifier achieved the highest performance (F1-score = 0.73), with minimum pupil diameter identified as the most critical predictor. These findings suggest that physiological comfort thresholds in human-robot interaction differ from human-human dynamics and can be effectively modeled using interpretable logic.
Flow-Aided Flight Through Dynamic Clutters From Point To Motion
Challenges in traversing dynamic clutters lie mainly in the efficient perception of the environmental dynamics and the generation of evasive behaviors considering obstacle movement. Previous solutions have made progress in explicitly modeling the dynamic obstacle motion for avoidance, but this key dependency of decision-making is time-consuming and unreliable in highly dynamic scenarios with occlusions. On the contrary, without introducing object detection, tracking, and prediction, we empower the reinforcement learning (RL) with single LiDAR sensing to realize an autonomous flight system directly from point to motion. For exteroception, a depth sensing distance map achieving fixed-shape, low-resolution, and detail-safe is encoded from raw point clouds, and an environment change sensing point flow is adopted as motion features extracted from multi-frame observations. These two are integrated into a lightweight and easy-to-learn representation of complex dynamic environments. For action generation, the behavior of avoiding dynamic threats in advance is implicitly driven by the proposed change-aware sensing representation, where the policy optimization is indicated by the relative motion modulated distance field. With the deployment-friendly sensing simulation and dynamics model-free acceleration control, the proposed system shows a superior success rate and adaptability to alternatives, and the policy derived from the simulator can drive a real-world quadrotor with safe maneuvers.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L), November, 2025
More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery
The recent SAM 3 and SAM 3D have introduced significant advancements over the predecessor, SAM 2, particularly with the integration of language-based segmentation and enhanced 3D perception capabilities. SAM 3 supports zero-shot segmentation across a wide range of prompts, including point, bounding box, and language-based prompts, allowing for more flexible and intuitive interactions with the model. In this empirical evaluation, we assess the performance of SAM 3 in robot-assisted surgery, benchmarking its zero-shot segmentation with point and bounding box prompts and exploring its effectiveness in dynamic video tracking, alongside its newly introduced language prompt segmentation. While language prompts show potential, their performance in the surgical domain is currently suboptimal, highlighting the need for further domain-specific training. Additionally, we investigate SAM 3D's depth reconstruction abilities, demonstrating its capacity to process surgical scene data and reconstruct 3D anatomical structures from 2D images. Through comprehensive testing on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 3 shows clear improvements over SAM and SAM 2 in both image and video segmentation under spatial prompts, while the zero-shot evaluations of SAM 3D on SCARED, StereoMIS, and EndoNeRF indicate strong monocular depth estimation and realistic 3D instrument reconstruction, yet also reveal remaining limitations in complex, highly dynamic surgical scenes.
comment: Technical Report
Proportional integral derivative booster for neural networks-based time-series prediction: Case of water demand prediction
Multi-step time-series prediction is an essential supportive step for decision-makers in several industrial areas. Artificial intelligence techniques, which use a neural network component in various forms, have recently frequently been used to accomplish this step. However, the complexity of the neural network structure still stands up as a critical problem against prediction accuracy. In this paper, a method inspired by the proportional-integral-derivative (PID) control approach is investigated to enhance the performance of neural network models used for multi-step ahead prediction of periodic time-series information while maintaining a negligible impact on the complexity of the system. The PID-based method is applied to the predicted value at each time step to bring that value closer to the real value. The water demand forecasting problem is considered as a case study, where two deep neural network models from the literature are used to prove the effectiveness of the proposed boosting method. Furthermore, to prove the applicability of this PID-based booster to other types of periodic time-series prediction problems, it is applied to enhance the accuracy of a neural network model used for multi-step forecasting of hourly energy consumption. The comparison between the results of the original prediction models and the results after using the proposed technique demonstrates the superiority of the proposed method in terms of prediction accuracy and system complexity.
comment: Engineering Applications of Artificial Intelligence 2022
From the Laboratory to Real-World Application: Evaluating Zero-Shot Scene Interpretation on Edge Devices for Mobile Robotics
Video Understanding, Scene Interpretation and Commonsense Reasoning are highly challenging tasks enabling the interpretation of visual information, allowing agents to perceive, interact with and make rational decisions in its environment. Large Language Models (LLMs) and Visual Language Models (VLMs) have shown remarkable advancements in these areas in recent years, enabling domain-specific applications as well as zero-shot open vocabulary tasks, combining multiple domains. However, the required computational complexity poses challenges for their application on edge devices and in the context of Mobile Robotics, especially considering the trade-off between accuracy and inference time. In this paper, we investigate the capabilities of state-of-the-art VLMs for the task of Scene Interpretation and Action Recognition, with special regard to small VLMs capable of being deployed to edge devices in the context of Mobile Robotics. The proposed pipeline is evaluated on a diverse dataset consisting of various real-world cityscape, on-campus and indoor scenarios. The experimental evaluation discusses the potential of these small models on edge devices, with particular emphasis on challenges, weaknesses, inherent model biases and the application of the gained information. Supplementary material is provided via the following repository: https://datahub.rz.rptu.de/hstr-csrl-public/publications/scene-interpretation-on-edge-devices/
comment: 15 pages, 6 figures, 1 table; accepted for AI-2025 Forty-fifth SGAI International Conference on Artificial Intelligence CAMBRIDGE, ENGLAND 16-18 DECEMBER 2025
FICO: Finite-Horizon Closed-Loop Factorization for Unified Multi-Agent Path Finding
Multi-Agent Path Finding is a fundamental problem in robotics and AI, yet most existing formulations treat planning and execution separately and address variants of the problem in an ad hoc manner. This paper presents a system-level framework for MAPF that integrates planning and execution, generalizes across variants, and explicitly models uncertainties. At its core is the MAPF system, a formal model that casts MAPF as a control design problem encompassing classical and uncertainty-aware formulations. To solve it, we introduce Finite-Horizon Closed-Loop Factorization (FICO), a factorization-based algorithm inspired by receding-horizon control that exploits compositional structure for efficient closed-loop operation. FICO enables real-time responses -- commencing execution within milliseconds -- while scaling to thousands of agents and adapting seamlessly to execution-time uncertainties. Extensive case studies demonstrate that it reduces computation time by up to two orders of magnitude compared with open-loop baselines, while delivering significantly higher throughput under stochastic delays and agent arrivals. These results establish a principled foundation for analyzing and advancing MAPF through system-level modeling, factorization, and closed-loop design.
Risk-Bounded Multi-Agent Visual Navigation via Iterative Risk Allocation
Safe navigation is essential for autonomous systems operating in hazardous environments, especially when multiple agents must coordinate using only high-dimensional visual observations. While recent approaches successfully combine Goal-Conditioned RL (GCRL) for graph construction with Conflict-Based Search (CBS) for planning, they typically rely on static edge pruning to enforce safety. This binary strategy is overly conservative, precluding feasible missions that require traversing high-risk regions, even when the aggregate risk is acceptable. To address this, we introduce a framework for Risk-Bounded Multi-Agent Path Finding (\problem{}), where agents share a user-specified global risk budget ($Δ$). Rather than permanently discarding edges, our framework dynamically distributes per-agent risk budgets ($δ_i$) during search via an Iterative Risk Allocation (IRA) layer that integrates with a standard CBS planner. We investigate two distribution strategies: a greedy surplus-deficit scheme for rapid feasibility repair, and a market-inspired mechanism that treats risk as a priced resource to guide improved allocation. This yields a tunable trade-off wherein agents exploit available risk to secure shorter, more efficient paths, but revert to longer, safer detours under tighter budgets. Experiments in complex visual environments show that, our dynamic allocation framework achieves higher success rates than baselines and effectively leverages the available safety budget to reduce travel time.
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.
Multiagent Systems
Visual Heading Prediction for Autonomous Aerial Vehicles
The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration
Interpretation as Linear Transformation: A Cognitive-Geometric Model of Belief and Meaning
This paper develops a geometric framework for modeling belief, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a personalized value space, a vector space encoding the internal dimensions through which the agent interprets and evaluates meaning. Beliefs are formalized as structured vectors-abstract beings-whose transmission is mediated by linear interpretation maps. A belief survives communication only if it avoids the null spaces of these maps, yielding a structural criterion for intelligibility, miscommunication, and belief death. Within this framework, I show how belief distortion, motivational drift, counterfactual evaluation, and the limits of mutual understanding arise from purely algebraic constraints. A central result-"the No-Null-Space Leadership Condition"-characterizes leadership as a property of representational reachability rather than persuasion or authority. More broadly, the model explains how abstract beings can propagate, mutate, or disappear as they traverse diverse cognitive geometries. The account unifies insights from conceptual spaces, social epistemology, and AI value alignment by grounding meaning preservation in structural compatibility rather than shared information or rationality. I argue that this cognitive-geometric perspective clarifies the epistemic boundaries of influence in both human and artificial systems, and offers a general foundation for analyzing belief dynamics across heterogeneous agents.
comment: The first draft of cognitive geometry model
Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
This work presents a conceptual study on the application of Multi-Agent Reinforcement Learning (MARL) for decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for scaling studies for MARL. A robust baseline policy is proposed, which is based on restricting agent motion envelopes and applying Dijkstra's algorithm. Experimental results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but scalability issues arise as the number of agents increase.
Supporting Dynamic Agentic Workloads: How Data and Agents Interact
The rise of multi-agent systems powered by large language models (LLMs) and specialized reasoning agents exposes fundamental limitations in today's data management architectures. Traditional databases and data fabrics were designed for static, well-defined workloads, whereas agentic systems exhibit dynamic, context-driven, and collaborative behaviors. Agents continuously decompose tasks, shift attention across modalities, and share intermediate results with peers - producing non-deterministic, multi-modal workloads that strain conventional query optimizers and caching mechanisms. We propose an Agent-Centric Data Fabric, a unified architecture that rethinks how data systems serve, optimize, coordinate, and learn from agentic workloads. To achieve this we exploit the concepts of attention-guided data retrieval, semantic micro-caching for context-driven agent federations, predictive data prefetching and quorum-based data serving. Together, these mechanisms enable agents to access representative data faster and more efficiently, while reducing redundant queries, data movement, and inference load across systems. By framing data systems as adaptive collaborators, instead of static executors, we outline new research directions toward behaviorally responsive data infrastructures, where caching, probing, and orchestration jointly enable efficient, context-rich data exchange among dynamic, reasoning-driven agents.
On Mobile Ad Hoc Networks for Coverage of Partially Observable Worlds
This paper addresses the movement and placement of mobile agents to establish a communication network in initially unknown environments. We cast the problem in a computational-geometric framework by relating the coverage problem and line-of-sight constraints to the Cooperative Guard Art Gallery Problem, and introduce its partially observable variant, the Partially Observable Cooperative Guard Art Gallery Problem (POCGAGP). We then present two algorithms that solve POCGAGP: CADENCE, a centralized planner that incrementally selects 270 degree corners at which to deploy agents, and DADENCE, a decentralized scheme that coordinates agents using local information and lightweight messaging. Both approaches operate under partial observability and target simultaneous coverage and connectivity. We evaluate the methods in simulation across 1,500 test cases of varied size and structure, demonstrating consistent success in forming connected networks while covering and exploring unknown space. These results highlight the value of geometric abstractions for communication-driven exploration and show that decentralized policies are competitive with centralized performance while retaining scalability.
Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation
Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learning (MARL) often suffers from inefficient exploration and fails to scale to large scenarios. We propose PGF-MAPPO (Path-Guided Frontier MAPPO), a hierarchical framework bridging topological planning with reactive control. To resolve local minima and sparse rewards, we integrate an A*-based potential field for dense reward shaping. Furthermore, we introduce Directional Frontier Allocation, combining Farthest Point Sampling (FPS) with geometric angle suppression to enforce spatial dispersion and accelerate coverage. The architecture employs a parameter-shared decentralized critic, maintaining O(1) model complexity suitable for robotic swarms. Experiments demonstrate that PGF-MAPPO achieves superior capture efficiency against faster evaders. Policies trained on 10x10 maps exhibit robust zero-shot generalization to unseen 20x20 environments, significantly outperforming rule-based and learning-based baselines.
comment: 7 pages, 7 figures
GAIR: GUI Automation via Information-Joint Reasoning and Group Reflection
Building AI systems for GUI automation task has attracted remarkable research efforts, where MLLMs are leveraged for processing user requirements and give operations. However, GUI automation includes a wide range of tasks, from document processing to online shopping, from CAD to video editing. Diversity between particular tasks requires MLLMs for GUI automation to have heterogeneous capabilities and master multidimensional expertise, raising problems on constructing such a model. To address such challenge, we propose GAIR: GUI Automation via Information-Joint Reasoning and Group Reflection, a novel MLLM-based GUI automation agent framework designed for integrating knowledge and combining capabilities from heterogeneous models to build GUI automation agent systems with higher performance. Since different GUI-specific MLLMs are trained on different dataset and thus have different strengths, GAIR introduced a general-purpose MLLM for jointly processing the information from multiple GUI-specific models, further enhancing performance of the agent framework. The general-purpose MLLM also serves as decision maker, trying to execute a reasonable operation based on previously gathered information. When the general-purpose model thinks that there isn't sufficient information for a reasonable decision, GAIR would transit into group reflection status, where the general-purpose model would provide GUI-specific models with different instructions and hints based on their strengths and weaknesses, driving them to gather information with more significance and accuracy that can support deeper reasoning and decision. We evaluated the effectiveness and reliability of GAIR through extensive experiments on GUI benchmarks.
The Illusion of Rationality: Tacit Bias and Strategic Dominance in Frontier LLM Negotiation Games
Large language models (LLMs) are increasingly being deployed as autonomous agents on behalf of institutions and individuals in economic, political, and social settings that involve negotiation. Yet this trend carries significant risks if their strategic behavior is not well understood. In this work, we revisit the NegotiationArena framework and run controlled simulation experiments on a diverse set of frontier LLMs across three multi turn bargaining games: Buyer Seller, Multi turn Ultimatum, and Resource Exchange. We ask whether improved general reasoning capabilities lead to rational, unbiased, and convergent negotiation strategies. Our results challenge this assumption. We find that models diverge into distinct, model specific strategic equilibria rather than converging to a unified optimal behavior. Moreover, strong numerical and semantic anchoring effects persist: initial offers are highly predictive of final agreements, and models consistently generate biased proposals by collapsing diverse internal valuations into rigid, generic price points. More concerningly, we observe dominance patterns in which some models systematically achieve higher payoffs than their counterparts. These findings underscore an urgent need to develop mechanisms to mitigate these issues before deploying such systems in real-world scenarios.
comment: 9 pages, 5 figures
Emergent Collective Memory in Decentralized Multi-Agent AI Systems
We demonstrate how collective memory emerges in decentralized multi-agent systems through the interplay between individual agent memory and environmental trace communication. Our agents maintain internal memory states while depositing persistent environmental traces, creating a spatially distributed collective memory without centralized control. Comprehensive validation across five environmental conditions (20x20 to 50x50 grids, 5-20 agents, 50 runs per configuration) reveals a critical asymmetry: individual memory alone provides 68.7% performance improvement over no-memory baselines (1563.87 vs 927.23, p < 0.001), while environmental traces without memory fail completely. This demonstrates that memory functions independently but traces require cognitive infrastructure for interpretation. Systematic density-sweep experiments (rho in [0.049, 0.300], up to 625 agents) validate our theoretical phase transition prediction. On realistic large grids (30x30, 50x50), stigmergic coordination dominates above rho ~ 0.20, with traces outperforming memory by 36-41% on composite metrics despite lower food efficiency. The experimental crossover confirms the predicted critical density rho_c = 0.230 within 13% error.
comment: 23 pages, 4 figures
A Simulation Framework for Studying Recommendation-Network Co-evolution in Social Platforms
Studying how recommendation systems reshape social networks is difficult on live platforms: confounds abound, and controlled experiments risk user harm. We present an agent-based simulator where content production, tie formation, and a graph attention network (GAT) recommender co-evolve in a closed loop. We calibrate parameters using Mastodon data and validate out-of-sample against Bluesky (4--6\% error on structural metrics; 10--15\% on held-out temporal splits). Across 18 configurations at 100 agents, we find that \emph{activation timing} affects outcomes: introducing recommendations at $t=10$ vs.\ $t=40$ decreases transitivity by 10\% while engagement differs by $<$8\%. Delaying activation increases content diversity by 9\% while reducing modularity by 4\%. Scaling experiments ($n$ up to 5,000) show the effect persists but attenuates. Jacobian analysis confirms local stability under bounded reactance parameters. We release configuration schemas and reproduction scripts.
Empirical Hardness in Multi-Agent Pathfinding: Research Challenges and Opportunities AAMAS-25
Multi-agent pathfinding (MAPF) is the problem of finding collision-free paths for a team of agents on a map. Although MAPF is NP-hard, the hardness of solving individual instances varies significantly, revealing a gap between theoretical complexity and actual hardness. This paper outlines three key research challenges in MAPF empirical hardness to understand such phenomena. The first challenge, known as algorithm selection, is determining the best-performing algorithms for a given instance. The second challenge is understanding the key instance features that affect MAPF empirical hardness, such as structural properties like phase transition and backbone/backdoor. The third challenge is how to leverage our knowledge of MAPF empirical hardness to effectively generate hard MAPF instances or diverse benchmark datasets. This work establishes a foundation for future empirical hardness research and encourages deeper investigation into these promising and underexplored areas.
comment: Published in AAMAS-25
Query Optimization Beyond Data Systems: The Case for Multi-Agent Systems
The proliferation of large language models (LLMs) has accelerated the adoption of agent-based workflows, where multiple autonomous agents reason, invoke functions, and collaborate to compose complex data pipelines. However, current approaches to building such agentic architectures remain largely ad hoc, lacking generality, scalability, and systematic optimization. Existing systems often rely on fixed models and single execution engines and are unable to efficiently optimize multiple agents operating over heterogeneous data sources and query engines. This paper presents a vision for a next-generation query optimization framework tailored to multi-agent workflows. We argue that optimizing these workflows can benefit from redesigning query optimization principles to account for new challenges: orchestration of diverse agents, cost efficiency under expensive LLM calls and across heterogeneous engines, and redundancy across tasks. Led by a real-world example and building on an analysis of multi-agent workflows, we outline our envisioned architecture and the main research challenges of building a multi-agent query optimization framework, which aims at enabling automated model selection, workflow composition, and execution across heterogeneous engines. This vision establishes the groundwork for query optimization in emerging multi-agent architectures and opens up a set of future research directions.
Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning
This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its $κ_{p}$-hop neighbors, with $κ_{p}\geq 1$ denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged $Q$-function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the state-action pairs of its $κ_{p}$-hop neighbors and the rewards of its $(κ_{p}+1)$-hop neighbors. Specially, in the DSCP algorithm, we employ a geometric 2-horizon sampling method that does not require storing a full $Q$-table to obtain an unbiased estimate of the coupled policy gradient. Moreover, each agent interacts exclusively with its direct neighbors to obtain accurate policy parameters, while maintaining local estimates of other agents' parameters to execute its local policy and collect samples for optimization. These estimates and policy parameters are updated via a push-sum protocol, enabling distributed coordination of policy updates across the network. We prove that the joint policy produced by the proposed algorithm converges to a first-order stationary point of the objective function. Finally, the effectiveness of DSCP algorithm is demonstrated through simulations in a robot path planning environment, showing clear improvement over state-of-the-art methods.
Risk-Bounded Multi-Agent Visual Navigation via Iterative Risk Allocation
Safe navigation is essential for autonomous systems operating in hazardous environments, especially when multiple agents must coordinate using only high-dimensional visual observations. While recent approaches successfully combine Goal-Conditioned RL (GCRL) for graph construction with Conflict-Based Search (CBS) for planning, they typically rely on static edge pruning to enforce safety. This binary strategy is overly conservative, precluding feasible missions that require traversing high-risk regions, even when the aggregate risk is acceptable. To address this, we introduce a framework for Risk-Bounded Multi-Agent Path Finding (\problem{}), where agents share a user-specified global risk budget ($Δ$). Rather than permanently discarding edges, our framework dynamically distributes per-agent risk budgets ($δ_i$) during search via an Iterative Risk Allocation (IRA) layer that integrates with a standard CBS planner. We investigate two distribution strategies: a greedy surplus-deficit scheme for rapid feasibility repair, and a market-inspired mechanism that treats risk as a priced resource to guide improved allocation. This yields a tunable trade-off wherein agents exploit available risk to secure shorter, more efficient paths, but revert to longer, safer detours under tighter budgets. Experiments in complex visual environments show that, our dynamic allocation framework achieves higher success rates than baselines and effectively leverages the available safety budget to reduce travel time.
Systems and Control (EESS)
Link-Sharing Backpressure Routing In Wireless Multi-Hop Networks ICASSP 2026
Backpressure (BP) routing and scheduling is an established resource allocation method for wireless multi-hop networks, noted for its fully distributed operation and maximum queue stability. Recent advances in shortest path-biased BP routing (SP-BP) mitigate shortcomings such as slow startup and random walks, yet exclusive link-level commodity selection still causes last-packet problem and bandwidth underutilization. By revisiting the Lyapunov drift theory underlying BP, we show that the legacy exclusive commodity selection is unnecessary, and propose a Maximum Utility (MaxU) link-sharing method to expand its performance envelope without increasing control message overhead. Numerical results show that MaxU SP-BP substantially mitigates the last-packet problem and slightly expands the network capacity region.
comment: 5 pages, 5 figures, submitted to IEEE ICASSP 2026
Visual Heading Prediction for Autonomous Aerial Vehicles
The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as search and rescue, environmental monitoring, and logistics. However, precise coordination between these platforms in real-time scenarios presents major challenges, particularly when external localization infrastructure such as GPS or GNSS is unavailable or degraded [1]. This paper proposes a vision-based, data-driven framework for real-time UAV-UGV integration, with a focus on robust UGV detection and heading angle prediction for navigation and coordination. The system employs a fine-tuned YOLOv5 model to detect UGVs and extract bounding box features, which are then used by a lightweight artificial neural network (ANN) to estimate the UAV's required heading angle. A VICON motion capture system was used to generate ground-truth data during training, resulting in a dataset of over 13,000 annotated images collected in a controlled lab environment. The trained ANN achieves a mean absolute error of 0.1506° and a root mean squared error of 0.1957°, offering accurate heading angle predictions using only monocular camera inputs. Experimental evaluations achieve 95% accuracy in UGV detection. This work contributes a vision-based, infrastructure- independent solution that demonstrates strong potential for deployment in GPS/GNSS-denied environments, supporting reliable multi-agent coordination under realistic dynamic conditions. A demonstration video showcasing the system's real-time performance, including UGV detection, heading angle prediction, and UAV alignment under dynamic conditions, is available at: https://github.com/Kooroshraf/UAV-UGV-Integration
Resilient Neural-Variable-Structure Consensus Control for Nonlinear MASs with Singular Input Gain Under DoS Attacks
This paper proposes a reliable learning-based adaptive control framework for nonlinear multi-agent systems (MASs) subject to Denial-of-Service (DoS) attacks and singular control gains, two critical challenges in cyber-physical systems. A neural-variable-structure adaptive controller is developed to achieve leader-follower consensus while ensuring robustness to external disturbances and adaptability to unknown nonlinear dynamics. A reliability-assessment rule is introduced to detect communication loss during DoS attacks, upon which a switched control mechanism is activated to preserve closed-loop stability and performance. Unlike existing resilient MAS control methods, the proposed strategy explicitly accommodates singular control gains and does not rely on restrictive assumptions such as Lipschitz continuity or prior bounds on nonlinearities. To the authors' knowledge, this is the first work to integrate neural learning, variable-structure robustness, and reliability-based switching into a unified consensus-tracking control architecture for heterogeneous nonlinear MASs with singular input gains under DoS attacks. Lyapunov-based analysis establishes uniform ultimate boundedness of all closed-loop signals, and Matlab/Simulink simulations on a connected automated vehicle platoon demonstrate the method's effectiveness and resilience.
comment: 15 pages, 9 figures. Submitted to Nonlinear Dynamics
Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies
This work presents a conceptual study on the application of Multi-Agent Reinforcement Learning (MARL) for decentralized control of unmanned aerial vehicles to relay a critical data package to a known position. For this purpose, a family of deterministic games is introduced, designed for scaling studies for MARL. A robust baseline policy is proposed, which is based on restricting agent motion envelopes and applying Dijkstra's algorithm. Experimental results show that two off-the-shelf MARL algorithms perform competitively with the baseline for a small number of agents, but scalability issues arise as the number of agents increase.
An Automated Tip-and-Cue Framework for Optimized Satellite Tasking and Visual Intelligence
The proliferation of satellite constellations, coupled with reduced tasking latency and diverse sensor capabilities, has expanded the opportunities for automated Earth observation. This paper introduces a fully automated Tip-and-Cue framework designed for satellite imaging tasking and scheduling. In this context, tips are generated from external data sources or analyses of prior satellite imagery, identifying spatiotemporal targets and prioritizing them for downstream planning. Corresponding cues are the imaging tasks formulated in response, which incorporate sensor constraints, timing requirements, and utility functions. The system autonomously generates candidate tasks, optimizes their scheduling across multiple satellites using continuous utility functions that reflect the expected value of each observation, and processes the resulting imagery using artificial-intelligence-based models, including object detectors and vision-language models. Structured visual reports are generated to support both interpretability and the identification of new insights for downstream tasking. The efficacy of the framework is demonstrated through a maritime vessel tracking scenario, utilizing Automatic Identification System (AIS) data for trajectory prediction, targeted observations, and the generation of actionable outputs. Maritime vessel tracking is a widely researched application, often used to benchmark novel approaches to satellite tasking, forecasting, and analysis. The system is extensible to broader applications such as smart-city monitoring and disaster response, where timely tasking and automated analysis are critical.
comment: Under review at IEEE Transactions on Geoscience and Remote Sensing (TGRS). 13 pages, 8 figures
Adaptive Optimal Control for Avatar-Guided Motor Rehabilitation in Virtual Reality
A control-theoretic framework for autonomous avatar-guided rehabilitation in virtual reality, based on interpretable, adaptive motor guidance through optimal control, is presented. The framework faces critical challenges in motor rehabilitation due to accessibility, cost, and continuity of care, with over 50% of patients inability to attend regular clinic sessions. The system enables post-stroke patients to undergo personalized therapy in immersive virtual reality at home, while being monitored by clinicians. The core is a nonlinear, human-in-the-loop control strategy, where the avatar adapts in real time to the patient's performance. Balance between following the patient's movements and guiding them to ideal kinematic profiles based on the Hogan minimum-jerk model is achieved through multi-objective optimal control. A data-driven "ability index" uses smoothness metrics to dynamically adjust control gains according to the patient's progress. The system was validated through simulations and preliminary trials, and shows potential for delivering adaptive, engaging and scalable remote physiotherapy guided by interpretable control-theoretic principles.
RIS-Assisted Coordinated Multi-Point ISAC for Low-Altitude Sensing Coverage
The low-altitude economy (LAE) has emerged and developed in various fields, which has gained considerable interest. To ensure the security of LAE, it is essential to establish a proper sensing coverage scheme for monitoring the unauthorized targets. Introducing integrated sensing and communication (ISAC) into cellular networks is a promising solution that enables coordinated multiple base stations (BSs) to significantly enhance sensing performance and extend coverage. Meanwhile, deploying a reconfigurable intelligent surface (RIS) can mitigate signal blockages between BSs and low-altitude targets in urban areas. Therefore, this paper focuses on the low-altitude sensing coverage problem in RIS-assisted coordinated multi-point ISAC networks, where a RIS is employed to enable multiple BSs to sense a prescribed region while serving multiple communication users. A joint beamforming and phase shifts design is proposed to minimize the total transmit power while guaranteeing sensing signal-to-noise ratio and communication spectral efficiency. To tackle this non-convex optimization problem, an efficient algorithm is proposed by using the alternating optimization and semi-definite relaxation techniques. Numerical results demonstrate the superiority of our proposed scheme over the baseline schemes.
Real-Time Non-Smooth MPC for Switching Systems: Application to a Three-Tank Process
Real-time model predictive control of non-smooth switching systems remains challenging due to discontinuities and the presence of discrete modes, which complicate numerical integration and optimization. This paper presents a real-time feasible non-smooth model predictive control scheme for a physical three-tank process, implemented without mixed-integer formulations. The approach combines Filippov system modeling with finite elements and switch detection for time discretization, leading to a finite-dimensional optimal control problem formulated as a mathematical program with complementarity constraints. The mathematical program is solved via a homotopy of smooth nonlinear programs. We introduce modeling adjustments that make the three-tank dynamics numerically tractable, including additional modes to avoid non-Lipschitz points and undefined function values. Hardware experiments demonstrate efficient handling of switching events, mode-consistent tracking across reference changes, correct boundary handling, and constraint satisfaction. Furthermore, we investigate the impact of model mismatch and show that the tracking performance and computation times remain within real-time limits for the chosen sampling time. The complete controller is implemented using the non-smooth optimal control framework NOSNOC
comment: Submitted to European Control Conference 2026 (ECC26)
Instantaneous Complex Phase and Frequency: Conceptual Clarification and Equivalence between Formulations
This letter seeks to clarify the different existing definitions of both instantaneous complex phase and frequency as well as their equivalence when specific hypotheses hold. To achieve this, the two fundamental definitions, i.e., those based on either the use of (i) analytic signals or (ii) space vectors, together with the premises used for their formulation, are presented and their relationship shown. Lastly, an unified notation and terminology to avoid confusion is proposed.
A tensor phase theory with applications in multilinear control
The purpose of this paper is to initiate a phase theory for tensors under the Einstein product, and explore its applications in multilinear control systems. Firstly, the sectorial tensor decomposition for sectorial tensors is derived, which allows us to define phases for sectorial tensors. A numerical procedure for computing phases of a sectorial tensor is also proposed. Secondly, the maximin and minimax expressions for tensor phases are given, which are used to quantify how close the phases of a sectorial tensor are to those of its compressions. Thirdly, the compound spectrum, compound numerical ranges and compound angular numerical ranges of two sectorial tensors $A,B$ are defined and characterized in terms of the compound numerical ranges and compound angular numerical ranges of the sectorial tensors $A,B$. Fourthly, it is shown that the angles of eigenvalues of the product of two sectorial tensors are upper bounded by the sum of their individual phases. Finally, based on the tensor phase theory developed above, a tensor version of the small phase theorem is presented, which can be regarded as a natural generalization of the matrix case, recently proposed in Ref. [10]. The results offer powerful new tools for the stability and robustness analysis of multilinear feedback control systems.
comment: 16 pages, 1 figure. Comments are welcome
Personalized Building Climate Control with Contextual Preferential Bayesian Optimization
Efficient tuning of building climate controllers to optimize occupant utility is essential for ensuring overall comfort and satisfaction. However, this is a challenging task since the latent utility are difficult to measure directly. Time-varying contextual factors, such as outdoor temperature, further complicate the problem. To address these challenges, we propose a contextual preferential Bayesian optimization algorithm that leverages binary preference feedback together with contextual information to enable efficient real-time controller tuning. We validate the approach by tuning an economic MPC controller on BOPTEST, a high-fidelity building simulation platform. Over a two-month simulation period, our method outperforms the baseline controller and achieves an improvement of up to 23% in utility. Moreover, for different occupant types, we demonstrate that the algorithm automatically adapts to individual preferences, enabling personalized controller tuning.
Power Control of Multi-Layer Repeater Networks (POLARNet)
In this letter we introduce POLARNet -- power control of multi-layer repeater networks -- for local optimization of SNR given different repeater power constraints. We assume relays or repeaters in groups or layers spatially separated. Under ideal circumstances SISO narrow-band communication and TDD, the system may be viewed as a dual to a deep neural network, where activations, corresponding to repeater amplifications, are optimized and weight matrices, corresponding to channel matrices, are static. Repeater amplifications are locally optimized layer-by-layer in a forward-backward manner over compact sets. The method is applicable for a wide range of constraints on within-layer power/energy utilization, is furthermore gradient-free, step-size-free, and has proven monotonicity in the objective. Numerical simulations show significant improvement compared to upper bounds on the expected SNR. In addition, power distribution over multiple repeaters is shown to be superior to optimal selection of single repeaters in the layers.
comment: This work has been submitted to the IEEE for possible publication
Time-Discretized Simulation of Vehicle Platoons for Safety Analysis with Guaranteed Error Bounds SC
Wireless communication is essential to achieve coordinated control in vehicle platoons. However, packet losses in wireless communication can cause critical safety issues when they occur in conjunction with sudden brakes. In this paper, we propose simulation-based methods that allow the study of such safety issues by determining the absolute minimum distance between vehicles over time for various control parameters that guarantee string stability. For our proposed time-discretized simulations, we provide two methods for selecting different time-step intervals to ensure that the error in distance approximation remains within specified bounds at all times. Through numerical examples we demonstrate that among control parameters that guarantee string stability some perform better than others under simultaneously occurring packet losses and sudden brakes.
comment: 9 pages, 5 figures. Journal version of Y. Chen, A. Cetinkaya, "Time-discretized simulation approach for safety analysis of vehicle platoons under sudden braking and communication losses", SICE International Symposium on Control Systems (ISCS), 2025
Intelligent Resilience Testing for Decision-Making Agents with Dual-Mode Surrogate Adaptation
Testing and evaluating decision-making agents remains challenging due to unknown system architectures, limited access to internal states, and the vastness of high-dimensional scenario spaces. Existing testing approaches often rely on surrogate models of decision-making agents to generate large-scale scenario libraries; however, discrepancies between surrogate models and real decision-making agents significantly limit their generalizability and practical applicability. To address this challenge, this paper proposes intelligent resilience testing (IRTest), a unified online adaptive testing framework designed to rapidly adjust to diverse decision-making agents. IRTest initializes with an offline-trained surrogate prediction model and progressively reduces surrogate-to-real gap during testing through two complementary adaptation mechanisms: (i) online neural fine-tuning in data-rich regimes, and (ii) lightweight importance-sampling-based weighting correction in data-limited regimes. A Bayesian optimization strategy, equipped with bias-corrected acquisition functions, guides scenario generation to balance exploration and exploitation in complex testing spaces. Extensive experiments across varying levels of task complexity and system heterogeneity demonstrate that IRTest consistently improves failure-discovery efficiency, testing robustness, and cross-system generalizability. These results highlight the potential of IRTest as a practical solution for scalable, adaptive, and resilient testing of decision-making agents.
comment: 14 pages, 7 figures
Analysis of Frequency and Voltage Strength in Power Electronics-Dominated Power Systems Based on Eigen-subsystems
The large-scale integration of inverter-based resources (IBRs) has deteriorated the frequency/voltage (F/V) responses of power systems, leading to a higher risk of instability. Consequently, evaluating the F/V strength has become an important task in power electronics (PE)-dominated power systems. Existing methods typically examine F/V strength separately, employing fundamentally different metrics, such as inertia (focusing on device dynamics) and short-circuit ratio (SCR, addressing network characteristics). These fragmented approaches have resulted in a lack of comprehensive understanding of the overall system strength, potentially overlooking critical aspects. To address this problem, this paper proposes a unified framework for analyzing F/V strength. First, a unified modeling of F/V regulations is introduced. Then, based on modal decoupling, the power systems are decomposed into several eigen-subsystems, where the F/V responses are both decomposed into common-mode (CM) and differential-mode (DM) components, namely, CM-F, DM-F, CM-V, and DM-V. The CM-F and CM-V represent the collective response of all devices to external active or reactive power disturbances, independent of the power network characteristics. In contrast, the DM-F and DM-V capture the redistribution of disturbance power within the system, which is strongly influenced by the network topology and the locations of devices. Notably, traditional strength analysis generally ignores the CM-V (global voltage response), which, as discovered in this paper, may also become unstable in PE-dominated power systems. Based on the proposed framework, new metrics are proposed to evaluate the strength of each modal component. Finally, the effectiveness of the proposed approach is validated through simulations.
Electric Arc Furnaces Scheduling under Electricity Price Volatility with Reinforcement Learning
This paper proposes a reinforcement learning-based framework for optimizing the operation of electric arc furnaces (EAFs) under volatile electricity prices. We formulate the deterministic version of the EAF scheduling problem into a mixed-integer linear programming (MILP) formulation, and then develop a Q-learning algorithm to perform real-time control of multiple EAF units under real-time price volatility and shared feeding capacity constraints. We design a custom reward function for the Q-learning algorithm to smooth the start-up penalties of the EAFs. Using real data from EAF designs and electricity prices in New York State, we benchmark our algorithm against a baseline rule-based controller and a MILP benchmark, assuming perfect price forecasts. The results show that our reinforcement learning algorithm achieves around 90% of the profit compared to the perfect MILP benchmark in various single-unit and multi-unit cases under a non-anticipatory control setting.
comment: 10 pages, 5 figures,7 tables
MPC for momentum counter-balanced and zero-impulse contact with a free-spinning satellite
In on-orbit robotics, a servicer satellite's ability to make contact with a free-spinning target satellite is essential to completing most on-orbit servicing (OOS) tasks. This manuscript develops a nonlinear model predictive control (MPC) framework that generates feasible controls for a servicer satellite to achieve zero-impulse contact with a free-spinning target satellite. The overall maneuver requires coordination between two separately actuated modules of the servicer satellite: (1) a moment generation module and (2) a manipulation module. We apply MPC to control both modules by explicitly modeling the cross-coupling dynamics between them. We demonstrate that the MPC controller can enforce actuation and state constraints that prior control approaches could not account for. We evaluate the performance of the MPC controller by simulating zero-impulse contact scenarios with a free-spinning target satellite via numerical Monte Carlo (MC) trials and comparing the simulation results with prior control approaches. Our simulation results validate the effectiveness of the MPC controller in maintaining spin synchronization and zero-impulse contact under operation constraints, moving contact location, and observation and actuation noise.
comment: 21 pages, 4 figures, 5 tables, submission for AIAA SciTech 2026 conference
Explicit Control Barrier Function-based Safety Filters and their Resource-Aware Computation
This paper studies the efficient implementation of safety filters that are designed using control barrier functions (CBFs), which minimally modify a nominal controller to render it safe with respect to a prescribed set of states. Although CBF-based safety filters are often implemented by solving a quadratic program (QP) in real time, the use of off-the-shelf solvers for such optimization problems poses a challenge in applications where control actions need to be computed efficiently at very high frequencies. In this paper, we introduce a closed-form expression for controllers obtained through CBF-based safety filters. This expression is obtained by partitioning the state-space into different regions, with a different closed-form solution in each region. We leverage this formula to introduce a resource-aware implementation of CBF-based safety filters that detects changes in the partition region and uses the closed-form expression between changes. We showcase the applicability of our approach in examples ranging from aerospace control to safe reinforcement learning.
CHyLL: Learning Continuous Neural Representations of Hybrid Systems
Learning the flows of hybrid systems that have both continuous and discrete time dynamics is challenging. The existing method learns the dynamics in each discrete mode, which suffers from the combination of mode switching and discontinuities in the flows. In this work, we propose CHyLL (Continuous Hybrid System Learning in Latent Space), which learns a continuous neural representation of a hybrid system without trajectory segmentation, event functions, or mode switching. The key insight of CHyLL is that the reset map glues the state space at the guard surface, reformulating the state space as a piecewise smooth quotient manifold where the flow becomes spatially continuous. Building upon these insights and the embedding theorems grounded in differential topology, CHyLL concurrently learns a singularity-free neural embedding in a higher-dimensional space and the continuous flow in it. We showcase that CHyLL can accurately predict the flow of hybrid systems with superior accuracy and identify the topological invariants of the hybrid systems. Finally, we apply CHyLL to the stochastic optimal control problem.
Fuzzy Hierarchical Multiplex
A new fuzzy optimization framework that extends FCM causality is proposed. This model utilizes the dynamics to map data into metrics and create a framework that examines logical implication and hierarchy of concepts using a multiplex. Moreover, this is a white-theoretical paper introducing the framework and analyzing the logic and math behind it. Upon this extension the main objectives and the orientation of this framework is expounded and exemplified; this framework is meant for service optimization of information transmission in service process design. Lastly, a thorough analysis of the FHM is included which is done following the logical steps in a simple and elegant manner.
comment: 11 pages, 2 figures, 1 double figure, 1 table, 12 references. This will be part of my PhD dissertation and it is a White paper-theoretical framewor. As is, it s meant for a basis that will be later used to further developed an FHM. It might not be math-logic related and I am willing to change it, I just felt that it belonged to mathematical modeling. Yours truly, AK
C*: A Coverage Path Planning Algorithm for Unknown Environments using Rapidly Covering Graphs
The paper presents a novel sample-based algorithm, called C*, for real-time coverage path planning (CPP) of unknown environments. C* is built upon the concept of a Rapidly Covering Graph (RCG), which is incrementally constructed during robot navigation via progressive sampling of the search space. By using efficient sampling and pruning techniques, the RCG is constructed to be a minimum-sufficient graph, where its nodes and edges form the potential waypoints and segments of the coverage trajectory, respectively. The RCG tracks the coverage progress, generates the coverage trajectory and helps the robot to escape from the dead-end situations. To minimize coverage time, C* produces the desired back-and-forth coverage pattern, while adapting to the TSP-based optimal coverage of local isolated regions, called coverage holes, which are surrounded by obstacles and covered regions. It is analytically proven that C* provides complete coverage of unknown environments. The algorithmic simplicity and low computational complexity of C* make it easy to implement and suitable for real-time on-board applications. The performance of C* is validated by 1) extensive high-fidelity simulations and 2) laboratory experiments using an autonomous robot. C* yields near optimal trajectories, and a comparative evaluation with seven existing CPP methods demonstrates significant improvements in performance in terms of coverage time, number of turns, trajectory length, and overlap ratio, while preventing the formation of coverage holes. Finally, C* is comparatively evaluated on two different CPP applications using 1) energy-constrained robots and 2) multi-robot teams.
Control Node Placement and Structural Controllability of Water Quality Dynamics in Drinking Networks
Chlorine, the most widely used disinfectant, needs to be adequately distributed in water distribution networks (WDNs) to maintain consistent residual levels and ensure safe water. This is performed through control node injections at the treatment plant via booster stations distributed across the WDNs. While previous studies have applied various optimization-based approaches for booster station placement, many have failed to consider the coverage of the station injections and the dynamic nature of WDNs. In particular, variations in hydraulics and demand significantly impact the reachability and efficacy of chlorine injections which then impact optimal placement of booster stations. This study introduces a novel formulation that combines control- and graph-theoretic approaches to solve the booster station placement problem. Unlike traditional methods, our approach emphasizes maximizing the system's ability to control disinfectant levels with minimal control energy, taking into account the time-varying hydraulic profiles that lead to different optimal station placements. We propose a simple weighting technique to determine the placements by assessing the structural controllability of each configuration, based on the network's topology, independent of specific parameters like decay rates or pipe roughness. This method ensures effective chlorine coverage across the network. Our approach is validated on different networks, demonstrating its operational effectiveness, scalability, and practicality.
InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Power Grid Control
The transition toward power grids with high renewable penetration demands context-aware decision making frameworks. Traditional operational paradigms, which rely on static optimization of history-based load forecasting, often fail to capture the complex nature of real-time operational conditions, such as operator-issued maintenance mandates, emergency topology changes, or event-driven load surges. To address this challenge, we introduce InstructMPC, a closed-loop framework that integrates Large Language Models (LLMs) to generate context-aware predictions, enabling the controller to optimize power system operation. Our method employs a Contextual Disturbances Predictor (CDP) module to translate contextual information into predictive disturbance trajectories, which are then incorporated into the Model Predictive Control (MPC) optimization. Unlike conventional open-loop forecasting frameworks, InstructMPC features an online tuning mechanism where the predictor's parameters are continuously updated based on the realized control cost with a theoretical guarantee, achieving a regret bound of $O(\sqrt{T \log T})$ for linear dynamics when optimized via a tailored loss function, ensuring task-aware learning and adaption to non-stationary grid conditions.
Model-free Adaptive Output Feedback Vibration Suppression in a Cantilever Beam
This paper presents a model-free adaptive control approach to suppress vibrations in a cantilevered beam excited by an unknown disturbance. The cantilevered beam under harmonic excitation is modeled using a lumped parameter approach. Based on retrospective cost optimization, a sampled-data adaptive controller is developed to suppress vibrations caused by external disturbances. Both displacement and acceleration measurements are considered for feedback. Since acceleration measurements are more sensitive to spillover, which excites higher frequency modes, a filter is developed to extract key displacement information from the acceleration data and enhance suppression performance. The vibration suppression performance is compared using both displacement and acceleration measurements.
comment: 16 pages, 14 figures, to be presented at Scitech 2026
No-Regret Learning in Stackelberg Games with an Application to Electric Ride-Hailing
We consider the problem of efficiently learning to play single-leader multi-follower Stackelberg games when the leader lacks knowledge of the lower-level game. Such games arise in hierarchical decision-making problems involving self-interested agents. For example, in electric ride-hailing markets, a central authority aims to learn optimal charging prices to shape fleet distributions and charging patterns of ride-hailing companies. Existing works typically apply gradient-based methods to find the leader's optimal strategy. Such methods are impractical as they require that the followers share private utility information with the leader. Instead, we treat the lower-level game as a black box, assuming only that the followers' interactions approximate a Nash equilibrium while the leader observes the realized cost of the resulting approximation. Under kernel-based regularity assumptions on the leader's cost function, we develop a no-regret algorithm that converges to an $ε$-Stackelberg equilibrium in $O(\sqrt{T})$ rounds. Finally, we validate our approach through a numerical case study on optimal pricing in electric ride-hailing markets.
comment: 8 pages, 2 figures, 1 table
Hybrid A* Path Planning with Multi-Modal Motion Extension for Four-Wheel Steering Mobile Robots
Four-wheel independent steering (4WIS) systems provide mobile robots with a rich set of motion modes, such as Ackermann steering, lateral steering, and parallel movement, offering superior maneuverability in constrained environments. However, existing path planning methods generally assume a single kinematic model and thus fail to fully exploit the multi-modal capabilities of 4WIS platforms. To address this limitation, we propose an extended Hybrid A* framework that operates in a four-dimensional state space incorporating both spatial states and motion modes. Within this framework, we design multi-modal Reeds-Shepp curves tailored to the distinct kinematic constraints of each motion mode, develop an enhanced heuristic function that accounts for mode-switching costs, and introduce a terminal connection strategy with intelligent mode selection to ensure smooth transitions between different steering patterns. The proposed planner enables seamless integration of multiple motion modalities within a single path, significantly improving flexibility and adaptability in complex environments. Results demonstrate significantly improved planning performance for 4WIS robots in complex environments.
comment: Corrected some formatting errors
CARTOS: A Charging-Aware Real-Time Operating System for Intermittent Batteryless Devices
This paper presents CARTOS, a charging-aware real-time operating system designed to enhance the functionality of intermittently-powered batteryless devices (IPDs) for various Internet of Things (IoT) applications. While IPDs offer significant advantages such as extended lifespan and operability in extreme environments, they pose unique challenges, including the need to ensure forward progress of program execution amidst variable energy availability and maintaining reliable real-time time behavior during power disruptions. To address these challenges, CARTOS introduces a mixed-preemption scheduling model that classifies tasks into computational and peripheral tasks, and ensures their efficient and timely execution by adopting just-in-time checkpointing for divisible computation tasks and uninterrupted execution for indivisible peripheral tasks. CARTOS also supports processing chains of tasks with precedence constraints and adapts its scheduling in response to environmental changes to offer continuous execution under diverse conditions. CARTOS is implemented with new APIs and components added to FreeRTOS but is designed for portability to other embedded RTOSs. Through real hardware experiments and simulations, CARTOS exhibits superior performance over state-of-the-art methods, demonstrating that it can serve as a practical platform for developing resilient, real-time sensing applications on IPDs.
comment: To be published in IEEE Transactions on Emerging Topics in Computing
System Identification Under Multi-rate Sensing Environment
This paper proposes a system identification algorithm for systems with multi-rate sensors in a discrete-time framework. It is challenging to obtain an accurate mathematical model when the ratios of inputs and outputs are different in the system. A cyclic reformulation-based model for multi-rate systems is formulated, and the multi-rate system can be reduced to a linear time-invariant system to derive the model under the multi-rate sensing environment. The proposed algorithm integrates a cyclic reformulation with a state coordinate transformation of the cycled system to enable precise identification of systems under the multi-rate sensing environment. The effectiveness of the proposed system identification method is demonstrated using numerical simulations.
No-Regret Model Predictive Control with Online Learning of Koopman Operators
We study a problem of simultaneous system identification and model predictive control of nonlinear systems. Particularly, we provide an algorithm for systems with unknown residual dynamics that can be expressed by Koopman operators. Such residual dynamics can model external disturbances and modeling errors, such as wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal non-causal controller. Specifically, the algorithm enjoys sublinear \textit{dynamic regret}, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown dynamics will adapt to its states and actions. To this end, we assume the algorithm is given Koopman observable functions such that the unknown dynamics can be approximated by a linear dynamical system. Then, it employs model predictive control based on the current learned model of the unknown residual dynamics. This model is updated online using least squares in a self-supervised manner based on the data collected while controlling the system. We validate our algorithm in physics-based simulations of a cart-pole system aiming to maintain the pole upright despite inaccurate model parameters.
comment: ACC 2025
Distributionally Robust Kalman Filter
In this work, we propose a noise-centric formulation of the distributionally robust Kalman filter (DRKF) for discrete-time linear stochastic systems with uncertain noise statistics. By placing Wasserstein ambiguity sets directly on the process and measurement noise distributions, the proposed DRKF preserves the analytical structure of the classical Kalman filter while providing a priori spectral bounds on all feasible covariances. In the time-invariant setting, we derive a steady-state DRKF from a single stationary semidefinite program, yielding a constant-gain estimator with the same per-step computational complexity as the standard Kalman filter. We establish conditions guaranteeing the existence, uniqueness, and convergence of this steady-state solution, and we prove its asymptotic minimax optimality with respect to the worst-case mean-square error. Numerical experiments validate the theory and demonstrate that the proposed DRKF improves estimation accuracy under unknown or uncertain noise models while offering computational advantages over existing robust and distributionally robust filters.
Long-Duration Station-Keeping Strategy for Cislunar Spacecraft Formations
This paper demonstrates a novel guidance and control strategy for cislunar near-rectilinear halo orbit formation-keeping applied to high-fidelity dynamics. Bounded relative motion is constructed about long-duration ephemeris trajectories with osculating invariant circles to form quasi-periodic relative orbits. State-of-the-art absolute control strategies are paired with a simple and effective relative control feedback law. Finally, a control barrier function is implemented to ensure recursively passively-safe bounded relative motion under feedback in the presence of possible missed maneuver events for the duration of the formation flight. The strategy is verified in high-fidelity simulation environments through Monte Carlo trials.
Consensus over Clustered Networks Using Intermittent and Asynchronous Output Feedback
Distributed consensus protocols provide a mechanism for spreading information within clustered networks, allowing agents and clusters to make decisions without requiring direct access to the state of the ensemble. In this work, we propose a strategy for achieving system-wide consensus in the states of identical linear time-invariant systems coupled by an undirected graph whose directed sub-graphs are available only at sporadic times. Within this work, the agents of the network are organized into pairwise disjoint clusters, which induce sub-graphs of the undirected parent graph. Some cluster sub-graph pairs are linked by an inter-cluster sub-graph, where the union of all cluster and inter-cluster sub-graphs yields the undirected parent graph. Each agent utilizes a distributed consensus protocol with components that are updated intermittently and asynchronously with respect to other agents and inter-clusters. The closed-loop ensemble dynamics is modeled as a hybrid system, and a Lyapunov-based stability analysis yields sufficient conditions for rendering the agreement subspace (consensus set) globally exponentially stable. Furthermore, an input-to-state stability argument demonstrates the consensus set is robust to a large class of perturbations. A numerical simulation considering both nominal and perturbed scenarios is provided for validation purposes.
Robotics
OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer
Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introduce OSMO, an open-source wearable tactile glove designed for human-to-robot skill transfer. The glove features 12 three-axis tactile sensors across the fingertips and palm and is designed to be compatible with state-of-the-art hand-tracking methods for in-the-wild data collection. We demonstrate that a robot policy trained exclusively on human demonstrations collected with OSMO, without any real robot data, is capable of executing a challenging contact-rich manipulation task. By equipping both the human and the robot with the same glove, OSMO minimizes the visual and tactile embodiment gap, enabling the transfer of continuous shear and normal force feedback while avoiding the need for image inpainting or other vision-based force inference. On a real-world wiping task requiring sustained contact pressure, our tactile-aware policy achieves a 72% success rate, outperforming vision-only baselines by eliminating contact-related failure modes. We release complete hardware designs, firmware, and assembly instructions to support community adoption.
comment: Project website: https://jessicayin.github.io/osmo_tactile_glove/
LiDAS: Lighting-driven Dynamic Active Sensing for Nighttime Perception
Nighttime environments pose significant challenges for camera-based perception, as existing methods passively rely on the scene lighting. We introduce Lighting-driven Dynamic Active Sensing (LiDAS), a closed-loop active illumination system that combines off-the-shelf visual perception models with high-definition headlights. Rather than uniformly brightening the scene, LiDAS dynamically predicts an optimal illumination field that maximizes downstream perception performance, i.e., decreasing light on empty areas to reallocate it on object regions. LiDAS enables zero-shot nighttime generalization of daytime-trained models through adaptive illumination control. Trained on synthetic data and deployed zero-shot in real-world closed-loop driving scenarios, LiDAS enables +18.7% mAP50 and +5.0% mIoU over standard low-beam at equal power. It maintains performances while reducing energy use by 40%. LiDAS complements domain-generalization methods, further strengthening robustness without retraining. By turning readily available headlights into active vision actuators, LiDAS offers a cost-effective solution to robust nighttime perception.
comment: Preprint. 12 pages, 9 figures. Project page: https://simondemoreau.github.io/LiDAS/
Accelerated Rotation-Invariant Convolution for UAV Image Segmentation
Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segmentation architectures like U-Net rely on convolution operators that are not rotation-invariant, leading to degraded segmentation accuracy across varying viewpoints. Rotation invariance can be achieved by expanding the filter bank across multiple orientations; however, this will significantly increase computational cost and memory traffic. In this paper, we introduce a GPU-optimized rotation-invariant convolution framework that eliminates the traditional data-lowering (im2col) step required for matrix-multiplication-based convolution. By exploiting structured data sharing among symmetrically rotated filters, our method achieves multi-orientation convolution with greatly reduced memory traffic and computational redundancy. We further generalize the approach to accelerate convolution with arbitrary (non-symmetric) rotation angles. Across extensive benchmarks, the proposed convolution achieves 20--55% faster training and 15--45% lower energy consumption than CUDNN, while maintaining accuracy comparable to state-of-the-art rotation-invariant methods. In the eight-orientation setting, our approach achieves up to 45% speedup and 41% energy savings on 256\(\times\)256 inputs, and 32% speedup and 23% lower energy usage on 1024\(\times\)1024 inputs. Integrated into a U-Net segmentation model, the framework yields up to 6% improvement in accuracy over the non-rotation-aware baseline. These results demonstrate that the proposed method provides an effective and highly efficient alternative to existing rotation-invariant CNN frameworks.
IPPO Learns the Game, Not the Team: A Study on Generalization in Heterogeneous Agent Teams
Multi-Agent Reinforcement Learning (MARL) is commonly deployed in settings where agents are trained via self-play with homogeneous teammates, often using parameter sharing and a single policy architecture. This opens the question: to what extent do self-play PPO agents learn general coordination strategies grounded in the underlying game, compared to overfitting to their training partners' behaviors? This paper investigates the question using the Heterogeneous Multi-Agent Challenge (HeMAC) environment, which features distinct Observer and Drone agents with complementary capabilities. We introduce Rotating Policy Training (RPT), an approach that rotates heterogeneous teammate policies of different learning algorithms during training, to expose the agent to a broader range of partner strategies. When playing alongside a withheld teammate policy (DDQN), we find that RPT achieves similar performance to a standard self-play baseline, IPPO, where all agents were trained sharing a single PPO policy. This result indicates that in this heterogeneous multi-agent setting, the IPPO baseline generalizes to novel teammate algorithms despite not experiencing teammate diversity during training. This shows that a simple IPPO baseline may possess the level of generalization to novel teammates that a diverse training regimen was designed to achieve.
comment: 4 pages, 3 figures, appendix
Heterogeneity in Multi-Robot Environmental Monitoring for Resolving Time-Conflicting Tasks
Multi-robot systems performing continuous tasks face a performance trade-off when interrupted by urgent, time-critical sub-tasks. We investigate this trade-off in a scenario where a team must balance area patrolling with locating an anomalous radio signal. To address this trade-off, we evaluate both behavioral heterogeneity through agent role specialization ("patrollers" and "searchers") and sensing heterogeneity (i.e., only the searchers can sense the radio signal). Through simulation, we identify the Pareto-optimal trade-offs under varying team compositions, with behaviorally heterogeneous teams demonstrating the most balanced trade-offs in the majority of cases. When sensing capability is restricted, heterogeneous teams with half of the sensing-capable agents perform comparably to homogeneous teams, providing cost-saving rationale for restricting sensor payload deployment. Our findings demonstrate that pre-deployment role and sensing specialization are powerful design considerations for multi-robot systems facing time-conflicting tasks, where varying the degree of behavioral heterogeneity can tune system performance toward either task.
comment: Accepted to SAC '26. To appear, DOI: https://doi.org/10.1145/3748522.3779970
Data-Driven Dynamic Parameter Learning of manipulator robots
Bridging the sim-to-real gap remains a fundamental challenge in robotics, as accurate dynamic parameter estimation is essential for reliable model-based control, realistic simulation, and safe deployment of manipulators. Traditional analytical approaches often fall short when faced with complex robot structures and interactions. Data-driven methods offer a promising alternative, yet conventional neural networks such as recurrent models struggle to capture long-range dependencies critical for accurate estimation. In this study, we propose a Transformer-based approach for dynamic parameter estimation, supported by an automated pipeline that generates diverse robot models and enriched trajectory data using Jacobian-derived features. The dataset consists of 8,192 robots with varied inertial and frictional properties. Leveraging attention mechanisms, our model effectively captures both temporal and spatial dependencies. Experimental results highlight the influence of sequence length, sampling rate, and architecture, with the best configuration (sequence length 64, 64 Hz, four layers, 32 heads) achieving a validation R2 of 0.8633. Mass and inertia are estimated with near-perfect accuracy, Coulomb friction with moderate-to-high accuracy, while viscous friction and distal link center-of-mass remain more challenging. These results demonstrate that combining Transformers with automated dataset generation and kinematic enrichment enables scalable, accurate dynamic parameter estimation, contributing to improved sim-to-real transfer in robotic systems
comment: Accepted for publication at SII 2026. 6 pages, 7 figures. Code is available at: https://github.com/MohamedAlsiagy/dynamic_parameter_est
A Multi-Robot Platform for Robotic Triage Combining Onboard Sensing and Foundation Models
This report presents a heterogeneous robotic system designed for remote primary triage in mass-casualty incidents (MCIs). The system employs a coordinated air-ground team of unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) to locate victims, assess their injuries, and prioritize medical assistance without risking the lives of first responders. The UAV identify and provide overhead views of casualties, while UGVs equipped with specialized sensors measure vital signs and detect and localize physical injuries. Unlike previous work that focused on exploration or limited medical evaluation, this system addresses the complete triage process: victim localization, vital sign measurement, injury severity classification, mental status assessment, and data consolidation for first responders. Developed as part of the DARPA Triage Challenge, this approach demonstrates how multi-robot systems can augment human capabilities in disaster response scenarios to maximize lives saved.
comment: Technical Report for the DARPA Triage Challenge PRONTO team
Non Normalized Shared-Constraint Dynamic Games for Human-Robot Collaboration with Asymmetric Responsibility
This paper proposes a dynamic game formulation for cooperative human-robot navigation in shared workspaces with obstacles, where the human and robot jointly satisfy shared safety constraints while pursuing a common task. A key contribution is the introduction of a non-normalized equilibrium structure for the shared constraints. This structure allows the two agents to contribute different levels of effort towards enforcing safety requirements such as collision avoidance and inter-players spacing. We embed this non-normalized equilibrium into a receding-horizon optimal control scheme.
Ergodic Trajectory Planning with Dynamic Sensor Footprints
This paper addresses the problem of trajectory planning for information gathering with a dynamic and resolution-varying sensor footprint. Ergodic planning offers a principled framework that balances exploration (visiting all areas) and exploitation (focusing on high-information regions) by planning trajectories such that the time spent in a region is proportional to the amount of information in that region. Existing ergodic planning often oversimplifies the sensing model by assuming a point sensor or a footprint with constant shape and resolution. In practice, the sensor footprint can drastically change over time as the robot moves, such as aerial robots equipped with downward-facing cameras, whose field of view depends on the orientation and altitude. To overcome this limitation, we propose a new metric that accounts for dynamic sensor footprints, analyze the theoretic local optimality conditions, and propose numerical trajectory optimization algorithms. Experimental results show that the proposed approach can simultaneously optimize both the trajectories and sensor footprints, with up to an order of magnitude better ergodicity than conventional methods. We also deploy our approach in a multi-drone system to ergodically cover an object in 3D space.
comment: 12 figures
Sim2Swim: Zero-Shot Velocity Control for Agile AUV Maneuvering in 3 Minutes
Holonomic autonomous underwater vehicles (AUVs) have the hardware ability for agile maneuvering in both translational and rotational degrees of freedom (DOFs). However, due to challenges inherent to underwater vehicles, such as complex hydrostatics and hydrodynamics, parametric uncertainties, and frequent changes in dynamics due to payload changes, control is challenging. Performance typically relies on carefully tuned controllers targeting unique platform configurations, and a need for re-tuning for deployment under varying payloads and hydrodynamic conditions. As a consequence, agile maneuvering with simultaneous tracking of time-varying references in both translational and rotational DOFs is rarely utilized in practice. To the best of our knowledge, this paper presents the first general zero-shot sim2real deep reinforcement learning-based (DRL) velocity controller enabling path following and agile 6DOF maneuvering with a training duration of just 3 minutes. Sim2Swim, the proposed approach, inspired by state-of-the-art DRL-based position control, leverages domain randomization and massively parallelized training to converge to field-deployable control policies for AUVs of variable characteristics without post-processing or tuning. Sim2Swim is extensively validated in pool trials for a variety of configurations, showcasing robust control for highly agile motions.
comment: 7 pages, 4 figures
A Sensor-Aware Phenomenological Framework for Lidar Degradation Simulation and SLAM Robustness Evaluation
Lidar-based SLAM systems are highly sensitive to adverse conditions such as occlusion, noise, and field-of-view (FoV) degradation, yet existing robustness evaluation methods either lack physical grounding or do not capture sensor-specific behavior. This paper presents a sensor-aware, phenomenological framework for simulating interpretable lidar degradations directly on real point clouds, enabling controlled and reproducible SLAM stress testing. Unlike image-derived corruption benchmarks (e.g., SemanticKITTI-C) or simulation-only approaches (e.g., lidarsim), the proposed system preserves per-point geometry, intensity, and temporal structure while applying structured dropout, FoV reduction, Gaussian noise, occlusion masking, sparsification, and motion distortion. The framework features autonomous topic and sensor detection, modular configuration with four severity tiers (light--extreme), and real-time performance (less than 20 ms per frame) compatible with ROS workflows. Experimental validation across three lidar architectures and five state-of-the-art SLAM systems reveals distinct robustness patterns shaped by sensor design and environmental context. The open-source implementation provides a practical foundation for benchmarking lidar-based SLAM under physically meaningful degradation scenarios.
Multi-Task Bayesian Optimization for Tuning Decentralized Trajectory Generation in Multi-UAV Systems
This paper investigates the use of Multi-Task Bayesian Optimization for tuning decentralized trajectory generation algorithms in multi-drone systems. We treat each task as a trajectory generation scenario defined by a specific number of drone-to-drone interactions. To model relationships across scenarios, we employ Multi-Task Gaussian Processes, which capture shared structure across tasks and enable efficient information transfer during optimization. We compare two strategies: optimizing the average mission time across all tasks and optimizing each task individually. Through a comprehensive simulation campaign, we show that single-task optimization leads to progressively shorter mission times as swarm size grows, but requires significantly more optimization time than the average-task approach.
Decoupled Design of Time-Varying Control Barrier Functions via Equivariances
This article presents a systematic method for designing time-varying Control Barrier Functions (CBF) composed of a time-invariant component and multiple time-dependent components, leveraging structural properties of the system dynamics. The method involves the construction of a specific class of time-invariant CBFs that encode the system's dynamic capabilities with respect to a given constraint, and augments them subsequently with appropriately designed time-dependent transformations. While transformations uniformly varying the time-invariant CBF can be applied to arbitrary systems, transformations exploiting structural properties in the dynamics - equivariances in particular - enable the handling of a broader and more expressive class of time-varying constraints. The article shows how to leverage such properties in the design of time-varying CBFs. The proposed method decouples the design of time variations from the computationally expensive construction of the underlying CBFs, thereby providing a computationally attractive method to the design of time-varying CBFs. The method accounts for input constraints and under-actuation, and requires only qualitative knowledge on the time-variation of the constraints making it suitable to the application in uncertain environments.
comment: 7 pages, 3 figures
Mind to Hand: Purposeful Robotic Control via Embodied Reasoning
Humans act with context and intention, with reasoning playing a central role. While internet-scale data has enabled broad reasoning capabilities in AI systems, grounding these abilities in physical action remains a major challenge. We introduce Lumo-1, a generalist vision-language-action (VLA) model that unifies robot reasoning ("mind") with robot action ("hand"). Our approach builds upon the general multi-modal reasoning capabilities of pre-trained vision-language models (VLMs), progressively extending them to embodied reasoning and action prediction, and ultimately towards structured reasoning and reasoning-action alignment. This results in a three-stage pre-training pipeline: (1) Continued VLM pre-training on curated vision-language data to enhance embodied reasoning skills such as planning, spatial understanding, and trajectory prediction; (2) Co-training on cross-embodiment robot data alongside vision-language data; and (3) Action training with reasoning process on trajectories collected on Astribot S1, a bimanual mobile manipulator with human-like dexterity and agility. Finally, we integrate reinforcement learning to further refine reasoning-action consistency and close the loop between semantic inference and motor control. Extensive experiments demonstrate that Lumo-1 achieves significant performance improvements in embodied vision-language reasoning, a critical component for generalist robotic control. Real-world evaluations further show that Lumo-1 surpasses strong baselines across a wide range of challenging robotic tasks, with strong generalization to novel objects and environments, excelling particularly in long-horizon tasks and responding to human-natural instructions that require reasoning over strategy, concepts and space.
comment: 49 pages, 25 figures
Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery
Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons' preferences for each option.
RVC-NMPC: Nonlinear Model Predictive Control with Reciprocal Velocity Constraints for Mutual Collision Avoidance in Agile UAV Flight
This paper presents an approach to mutual collision avoidance based on Nonlinear Model Predictive Control (NMPC) with time-dependent Reciprocal Velocity Constraints (RVCs). Unlike most existing methods, the proposed approach relies solely on observable information about other robots, eliminating the necessity of excessive communication use. The computationally efficient algorithm for computing RVCs, together with the direct integration of these constraints into NMPC problem formulation on a controller level, allows the whole pipeline to run at 100 Hz. This high processing rate, combined with modeled nonlinear dynamics of the controlled Uncrewed Aerial Vehicles (UAVs), is a key feature that facilitates the use of the proposed approach for an agile UAV flight. The proposed approach was evaluated through extensive simulations emulating real-world conditions in scenarios involving up to 10 UAVs and velocities of up to 25 m/s, and in real-world experiments with accelerations up to 30 m/s$^2$. Comparison with state of the art shows 31% improvement in terms of flight time reduction in challenging scenarios, while maintaining a collision-free navigation in all trials.
comment: 8 pages, 8 figures
Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations
Recent end-to-end robotic manipulation research increasingly adopts architectures inspired by large language models to enable robust manipulation. However, a critical challenge arises from severe distribution shifts between robotic action data, primarily due to substantial numerical variations in action commands across diverse robotic platforms and tasks, hindering the effective transfer of pretrained knowledge. To address this limitation, we propose a semantically grounded linguistic representation to normalize actions for efficient pretraining. Unlike conventional discretized action representations that are sensitive to numerical scales, the motion representation specifically disregards numeric scale effects, emphasizing directionality instead. This abstraction mitigates distribution shifts, yielding a more generalizable pretraining representation. Moreover, using the motion representation narrows the feature distance between action tokens and standard vocabulary tokens, mitigating modality gaps. Multi-task experiments on two benchmarks demonstrate that the proposed method significantly improves generalization performance and transferability in robotic manipulation tasks.
vEDGAR -- Can CARLA Do HiL?
Simulation offers advantages throughout the development process of automated driving functions, both in research and product development. Common open-source simulators like CARLA are extensively used in training, evaluation, and software-in-the-loop testing of new automated driving algorithms. However, the CARLA simulator lacks an evaluation where research and automated driving vehicles are simulated with their entire sensor and actuation stack in real time. The goal of this work is therefore to create a simulation framework for testing the automation software on its dedicated hardware and identifying its limits. Achieving this goal would greatly benefit the open-source development workflow of automated driving functions, designating CARLA as a consistent evaluation tool along the entire development process. To achieve this goal, in a first step, requirements are derived, and a simulation architecture is specified and implemented. Based on the formulated requirements, the proposed vEDGAR software is evaluated, resulting in a final conclusion on the applicability of CARLA for HiL testing of automated vehicles. The tool is available open source: Modified CARLA fork: https://github.com/TUMFTM/carla, vEDGAR Framework: https://github.com/TUMFTM/vEDGAR
SensHRPS: Sensing Comfortable Human-Robot Proxemics and Personal Space With Eye-Tracking
Social robots must adjust to human proxemic norms to ensure user comfort and engagement. While prior research demonstrates that eye-tracking features reliably estimate comfort in human-human interactions, their applicability to interactions with humanoid robots remains unexplored. In this study, we investigate user comfort with the robot "Ameca" across four experimentally controlled distances (0.5 m to 2.0 m) using mobile eye-tracking and subjective reporting (N=19). We evaluate multiple machine learning and deep learning models to estimate comfort based on gaze features. Contrary to previous human-human studies where Transformer models excelled, a Decision Tree classifier achieved the highest performance (F1-score = 0.73), with minimum pupil diameter identified as the most critical predictor. These findings suggest that physiological comfort thresholds in human-robot interaction differ from human-human dynamics and can be effectively modeled using interpretable logic.
Prospect Theory in Physical Human-Robot Interaction: A Pilot Study of Probability Perception
Understanding how humans respond to uncertainty is critical for designing safe and effective physical human-robot interaction (pHRI), as physically working with robots introduces multiple sources of uncertainty, including trust, comfort, and perceived safety. Conventional pHRI control frameworks typically build on optimal control theory, which assumes that human actions minimize a cost function; however, human behavior under uncertainty often departs from such optimal patterns. To address this gap, additional understanding of human behavior under uncertainty is needed. This pilot study implemented a physically coupled target-reaching task in which the robot delivered assistance or disturbances with systematically varied probabilities (10\% to 90\%). Analysis of participants' force inputs and decision-making strategies revealed two distinct behavioral clusters: a "trade-off" group that modulated their physical responses according to disturbance likelihood, and an "always-compensate" group characterized by strong risk aversion irrespective of probability. These findings provide empirical evidence that human decision-making in pHRI is highly individualized and that the perception of probability can differ to its true value. Accordingly, the study highlights the need for more interpretable behavioral models, such as cumulative prospect theory (CPT), to more accurately capture these behaviors and inform the design of future adaptive robot controllers.
comment: 9 pages, 6 figures
A Multi-Agent LLM Framework for Design Space Exploration in Autonomous Driving Systems
Designing autonomous driving systems requires efficient exploration of large hardware/software configuration spaces under diverse environmental conditions, e.g., with varying traffic, weather, and road layouts. Traditional design space exploration (DSE) approaches struggle with multi-modal execution outputs and complex performance trade-offs, and often require human involvement to assess correctness based on execution outputs. This paper presents a multi-agent, large language model (LLM)-based DSE framework, which integrates multi-modal reasoning with 3D simulation and profiling tools to automate the interpretation of execution outputs and guide the exploration of system designs. Specialized LLM agents are leveraged to handle user input interpretation, design point generation, execution orchestration, and analysis of both visual and textual execution outputs, which enables identification of potential bottlenecks without human intervention. A prototype implementation is developed and evaluated on a robotaxi case study (an SAE Level 4 autonomous driving application). Compared with a genetic algorithm baseline, the proposed framework identifies more Pareto-optimal, cost-efficient solutions with reduced navigation time under the same exploration budget. Experimental results also demonstrate the efficiency of the adoption of the LLM-based approach for DSE. We believe that this framework paves the way to the design automation of autonomous driving systems.
Using reinforcement learning to probe the role of feedback in skill acquisition
Many high-performance human activities are executed with little or no external feedback: think of a figure skater landing a triple jump, a pitcher throwing a curveball for a strike, or a barista pouring latte art. To study the process of skill acquisition under fully controlled conditions, we bypass human subjects. Instead, we directly interface a generalist reinforcement learning agent with a spinning cylinder in a tabletop circulating water channel to maximize or minimize drag. This setup has several desirable properties. First, it is a physical system, with the rich interactions and complex dynamics that only the physical world has: the flow is highly chaotic and extremely difficult, if not impossible, to model or simulate accurately. Second, the objective -- drag minimization or maximization -- is easy to state and can be captured directly in the reward, yet good strategies are not obvious beforehand. Third, decades-old experimental studies provide recipes for simple, high-performance open-loop policies. Finally, the setup is inexpensive and far easier to reproduce than human studies. In our experiments we find that high-dimensional flow feedback lets the agent discover high-performance drag-control strategies with only minutes of real-world interaction. When we later replay the same action sequences without any feedback, we obtain almost identical performance. This shows that feedback, and in particular flow feedback, is not needed to execute the learned policy. Surprisingly, without flow feedback during training the agent fails to discover any well-performing policy in drag maximization, but still succeeds in drag minimization, albeit more slowly and less reliably. Our studies show that learning a high-performance skill can require richer information than executing it, and learning conditions can be kind or wicked depending solely on the goal, not on dynamics or policy complexity.
comment: Website: https://antonioterpin.com/fluids-control
SDT-6D: Fully Sparse Depth-Transformer for Staged End-to-End 6D Pose Estimation in Industrial Multi-View Bin Picking WACV 2026
Accurately recovering 6D poses in densely packed industrial bin-picking environments remain a serious challenge, owing to occlusions, reflections, and textureless parts. We introduce a holistic depth-only 6D pose estimation approach that fuses multi-view depth maps into either a fine-grained 3D point cloud in its vanilla version, or a sparse Truncated Signed Distance Field (TSDF). At the core of our framework lies a staged heatmap mechanism that yields scene-adaptive attention priors across different resolutions, steering computation toward foreground regions, thus keeping memory requirements at high resolutions feasible. Along, we propose a density-aware sparse transformer block that dynamically attends to (self-) occlusions and the non-uniform distribution of 3D data. While sparse 3D approaches has proven effective for long-range perception, its potential in close-range robotic applications remains underexplored. Our framework operates fully sparse, enabling high-resolution volumetric representations to capture fine geometric details crucial for accurate pose estimation in clutter. Our method processes the entire scene integrally, predicting the 6D pose via a novel per-voxel voting strategy, allowing simultaneous pose predictions for an arbitrary number of target objects. We validate our method on the recently published IPD and MV-YCB multi-view datasets, demonstrating competitive performance in heavily cluttered industrial and household bin picking scenarios.
comment: Accepted to WACV 2026. Preprint version
Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems
Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and impacts. Conventional latent world models typically employ monolithic neural networks that enforce global continuity, inevitably over-smoothing the distinct dynamic modes (e.g., sticking vs. sliding, flight vs. stance). For a planner, this smoothing results in catastrophic compounding errors during long-horizon lookaheads, rendering the search process unreliable at physical boundaries. To address this, we introduce the Prismatic World Model (PRISM-WM), a structured architecture designed to decompose complex hybrid dynamics into composable primitives. PRISM-WM leverages a context-aware Mixture-of-Experts (MoE) framework where a gating mechanism implicitly identifies the current physical mode, and specialized experts predict the associated transition dynamics. We further introduce a latent orthogonalization objective to ensure expert diversity, effectively preventing mode collapse. By accurately modeling the sharp mode transitions in system dynamics, PRISM-WM significantly reduces rollout drift. Extensive experiments on challenging continuous control benchmarks, including high-dimensional humanoids and diverse multi-task settings, demonstrate that PRISM-WM provides a superior high-fidelity substrate for trajectory optimization algorithms (e.g., TD-MPC), proving its potential as a powerful foundational model for next-generation model-based agents.
Learning Robot Manipulation from Audio World Models
World models have demonstrated impressive performance on robotic learning tasks. Many such tasks inherently demand multimodal reasoning; for example, filling a bottle with water will lead to visual information alone being ambiguous or incomplete, thereby requiring reasoning over the temporal evolution of audio, accounting for its underlying physical properties and pitch patterns. In this paper, we propose a generative latent flow matching model to anticipate future audio observations, enabling the system to reason about long-term consequences when integrated into a robot policy. We demonstrate the superior capabilities of our system through two manipulation tasks that require perceiving in-the-wild audio or music signals, compared to methods without future lookahead. We further emphasize that successful robot action learning for these tasks relies not merely on multi-modal input, but critically on the accurate prediction of future audio states that embody intrinsic rhythmic patterns.
Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging
Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied real-world environments. However, they still fall short on new tasks not covered in the training data. When finetuned on limited demonstrations of a new task, these policies often overfit to the specific demonstrations--not only losing their prior abilities to solve a wide variety of generalist tasks but also failing to generalize within the new task itself. In this work, we aim to develop a method that preserves the generalization capabilities of the generalist policy during finetuning, allowing a single policy to robustly incorporate a new skill into its repertoire. Our goal is a single policy that both learns to generalize to variations of the new task and retains the broad competencies gained from pretraining. We show that this can be achieved through a simple yet effective strategy: interpolating the weights of a finetuned model with that of the pretrained model. We show, across extensive simulated and real-world experiments, that such model merging produces a single model that inherits the generalist abilities of the base model and learns to solve the new task robustly, outperforming both the pretrained and finetuned model on out-of-distribution variations of the new task. Moreover, we show that model merging enables continual acquisition of new skills in a lifelong learning setting, without sacrificing previously learned generalist abilities.
Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamically infeasible. We propose Model Predictive Diffuser (MPDiffuser), a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives. MPDiffuser employs an alternating diffusion sampling scheme, where planner and dynamics updates are interleaved to progressively refine trajectories for both task alignment and feasibility during the sampling process. We also provide a theoretical rationale for this procedure, showing how it balances fidelity to data priors with dynamics consistency. Empirically, the compositional design improves sample efficiency, as it leverages even low-quality data for dynamics learning and adapts seamlessly to novel dynamics. We evaluate MPDiffuser on both unconstrained (D4RL) and constrained (DSRL) offline decision-making benchmarks, demonstrating consistent gains over existing approaches. Furthermore, we present a preliminary study extending MPDiffuser to vision-based control tasks, showing its potential to scale to high-dimensional sensory inputs. Finally, we deploy our method on a real quadrupedal robot, showcasing its practicality for real-world control.
Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation ICRA 2025
We introduce Zero-Splat TeleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleoperation. By integrating vision-language segmentation, monocular depth, weighted-PCA pose extraction, and 3D Gaussian Splatting (3DGS), TeleAssist provides every operator with real-time global positions and orientations of multiple robots without fiducials or depth sensors in an interaction-centric teleoperation setup.
comment: Published and Presented at 3rd Workshop on Human-Centric Multilateral Teleoperation in ICRA 2025
RLCNet: An end-to-end deep learning framework for simultaneous online calibration of LiDAR, RADAR, and Camera
Accurate extrinsic calibration of LiDAR, RADAR, and camera sensors is essential for reliable perception in autonomous vehicles. Still, it remains challenging due to factors such as mechanical vibrations and cumulative sensor drift in dynamic environments. This paper presents RLCNet, a novel end-to-end trainable deep learning framework for the simultaneous online calibration of these multimodal sensors. Validated on real-world datasets, RLCNet is designed for practical deployment and demonstrates robust performance under diverse conditions. To support real-time operation, an online calibration framework is introduced that incorporates a weighted moving average and outlier rejection, enabling dynamic adjustment of calibration parameters with reduced prediction noise and improved resilience to drift. An ablation study highlights the significance of architectural choices, while comparisons with existing methods demonstrate the superior accuracy and robustness of the proposed approach.
Learning Spatiotemporal Tubes for Temporal Reach-Avoid-Stay Tasks using Physics-Informed Neural Networks
This paper presents a Spatiotemporal Tube (STT)-based control framework for general control-affine MIMO nonlinear pure-feedback systems with unknown dynamics to satisfy prescribed time reach-avoid-stay tasks under external disturbances. The STT is defined as a time-varying ball, whose center and radius are jointly approximated by a Physics-Informed Neural Network (PINN). The constraints governing the STT are first formulated as loss functions of the PINN, and a training algorithm is proposed to minimize the overall violation. The PINN being trained on certain collocation points, we propose a Lipschitz-based validity condition to formally verify that the learned PINN satisfies the conditions over the continuous time horizon. Building on the learned STT representation, an approximation-free closed-form controller is defined to guarantee satisfaction of the T-RAS specification. Finally, the effectiveness and scalability of the framework are validated through two case studies involving a mobile robot and an aerial vehicle navigating through cluttered environments.
Semantic-Metric Bayesian Risk Fields: Learning Robot Safety from Human Videos with a VLM Prior
Humans interpret safety not as a binary signal but as a continuous, context- and spatially-dependent notion of risk. While risk is subjective, humans form rational mental models that guide action selection in dynamic environments. This work proposes a framework for extracting implicit human risk models by introducing a novel, semantically-conditioned and spatially-varying parametrization of risk, supervised directly from safe human demonstration videos and VLM common sense. Notably, we define risk through a Bayesian formulation. The prior is furnished by a pretrained vision-language model. In order to encourage the risk estimate to be more human aligned, a likelihood function modulates the prior to produce a relative metric of risk. Specifically, the likelihood is a learned ViT that maps pretrained features, to pixel-aligned risk values. Our pipeline ingests RGB images and a query object string, producing pixel-dense risk images. These images that can then be used as value-predictors in robot planning tasks or be projected into 3D for use in conventional trajectory optimization to produce human-like motion. This learned mapping enables generalization to novel objects and contexts, and has the potential to scale to much larger training datasets. In particular, the Bayesian framework that is introduced enables fast adaptation of our model to additional observations or common sense rules. We demonstrate that our proposed framework produces contextual risk that aligns with human preferences. Additionally, we illustrate several downstream applications of the model; as a value learner for visuomotor planners or in conjunction with a classical trajectory optimization algorithm. Our results suggest that our framework is a significant step toward enabling autonomous systems to internalize human-like risk. Code and results can be found at https://riskbayesian.github.io/bayesian_risk/.
Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic Systems
Accurate three-dimensional perception is essential for modern industrial robotic systems that perform manipulation, inspection, and navigation tasks. RGB-D and stereo vision sensors are widely used for this purpose, but the depth maps they produce are often noisy, incomplete, or biased due to sensor limitations and environmental conditions. Depth completion methods aim to generate dense, reliable depth maps from RGB images and sparse depth input. However, a key limitation in current depth completion pipelines is the unrealistic generation of sparse depth: sparse pixels are typically selected uniformly at random from dense ground-truth depth, ignoring the fact that real sensors exhibit geometry-dependent and spatially nonuniform reliability. In this work, we propose a normal-guided sparse depth sampling strategy that leverages PCA-based surface normal estimation on the RGB-D point cloud to compute a per-pixel depth reliability measure. The sparse depth samples are then drawn according to this reliability distribution. We integrate this sampling method with the Marigold-DC diffusion-based depth completion model and evaluate it on NYU Depth v2 using the standard metrics. Experiments show that our geometry-aware sparse depth improves accuracy, reduces artifacts near edges and discontinuities, and produces more realistic training conditions that better reflect real sensor behavior.
High-Performance Dual-Arm Task and Motion Planning for Tabletop Rearrangement ICRA 2026
We propose Synchronous Dual-Arm Rearrangement Planner (SDAR), a task and motion planning (TAMP) framework for tabletop rearrangement, where two robot arms equipped with 2-finger grippers must work together in close proximity to rearrange objects whose start and goal configurations are strongly entangled. To tackle such challenges, SDAR tightly knit together its dependency-driven task planner (SDAR-T) and synchronous dual-arm motion planner (SDAR-M), to intelligently sift through a large number of possible task and motion plans. Specifically, SDAR-T applies a simple yet effective strategy to decompose the global object dependency graph induced by the rearrangement task, to produce more optimal dual-arm task plans than solutions derived from optimal task plans for a single arm. Leveraging state-of-the-art GPU SIMD-based motion planning tools, SDAR-M employs a layered motion planning strategy to sift through many task plans for the best synchronous dual-arm motion plan while ensuring high levels of success rate. Comprehensive evaluation demonstrates that SDAR delivers a 100% success rate in solving complex, non-monotone, long-horizon tabletop rearrangement tasks with solution quality far exceeding the previous state-of-the-art. Experiments on two UR-5e arms further confirm SDAR directly and reliably transfers to robot hardware.
comment: ICRA 2026 Submission
Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
World models have emerged as a pivotal component in robot manipulation planning, enabling agents to predict future environmental states and reason about the consequences of actions before execution. While video-generation models are increasingly adopted, they often lack rigorous physical grounding, leading to hallucinations and a failure to maintain consistency in long-horizon physical constraints. To address these limitations, we propose Embodied Tree of Thoughts (EToT), a novel Real2Sim2Real planning framework that leverages a physics-based interactive digital twin as an embodied world model. EToT formulates manipulation planning as a tree search expanded through two synergistic mechanisms: (1) Priori Branching, which generates diverse candidate execution paths based on semantic and spatial analysis; and (2) Reflective Branching, which utilizes VLMs to diagnose execution failures within the simulator and iteratively refine the planning tree with corrective actions. By grounding high-level reasoning in a physics simulator, our framework ensures that generated plans adhere to rigid-body dynamics and collision constraints. We validate EToT on a suite of short- and long-horizon manipulation tasks, where it consistently outperforms baselines by effectively predicting physical dynamics and adapting to potential failures. Website at https://embodied-tree-of-thoughts.github.io .
comment: Website at https://embodied-tree-of-thoughts.github.io
Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation
While recent large vision-language models (VLMs) have improved generalization in vision-language navigation (VLN), existing methods typically rely on end-to-end pipelines that map vision-language inputs directly to short-horizon discrete actions. Such designs often produce fragmented motions, incur high latency, and struggle with real-world challenges like dynamic obstacle avoidance. We propose DualVLN, the first dual-system VLN foundation model that synergistically integrates high-level reasoning with low-level action execution. System 2, a VLM-based global planner, "grounds slowly" by predicting mid-term waypoint goals via image-grounded reasoning. System 1, a lightweight, multi-modal conditioning Diffusion Transformer policy, "moves fast" by leveraging both explicit pixel goals and latent features from System 2 to generate smooth and accurate trajectories. The dual-system design enables robust real-time control and adaptive local decision-making in complex, dynamic environments. By decoupling training, the VLM retains its generalization, while System 1 achieves interpretable and effective local navigation. DualVLN outperforms prior methods across all VLN benchmarks and real-world experiments demonstrate robust long-horizon planning and real-time adaptability in dynamic environments.
RAVES-Calib: Robust, Accurate and Versatile Extrinsic Self Calibration Using Optimal Geometric Features
In this paper, we present a user-friendly LiDAR-camera calibration toolkit that is compatible with various LiDAR and camera sensors and requires only a single pair of laser points and a camera image in targetless environments. Our approach eliminates the need for an initial transform and remains robust even with large positional and rotational LiDAR-camera extrinsic parameters. We employ the Gluestick pipeline to establish 2D-3D point and line feature correspondences for a robust and automatic initial guess. To enhance accuracy, we quantitatively analyze the impact of feature distribution on calibration results and adaptively weight the cost of each feature based on these metrics. As a result, extrinsic parameters are optimized by filtering out the adverse effects of inferior features. We validated our method through extensive experiments across various LiDAR-camera sensors in both indoor and outdoor settings. The results demonstrate that our method provides superior robustness and accuracy compared to SOTA techniques. Our code is open-sourced on GitHub to benefit the community.
Chat with UAV -- Human-UAV Interaction Based on Large Language Models
The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling more personalized task planning and design, thereby achieving a higher quality of interaction experience and greater flexibility, which can be used in many fileds, such as agriculture, aerial photography, logistics, and environmental monitoring. However, due to the lack of a common language between users and the UAVs, such interactions are often difficult to be achieved. The developments of Large Language Models possess the ability to understand nature languages and Robots' (UAVs') behaviors, marking the possibility of personalized Human-UAV Interaction. Recently, some HUI frameworks based on LLMs have been proposed, but they commonly suffer from difficulties in mixed task planning and execution, leading to low adaptability in complex scenarios. In this paper, we propose a novel dual-agent HUI framework. This framework constructs two independent LLM agents (a task planning agent, and an execution agent) and applies different Prompt Engineering to separately handle the understanding, planning, and execution of tasks. To verify the effectiveness and performance of the framework, we have built a task database covering four typical application scenarios of UAVs and quantified the performance of the HUI framework using three independent metrics. Meanwhile different LLM models are selected to control the UAVs with compared performance. Our user study experimental results demonstrate that the framework improves the smoothness of HUI and the flexibility of task execution in the tasks scenario we set up, effectively meeting users' personalized needs.
Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous SC
Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits the operational scalability in real rendezvous missions.This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings -- fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform -- demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90\% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden.
comment: 28 pages, 12 figures. Submitted to AIAA SCITECH 2026
Cognitive Trust in HRI: "Pay Attention to Me and I'll Trust You Even if You are Wrong"
Cognitive trust and the belief that a robot is capable of accurately performing tasks, are recognized as central factors in fostering high-quality human-robot interactions. It is well established that performance factors such as the robot's competence and its reliability shape cognitive trust. Recent studies suggest that affective factors, such as robotic attentiveness, also play a role in building cognitive trust. This work explores the interplay between these two factors that shape cognitive trust. Specifically, we evaluated whether different combinations of robotic competence and attentiveness introduce a compensatory mechanism, where one factor compensates for the lack of the other. In the experiment, participants performed a search task with a robotic dog in a 2x2 experimental design that included two factors: competence (high or low) and attentiveness (high or low). The results revealed that high attentiveness can compensate for low competence. Participants who collaborated with a highly attentive robot that performed poorly reported trust levels comparable to those working with a highly competent robot. When the robot did not demonstrate attentiveness, low competence resulted in a substantial decrease in cognitive trust. The findings indicate that building cognitive trust in human-robot interaction may be more complex than previously believed, involving emotional processes that are typically overlooked. We highlight an affective compensatory mechanism that adds a layer to consider alongside traditional competence-based models of cognitive trust.
comment: Confrence paper
Masked Generative Policy for Robotic Control
We present Masked Generative Policy (MGP), a novel framework for visuomotor imitation learning. We represent actions as discrete tokens, and train a conditional masked transformer that generates tokens in parallel and then rapidly refines only low-confidence tokens. We further propose two new sampling paradigms: MGP-Short, which performs parallel masked generation with score-based refinement for Markovian tasks, and MGP-Long, which predicts full trajectories in a single pass and dynamically refines low-confidence action tokens based on new observations. With globally coherent prediction and robust adaptive execution capabilities, MGP-Long enables reliable control on complex and non-Markovian tasks that prior methods struggle with. Extensive evaluations on 150 robotic manipulation tasks spanning the Meta-World and LIBERO benchmarks show that MGP achieves both rapid inference and superior success rates compared to state-of-the-art diffusion and autoregressive policies. Specifically, MGP increases the average success rate by 9% across 150 tasks while cutting per-sequence inference time by up to 35x. It further improves the average success rate by 60% in dynamic and missing-observation environments, and solves two non-Markovian scenarios where other state-of-the-art methods fail.
Characterizing Human Feedback-Based Control in Naturalistic Driving Interactions via Gaussian Process Regression with Linear Feedback
Understanding driver interactions is critical to designing autonomous vehicles to interoperate safely with human-driven cars. We consider the impact of these interactions on the policies drivers employ when navigating unsigned intersections in a driving simulator. The simulator allows the collection of naturalistic decision-making and behavior data in a controlled environment. Using these data, we model the human driver responses as state-based feedback controllers learned via Gaussian Process regression methods. We compute the feedback gain of the controller using a weighted combination of linear and nonlinear priors. We then analyze how the individual gains are reflected in driver behavior. We also assess differences in these controllers across populations of drivers. Our work in data-driven analyses of how drivers determine their policies can facilitate future work in the design of socially responsive autonomy for vehicles.
Inferring Operator Emotions from a Motion-Controlled Robotic Arm
A remote robot operator's affective state can significantly impact the resulting robot's motions leading to unexpected consequences, even when the user follows protocol and performs permitted tasks. The recognition of a user operator's affective states in remote robot control scenarios is, however, underexplored. Current emotion recognition methods rely on reading the user's vital signs or body language, but the devices and user participation these measures require would add limitations to remote robot control. We demonstrate that the functional movements of a remote-controlled robotic avatar, which was not designed for emotional expression, can be used to infer the emotional state of the human operator via a machine-learning system. Specifically, our system achieved 83.3$\%$ accuracy in recognizing the user's emotional state expressed by robot movements, as a result of their hand motions. We discuss the implications of this system on prominent current and future remote robot operation and affective robotic contexts.
Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics
Visual place recognition (VPR) is an important component technology for camera-based mapping and navigation applications. This is a challenging problem because images of the same place may appear quite different for reasons including seasonal changes, weather illumination, structural changes to the environment, as well as transient pedestrian or vehicle traffic. Papers focusing on generating image descriptors for VPR report their results using metrics such as recall@K and ROC curves. However, for a robot implementation, determining which matches are sufficiently good is often reduced to a manually set threshold. And it is difficult to manually select a threshold that will work for a variety of visual scenarios. This paper addresses the problem of automatically selecting a threshold for VPR by looking at the 'negative' Gaussian mixture statistics for a place - image statistics indicating not this place. We show that this approach can be used to select thresholds that work well for a variety of image databases and image descriptors.
comment: Accepted and presented at IEEE RoboticCC 2025. 4 pages short paper
ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside MCL, yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. Across 100 global-localization trials spanning four conditions (cart-mounted, wearable, dynamic obstacles, and sparse semantics) in a semantically dense, retail environment, ShelfAware achieves a 96% success rate (vs. 22% MCL and 10% AMCL) with a mean time-to-convergence of 1.91s, attains the lowest translational RMSE in all conditions, and maintains stable tracking in 80% of tested sequences, all while running in real time on a consumer laptop-class platform. By modeling semantics distributionally at the category level and leveraging inverse proposals, ShelfAware resolves geometric aliasing and semantic drift common to quasi-static domains. Because the method requires only vision sensors and VIO, it integrates as an infrastructure-free building block for mobile robots in warehouses, labs, and retail settings; as a representative application, it also supports the creation of assistive devices providing start-anytime, shared-control assistive navigation for people with visual impairments.
comment: 8 pages
Botany Meets Robotics in Alpine Scree Monitoring
According to the European Union's Habitat Directive, habitat monitoring plays a critical role in response to the escalating problems posed by biodiversity loss and environmental degradation. Scree habitats, hosting unique and often endangered species, face severe threats from climate change due to their high-altitude nature. Traditionally, their monitoring has required highly skilled scientists to conduct extensive fieldwork in remote, potentially hazardous locations, making the process resource-intensive and time-consuming. This paper presents a novel approach for scree habitat monitoring using a legged robot to assist botanists in data collection and species identification. Specifically, we deployed the ANYmal C robot in the Italian Alpine bio-region in two field campaigns spanning two years and leveraged deep learning to detect and classify key plant species of interest. Our results demonstrate that agile legged robots can navigate challenging terrains and increase the frequency and efficiency of scree monitoring. When paired with traditional phytosociological surveys performed by botanists, this robotics-assisted protocol not only streamlines field operations but also enhances data acquisition, storage, and usage. The outcomes of this research contribute to the evolving landscape of robotics in environmental science, paving the way for a more comprehensive and sustainable approach to habitat monitoring and preservation.
comment: 19 pages, 13 figures
Obstacle Avoidance of UAV in Dynamic Environments Using Direction and Velocity-Adaptive Artificial Potential Field
The conventional Artificial Potential Field (APF) is fundamentally limited by the local minima issue and its inability to account for the kinematics of moving obstacles. This paper addresses the critical challenge of autonomous collision avoidance for Unmanned Aerial Vehicles (UAVs) operating in dynamic and cluttered airspace by proposing a novel Direction and Relative Velocity Weighted Artificial Potential Field (APF). In this approach, a bounded weighting function, $ω(θ,v_{e})$, is introduced to dynamically scale the repulsive potential based on the direction and velocity of the obstacle relative to the UAV. This robust APF formulation is integrated within a Model Predictive Control (MPC) framework to generate collision-free trajectories while adhering to kinematic constraints. Simulation results demonstrate that the proposed method effectively resolves local minima and significantly enhances safety by enabling smooth, predictive avoidance maneuvers. The system ensures superior path integrity and reliable performance, confirming its viability for autonomous navigation in complex environments.
Towards Task-Oriented Flying: Framework, Infrastructure, and Principles
Deploying robot learning methods to aerial robots in unstructured environments remains both challenging and promising. While recent advances in deep reinforcement learning (DRL) have enabled end-to-end flight control, the field still lacks systematic design guidelines and a unified infrastructure to support reproducible training and real-world deployment. We present a task-oriented framework for end-to-end DRL in quadrotors that integrates design principles for complex task specification and reveals the interdependencies among simulated task definition, training design principles, and physical deployment. Our framework involves software infrastructure, hardware platforms, and open-source firmware to support a full-stack learning infrastructure and workflow. Extensive empirical results demonstrate robust flight and sim-to-real generalization under real-world disturbances. By reducing the entry barrier for deploying learning-based controllers on aerial robots, our work lays a practical foundation for advancing autonomous flight in dynamic and unstructured environments.
DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving
Most end-to-end autonomous driving methods rely on imitation learning from single expert demonstrations, often leading to conservative and homogeneous behaviors that limit generalization in complex real-world scenarios. In this work, we propose DIVER, an end-to-end driving framework that integrates reinforcement learning with diffusion-based generation to produce diverse and feasible trajectories. At the core of DIVER lies a reinforced diffusion-based generation mechanism. First, the model conditions on map elements and surrounding agents to generate multiple reference trajectories from a single ground-truth trajectory, alleviating the limitations of imitation learning that arise from relying solely on single expert demonstrations. Second, reinforcement learning is employed to guide the diffusion process, where reward-based supervision enforces safety and diversity constraints on the generated trajectories, thereby enhancing their practicality and generalization capability. Furthermore, to address the limitations of L2-based open-loop metrics in capturing trajectory diversity, we propose a novel Diversity metric to evaluate the diversity of multi-mode predictions.Extensive experiments on the closed-loop NAVSIM and Bench2Drive benchmarks, as well as the open-loop nuScenes dataset, demonstrate that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
comment: 18 pages, 8 figures
RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation
Physical feasibility in 3D bin packing is a key requirement in modern industrial logistics and robotic automation. With the growing adoption of industrial automation, online bin packing has gained increasing attention. However, inconsistencies in problem settings, test datasets, and evaluation metrics have hindered progress in the field, and there is a lack of a comprehensive benchmarking system. Direct testing on real hardware is costly, and building a realistic simulation environment is also challenging. To address these limitations, we introduce RoboBPP, a benchmarking system designed for robotic online bin packing. RoboBPP integrates a physics-based simulator to assess physical feasibility. In our simulation environment, we introduce a robotic arm and boxes at real-world scales to replicate real industrial packing workflows. By simulating conditions that arise in real industrial applications, we ensure that evaluated algorithms are practically deployable. In addition, prior studies often rely on synthetic datasets whose distributions differ from real-world industrial data. To address this issue, we collect three datasets from real industrial workflows, including assembly-line production, logistics packing, and furniture manufacturing. The benchmark comprises three carefully designed test settings and extends existing evaluation metrics with new metrics for structural stability and operational safety. We design a scoring system and derive a range of insights from the evaluation results. RoboBPP is fully open-source and is equipped with visualization tools and an online leaderboard, providing a reproducible and extensible foundation for future research and industrial applications (https://robot-bin-packing-benchmark.github.io).
comment: Under review at the International Journal of Robotics Research (IJRR)
Control Your Robot: A Unified System for Robot Control and Policy Deployment
Cross-platform robot control remains difficult because hardware interfaces, data formats, and control paradigms vary widely, which fragments toolchains and slows deployment. To address this, we present Control Your Robot, a modular, general-purpose framework that unifies data collection and policy deployment across diverse platforms. The system reduces fragmentation through a standardized workflow with modular design, unified APIs, and a closed-loop architecture. It supports flexible robot registration, dual-mode control with teleoperation and trajectory playback, and seamless integration from multimodal data acquisition to inference. Experiments on single-arm and dual-arm systems show efficient, low-latency data collection and effective support for policy learning with imitation learning and vision-language-action models. Policies trained on data gathered by Control Your Robot match expert demonstrations closely, indicating that the framework enables scalable and reproducible robot learning across platforms.
comment: Code: https://github.com/Tian-Nian/control_your_robot
Humanoid Whole-Body Badminton via Multi-Stage Reinforcement Learning
Humanoid robots have demonstrated strong capabilities for interacting with static scenes across locomotion, manipulation, and more challenging loco-manipulation tasks. Yet the real world is dynamic, and quasi-static interactions are insufficient to cope with diverse environmental conditions. As a step toward more dynamic interaction scenarios, we present a reinforcement-learning-based training pipeline that produces a unified whole-body controller for humanoid badminton, enabling coordinated lower-body footwork and upper-body striking without motion priors or expert demonstrations. Training follows a three-stage curriculum: first footwork acquisition, then precision-guided racket swing generation, and finally task-focused refinement, yielding motions in which both legs and arms serve the hitting objective. For deployment, we incorporate an Extended Kalman Filter (EKF) to estimate and predict shuttlecock trajectories for target striking. We also introduce a prediction-free variant that dispenses with EKF and explicit trajectory prediction. To validate the framework, we conduct five sets of experiments in both simulation and the real world. In simulation, two robots sustain a rally of 21 consecutive hits. Moreover, the prediction-free variant achieves successful hits with comparable performance relative to the target-known policy. In real-world tests, both prediction and controller modules exhibit high accuracy, and on-court hitting achieves an outgoing shuttle speed up to 19.1 m/s with a mean return landing distance of 4 m. These experimental results show that our proposed training scheme can deliver highly dynamic while precise goal striking in badminton, and can be adapted to more dynamics-critical domains.
comment: Project Page: https://humanoid-badminton.github.io/Humanoid-Whole-Body-Badminton-via-Multi-Stage-Reinforcement-Learning
db-LaCAM: Fast and Scalable Multi-Robot Kinodynamic Motion Planning with Discontinuity-Bounded Search and Lightweight MAPF
State-of-the-art multi-robot kinodynamic motion planners struggle to handle more than a few robots due to high computational burden, which limits their scalability and results in slow planning time. In this work, we combine the scalability and speed of modern multi-agent path finding (MAPF) algorithms with the dynamic-awareness of kinodynamic planners to address these limitations. To this end, we propose discontinuity-Bounded LaCAM (db-LaCAM), a planner that utilizes a precomputed set of motion primitives that respect robot dynamics to generate horizon-length motion sequences, while allowing a user-defined discontinuity between successive motions. The planner db-LaCAM is resolution-complete with respect to motion primitives and supports arbitrary robot dynamics. Extensive experiments demonstrate that db-LaCAM scales efficiently to scenarios with up to 50 robots, achieving up to ten times faster runtime compared to state-of-the-art planners, while maintaining comparable solution quality. The approach is validated in both 2D and 3D environments with dynamics such as the unicycle and 3D double integrator. We demonstrate the safe execution of trajectories planned with db-LaCAM in two distinct physical experiments involving teams of flying robots and car-with-trailer robots.
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
comment: Corrected "DRO" to "DRC" and fixed theorem numbering throughout paper
Fitts' List Revisited: An Empirical Study on Function Allocation in a Two-Agent Physical Human-Robot Collaborative Position/Force Task
In this letter, we investigate whether classical function allocation-the principle of assigning tasks to either a human or a machine-holds for physical Human-Robot Collaboration, which is important for providing insights for Industry 5.0 to guide how to best augment rather than replace workers. This study empirically tests the applicability of Fitts' List within physical Human-Robot Collaboration, by conducting a user study (N=26, within-subject design) to evaluate four distinct allocations of position/force control between human and robot in an abstract blending task. We hypothesize that the function in which humans control the position achieves better performance and receives higher user ratings. When allocating position control to the human and force control to the robot, compared to the opposite case, we observed a significant improvement in preventing overblending. This was also perceived better in terms of physical demand and overall system acceptance, while participants experienced greater autonomy, more engagement and less frustration. An interesting insight was that the supervisory role (when the robot controls both position and force) was rated second best in terms of subjective acceptance. Another surprising insight was that if position control was delegated to the robot, the participants perceived much lower autonomy than when the force control was delegated to the robot. These findings empirically support applying Fitts' principles to static function allocation for physical collaboration, while also revealing important nuanced user experience trade-offs, particularly regarding perceived autonomy when delegating position control.
comment: 8 pages, 6 figures, published in IEEE Robotics and Automation Letters, col. 11, no. 1, January 2026
Haptic-based Complementary Filter for Rigid Body Rotations
The non-commutative nature of 3D rotations poses well-known challenges in generalizing planar problems to three-dimensional ones, even more so in contact-rich tasks where haptic information (i.e., forces/torques) is involved. In this sense, not all learning-based algorithms that are currently available generalize to 3D orientation estimation. Non-linear filters defined on $\mathbf{\mathbb{SO}(3)}$ are widely used with inertial measurement sensors; however, none of them have been used with haptic measurements. This paper presents a unique complementary filtering framework that interprets the geometric shape of objects in the form of superquadrics, exploits the symmetry of $\mathbf{\mathbb{SO}(3)}$, and uses force and vision sensors as measurements to provide an estimate of orientation. The framework's robustness and almost global stability are substantiated by a set of experiments on a dual-arm robotic setup.
comment: 7 pages, 7 figures; Updated filter design; Submitted to IFAC for possible publication
Capability-Driven Skill Generation with LLMs: A RAG-Based Approach for Reusing Existing Libraries and Interfaces
Modern automation systems increasingly rely on modular architectures, with capabilities and skills as one solution approach. Capabilities define the functions of resources in a machine-readable form and skills provide the concrete implementations that realize those capabilities. However, the development of a skill implementation conforming to a corresponding capability remains a time-consuming and challenging task. In this paper, we present a method that treats capabilities as contracts for skill implementations and leverages large language models to generate executable code based on natural language user input. A key feature of our approach is the integration of existing software libraries and interface technologies, enabling the generation of skill implementations across different target languages. We introduce a framework that allows users to incorporate their own libraries and resource interfaces into the code generation process through a retrieval-augmented generation architecture. The proposed method is evaluated using an autonomous mobile robot controlled via Python and ROS 2, demonstrating the feasibility and flexibility of the approach.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Omni-LIVO: Robust RGB-Colored Multi-Camera Visual-Inertial-LiDAR Odometry via Photometric Migration and ESIKF Fusion
Wide field-of-view (FoV) LiDAR sensors provide dense geometry across large environments, but existing LiDAR-inertial-visual odometry (LIVO) systems generally rely on a single camera, limiting their ability to fully exploit LiDAR-derived depth for photometric alignment and scene colorization. We present Omni-LIVO, a tightly coupled multi-camera LIVO system that leverages multi-view observations to comprehensively utilize LiDAR geometric information across extended spatial regions. Omni-LIVO introduces a Cross-View direct alignment strategy that maintains photometric consistency across non-overlapping views, and extends the Error-State Iterated Kalman Filter (ESIKF) with multi-view updates and adaptive covariance. The system is evaluated on public benchmarks and our custom dataset, showing improved accuracy and robustness over state-of-the-art LIVO, LIO, and visual-inertial SLAM baselines. Code and dataset will be released upon publication.
MARL Warehouse Robots
We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
comment: 5 pages.Project documentation: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
CDKFormer: Contextual Deviation Knowledge-Based Transformer for Long-Tail Trajectory Prediction
Predicting the future movements of surrounding vehicles is essential for ensuring the safe operation and efficient navigation of autonomous vehicles (AVs) in urban traffic environments. Existing vehicle trajectory prediction methods primarily focus on improving overall performance, yet they struggle to address long-tail scenarios effectively. This limitation often leads to poor predictions in rare cases, significantly increasing the risk of safety incidents. Taking Argoverse 2 motion forecasting dataset as an example, we first investigate the long-tail characteristics in trajectory samples from two perspectives, individual motion and group interaction, and deriving deviation features to distinguish abnormal from regular scenarios. On this basis, we propose CDKFormer, a Contextual Deviation Knowledge-based Transformer model for long-tail trajectory prediction. CDKFormer integrates an attention-based scene context fusion module to encode spatiotemporal interaction and road topology. An additional deviation feature fusion module is proposed to capture the dynamic deviations in the target vehicle status. We further introduce a dual query-based decoder, supported by a multi-stream decoder block, to sequentially decode heterogeneous scene deviation features and generate multimodal trajectory predictions. Extensive experiments demonstrate that CDKFormer achieves state-of-the-art performance, significantly enhancing prediction accuracy and robustness for long-tailed trajectories compared to existing methods, thus advancing the reliability of AVs in complex real-world environments.
Khalasi: Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields ICRA 2026
For centuries, khalasi (Gujarati for sailor) have skillfully harnessed ocean currents to navigate vast waters with minimal effort. Emulating this intuition in autonomous systems remains a significant challenge, particularly for Autonomous Surface Vehicles tasked with long duration missions under strict energy budgets. In this work, we present a learning-based approach for energy-efficient surface vehicle navigation in vortical flow fields, where partial observability often undermines traditional path-planning methods. We present an end to end reinforcement learning framework based on Soft Actor Critic that learns flow-aware navigation policies using only local velocity measurements. Through extensive evaluation across diverse and dynamically rich scenarios, our method demonstrates substantial energy savings and robust generalization to previously unseen flow conditions, offering a promising path toward long term autonomy in ocean environments. The navigation paths generated by our proposed approach show an improvement in energy conservation 30 to 50 percent compared to the existing state of the art techniques.
comment: Under Review for International Conference on Robotics and Automation (ICRA 2026)
Incremental Generalized Hybrid A*
We address the problem of efficiently organizing search over very large trees, which arises in many applications ranging from autonomous driving to aerial vehicles. Here, we are motivated by off-road autonomy, where real-time planning is essential. Classical approaches use graphs of motion primitives and exploit dominance to mitigate the curse of dimensionality and prune expansions efficiently. However, for complex dynamics, repeatedly solving two-point boundary-value problems makes graph construction too slow for fast kinodynamic planning. Hybrid A* (HA*) addressed this challenge by searching over a tree of motion primitives and introducing approximate pruning using a grid-based dominance check. However, choosing the grid resolution is difficult: too coarse risks failure, while too fine leads to excessive expansions and slow planning. We propose Incremental Generalized Hybrid A* (IGHA*), an anytime tree-search framework that dynamically organizes vertex expansions without rigid pruning. IGHA* provably matches or outperforms HA*. For both on-road kinematic and off-road kinodynamic planning queries for a car-like robot, variants of IGHA* use 6x fewer expansions to the best solution compared to an optimized version of HA* (HA*M, an internal baseline). In simulated off-road experiments in a high-fidelity simulator, IGHA* outperforms HA*M when both are used in the loop with a model predictive controller. We demonstrate real-time performance both in simulation and on a small-scale off-road vehicle, enabling fast, robust planning under complex dynamics. Website: https://personalrobotics.github.io/IGHAStar/
comment: 8 pages, 7 figures, Accepted to IEEE RA-L, Nov 2025
Training-Time Action Conditioning for Efficient Real-Time Chunking
Real-time chunking (RTC) enables vision-language-action models (VLAs) to generate smooth, reactive robot trajectories by asynchronously predicting action chunks and conditioning on previously committed actions via inference-time inpainting. However, this inpainting method introduces computational overhead that increases inference latency. In this work, we propose a simple alternative: simulating inference delay at training time and conditioning on action prefixes directly, eliminating any inference-time overhead. Our method requires no modifications to the model architecture or robot runtime, and can be implemented with only a few additional lines of code. In simulated experiments, we find that training-time RTC outperforms inference-time RTC at higher inference delays. In real-world experiments on box building and espresso making tasks with the $π_{0.6}$ VLA, we demonstrate that training-time RTC maintains both task performance and speed parity with inference-time RTC while being computationally cheaper. Our results suggest that training-time action conditioning is a practical drop-in replacement for inference-time inpainting in real-time robot control.
Enhancing the NAO: Extending Capabilities of Legacy Robots for Long-Term Research
Legacy (unsupported) robotic platforms often lose research utility when manufacturer support ends, preventing integration of modern sensing, speech, and interaction capabilities. We present the Enhanced NAO, a revitalized version of Aldebaran's NAO robot featuring upgraded beamforming microphones, RGB-D and thermal cameras, and additional compute resources in a fully self-contained package. This system combines cloud-based and local models for perception and dialogue, while preserving the NAO's expressive body and behaviors. In a pilot user study validating conversational performance, the Enhanced NAO delivered significantly higher conversational quality and elicited stronger user preference compared to the NAO AI Edition, without increasing response latency. The added visual and thermal sensing modalities established a foundation for future perception-driven interaction. Beyond this implementation, our framework provides a platform-agnostic strategy for extending the lifespan and research utility of legacy robots, ensuring they remain valuable tools for human-robot interaction.
Connectivity-Preserving Multi-Agent Area Coverage via Optimal-Transport-Based Density-Driven Optimal Control (D2OC)
Multi-agent systems play a central role in area coverage tasks across search-and-rescue, environmental monitoring, and precision agriculture. Achieving non-uniform coverage, where spatial priorities vary across the domain, requires coordinating agents while respecting dynamic and communication constraints. Density-driven approaches can distribute agents according to a prescribed reference density, but existing methods do not ensure connectivity. This limitation often leads to communication loss, reduced coordination, and degraded coverage performance. This letter introduces a connectivity-preserving extension of the Density-Driven Optimal Control (D2OC) framework. The coverage objective, defined using the Wasserstein distance between the agent distribution and the reference density, admits a convex quadratic program formulation. Communication constraints are incorporated through a smooth connectivity penalty, which maintains strict convexity, supports distributed implementation, and preserves inter-agent communication without imposing rigid formations. Simulation studies show that the proposed method consistently maintains connectivity, improves convergence speed, and enhances non-uniform coverage quality compared with density-driven schemes that do not incorporate explicit connectivity considerations.
comment: Under review in IEEE Control Systems Letters (LCSS). 6 pages
Multiagent Systems
Towards Foundation Models with Native Multi-Agent Intelligence
Foundation models (FMs) are increasingly assuming the role of the "brain" of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interaction or integrated tool use -- we argue that the next frontier is endowing FMs with native multi-agent intelligence. We identify four core capabilities of FMs in multi-agent contexts: understanding, planning, efficient communication, and adaptation. Contrary to assumptions about the spontaneous emergence of such abilities, we provide extensive empirical evidence across 41 large language models showing that strong single-agent performance alone does not automatically yield robust multi-agent intelligence. To address this gap, we outline key research directions -- spanning dataset construction, evaluation, training paradigms, and safety considerations -- for building FMs with native multi-agent intelligence.
Insured Agents: A Decentralized Trust Insurance Mechanism for Agentic Economy AAMAS 2026
The emerging "agentic web" envisions large populations of autonomous agents coordinating, transacting, and delegating across open networks. Yet many agent communication and commerce protocols treat agents as low-cost identities, despite the empirical reality that LLM agents remain unreliable, hallucinated, manipulable, and vulnerable to prompt-injection and tool-abuse. A natural response is "agents-at-stake": binding economically meaningful, slashable collateral to persistent identities and adjudicating misbehavior with verifiable evidence. However, heterogeneous tasks make universal verification brittle and centralization-prone, while traditional reputation struggles under rapid model drift and opaque internal states. We propose a protocol-native alternative: insured agents. Specialized insurer agents post stake on behalf of operational agents in exchange for premiums, and receive privileged, privacy-preserving audit access via TEEs to assess claims. A hierarchical insurer market calibrates stake through pricing, decentralizes verification via competitive underwriting, and yields incentive-compatible dispute resolution.
comment: Submitted to AAMAS 2026
Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology
Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Despite the evident potential exhibited by Multimodal Large Language Models (MLLMs), they frequently encounter challenges such as context dilution and hallucination when confronted with intricate, heterogeneous medical histories. In order to address these limitations, a hierarchical Multi-Agent Framework is proposed, which emulates the collaborative workflow of a human Multidisciplinary Team (MDT). The system attained a composite expert evaluation score of 4.60/5.00, thereby demonstrating a substantial improvement over the monolithic baseline. It is noteworthy that the agent-based architecture yielded the most substantial enhancements in reasoning logic and medical accuracy. The findings indicate that mimetic, agent-based collaboration provides a scalable, interpretable, and clinically robust paradigm for automated decision support in oncology.
Multi-Task Bayesian Optimization for Tuning Decentralized Trajectory Generation in Multi-UAV Systems
This paper investigates the use of Multi-Task Bayesian Optimization for tuning decentralized trajectory generation algorithms in multi-drone systems. We treat each task as a trajectory generation scenario defined by a specific number of drone-to-drone interactions. To model relationships across scenarios, we employ Multi-Task Gaussian Processes, which capture shared structure across tasks and enable efficient information transfer during optimization. We compare two strategies: optimizing the average mission time across all tasks and optimizing each task individually. Through a comprehensive simulation campaign, we show that single-task optimization leads to progressively shorter mission times as swarm size grows, but requires significantly more optimization time than the average-task approach.
Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks
Large Language Models and multi-agent systems have shown promise in decomposing complex tasks, yet they struggle with long-horizon reasoning tasks and escalating computation cost. This work introduces a hierarchical multi-agent architecture that distributes reasoning across a 64*64 grid of lightweight agents, supported by a selective oracle. A spatial curriculum progressively expands the operational region of the grid, ensuring that agents master easier central tasks before tackling harder peripheral ones. To improve reliability, the system integrates Negative Log-Likelihood as a measure of confidence, allowing the curriculum to prioritize regions where agents are both accurate and well calibrated. A Thompson Sampling curriculum manager adaptively chooses training zones based on competence and NLL-driven reward signals. We evaluate the approach on a spatially grounded Tower of Hanoi benchmark, which mirrors the long-horizon structure of many robotic manipulation and planning tasks. Results demonstrate improved stability, reduced oracle usage, and stronger long-range reasoning from distributed agent cooperation.
comment: 22 pages, 2 tables, 9 figures
The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations
Workplace toxicity is widely recognized as detrimental to organizational culture, yet quantifying its direct impact on operational efficiency remains methodologically challenging due to the ethical and practical difficulties of reproducing conflict in human subjects. This study leverages Large Language Model (LLM) based Multi-Agent Systems to simulate 1-on-1 adversarial debates, creating a controlled "sociological sandbox". We employ a Monte Carlo method to simulate hundrets of discussions, measuring the convergence time (defined as the number of arguments required to reach a conclusion) between a baseline control group and treatment groups involving agents with "toxic" system prompts. Our results demonstrate a statistically significant increase of approximately 25\% in the duration of conversations involving toxic participants. We propose that this "latency of toxicity" serves as a proxy for financial damage in corporate and academic settings. Furthermore, we demonstrate that agent-based modeling provides a reproducible, ethical alternative to human-subject research for measuring the mechanics of social friction.
comment: 8 figures, 3 tables
Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks
The deployment of Unmanned Aerial Vehicle (UAV) swarms as dynamic communication relays is critical for next-generation tactical networks. However, operating in contested environments requires solving a complex trade-off, including maximizing system throughput while ensuring collision avoidance and resilience against adversarial jamming. Existing heuristic-based approaches often struggle to find effective solutions due to the dynamic and multi-objective nature of this problem. This paper formulates this challenge as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, solved using the Centralized Training with Decentralized Execution (CTDE) framework. Our approach employs a centralized critic that uses global state information to guide decentralized actors which operate using only local observations. Simulation results show that our proposed framework significantly outperforms heuristic baselines, increasing the total system throughput by approximately 50% while simultaneously achieving a near-zero collision rate. A key finding is that the agents develop an emergent anti-jamming strategy without explicit programming. They learn to intelligently position themselves to balance the trade-off between mitigating interference from jammers and maintaining effective communication links with ground users.
comment: IEEE ICC 2026
Probabilistic Multi-Agent Aircraft Landing Time Prediction
Accurate and reliable aircraft landing time prediction is essential for effective resource allocation in air traffic management. However, the inherent uncertainty of aircraft trajectories and traffic flows poses significant challenges to both prediction accuracy and trustworthiness. Therefore, prediction models should not only provide point estimates of aircraft landing times but also the uncertainties associated with these predictions. Furthermore, aircraft trajectories are frequently influenced by the presence of nearby aircraft through air traffic control interventions such as radar vectoring. Consequently, landing time prediction models must account for multi-agent interactions in the airspace. In this work, we propose a probabilistic multi-agent aircraft landing time prediction framework that provides the landing times of multiple aircraft as distributions. We evaluate the proposed framework using an air traffic surveillance dataset collected from the terminal airspace of the Incheon International Airport in South Korea. The results demonstrate that the proposed model achieves higher prediction accuracy than the baselines and quantifies the associated uncertainties of its outcomes. In addition, the model uncovered underlying patterns in air traffic control through its attention scores, thereby enhancing explainability.
comment: 13 pages, 8 figures, accepted at AIAA SciTech 2026
WOLF: Werewolf-based Observations for LLM Deception and Falsehoods NeurIPS 2025
Deception is a fundamental challenge for multi-agent reasoning: effective systems must strategically conceal information while detecting misleading behavior in others. Yet most evaluations reduce deception to static classification, ignoring the interactive, adversarial, and longitudinal nature of real deceptive dynamics. Large language models (LLMs) can deceive convincingly but remain weak at detecting deception in peers. We present WOLF, a multi-agent social deduction benchmark based on Werewolf that enables separable measurement of deception production and detection. WOLF embeds role-grounded agents (Villager, Werewolf, Seer, Doctor) in a programmable LangGraph state machine with strict night-day cycles, debate turns, and majority voting. Every statement is a distinct analysis unit, with self-assessed honesty from speakers and peer-rated deceptiveness from others. Deception is categorized via a standardized taxonomy (omission, distortion, fabrication, misdirection), while suspicion scores are longitudinally smoothed to capture both immediate judgments and evolving trust dynamics. Structured logs preserve prompts, outputs, and state transitions for full reproducibility. Across 7,320 statements and 100 runs, Werewolves produce deceptive statements in 31% of turns, while peer detection achieves 71-73% precision with ~52% overall accuracy. Precision is higher for identifying Werewolves, though false positives occur against Villagers. Suspicion toward Werewolves rises from ~52% to over 60% across rounds, while suspicion toward Villagers and the Doctor stabilizes near 44-46%. This divergence shows that extended interaction improves recall against liars without compounding errors against truthful roles. WOLF moves deception evaluation beyond static datasets, offering a dynamic, controlled testbed for measuring deceptive and detective capacity in adversarial multi-agent interaction.
comment: Spotlight Multi-Turn Interactions in Large Language Models (MTI-LLM) Workshop at NeurIPS 2025
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Large Language Models (LLMs) have demonstrated impressive fluency and reasoning capabilities, but their potential for misuse has raised growing concern. In this paper, we present ScamAgent, an autonomous multi-turn agent built on top of LLMs, capable of generating highly realistic scam call scripts that simulate real-world fraud scenarios. Unlike prior work focused on single-shot prompt misuse, ScamAgent maintains dialogue memory, adapts dynamically to simulated user responses, and employs deceptive persuasion strategies across conversational turns. We show that current LLM safety guardrails, including refusal mechanisms and content filters, are ineffective against such agent-based threats. Even models with strong prompt-level safeguards can be bypassed when prompts are decomposed, disguised, or delivered incrementally within an agent framework. We further demonstrate the transformation of scam scripts into lifelike voice calls using modern text-to-speech systems, completing a fully automated scam pipeline. Our findings highlight an urgent need for multi-turn safety auditing, agent-level control frameworks, and new methods to detect and disrupt conversational deception powered by generative AI.
comment: Accepted at CAMLIS 25: Conference on Applied Machine Learning for Information Security. 19 pages, 3 figures
DiscoVerse: Multi-Agent Pharmaceutical Co-Scientist for Traceable Drug Discovery and Reverse Translation
Pharmaceutical research and development has accumulated vast and heterogeneous archives of data. Much of this knowledge stems from discontinued programs, and reusing these archives is invaluable for reverse translation. However, in practice, such reuse is often infeasible. In this work, we introduce DiscoVerse, a multi-agent co-scientist designed to support pharmaceutical research and development at Roche. Designed as a human-in-the-loop assistant, DiscoVerse enables domain-specific queries by delivering evidence-based answers: it retrieves relevant data, links across documents, summarises key findings and preserves institutional memory. We assess DiscoVerse through expert evaluation of source-linked outputs. Our evaluation spans a selected subset of 180 molecules from Roche's research and development repositories, encompassing over 0.87 billion BPE tokens and more than four decades of research. To our knowledge, this represents the first agentic framework to be systematically assessed on real pharmaceutical data for reverse translation, enabled by authorized access to confidential archives covering the full lifecycle of drug development. Our contributions include: role-specialized agent designs aligned with scientist workflows; human-in-the-loop support for reverse translation; expert evaluation; and a large-scale demonstration showing promising decision-making insights. In brief, across seven benchmark queries, DiscoVerse achieved near-perfect recall ($\geq 0.99$) with moderate precision ($0.71-0.91$). Qualitative assessments and three real-world pharmaceutical use cases further showed faithful, source-linked synthesis across preclinical and clinical evidence.
comment: 24 pages, 5 figures, 3 tables. Updated version: added three pharmaceutical industry use cases and revised text for clarity
CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball
This paper presents CourtMotion, a spatiotemporal modeling framework for analyzing and predicting game events and plays as they develop in professional basketball. Anticipating basketball events requires understanding both physical motion patterns and their semantic significance in the context of the game. Traditional approaches that use only player positions fail to capture crucial indicators such as body orientation, defensive stance, or shooting preparation motions. Our two-stage approach first processes skeletal tracking data through Graph Neural Networks to capture nuanced motion patterns, then employs a Transformer architecture with specialized attention mechanisms to model player interactions. We introduce event projection heads that explicitly connect player movements to basketball events like passes, shots, and steals, training the model to associate physical motion patterns with their tactical purposes. Experiments on NBA tracking data demonstrate significant improvements over position-only baselines: 35% reduction in trajectory prediction error compared to state-of-the-art position-based models and consistent performance gains across key basketball analytics tasks. The resulting pretrained model serves as a powerful foundation for multiple downstream tasks, with pick detection, shot taker identification, assist prediction, shot location classification, and shot type recognition demonstrating substantial improvements over existing methods.
Shift Bribery over Social Networks
In shift bribery, a briber seeks to promote his preferred candidate by paying voters to raise their ranking. Classical models of shift bribery assume voters act independently, overlooking the role of social influence. However, in reality, individuals are social beings and are often represented as part of a social network, where bribed voters may influence their neighbors, thereby amplifying the effect of persuasion. We study Shift bribery over Networks, where voters are modeled as nodes in a directed weighted graph, and arcs represent social influence between them. In this setting, bribery is not confined to directly targeted voters its effects can propagate through the network, influencing neighbors and amplifying persuasion. Given a budget and individual cost functions for shifting each voter's preference toward a designated candidate, the goal is to determine whether a shift strategy exists within budget that ensures the preferred candidate wins after both direct and network-propagated influence takes effect. We show that the problem is NP-Complete even with two candidates and unit costs, and W[2]-hard when parameterized by budget or maximum degree. On the positive side, we design polynomial-time algorithms for complete graphs under plurality and majority rules and path graphs for uniform edge weights, linear-time algorithms for transitive tournaments for two candidates, linear cost functions and uniform arc weights, and pseudo-polynomial algorithms for cluster graphs. We further prove the existence of fixed-parameter tractable algorithms with treewidth as parameter for two candidates, linear cost functions and uniform arc weights and pseudo-FPT with cluster vertex deletion number for two candidates and uniform arc weights. Together, these results give a detailed complexity landscape for shift bribery in social networks.
DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents
Spatial task allocation in systems such as multi-robot delivery or ride-sharing requires balancing efficiency with fair service across tasks. Greedy assignment policies that match each agent to its highest-preference or lowest-cost task can maximize efficiency but often create inequities: some tasks receive disproportionately favorable service (e.g., shorter delays or better matches), while others face long waits or poor allocations. We study fairness in heterogeneous multi-agent systems where tasks vary in preference alignment and urgency. Most existing approaches either assume centralized coordination or largely ignore fairness under partial observability. Distinct from this prior work, we establish a connection between the Eisenberg-Gale (EG) equilibrium convex program and decentralized, partially observable multi-agent learning. Building on this connection, we develop two equilibrium-informed algorithms that integrate fairness and efficiency: (i) a multi-agent reinforcement learning (MARL) framework, EG-MARL, whose training is guided by a centralized EG equilibrium assignment algorithm; and (ii) a stochastic online optimization mechanism that performs guided exploration and subset-based fair assignment as tasks are discovered. We evaluate on Multi-Agent Particle Environment (MPE) simulations across varying team sizes against centralized EG, Hungarian, and Min-Max distance baselines, and also present a Webots-based warehouse proof-of-concept with heterogeneous robots. Both methods preserve the fairness-efficiency balance of the EG solution under partial observability, with EG-MARL achieving near-centralized coordination and reduced travel distances, and the online mechanism enabling real-time allocation with competitive fairness.
Systems and Control (EESS)
Saturation-based robustly optimal hierarchical operation control of microgrids
This paper studies the problem of robustly optimal operation control of microgrids with a high share of renewable energy sources. The main goal is to ensure optimal operation under a wide range of circumstances, given the highly intermittent and uncertain nature of renewable sources and load demand. We formally state this problem, and, in order to solve it, we make effective use of the hierarchical power system control approach. We consider an enhanced primary control layer including droop control and autonomous limitation of power and energy. We prove that this enables the use of constant power setpoints to achieve optimal operation under certain conditions. In order to relax these conditions, the approach is combined with an energy management system, which solves a robust unit commitment problem within a model predictive control framework. Finally, a case study demonstrates the viability of the control design.
IoT-based Cost-Effective Fruit Quality Monitoring System using Electronic Nose
Post-harvest losses due to subjective quality assessment cause significant damage to the economy and food safety, especially in countries like Bangladesh. To mitigate such damages, objective decision-making backed by scientific methods is necessary. An IoT-based, cost-effective quality monitoring system can provide a solution by going beyond subjective quality monitoring and decision-making practices. Here, we propose a low-power, cost-effective fruit quality monitoring system with an array of MQ gas sensors, which can be used as an electronic nose. We track the volatile gas emissions, specifically ethanol, methane, and ammonia, encompassing both ripening and decomposition for a set of bananas. Based on the gas concentration thresholds, we develop a mathematical model to accurately assess fruit quality. We also integrate this information into a dashboard for prompt decision-making and monitoring to make it useful to the farmers. This approach has the potential to reduce economic losses, enhance food safety, and provide scalable solutions for the supply chain.
comment: Paper Accepted in ICCIT 2025 Conference
LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design
The success of large language models LLMs amplifies the need for highthroughput energyefficient inference at scale. 3DDRAMbased accelerators provide high memory bandwidth and therefore an opportunity to accelerate the bandwidthbound decode phase. However, how to adequately balance compute density for prefill with bandwidthcapacity for decode remains open. Moreover, most prior designs do not target endtoend serving, leaving the codesign of dataflow, parallel mapping, and scheduling underexplored. To bridge the gap, we present LaMoSys3.5D, to our knowledge the first scalable 3.5DIC architecture for LLM serving. LaMoSys3.5D composes heterogeneous 3DDRAM chiplets on a 2.5D interposer: computerich chiplets for prefill and bandwidthcapacityrich chiplets for decode. To realize efficient serving, we adopt a hardwaresoftware codesign spanning dataflow, parallel mapping, and introduce a thermalaware modeling and hierarchical designspace exploration framework. Across diverse LLMs and workloads, LaMoSys3.5D improves throughputperwatt over DGXA100 systems by 62 and achieves a 4.87 better endtoend latency geomean versus prior 3D designs. We further distill intriguing design guidelines for 3.5DIC architectures and endtoend inference serving.
Gradient-Informed Monte Carlo Fine-Tuning of Diffusion Models for Low-Thrust Trajectory Design
Preliminary mission design of low-thrust spacecraft trajectories in the Circular Restricted Three-Body Problem is a global search characterized by a complex objective landscape and numerous local minima. Formulating the problem as sampling from an unnormalized distribution supported on neighborhoods of locally optimal solutions, provides the opportunity to deploy Markov chain Monte Carlo methods and generative machine learning. In this work, we extend our previous self-supervised diffusion model fine-tuning framework to employ gradient-informed Markov chain Monte Carlo. We compare two algorithms - the Metropolis-Adjusted Langevin Algorithm and Hamiltonian Monte Carlo - both initialized from a distribution learned by a diffusion model. Derivatives of an objective function that balances fuel consumption, time of flight and constraint violations are computed analytically using state transition matrices. We show that incorporating the gradient drift term accelerates mixing and improves convergence of the Markov chain for a multi-revolution transfer in the Saturn-Titan system. Among the evaluated methods, MALA provides the best trade-off between performance and computational cost. Starting from samples generated by a baseline diffusion model trained on a related transfer, MALA explicitly targets Pareto-optimal solutions. Compared to a random walk Metropolis algorithm, it increases the feasibility rate from 17.34% to 63.01% and produces a denser, more diverse coverage of the Pareto front. By fine-tuning a diffusion model on the generated samples and associated reward values with reward-weighted likelihood maximization, we learn the global solution structure of the problem and eliminate the need for a tedious separate data generation phase.
Direct transfer of optimized controllers to similar systems using dimensionless MPC
Scaled model experiments are commonly used in various engineering fields to reduce experimentation costs and overcome constraints associated with full-scale systems. The relevance of such experiments relies on dimensional analysis and the principle of dynamic similarity. However, transferring controllers to full-scale systems often requires additional tuning. In this paper, we propose a method to enable a direct controller transfer using dimensionless model predictive control, tuned automatically for closed-loop performance. With this reformulation, the closed-loop behavior of an optimized controller transfers directly to a new, dynamically similar system. Additionally, the dimensionless formulation allows for the use of data from systems of different scales during parameter optimization. We demonstrate the method on a cartpole swing-up and a car racing problem, applying either reinforcement learning or Bayesian optimization for tuning the controller parameters. Software used to obtain the results in this paper is publicly available at https://github.com/josipkh/dimensionless-mpcrl.
comment: 7 pages, 4 figures
NLoS Localization with Single Base Station Based on Radio Map
Accurate outdoor localization in Non-Line-of-Sight (NLoS) environments remains a critical challenge for wireless communication and sensing systems. Existing methods, including positioning based on the Global Navigation Satellite System (GNSS) and triple Base Stations (BSs) techniques, cannot provide reliable performance under NLoS conditions, particularly in dense urban areas with strong multipath effects. To address this limitation, we propose a single BS localization framework that integrates sequential signal measurements with prior radio information embedded in the Radio Map (RM). Using temporal measurement features and matching them with radio maps, the proposed method effectively mitigates the adverse impact of multipath propagation and reduces the dependence on LoS paths. Simulation experiments further evaluate the impact of different radio map construction strategies and the varying lengths of the measurement sequence on localization accuracy. Results demonstrate that the proposed scheme achieves sub-meter positioning accuracy in typical NLoS environments, highlighting its potential as a practical and robust solution for single-base-station deployment.
Decoupled Design of Time-Varying Control Barrier Functions via Equivariances
This article presents a systematic method for designing time-varying Control Barrier Functions (CBF) composed of a time-invariant component and multiple time-dependent components, leveraging structural properties of the system dynamics. The method involves the construction of a specific class of time-invariant CBFs that encode the system's dynamic capabilities with respect to a given constraint, and augments them subsequently with appropriately designed time-dependent transformations. While transformations uniformly varying the time-invariant CBF can be applied to arbitrary systems, transformations exploiting structural properties in the dynamics - equivariances in particular - enable the handling of a broader and more expressive class of time-varying constraints. The article shows how to leverage such properties in the design of time-varying CBFs. The proposed method decouples the design of time variations from the computationally expensive construction of the underlying CBFs, thereby providing a computationally attractive method to the design of time-varying CBFs. The method accounts for input constraints and under-actuation, and requires only qualitative knowledge on the time-variation of the constraints making it suitable to the application in uncertain environments.
comment: 7 pages, 3 figures
The SMART+ Framework for AI Systems
Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibility screening for protocol enrollment, and data quality validation. Beyond healthcare, AI is transforming finance through real-time fraud detection, automated loan risk assessment, and algorithmic decision-making. Similarly, in manufacturing, AI enables predictive maintenance to reduce equipment downtime, enhances quality control through computer-vision inspection, and optimizes production workflows using real-time operational data. While these technologies enhance operational efficiency, they introduce new challenges regarding safety, accountability, and regulatory compliance. To address these concerns, we introduce the SMART+ Framework - a structured model built on the pillars of Safety, Monitoring, Accountability, Reliability, and Transparency, and further enhanced with Privacy & Security, Data Governance, Fairness & Bias, and Guardrails. SMART+ offers a practical, comprehensive approach to evaluating and governing AI systems across industries. This framework aligns with evolving mechanisms and regulatory guidance to integrate operational safeguards, oversight procedures, and strengthened privacy and governance controls. SMART+ demonstrates risk mitigation, trust-building, and compliance readiness. By enabling responsible AI adoption and ensuring auditability, SMART+ provides a robust foundation for effective AI governance in clinical research.
Optimal Control of Behavioral-Feedback SIR Epidemic Model
We consider a behavioral-feedback SIR epidemic model, in which the infection rate depends in feedback on the fractions of susceptible and infected agents, respectively. The considered model allows one to account for endogenous adaptation mechanisms of the agents in response to the epidemics, such as voluntary social distancing, or the adoption of face masks. For this model, we formulate an optimal control problem for a social planner that has the ability to reduce the infection rate to keep the infection curve below a certain threshold within an infinite time horizon, while minimizing the intervention cost. Based on the dynamic properties of the model, we prove that, under quite general conditions on the infection rate, the \emph{filling the box} strategy is the optimal control. This strategy consists in letting the epidemics spread without intervention until the threshold is reached, then applying the minimum control that leaves the fraction of infected individuals constantly at the threshold until the reproduction number becomes less than one and the infection naturally fades out. Our result generalizes one available in the literature for the equivalent problem formulated for the classical SIR model, which can be recovered as a special case of our model when the infection rate is constant. Our contribution enhances the understanding of epidemic management with adaptive human behavior, offering insights for robust containment strategies.
comment: 14 pages, 3 figures
High-performance computing enabled contingency analysis for modern power networks
Modern power networks face increasing vulnerability to cascading failures due to high complexity and the growing penetration of intermittent resources, necessitating rigorous security assessment beyond the conventional $N-1$ criterion. Current approaches often struggle to achieve the computational tractability required for exhaustive $N-2$ contingency analysis integrated with complex stability evaluations like small-signal stability. Addressing this computational bottleneck and the limitations of deterministic screening, this paper presents a scalable methodology for the vulnerability assessment of modern power networks, integrating $N-2$ contingency analysis with small-signal stability evaluation. To prioritize critical components, we propose a probabilistic \textbf{Risk Index ($R_i$)} that weights the deterministic \textit{severity} of a contingency (including optimal power flow divergence, islanding, and oscillatory instability) by the \textit{failure frequency} of the involved elements based on reliability data. The proposed framework is implemented using High-Performance Computing (HPC) techniques through the PyCOMPSs parallel programming library, orchestrating optimal power flow simulations (VeraGrid) and small-signal analysis (STAMP) to enable the exhaustive exploration of massive contingency sets. The methodology is validated on the IEEE 118-bus test system, processing more than \num{57000} scenarios to identify components prone to triggering cascading failures. Results demonstrate that the risk-based approach effectively isolates critical assets that deterministic $N-1$ criteria often overlook. This work establishes a replicable and efficient workflow for probabilistic security assessment, suitable for large-scale networks and capable of supporting operator decision-making in near real-time environments.
comment: 10 apges, 5 figures, pending to be submitted on IJEPES
Quantization and Security Parameter Design for Overflow-Free Confidential FRIT
This study proposes a systematic design procedure for determining the quantization gain and the security parameter in the Confidential Fictitious Reference Iterative Tuning (CFRIT), enabling overflow-free and accuracy-guaranteed encrypted controller tuning. Within an encrypted data-driven gain tuning, the range of quantization errors induced during the encoding (encryption) process can be estimated from operational data. Based on this insight, explicit analytical conditions on the quantization gain and the security parameter are derived to prevent overflow in computing over encrypted data. Furthermore, the analysis reveals a quantitative relationship between quantization-induced errors and the deviation between the gains obtained by CFRIT and non-confidential Fictitious Reference Iterative Tuning (FRIT), clarifying how parameter choice affects tuning accuracy. A numerical example verifies the proposed procedure by demonstrating that the designed parameters achieve accurate encrypted tuning within a prescribed tolerance while preventing overflow. In addition, the admissible region of parameter combinations is visualized to examine the characteristics of feasible and infeasible regions, providing practical insights into parameter design for encrypted data-driven control.
Using reinforcement learning to probe the role of feedback in skill acquisition
Many high-performance human activities are executed with little or no external feedback: think of a figure skater landing a triple jump, a pitcher throwing a curveball for a strike, or a barista pouring latte art. To study the process of skill acquisition under fully controlled conditions, we bypass human subjects. Instead, we directly interface a generalist reinforcement learning agent with a spinning cylinder in a tabletop circulating water channel to maximize or minimize drag. This setup has several desirable properties. First, it is a physical system, with the rich interactions and complex dynamics that only the physical world has: the flow is highly chaotic and extremely difficult, if not impossible, to model or simulate accurately. Second, the objective -- drag minimization or maximization -- is easy to state and can be captured directly in the reward, yet good strategies are not obvious beforehand. Third, decades-old experimental studies provide recipes for simple, high-performance open-loop policies. Finally, the setup is inexpensive and far easier to reproduce than human studies. In our experiments we find that high-dimensional flow feedback lets the agent discover high-performance drag-control strategies with only minutes of real-world interaction. When we later replay the same action sequences without any feedback, we obtain almost identical performance. This shows that feedback, and in particular flow feedback, is not needed to execute the learned policy. Surprisingly, without flow feedback during training the agent fails to discover any well-performing policy in drag maximization, but still succeeds in drag minimization, albeit more slowly and less reliably. Our studies show that learning a high-performance skill can require richer information than executing it, and learning conditions can be kind or wicked depending solely on the goal, not on dynamics or policy complexity.
comment: Website: https://antonioterpin.com/fluids-control
MPC for tracking for anesthesia dynamics
In this paper, an MPC for tracking formulation is proposed for the control of anesthesia dynamics. It seamlessly enables the optimization of the steady-states pair that is not unique due to the MISO nature of the model. Anesthesia dynamics is a multi-time scale system with two types of states characterized, respectively, by fast and slow dynamics. In anesthesia control, the output equation depends only on the fast dynamics. Therefore, the slow states can be treated as disturbances, and compensation terms can be introduced. Subsequently, the system can be reformulated as a nominal one allowing the design of an MPC for tracking strategy. The presented framework ensures recursive feasibility and asymptotic stability, through the design of appropriate terminal ingredients in the MPC for tracking framework. The controller performance is then assessed on a patient in a simulation environment.
Beyond Wave Variables: A Data-Driven Ensemble Approach for Enhanced Teleoperation Transparency and Stability
Time delays in communication channels present significant challenges for bilateral teleoperation systems, affecting both transparency and stability. Although traditional wave variable-based methods for a four-channel architecture ensure stability via passivity, they remain vulnerable to wave reflections and disturbances like variable delays and environmental noise. This article presents a data-driven hybrid framework that replaces the conventional wave-variable transform with an ensemble of three advanced sequence models, each optimized separately via the state-of-the-art Optuna optimizer, and combined through a stacking meta-learner. The base predictors include an LSTM augmented with Prophet for trend correction, an LSTM-based feature extractor paired with clustering and a random forest for improved regression, and a CNN-LSTM model for localized and long-term dynamics. Experimental validation was performed in Python using data generated from the baseline system implemented in MATLAB/Simulink. The results show that our optimized ensemble achieves a transparency comparable to the baseline wave-variable system under varying delays and noise, while ensuring stability through passivity constraints.
comment: 14 pages, 8 figures, 5 tables
Integration of AI-Driven CAD Systems in Designing Water and Power Transportation Infrastructure for Industrial and Remote Landscape Applications
The integration of AI into CAD systems transforms how engineers plan and develop infrastructure projects involving water and power transportation across industrial and remote landscapes. This paper discusses how AI-driven CAD systems improve the efficient, effective, and sustainable design of infrastructure by embedding automation, predictive modeling, and real-time data analytics. This study examines how AI-supported toolsets can enhance design workflows, minimize human error, and optimize resource allocation for projects in underdeveloped environments. It also addresses technical and organizational challenges to AI adoption, including data silos, interoperability issues, and workforce adaptation. The findings demonstrate that AI-powered CAD enables faster project delivery, enhanced design precision, and increased resilience to environmental and logistical constraints. AI helps connect CAD, GIS, and IoT technologies to develop self-learning, adaptive design systems that are needed to meet the increasing global demand for sustainable infrastructure.
comment: 22 pages total, 3 figures
Möbius Transformations and the Analytic--Geometric Reconstruction of the Induction--Machine Circle Diagram
The Heyland circle diagram is a classical graphical method for representing the steady--state behavior of induction machines using no--load and blocked--rotor test data. Despite its long pedagogical history, the traditional geometric construction has not been formalized within a closed analytic framework. This note develops a complete Euclidean reconstruction of the diagram using only the two measured phasors and elementary geometric operations, yielding a unique circle, a torque chord, a slip scale, and a maximum--torque point. We prove that this constructed circle coincides precisely with the analytic steady--state current locus obtained from the per--phase equivalent circuit. A Möbius transformation interpretation reveals the complex--analytic origin of the diagram's circularity and offers a compact explanation of its geometric structure.
comment: 10 pages, 1 figure
Formation and Investigation of Cooperative Platooning at the Early Stage of Connected and Automated Vehicles Deployment
Cooperative platooning, enabled by cooperative adaptive cruise control (CACC), is a cornerstone technology for connected automated vehicles (CAVs), offering significant improvements in safety, comfort, and traffic efficiency over traditional adaptive cruise control (ACC). This paper addresses a key challenge in the initial deployment phase of CAVs: the limited benefits of cooperative platooning due to the sparse distribution of CAVs on the road. To overcome this limitation, we propose an innovative control framework that enhances cooperative platooning in mixed traffic environments. Two techniques are utilized: (1) a mixed cooperative platooning strategy that integrates CACC with unconnected vehicles (CACCu), and (2) a strategic lane-change decision model designed to facilitate safe and efficient lane changes for platoon formation. Additionally, a surrounding vehicle identification system is embedded in the framework to enable CAVs to effectively identify and select potential platooning leaders. Simulation studies across various CV market penetration rates (MPRs) show that incorporating CACCu systems significantly improves safety, comfort, and traffic efficiency compared to existing systems with only CACC and ACC systems, even at CV penetration as low as 10%. The maximized platoon formation increases by up to 24%, accompanied by an 11% reduction in acceleration and a 7% decrease in fuel consumption. Furthermore, the strategic lane-change model enhances CAV performance, achieving notable improvements between 6% and 60% CV penetration, without adversely affecting overall traffic flow.
Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamically infeasible. We propose Model Predictive Diffuser (MPDiffuser), a compositional model-based diffusion framework consisting of: (i) a planner that generates diverse, task-aligned trajectories; (ii) a dynamics model that enforces consistency with the underlying system dynamics; and (iii) a ranker module that selects behaviors aligned with the task objectives. MPDiffuser employs an alternating diffusion sampling scheme, where planner and dynamics updates are interleaved to progressively refine trajectories for both task alignment and feasibility during the sampling process. We also provide a theoretical rationale for this procedure, showing how it balances fidelity to data priors with dynamics consistency. Empirically, the compositional design improves sample efficiency, as it leverages even low-quality data for dynamics learning and adapts seamlessly to novel dynamics. We evaluate MPDiffuser on both unconstrained (D4RL) and constrained (DSRL) offline decision-making benchmarks, demonstrating consistent gains over existing approaches. Furthermore, we present a preliminary study extending MPDiffuser to vision-based control tasks, showing its potential to scale to high-dimensional sensory inputs. Finally, we deploy our method on a real quadrupedal robot, showcasing its practicality for real-world control.
Theoretical Studies of Sub-THz Active Split-Ring Resonators for Near-Field Imaging
This paper develops a theoretical framework for the design of Active Split-Ring Resonators (ASRRs). An ASRR is a Split-Ring Resonator (SRR) equipped with a tunable negative resistor, enabling both switchability and quality factor boosting and tuning. These properties make ASRRs well-suited for integration into dense arrays on silicon chips, where pixelated near-fields are generated and leveraged for high-resolution 2D imaging of samples. Such imagers pave the way for real-time, non-invasive, and low-cost imaging of human body tissue. The paper investigates ASRR coupling to host transmission lines, nonlinear effects, signal flow, and the influence of various noise sources on detection performance. Verified through simulations, these studies provide design guidelines for optimizing the Signal-to-Noise Ratio (SNR) and power consumption of a single pixel, while adhering to the constraints of a scalable array.
Bounding the Minimal Current Harmonic Distortion in Optimal Modulation of Single-Phase Power Converters
Optimal pulse patterns (OPPs) are a modulation technique in which a switching signal is computed offline through an optimization process that accounts for selected performance criteria, such as current harmonic distortion. The optimization determines both the switching angles (i.e., switching times) and the pattern structure (i.e., the sequence of voltage levels). This optimization task is a challenging mixed-integer nonconvex problem, involving integer-valued voltage levels and trigono metric nonlinearities in both the objective and the constraints. We address this challenge by reinterpreting OPP design as a periodic mode-selecting optimal control problem of a hybrid system, where selecting angles and levels corresponds to choosing jump times in a transition graph. This time-domain formulation enables the direct use of convex-relaxation techniques from optimal control, producing a hierarchy of semidefinite programs that lower-bound the minimal achievable harmonic distortion and scale subquadratically with the number of converter levels and switching angles. Numerical results demonstrate the effectiveness of the proposed approachs
comment: 18 pages, 6 tables, 18 figures
Integrating Delay-Absorption Capability into Flight Departure Delay Prediction
Accurately forecasting flight departure delays is essential for improving operational efficiency and mitigating the cascading disruptions that propagate through tightly coupled aircraft rotations. Traditional machine learning approaches often treat upstream delays as static variables, overlooking the dynamic recovery processes that determine whether a delay is absorbed or transmitted to subsequent legs. This study introduces a two-stage machine learning framework that explicitly models delay-absorption behavior and incorporates it into downstream delay prediction. In Stage I, a CatBoost classifier estimates the probability that a flight successfully absorbs an upstream delay based on operational, temporal, and meteorological features. This probability, termed AbsorbScore, quantifies airport- and flight-specific resilience to delay propagation. In Stage II, an XGBoost classifier integrates AbsorbScore with schedule, weather, and congestion indicators to predict whether a flight will depart more than 15 minutes late. Using U.S. domestic flight and NOAA weather data from Summer 2023, the proposed framework achieves substantial improvements over baseline models, increasing ROC-AUC from 0.865 to 0.898 and enhancing precision to 89.2% in identifying delayed flights. The results demonstrate that modeling delay absorption as an intermediate mechanism significantly improves predictive performance and yields interpretable insights into airport recovery dynamics, offering a practical foundation for data-driven delay management and proactive operational planning.
comment: 12 pages, 9 figures
A Dynamic Coding Scheme to Prevent Covert Cyber-Attacks in Cyber-Physical Systems
In this paper, we address two main problems in the context of covert cyber-attacks in cyber-physical systems (CPS). First, we aim to investigate and develop necessary and sufficient conditions in terms of disruption resources of the CPS that enable adversaries to execute covert cyber-attacks. These conditions can be utilized to identify the input and output communication channels that are needed by adversaries to execute these attacks. Second, this paper introduces and develops a dynamic coding scheme as a countermeasure against covert cyber-attacks. Under certain conditions and assuming the existence of one secure input and two secure output communication channels, the proposed dynamic coding scheme prevents adversaries from executing covert cyber-attacks. A numerical case study of a flight control system is provided to demonstrate the capabilities of our proposed and developed dynamic coding scheme.
Characterizing Human Feedback-Based Control in Naturalistic Driving Interactions via Gaussian Process Regression with Linear Feedback
Understanding driver interactions is critical to designing autonomous vehicles to interoperate safely with human-driven cars. We consider the impact of these interactions on the policies drivers employ when navigating unsigned intersections in a driving simulator. The simulator allows the collection of naturalistic decision-making and behavior data in a controlled environment. Using these data, we model the human driver responses as state-based feedback controllers learned via Gaussian Process regression methods. We compute the feedback gain of the controller using a weighted combination of linear and nonlinear priors. We then analyze how the individual gains are reflected in driver behavior. We also assess differences in these controllers across populations of drivers. Our work in data-driven analyses of how drivers determine their policies can facilitate future work in the design of socially responsive autonomy for vehicles.
Cyqlone: A Parallel, High-Performance Linear Solver for Optimal Control
We present Cyqlone, a solver for linear systems with a stage-wise optimal control structure that fully exploits the various levels of parallelism available in modern hardware. Cyqlone unifies algorithms based on the sequential Riccati recursion, parallel Schur complement methods, and cyclic reduction methods, thereby minimizing the required number of floating-point operations, while allowing parallelization across a user-configurable number of processors. Given sufficient parallelism, the solver run time scales with the logarithm of the horizon length (in contrast to the linear scaling of sequential Riccati-based methods), enabling real-time solution of long-horizon problems. Beyond multithreading on multi-core processors, implementations of Cyqlone can also leverage vectorization using batched linear algebra routines. Such batched routines exploit data parallelism using single instruction, multiple data (SIMD) operations, and expose a higher degree of instruction-level parallelism than their non-batched counterparts. This enables them to significantly outperform BLAS and BLASFEO for the small matrices that arise in optimal control. Building on this high-performance linear solver, we develop CyQPALM, a parallel and optimal-control-specific variant of the QPALM quadratic programming solver. It combines the parallel and vectorized linear algebra operations from Cyqlone with a parallel line search and parallel factorization updates, resulting in order-of-magnitude speedups compared to the state-of-the-art HPIPM solver. Open-source C++ implementations of Cyqlone and CyQPALM are available at https://github.com/kul-optec/cyqlone
Obstacle Avoidance of UAV in Dynamic Environments Using Direction and Velocity-Adaptive Artificial Potential Field
The conventional Artificial Potential Field (APF) is fundamentally limited by the local minima issue and its inability to account for the kinematics of moving obstacles. This paper addresses the critical challenge of autonomous collision avoidance for Unmanned Aerial Vehicles (UAVs) operating in dynamic and cluttered airspace by proposing a novel Direction and Relative Velocity Weighted Artificial Potential Field (APF). In this approach, a bounded weighting function, $ω(θ,v_{e})$, is introduced to dynamically scale the repulsive potential based on the direction and velocity of the obstacle relative to the UAV. This robust APF formulation is integrated within a Model Predictive Control (MPC) framework to generate collision-free trajectories while adhering to kinematic constraints. Simulation results demonstrate that the proposed method effectively resolves local minima and significantly enhances safety by enabling smooth, predictive avoidance maneuvers. The system ensures superior path integrity and reliable performance, confirming its viability for autonomous navigation in complex environments.
Asymptotic stability equals exponential stability -- while you twist your eyes
Suppose that two vector fields on a smooth manifold render some equilibrium point globally asymptotically stable (GAS). We show that there exists a homotopy between the corresponding semiflows such that this point remains GAS along this homotopy.
comment: Update: only minor changes
TenonOS: A Self-Generating LibOS-on-LibOS Framework for Time-Critical Embedded Operating Systems
The growing complexity of embedded systems creates tension between rich functionality and strict resource and real-time constraints. Traditional monolithic operating system and hypervisor designs suffer from resource bloat and unpredictable scheduling, making them unsuitable for time-critical workloads where low latency and low jitter are essential. We propose TenonOS, a demand-driven, self-generating, lightweight operating system framework for time-critical embedded systems that rethinks both hypervisor and operating system architectures. TenonOS introduces a LibOS-on-LibOS model that decomposes hypervisor and operating system functionality into fine-grained, reusable micro-libraries. A generative orchestration engine dynamically composes these libraries to synthesize a customized runtime tailored to each application's criticality, timing requirements, and resource profile. TenonOS consists of two core components: Mortise, a minimalist micro-hypervisor, and Tenon, a real-time library operating system. Mortise provides lightweight isolation and removes the usual double-scheduler overhead in virtualized setups, while Tenon provides precise and deterministic task management. By generating only the necessary software stack per workload, TenonOS removes redundant layers, minimizes the trusted computing base, and maximizes responsiveness. Experiments show a 40.28 percent reduction in scheduling latency, an ultra-compact 361 KiB memory footprint, and strong adaptability.
Beyond Formal Semantics for Capabilities and Skills: Model Context Protocol in Manufacturing
Explicit modeling of capabilities and skills -- whether based on ontologies, Asset Administration Shells, or other technologies -- requires considerable manual effort and often results in representations that are not easily accessible to Large Language Models (LLMs). In this work-in-progress paper, we present an alternative approach based on the recently introduced Model Context Protocol (MCP). MCP allows systems to expose functionality through a standardized interface that is directly consumable by LLM-based agents. We conduct a prototypical evaluation on a laboratory-scale manufacturing system, where resource functions are made available via MCP. A general-purpose LLM is then tasked with planning and executing a multi-step process, including constraint handling and the invocation of resource functions via MCP. The results indicate that such an approach can enable flexible industrial automation without relying on explicit semantic models. This work lays the basis for further exploration of external tool integration in LLM-driven production systems.
comment: \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
comment: Corrected "DRO" to "DRC" and fixed theorem numbering throughout paper
Robust H-infinity control and worst-case search in constrained parametric space
Standard H-infinity/H2 robust control and analysis tools operate on uncertain parameters assumed to vary independently within prescribed bounds. This paper extends their capabilities in the presence of constraints coupling these parameters and restricting the parametric space. Focusing on the worst-case search, we demonstrate - based on the theory of upper-C1 functions - the validity of using standard, readily available smooth optimization algorithms to address this nonsmooth constrained optimization problem. In particular, we prove that the sequential quadratic programming algorithm converges to Karush-Kuhn-Tucker points, and that such conditions are satisfied by any subgradient at a local minimum. This worst-case search then enables robust controller synthesis: as in the state-of-art algorithm for standard robust control, identified worst-case configurations are iteratively added to an active set on which a non-smooth multi-models optimization of the controller is performed. The methodology is illustrated on a satellite benchmark with flexible appendages, of order 50 with 43 uncertain parameters. From a practical point of view, we combine the local exploitation proposed above with a global exploration using either Monte-Carlo sampling or particle swarm optimization. We show that the proposed constrained optimization effectively complements Monte-Carlo sampling by enabling fast detection of rare worst-case configurations, and that the robust controller optimization converges with less than 10 active configurations.
comment: Preprint
Optimal Transmission Switching and Busbar Splitting in Hybrid AC/DC Grids
Driven by global climate goals, an increasing amount of Renewable Energy Sources (RES) is currently being installed worldwide. Especially in the context of offshore wind integration, hybrid AC/DC grids are considered to be the most effective technology to transmit this RES power over long distances. As hybrid AC/DC systems develop, they are expected to become increasingly complex and meshed as the current AC system. Nevertheless, there is still limited literature on how to optimize hybrid AC/DC topologies while minimizing the total power generation cost. For this reason, this paper proposes a methodology to optimize the steady-state switching states of transmission lines and busbar configurations in hybrid AC/DC grids. The proposed optimization model includes optimal transmission switching (OTS) and busbar splitting (BS), which can be applied to both AC and DC parts of hybrid AC/DC grids. To solve the problem, a scalable and exact nonlinear, non-convex model using a big M approach is formulated. In addition, convex relaxations and linear approximations of the model are tested, and their accuracy, feasibility, and optimality are analyzed. The numerical experiments show that a solution to the combined OTS/BS problem can be found in acceptable computation time and that the investigated relaxations and linearisations provide AC feasible results.
Haptic-based Complementary Filter for Rigid Body Rotations
The non-commutative nature of 3D rotations poses well-known challenges in generalizing planar problems to three-dimensional ones, even more so in contact-rich tasks where haptic information (i.e., forces/torques) is involved. In this sense, not all learning-based algorithms that are currently available generalize to 3D orientation estimation. Non-linear filters defined on $\mathbf{\mathbb{SO}(3)}$ are widely used with inertial measurement sensors; however, none of them have been used with haptic measurements. This paper presents a unique complementary filtering framework that interprets the geometric shape of objects in the form of superquadrics, exploits the symmetry of $\mathbf{\mathbb{SO}(3)}$, and uses force and vision sensors as measurements to provide an estimate of orientation. The framework's robustness and almost global stability are substantiated by a set of experiments on a dual-arm robotic setup.
comment: 7 pages, 7 figures; Updated filter design; Submitted to IFAC for possible publication
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 28 pages, 11 figures
Symmetry-Based Formation Control on Cycle Graphs Using Dihedral Point Groups
This work develops a symmetry-based framework for formation control on cycle graphs using Dihedral point-group constraints. We show that enforcing inter-agent reflection symmetries, together with anchoring a single designated agent to its prescribed mirror axis, is sufficient to realize every $\mathcal{C}_{nv}$-symmetric configuration using only $n-1$ communication links. The resulting control laws have a matrix-weighted Laplacian structure and guarantee exponential convergence to the desired symmetric configuration. Furthermore, we extend the method to enable coordinated maneuvers along a time-varying reference trajectory. Simulation results are provided to support the theoretical analysis.
CDKFormer: Contextual Deviation Knowledge-Based Transformer for Long-Tail Trajectory Prediction
Predicting the future movements of surrounding vehicles is essential for ensuring the safe operation and efficient navigation of autonomous vehicles (AVs) in urban traffic environments. Existing vehicle trajectory prediction methods primarily focus on improving overall performance, yet they struggle to address long-tail scenarios effectively. This limitation often leads to poor predictions in rare cases, significantly increasing the risk of safety incidents. Taking Argoverse 2 motion forecasting dataset as an example, we first investigate the long-tail characteristics in trajectory samples from two perspectives, individual motion and group interaction, and deriving deviation features to distinguish abnormal from regular scenarios. On this basis, we propose CDKFormer, a Contextual Deviation Knowledge-based Transformer model for long-tail trajectory prediction. CDKFormer integrates an attention-based scene context fusion module to encode spatiotemporal interaction and road topology. An additional deviation feature fusion module is proposed to capture the dynamic deviations in the target vehicle status. We further introduce a dual query-based decoder, supported by a multi-stream decoder block, to sequentially decode heterogeneous scene deviation features and generate multimodal trajectory predictions. Extensive experiments demonstrate that CDKFormer achieves state-of-the-art performance, significantly enhancing prediction accuracy and robustness for long-tailed trajectories compared to existing methods, thus advancing the reliability of AVs in complex real-world environments.
Control Synthesis with Reinforcement Learning: A Modeling Perspective
Controllers designed with reinforcement learning can be sensitive to model mismatch. We demonstrate that designing such controllers in a virtual simulation environment with an inaccurate model is not suitable for deployment in a physical setup. Controllers designed using an accurate model is robust against disturbance and small mismatch between the physical setup and the mathematical model derived from first principles; while a poor model results in a controller that performs well in simulation but fails in physical experiments. Sensitivity analysis is used to justify these discrepancies and an empirical region of attraction estimation help us visualize their robustness.
Connectivity-Preserving Multi-Agent Area Coverage via Optimal-Transport-Based Density-Driven Optimal Control (D2OC)
Multi-agent systems play a central role in area coverage tasks across search-and-rescue, environmental monitoring, and precision agriculture. Achieving non-uniform coverage, where spatial priorities vary across the domain, requires coordinating agents while respecting dynamic and communication constraints. Density-driven approaches can distribute agents according to a prescribed reference density, but existing methods do not ensure connectivity. This limitation often leads to communication loss, reduced coordination, and degraded coverage performance. This letter introduces a connectivity-preserving extension of the Density-Driven Optimal Control (D2OC) framework. The coverage objective, defined using the Wasserstein distance between the agent distribution and the reference density, admits a convex quadratic program formulation. Communication constraints are incorporated through a smooth connectivity penalty, which maintains strict convexity, supports distributed implementation, and preserves inter-agent communication without imposing rigid formations. Simulation studies show that the proposed method consistently maintains connectivity, improves convergence speed, and enhances non-uniform coverage quality compared with density-driven schemes that do not incorporate explicit connectivity considerations.
comment: Under review in IEEE Control Systems Letters (LCSS). 6 pages
Robotics
Efficient and Compliant Control Framework for Versatile Human-Humanoid Collaborative Transportation
We present a control framework that enables humanoid robots to perform collaborative transportation tasks with a human partner. The framework supports both translational and rotational motions, which are fundamental to co-transport scenarios. It comprises three components: a high-level planner, a low-level controller, and a stiffness modulation mechanism. At the planning level, we introduce the Interaction Linear Inverted Pendulum (I-LIP), which, combined with an admittance model and an MPC formulation, generates dynamically feasible footstep plans. These are executed by a QP-based whole-body controller that accounts for the coupled humanoid-object dynamics. Stiffness modulation regulates robot-object interaction, ensuring convergence to the desired relative configuration defined by the distance between the object and the robot's center of mass. We validate the effectiveness of the framework through real-world experiments conducted on the Digit humanoid platform. To quantify collaboration quality, we propose an efficiency metric that captures both task performance and inter-agent coordination. We show that this metric highlights the role of compliance in collaborative tasks and offers insights into desirable trajectory characteristics across both high- and low-level control layers. Finally, we showcase experimental results on collaborative behaviors, including translation, turning, and combined motions such as semi circular trajectories, representative of naturally occurring co-transportation tasks.
Inchworm-Inspired Soft Robot with Groove-Guided Locomotion
Soft robots require directional control to navigate complex terrains. However, achieving such control often requires multiple actuators, which increases mechanical complexity, complicates control systems, and raises energy consumption. Here, we introduce an inchworm-inspired soft robot whose locomotion direction is controlled passively by patterned substrates. The robot employs a single rolled dielectric elastomer actuator, while groove patterns on a 3D-printed substrate guide its alignment and trajectory. Through systematic experiments, we demonstrate that varying groove angles enables precise control of locomotion direction without the need for complex actuation strategies. This groove-guided approach reduces energy consumption, simplifies robot design, and expands the applicability of bio-inspired soft robots in fields such as search and rescue, pipe inspection, and planetary exploration.
OptMap: Geometric Map Distillation via Submodular Maximization
Autonomous robots rely on geometric maps to inform a diverse set of perception and decision-making algorithms. As autonomy requires reasoning and planning on multiple scales of the environment, each algorithm may require a different map for optimal performance. Light Detection And Ranging (LiDAR) sensors generate an abundance of geometric data to satisfy these diverse requirements, but selecting informative, size-constrained maps is computationally challenging as it requires solving an NP-hard combinatorial optimization. In this work we present OptMap: a geometric map distillation algorithm which achieves real-time, application-specific map generation via multiple theoretical and algorithmic innovations. A central feature is the maximization of set functions that exhibit diminishing returns, i.e., submodularity, using polynomial-time algorithms with provably near-optimal solutions. We formulate a novel submodular reward function which quantifies informativeness, reduces input set sizes, and minimizes bias in sequentially collected datasets. Further, we propose a dynamically reordered streaming submodular algorithm which improves empirical solution quality and addresses input order bias via an online approximation of the value of all scans. Testing was conducted on open-source and custom datasets with an emphasis on long-duration mapping sessions, highlighting OptMap's minimal computation requirements. Open-source ROS1 and ROS2 packages are available and can be used alongside any LiDAR SLAM algorithm.
Toward Seamless Physical Human-Humanoid Interaction: Insights from Control, Intent, and Modeling with a Vision for What Comes Next
Physical Human-Humanoid Interaction (pHHI) is a rapidly advancing field with significant implications for deploying robots in unstructured, human-centric environments. In this review, we examine the current state of the art in pHHI through three core pillars: (i) humanoid modeling and control, (ii) human intent estimation, and (iii) computational human models. For each pillar, we survey representative approaches, identify open challenges, and analyze current limitations that hinder robust, scalable, and adaptive interaction. These include the need for whole-body control strategies capable of handling uncertain human dynamics, real-time intent inference under limited sensing, and modeling techniques that account for variability in human physical states. Although significant progress has been made within each domain, integration across pillars remains limited. We propose pathways for unifying methods across these areas to enable cohesive interaction frameworks. This structure enables us not only to map the current landscape but also to propose concrete directions for future research that aim to bridge these domains. Additionally, we introduce a unified taxonomy of interaction types based on modality, distinguishing between direct interactions (e.g., physical contact) and indirect interactions (e.g., object-mediated), and on the level of robot engagement, ranging from assistance to cooperation and collaboration. For each category in this taxonomy, we provide the three core pillars that highlight opportunities for cross-pillar unification. Our goal is to suggest avenues to advance robust, safe, and intuitive physical interaction, providing a roadmap for future research that will allow humanoid systems to effectively understand, anticipate, and collaborate with human partners in diverse real-world settings.
comment: 60 pages, 5 figures, 3 tables
UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction
Clinical ultrasound acquisition is highly operator-dependent, where rapid probe motion and brightness fluctuations often lead to reconstruction errors that reduce trust and clinical utility. We present UltrasODM, a dual-stream framework that assists sonographers during acquisition through calibrated per-frame uncertainty, saliency-based diagnostics, and actionable prompts. UltrasODM integrates (i) a contrastive ranking module that groups frames by motion similarity, (ii) an optical-flow stream fused with Dual-Mamba temporal modules for robust 6-DoF pose estimation, and (iii) a Human-in-the-Loop (HITL) layer combining Bayesian uncertainty, clinician-calibrated thresholds, and saliency maps highlighting regions of low confidence. When uncertainty exceeds the threshold, the system issues unobtrusive alerts suggesting corrective actions such as re-scanning highlighted regions or slowing the sweep. Evaluated on a clinical freehand ultrasound dataset, UltrasODM reduces drift by 15.2%, distance error by 12.1%, and Hausdorff distance by 10.1% relative to UltrasOM, while producing per-frame uncertainty and saliency outputs. By emphasizing transparency and clinician feedback, UltrasODM improves reconstruction reliability and supports safer, more trustworthy clinical workflows. Our code is publicly available at https://github.com/AnandMayank/UltrasODM.
sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only
Understanding articulated objects is a fundamental challenge in robotics and digital twin creation. To effectively model such objects, it is essential to recover both part segmentation and the underlying joint parameters. Despite the importance of this task, previous work has largely focused on setups like multi-view systems, object scanning, or static cameras. In this paper, we present the first data-driven approach that jointly predicts part segmentation and joint parameters from monocular video captured with a freely moving camera. Trained solely on synthetic data, our method demonstrates strong generalization to real-world objects, offering a scalable and practical solution for articulated object understanding. Our approach operates directly on casually recorded video, making it suitable for real-time applications in dynamic environments. Project webpage: https://aartykov.github.io/sim2art/
Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks
As a robot senses and selects actions, the world keeps changing. This inference delay creates a gap of tens to hundreds of milliseconds between the observed state and the state at execution. In this work, we take the natural generalization from zero delay to measured delay during training and inference. We introduce Delay-Aware Diffusion Policy (DA-DP), a framework for explicitly incorporating inference delays into policy learning. DA-DP corrects zero-delay trajectories to their delay-compensated counterparts, and augments the policy with delay conditioning. We empirically validate DA-DP on a variety of tasks, robots, and delays and find its success rate more robust to delay than delay-unaware methods. DA-DP is architecture agnostic and transfers beyond diffusion policies, offering a general pattern for delay-aware imitation learning. More broadly, DA-DP encourages evaluation protocols that report performance as a function of measured latency, not just task difficulty.
AMBER: Aerial deployable gripping crawler with compliant microspine for canopy manipulation
This paper presents an aerially deployable crawler designed for adaptive locomotion and manipulation within tree canopies. The system combines compliant microspine-based tracks, a dual-track rotary gripper, and an elastic tail, enabling secure attachment and stable traversal across branches of varying curvature and inclination. Experiments demonstrate reliable gripping up to 90 degrees of body roll and inclination, while effective climbing on branches inclined up to 67.5 degrees, achieving a maximum speed of 0.55 body lengths per second on horizontal branches. The compliant tracks allow yaw steering of up to 10 degrees, enhancing maneuverability on irregular surfaces. Power measurements show efficient operation with a dimensionless cost of transport over an order of magnitude lower than typical hovering power consumption in aerial robots. Integrated within a drone-tether deployment system, the crawler provides a robust, low-power platform for environmental sampling and in-canopy sensing, bridging the gap between aerial and surface-based ecological robotics.
Multi-Domain Motion Embedding: Expressive Real-Time Mimicry for Legged Robots
Effective motion representation is crucial for enabling robots to imitate expressive behaviors in real time, yet existing motion controllers often ignore inherent patterns in motion. Previous efforts in representation learning do not attempt to jointly capture structured periodic patterns and irregular variations in human and animal movement. To address this, we present Multi-Domain Motion Embedding (MDME), a motion representation that unifies the embedding of structured and unstructured features using a wavelet-based encoder and a probabilistic embedding in parallel. This produces a rich representation of reference motions from a minimal input set, enabling improved generalization across diverse motion styles and morphologies. We evaluate MDME on retargeting-free real-time motion imitation by conditioning robot control policies on the learned embeddings, demonstrating accurate reproduction of complex trajectories on both humanoid and quadruped platforms. Our comparative studies confirm that MDME outperforms prior approaches in reconstruction fidelity and generalizability to unseen motions. Furthermore, we demonstrate that MDME can reproduce novel motion styles in real-time through zero-shot deployment, eliminating the need for task-specific tuning or online retargeting. These results position MDME as a generalizable and structure-aware foundation for scalable real-time robot imitation.
comment: 15 pages
Obstacle Avoidance of UAV in Dynamic Environments Using Direction and Velocity-Adaptive Artificial Potential Field
The conventional Artificial Potential Field (APF) is fundamentally limited by the local minima issue and its inability to account for the kinematics of moving obstacles. This paper addresses the critical challenge of autonomous collision avoidance for Unmanned Aerial Vehicles (UAVs) operating in dynamic and cluttered airspace by proposing a novel Direction and Relative Velocity Weighted Artificial Potential Field (APF). In this approach, a bounded weighting function, $ω(θ,v_{e})$, is introduced to dynamically scale the repulsive potential based on the direction and velocity of the obstacle relative to the UAV. This robust APF formulation is integrated within a Model Predictive Control (MPC) framework to generate collision-free trajectories while adhering to kinematic constraints. Simulation results demonstrate that the proposed method effectively resolves local minima and significantly enhances safety by enabling smooth, predictive avoidance maneuvers. The system ensures superior path integrity and reliable performance, confirming its viability for autonomous navigation in complex environments.
More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery
The recent Segment Anything Model (SAM) 3 has introduced significant advancements over its predecessor, SAM 2, particularly with the integration of language-based segmentation and enhanced 3D perception capabilities. SAM 3 supports zero-shot segmentation across a wide range of prompts, including point, bounding box, and language-based prompts, allowing for more flexible and intuitive interactions with the model. In this empirical evaluation, we assess the performance of SAM 3 in robot-assisted surgery, benchmarking its zero-shot segmentation with point and bounding box prompts and exploring its effectiveness in dynamic video tracking, alongside its newly introduced language prompt segmentation. While language prompts show potential, their performance in the surgical domain is currently suboptimal, highlighting the need for further domain-specific training. Additionally, we investigate SAM 3's 3D reconstruction abilities, demonstrating its capacity to process surgical scene data and reconstruct 3D anatomical structures from 2D images. Through comprehensive testing on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 3 shows clear improvements over SAM and SAM 2 in both image and video segmentation under spatial prompts, while zero-shot evaluations on SCARED, StereoMIS, and EndoNeRF indicate strong monocular depth estimation and realistic 3D instrument reconstruction, yet also reveal remaining limitations in complex, highly dynamic surgical scenes.
comment: Technical Report
See Once, Then Act: Vision-Language-Action Model with Task Learning from One-Shot Video Demonstrations
Developing robust and general-purpose manipulation policies represents a fundamental objective in robotics research. While Vision-Language-Action (VLA) models have demonstrated promising capabilities for end-to-end robot control, existing approaches still exhibit limited generalization to tasks beyond their training distributions. In contrast, humans possess remarkable proficiency in acquiring novel skills by simply observing others performing them once. Inspired by this capability, we propose ViVLA, a generalist robotic manipulation policy that achieves efficient task learning from a single expert demonstration video at test time. Our approach jointly processes an expert demonstration video alongside the robot's visual observations to predict both the demonstrated action sequences and subsequent robot actions, effectively distilling fine-grained manipulation knowledge from expert behavior and transferring it seamlessly to the agent. To enhance the performance of ViVLA, we develop a scalable expert-agent pair data generation pipeline capable of synthesizing paired trajectories from easily accessible human videos, further augmented by curated pairs from publicly available datasets. This pipeline produces a total of 892,911 expert-agent samples for training ViVLA. Experimental results demonstrate that our ViVLA is able to acquire novel manipulation skills from only a single expert demonstration video at test time. Our approach achieves over 30% improvement on unseen LIBERO tasks and maintains above 35% gains with cross-embodiment videos. Real-world experiments demonstrate effective learning from human videos, yielding more than 38% improvement on unseen tasks.
VP-AutoTest: A Virtual-Physical Fusion Autonomous Driving Testing Platform
The rapid development of autonomous vehicles has led to a surge in testing demand. Traditional testing methods, such as virtual simulation, closed-course, and public road testing, face several challenges, including unrealistic vehicle states, limited testing capabilities, and high costs. These issues have prompted increasing interest in virtual-physical fusion testing. However, despite its potential, virtual-physical fusion testing still faces challenges, such as limited element types, narrow testing scope, and fixed evaluation metrics. To address these challenges, we propose the Virtual-Physical Testing Platform for Autonomous Vehicles (VP-AutoTest), which integrates over ten types of virtual and physical elements, including vehicles, pedestrians, and roadside infrastructure, to replicate the diversity of real-world traffic participants. The platform also supports both single-vehicle interaction and multi-vehicle cooperation testing, employing adversarial testing and parallel deduction to accelerate fault detection and explore algorithmic limits, while OBU and Redis communication enable seamless vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) cooperation across all levels of cooperative automation. Furthermore, VP-AutoTest incorporates a multidimensional evaluation framework and AI-driven expert systems to conduct comprehensive performance assessment and defect diagnosis. Finally, by comparing virtual-physical fusion test results with real-world experiments, the platform performs credibility self-evaluation to ensure both the fidelity and efficiency of autonomous driving testing. Please refer to the website for the full testing functionalities on the autonomous driving public service platform OnSite:https://www.onsite.com.cn.
From Real-World Traffic Data to Relevant Critical Scenarios
The reliable operation of autonomous vehicles, automated driving functions, and advanced driver assistance systems across a wide range of relevant scenarios is critical for their development and deployment. Identifying a near-complete set of relevant driving scenarios for such functionalities is challenging due to numerous degrees of freedom involved, each affecting the outcomes of the driving scenario differently. Moreover, with increasing technical complexity of new functionalities, the number of potentially relevant, particularly "unknown unsafe" scenarios is increasing. To enhance validation efficiency, it is essential to identify relevant scenarios in advance, starting with simpler domains like highways before moving to more complex environments such as urban traffic. To address this, this paper focuses on analyzing lane change scenarios in highway traffic, which involve multiple degrees of freedom and present numerous safetyrelevant scenarios. We describe the process of data acquisition and processing of real-world data from public highway traffic, followed by the application of criticality measures on trajectory data to evaluate scenarios, as conducted within the AVEAS project (www.aveas.org). By linking the calculated measures to specific lane change driving scenarios and the conditions under which the data was collected, we facilitate the identification of safetyrelevant driving scenarios for various applications. Further, to tackle the extensive range of "unknown unsafe" scenarios, we propose a way to generate relevant scenarios by creating synthetic scenarios based on recorded ones. Consequently, we demonstrate and evaluate a processing chain that enables the identification of safety-relevant scenarios, the development of data-driven methods for extracting these scenarios, and the generation of synthetic critical scenarios via sampling on highways.
comment: 8 pages, 8 figures
Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation
Vision-Language-Action (VLA) models have shown great performance in robotic manipulation by mapping visual observations and language instructions directly to actions. However, they remain brittle under distribution shifts: when test scenarios change, VLAs often reproduce memorized trajectories instead of adapting to the updated scene, which is a failure mode we refer to as the "Memory Trap". This limitation stems from the end-to-end design, which lacks explicit 3D spatial reasoning and prevents reliable identification of actionable regions in unfamiliar environments. To compensate for this missing spatial understanding, 3D Spatial Affordance Fields (SAFs) can provide a geometric representation that highlights where interactions are physically feasible, offering explicit cues about regions the robot should approach or avoid. We therefore introduce Affordance Field Intervention (AFI), a lightweight hybrid framework that uses SAFs as an on-demand plug-in to guide VLA behavior. Our system detects memory traps through proprioception, repositions the robot to recent high-affordance regions, and proposes affordance-driven waypoints that anchor VLA-generated actions. A SAF-based scorer then selects trajectories with the highest cumulative affordance. Extensive experiments demonstrate that our method achieves an average improvement of 23.5% across different VLA backbones ($π_{0}$ and $π_{0.5}$) under out-of-distribution scenarios on real-world robotic platforms, and 20.2% on the LIBERO-Pro benchmark, validating its effectiveness in enhancing VLA robustness to distribution shifts.
Gait-Adaptive Perceptive Humanoid Locomotion with Real-Time Under-Base Terrain Reconstruction
For full-size humanoid robots, even with recent advances in reinforcement learning-based control, achieving reliable locomotion on complex terrains, such as long staircases, remains challenging. In such settings, limited perception, ambiguous terrain cues, and insufficient adaptation of gait timing can cause even a single misplaced or mistimed step to result in rapid loss of balance. We introduce a perceptive locomotion framework that merges terrain sensing, gait regulation, and whole-body control into a single reinforcement learning policy. A downward-facing depth camera mounted under the base observes the support region around the feet, and a compact U-Net reconstructs a dense egocentric height map from each frame in real time, operating at the same frequency as the control loop. The perceptual height map, together with proprioceptive observations, is processed by a unified policy that produces joint commands and a global stepping-phase signal, allowing gait timing and whole-body posture to be adapted jointly to the commanded motion and local terrain geometry. We further adopt a single-stage successive teacher-student training scheme for efficient policy learning and knowledge transfer. Experiments conducted on a 31-DoF, 1.65 m humanoid robot demonstrate robust locomotion in both simulation and real-world settings, including forward and backward stair ascent and descent, as well as crossing a 46 cm gap. Project Page:https://ga-phl.github.io/
KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models
DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to Multi-Layer Perceptrons (MLPs), offering superior parameter efficiency and interpretability. To mitigate KANs' computational overhead, variants like FastKAN leverage Radial Basis Functions (RBFs) to accelerate inference. In this work, we investigate integrating KAN architectures into the DreamerV3 framework. We introduce KAN-Dreamer, replacing specific MLP and convolutional components of DreamerV3 with KAN and FastKAN layers. To ensure efficiency within the JAX-based World Model, we implement a tailored, fully vectorized version with simplified grid management. We structure our investigation into three subsystems: Visual Perception, Latent Prediction, and Behavior Learning. Empirical evaluations on the DeepMind Control Suite (walker_walk) analyze sample efficiency, training time, and asymptotic performance. Experimental results demonstrate that utilizing our adapted FastKAN as a drop-in replacement for the Reward and Continue predictors yields performance on par with the original MLP-based architecture, maintaining parity in both sample efficiency and training speed. This report serves as a preliminary study for future developments in KAN-based world models.
comment: 23 pages, 8 figures, 3 tables
ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning
Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studies on acceleration methods mainly rely on statistical or heuristic cues that ignore task semantics and can fail across diverse manipulation settings. We present ESPADA, a semantic and spatially aware framework that segments demonstrations using a VLM-LLM pipeline with 3D gripper-object relations, enabling aggressive downsampling only in non-critical segments while preserving precision-critical phases, without requiring extra data or architectural modifications, or any form of retraining. To scale from a single annotated episode to the full dataset, ESPADA propagates segment labels via Dynamic Time Warping (DTW) on dynamics-only features. Across both simulation and real-world experiments with ACT and DP baselines, ESPADA achieves approximately a 2x speed-up while maintaining success rates, narrowing the gap between human demonstrations and efficient robot control.
comment: project page: https://project-espada.github.io/espada/
Multi-Rigid-Body Approximation of Human Hands with Application to Digital Twin
Human hand simulation plays a critical role in digital twin applications, requiring models that balance anatomical fidelity with computational efficiency. We present a complete pipeline for constructing multi-rigid-body approximations of human hands that preserve realistic appearance while enabling real-time physics simulation. Starting from optical motion capture of a specific human hand, we construct a personalized MANO (Multi-Abstracted hand model with Neural Operations) model and convert it to a URDF (Unified Robot Description Format) representation with anatomically consistent joint axes. The key technical challenge is projecting MANO's unconstrained SO(3) joint rotations onto the kinematically constrained joints of the rigid-body model. We derive closed-form solutions for single degree-of-freedom joints and introduce a Baker-Campbell-Hausdorff (BCH)-corrected iterative method for two degree-of-freedom joints that properly handles the non-commutativity of rotations. We validate our approach through digital twin experiments where reinforcement learning policies control the multi-rigid-body hand to replay captured human demonstrations. Quantitative evaluation shows sub-centimeter reconstruction error and successful grasp execution across diverse manipulation tasks.
comment: 10 pages, 4 figures. Accepted at ICBSR'25 (International Conference on Biomechanical Systems and Robotics)
Model Predictive Control for Cooperative Docking Between Autonomous Surface Vehicles with Disturbance Rejection
Uncrewed Surface Vehicles (USVs) are a popular and efficient type of marine craft that find application in a large number of water-based tasks. When multiple USVs operate in the same area, they may be required to dock to each other to perform a shared task. Existing approaches for the docking between autonomous USVs generally consider one USV as a stationary target, while the second one is tasked to reach the required docking pose. In this work, we propose a cooperative approach for USV-USV docking, where two USVs work together to dock at an agreed location. We use a centralized Model Predictive Control (MPC) approach to solve the control problem, obtaining feasible trajectories that also guarantee constraint satisfaction. Owing to its model-based nature, this approach allows the rejection of disturbances, inclusive of exogenous inputs, by anticipating their effect on the USVs through the MPC prediction model. This is particularly effective in case of almost-stationary disturbances such as water currents. In simulations, we demonstrate how the proposed approach allows for a faster and more efficient docking with respect to existing approaches.
comment: 7 pages, 4 figures, submitted to IFAC World Congress 2026
Efficient Computation of a Continuous Topological Model of the Configuration Space of Tethered Mobile Robots
Despite the attention that the problem of path planning for tethered robots has garnered in the past few decades, the approaches proposed to solve it typically rely on a discrete representation of the configuration space and do not exploit a model that can simultaneously capture the topological information of the tether and the continuous location of the robot. In this work, we explicitly build a topological model of the configuration space of a tethered robot starting from a polygonal representation of the workspace where the robot moves. To do so, we first establish a link between the configuration space of the tethered robot and the universal covering space of the workspace, and then we exploit this link to develop an algorithm to compute a simplicial complex model of the configuration space. We show how this approach improves the performances of existing algorithms that build other types of representations of the configuration space. The proposed model can be computed in a fraction of the time required to build traditional homotopy-augmented graphs, and is continuous, allowing to solve the path planning task for tethered robots using a broad set of path planning algorithms.
comment: 7 pages, 3 figures, submitted to IFAC World Congress 2026
SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks
Integrating autonomous mobile robots into human environments requires human-like decision-making and energy-efficient, event-based computation. Despite progress, neuromorphic methods are rarely applied to Deep Reinforcement Learning (DRL) navigation approaches due to unstable training. We address this gap with a hybrid socially integrated DRL actor-critic approach that combines Spiking Neural Networks (SNNs) in the actor with Artificial Neural Networks (ANNs) in the critic and a neuromorphic feature extractor to capture temporal crowd dynamics and human-robot interactions. Our approach enhances social navigation performance and reduces estimated energy consumption by approximately 1.69 orders of magnitude.
comment: 8 pages, 6 figures
Spatiotemporal Calibration and Ground Truth Estimation for High-Precision SLAM Benchmarking in Extended Reality
Simultaneous localization and mapping (SLAM) plays a fundamental role in extended reality (XR) applications. As the standards for immersion in XR continue to increase, the demands for SLAM benchmarking have become more stringent. Trajectory accuracy is the key metric, and marker-based optical motion capture (MoCap) systems are widely used to generate ground truth (GT) because of their drift-free and relatively accurate measurements. However, the precision of MoCap-based GT is limited by two factors: the spatiotemporal calibration with the device under test (DUT) and the inherent jitter in the MoCap measurements. These limitations hinder accurate SLAM benchmarking, particularly for key metrics like rotation error and inter-frame jitter, which are critical for immersive XR experiences. This paper presents a novel continuous-time maximum likelihood estimator to address these challenges. The proposed method integrates auxiliary inertial measurement unit (IMU) data to compensate for MoCap jitter. Additionally, a variable time synchronization method and a pose residual based on screw congruence constraints are proposed, enabling precise spatiotemporal calibration across multiple sensors and the DUT. Experimental results demonstrate that our approach outperforms existing methods, achieving the precision necessary for comprehensive benchmarking of state-of-the-art SLAM algorithms in XR applications. Furthermore, we thoroughly validate the practicality of our method by benchmarking several leading XR devices and open-source SLAM algorithms. The code is publicly available at https://github.com/ylab-xrpg/xr-hpgt.
Characterizing Lane-Changing Behavior in Mixed Traffic
Characterizing and understanding lane-changing behavior in the presence of automated vehicles (AVs) is crucial to ensuring safety and efficiency in mixed traffic. Accordingly, this study aims to characterize the interactions between the lane-changing vehicle (active vehicle) and the vehicle directly impacted by the maneuver in the target lane (passive vehicle). Utilizing real-world trajectory data from the Waymo Open Motion Dataset (WOMD), this study explores patterns in lane-changing behavior and provides insight into how these behaviors evolve under different AV market penetration rates (MPRs). In particular, we propose a game-theoretic framework to analyze cooperative and defective behaviors in mixed traffic, applied to the 7,636 observed lane-changing events in the WOMD. First, we utilize k-means clustering to classify vehicles as cooperative or defective, revealing that the proportions of cooperative AVs are higher than those of HDVs in both active and passive roles. Next, we jointly estimate the utilities of active and passive vehicles to model their behaviors using the quantal response equilibrium framework. Empirical payoff tables are then constructed based on these utilities. Using these payoffs, we analyze the presence of social dilemmas and examine the evolution of cooperative behaviors using evolutionary game theory. Our results reveal the presence of social dilemmas in approximately 4% and 11% of lane-changing events for active and passive vehicles, respectively, with most classified as Stag Hunt or Prisoner's Dilemma (Chicken Game rarely observed). Moreover, the Monte Carlo simulation results show that repeated lane-changing interactions consistently lead to increased cooperative behavior over time, regardless of the AV penetration rate.
Using Vision-Language Models as Proxies for Social Intelligence in Human-Robot Interaction
Robots operating in everyday environments must often decide when and whether to engage with people, yet such decisions often hinge on subtle nonverbal cues that unfold over time and are difficult to model explicitly. Drawing on a five-day Wizard-of-Oz deployment of a mobile service robot in a university cafe, we analyze how people signal interaction readiness through nonverbal behaviors and how expert wizards use these cues to guide engagement. Motivated by these observations, we propose a two-stage pipeline in which lightweight perceptual detectors (gaze shifts and proxemics) are used to selectively trigger heavier video-based vision-language model (VLM) queries at socially meaningful moments. We evaluate this pipeline on replayed field interactions and compare two prompting strategies. Our findings suggest that selectively using VLMs as proxies for social reasoning enables socially responsive robot behavior, allowing robots to act appropriately by attending to the cues people naturally provide in real-world interactions.
Time-Varying Formation Tracking Control of Wheeled Mobile Robots With Region Constraint: A Generalized Udwadia-Kalaba Framework
In this paper, the time-varying formation tracking control of wheeled mobile robots with region constraint is investigated from a generalized Udwadia-Kalaba framework. The communication topology is directed, weighted and has a spanning tree with the leader being the root. By reformulating the time-varying formation tracking control objective as a constrained equation and transforming the region constraint by a diffeomorphism, the time-varying formation tracking controller with the region constraint is designed under the generalized Udwadia-Kalaba framework. Compared with the existing works on time-varying formation tracking control, the region constraint is takeninto account in this paper, which ensures the safety of the robots.Finally, some numerical simulations are presented to illustrate the effectiveness of the proposed control strategy.
comment: 10 pages,9 figures
Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving
End-to-end autonomous driving has emerged as a pivotal direction in the field of autonomous systems. Recent works have demonstrated impressive performance by incorporating high-level guidance signals to steer low-level trajectory planners. However, their potential is often constrained by inaccurate high-level guidance and the computational overhead of complex guidance modules. To address these limitations, we propose Mimir, a novel hierarchical dual-system framework capable of generating robust trajectories relying on goal points with uncertainty estimation: (1) Unlike previous approaches that deterministically model, we estimate goal point uncertainty with a Laplace distribution to enhance robustness; (2) To overcome the slow inference speed of the guidance system, we introduce a multi-rate guidance mechanism that predicts extended goal points in advance. Validated on challenging Navhard and Navtest benchmarks, Mimir surpasses previous state-of-the-art methods with a 20% improvement in the driving score EPDMS, while achieving 1.6 times improvement in high-level module inference speed without compromising accuracy. The code and models will be released soon to promote reproducibility and further development. The code is available at https://github.com/ZebinX/Mimir-Uncertainty-Driving
Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots
Adaptive morphogenetic robots adapt their morphology and control policies to meet changing tasks and environmental conditions. Many such systems leverage soft components, which enable shape morphing but also introduce simulation and control challenges. Soft-body simulators remain limited in accuracy and computational tractability, while rigid-body simulators cannot capture soft-material dynamics. Here, we present a surrogate compliance modeling approach: rather than explicitly modeling soft-body physics, we introduce indirect variables representing soft-material deformation within a rigid-body simulator. We validate this approach using our amphibious robotic turtle, a quadruped with soft morphing limbs designed for multi-environment locomotion. By capturing deformation effects as changes in effective limb length and limb center of mass, and by applying reinforcement learning with extensive randomization of these indirect variables, we achieve reliable policy learning entirely in a rigid-body simulation. The resulting gaits transfer directly to hardware, demonstrating high-fidelity sim-to-real performance on hard, flat substrates and robust, though lower-fidelity, transfer on rheologically complex terrains. The learned closed-loop gaits exhibit unprecedented terrestrial maneuverability and achieve an order-of-magnitude reduction in cost of transport compared to open-loop baselines. Field experiments with the robot further demonstrate stable, multi-gait locomotion across diverse natural terrains, including gravel, grass, and mud.
A Flexible Funnel-Shaped Robotic Hand with an Integrated Single-Sheet Valve for Milligram-Scale Powder Handling
Laboratory Automation (LA) has the potential to accelerate solid-state materials discovery by enabling continuous robotic operation without human intervention. While robotic systems have been developed for tasks such as powder grinding and X-ray diffraction (XRD) analysis, fully automating powder handling at the milligram scale remains a significant challenge due to the complex flow dynamics of powders and the diversity of laboratory tasks. To address this challenge, this study proposes a novel, funnel-shaped, flexible robotic hand that preserves the softness and conical sheet designs in prior work while incorporating a controllable valve at the cone apex to enable precise, incremental dispensing of milligram-scale powder quantities. The hand is integrated with an external balance through a feedback control system based on a model of powder flow and online parameter identification. Experimental evaluations with glass beads, monosodium glutamate, and titanium dioxide demonstrated that 80% of the trials achieved an error within 2 mg, and the maximum error observed was approximately 20 mg across a target range of 20 mg to 3 g. In addition, by incorporating flow prediction models commonly used for hoppers and performing online parameter identification, the system is able to adapt to variations in powder dynamics. Compared to direct PID control, the proposed model-based control significantly improved both accuracy and convergence speed. These results highlight the potential of the proposed system to enable efficient and flexible powder weighing, with scalability toward larger quantities and applicability to a broad range of laboratory automation tasks.
comment: 9 pages, 8 figures
An Introduction to Deep Reinforcement and Imitation Learning
Embodied agents, such as robots and virtual characters, must continuously select actions to execute tasks effectively, solving complex sequential decision-making problems. Given the difficulty of designing such controllers manually, learning-based approaches have emerged as promising alternatives, most notably Deep Reinforcement Learning (DRL) and Deep Imitation Learning (DIL). DRL leverages reward signals to optimize behavior, while DIL uses expert demonstrations to guide learning. This document introduces DRL and DIL in the context of embodied agents, adopting a concise, depth-first approach to the literature. It is self-contained, presenting all necessary mathematical and machine learning concepts as they are needed. It is not intended as a survey of the field; rather, it focuses on a small set of foundational algorithms and techniques, prioritizing in-depth understanding over broad coverage. The material ranges from Markov Decision Processes to REINFORCE and Proximal Policy Optimization (PPO) for DRL, and from Behavioral Cloning to Dataset Aggregation (DAgger) and Generative Adversarial Imitation Learning (GAIL) for DIL.
Optimized Area Coverage in Disaster Response Utilizing Autonomous UAV Swarm Formations
This paper presents a UAV swarm system designed to assist first responders in disaster scenarios like wildfires. By distributing sensors across multiple agents, the system extends flight duration and enhances data availability, reducing the risk of mission failure due to collisions. To mitigate this risk further, we introduce an autonomous navigation framework that utilizes a local Euclidean Signed Distance Field (ESDF) map for obstacle avoidance while maintaining swarm formation with minimal path deviation. Additionally, we incorporate a Traveling Salesman Problem (TSP) variant to optimize area coverage, prioritizing Points of Interest (POIs) based on preassigned values derived from environmental behavior and critical infrastructure. The proposed system is validated through simulations with varying swarm sizes, demonstrating its ability to maximize coverage while ensuring collision avoidance between UAVs and obstacles.
DIJIT: A Robotic Head for an Active Observer
We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research and the study of human-like eye and head-neck motions, their interrelationships, and how each contributes to visual ability. DIJIT is also being used to explore the differences between how human vision employs eye/head movements to solve visual tasks and current computer vision methods. DIJIT's design features nine mechanical degrees of freedom, while the cameras and lenses provide an additional four optical degrees of freedom. The ranges and speeds of the mechanical design are comparable to human performance. Our design includes the ranges of motion required for convergent stereo, namely, vergence, version, and cyclotorsion. The exploration of the utility of these to both human and machine vision is ongoing. Here, we present the design of DIJIT and evaluate aspects of its performance. We present a new method for saccadic camera movements. In this method, a direct relationship between camera orientation and motor values is developed. The resulting saccadic camera movements are close to human movements in terms of their accuracy.
VLD: Visual Language Goal Distance for Reinforcement Learning Navigation
Training end-to-end policies from image data to directly predict navigation actions for robotic systems has proven inherently difficult. Existing approaches often suffer from either the sim-to-real gap during policy transfer or a limited amount of training data with action labels. To address this problem, we introduce Vision-Language Distance (VLD) learning, a scalable framework for goal-conditioned navigation that decouples perception learning from policy learning. Instead of relying on raw sensory inputs during policy training, we first train a self-supervised distance-to-goal predictor on internet-scale video data. This predictor generalizes across both image- and text-based goals, providing a distance signal that can be minimized by a reinforcement learning (RL) policy. The RL policy can be trained entirely in simulation using privileged geometric distance signals, with injected noise to mimic the uncertainty of the trained distance predictor. At deployment, the policy consumes VLD predictions, inheriting semantic goal information-"where to go"-from large-scale visual training while retaining the robust low-level navigation behaviors learned in simulation. We propose using ordinal consistency to assess distance functions directly and demonstrate that VLD outperforms prior temporal distance approaches, such as ViNT and VIP. Experiments show that our decoupled design achieves competitive navigation performance in simulation while supporting flexible goal modalities, providing an alternative and, most importantly, scalable path toward reliable, multimodal navigation policies.
Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear Optimization
Robotic perception often requires solving large nonlinear least-squares (NLS) problems. While sparsity has been well-exploited to scale solvers, a complementary and underexploited structure is \emph{separability} -- where some variables (e.g., visual landmarks) appear linearly in the residuals and, for any estimate of the remaining variables (e.g., poses), have a closed-form solution. Variable projection (VarPro) methods are a family of techniques that exploit this structure by analytically eliminating the linear variables and presenting a reduced problem in the remaining variables that has favorable properties. However, VarPro has seen limited use in robotic perception; a major challenge arises from gauge symmetries (e.g., cost invariance to global shifts and rotations), which are common in perception and induce specific computational challenges in standard VarPro approaches. We present a VarPro scheme designed for problems with gauge symmetries that jointly exploits separability and sparsity. Our method can be applied as a one-time preprocessing step to construct a \emph{matrix-free Schur complement operator}. This operator allows efficient evaluation of costs, gradients, and Hessian-vector products of the reduced problem and readily integrates with standard iterative NLS solvers. We provide precise conditions under which our method applies, and describe extensions when these conditions are only partially met. Across synthetic and real benchmarks in SLAM, SNL, and SfM, our approach achieves up to \textbf{2$\times$--35$\times$ faster runtimes} than state-of-the-art methods while maintaining accuracy. We release an open-source C++ implementation and all datasets from our experiments.
comment: 8 pages, submitted for review
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Large-scale egocentric video datasets capture diverse human activities across a wide range of scenarios, offering rich and detailed insights into how humans interact with objects, especially those that require fine-grained dexterous control. Such complex, dexterous skills with precise controls are crucial for many robotic manipulation tasks, yet are often insufficiently addressed by traditional data-driven approaches to robotic manipulation. To address this gap, we leverage manipulation priors learned from large-scale egocentric video datasets to improve policy learning for dexterous robotic manipulation tasks. We present MAPLE, a novel method for dexterous robotic manipulation that learns features to predict object contact points and detailed hand poses at the moment of contact from egocentric images. We then use the learned features to train policies for downstream manipulation tasks. Experimental results demonstrate the effectiveness of MAPLE across 4 existing simulation benchmarks, as well as a newly designed set of 4 challenging simulation tasks requiring fine-grained object control and complex dexterous skills. The benefits of MAPLE are further highlighted in real-world experiments using a 17 DoF dexterous robotic hand, whereas the simultaneous evaluation across both simulation and real-world experiments has remained underexplored in prior work. We additionally showcase the efficacy of our model on an egocentric contact point prediction task, validating its usefulness beyond dexterous manipulation policy learning.
Incremental Generalized Hybrid A*
We address the problem of efficiently organizing search over very large trees, which arises in many applications ranging from autonomous driving to aerial vehicles. Here, we are motivated by off-road autonomy, where real-time planning is essential. Classical approaches use graphs of motion primitives and exploit dominance to mitigate the curse of dimensionality and prune expansions efficiently. However, for complex dynamics, repeatedly solving two-point boundary-value problems makes graph construction too slow for fast kinodynamic planning. Hybrid A* (HA*) addressed this challenge by searching over a tree of motion primitives and introducing approximate pruning using a grid-based dominance check. However, choosing the grid resolution is difficult: too coarse risks failure, while too fine leads to excessive expansions and slow planning. We propose Incremental Generalized Hybrid A* (IGHA*), an anytime tree-search framework that dynamically organizes vertex expansions without rigid pruning. IGHA* provably matches or outperforms HA*. For both on-road kinematic and off-road kinodynamic planning queries for a car-like robot, variants of IGHA* use 6x fewer expansions to the best solution compared to an optimized version of HA* (HA*M, an internal baseline). In simulated off- road experiments in a high-fidelity simulator, IGHA* outper- forms HA*M when both are used in the loop with a model predictive controller. We demonstrate real-time performance both in simulation and on a small-scale off-road vehicle, enabling fast, robust planning under complex dynamics. Website: https: //personalrobotics.github.io/IGHAStar/
comment: 8 pages, 7 figures, Accepted to IEEE RA-L, Nov 2025
PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention
The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still struggle to produce consistent and precise target-oriented actions, as they often generate redundant or unstable motions along trajectories, limiting their applicability in time-sensitive scenarios.In this work, we attribute these redundant actions to the spatially uniform perception field of existing VLAs, which causes them to be distracted by target-irrelevant objects, especially in complex environments.To address this issue, we propose an efficient PosA-VLA framework that anchors visual attention via pose-conditioned supervision, consistently guiding the model's perception toward task-relevant regions. The pose-conditioned anchor attention mechanism enables the model to better align instruction semantics with actionable visual cues, thereby improving action generation precision and efficiency. Moreover, our framework adopts a lightweight architecture and requires no auxiliary perception modules (e.g., segmentation or grounding networks), ensuring efficient inference. Extensive experiments verify that our method executes embodied tasks with precise and time-efficient behavior across diverse robotic manipulation benchmarks and shows robust generalization in a variety of challenging environments.
MM-ACT: Learn from Multimodal Parallel Generation to Act
A generalist robotic policy needs both semantic understanding for task planning and the ability to interact with the environment through predictive capabilities. To tackle this, we present MM-ACT, a unified Vision-Language-Action (VLA) model that integrates text, image, and action in shared token space and performs generation across all three modalities. MM-ACT adopts a re-mask parallel decoding strategy for text and image generation, and employs a one-step parallel decoding strategy for action generation to improve efficiency. We introduce Context-Shared Multimodal Learning, a unified training paradigm that supervises generation in all three modalities from a shared context, enhancing action generation through cross-modal learning. Experiments were conducted on the LIBERO simulation and Franka real-robot setups as well as RoboTwin2.0 to assess in-domain and out-of-domain performances respectively. Our approach achieves a success rate of 96.3% on LIBERO, 72.0% across three tasks of real Franka, and 52.38% across eight bimanual tasks of RoboTwin2.0 with an additional gain of 9.25% from cross-modal learning. We release our codes, models and data at https://github.com/HHYHRHY/MM-ACT.
comment: 17 pages
Exploring Adversarial Obstacle Attacks in Search-based Path Planning for Autonomous Mobile Robots
Path planning algorithms, such as the search-based A*, are a critical component of autonomous mobile robotics, enabling robots to navigate from a starting point to a destination efficiently and safely. We investigated the resilience of the A* algorithm in the face of potential adversarial interventions known as obstacle attacks. The adversary's goal is to delay the robot's timely arrival at its destination by introducing obstacles along its original path. We developed malicious software to execute the attacks and conducted experiments to assess their impact, both in simulation using TurtleBot in Gazebo and in real-world deployment with the Unitree Go1 robot. In simulation, the attacks resulted in an average delay of 36\%, with the most significant delays occurring in scenarios where the robot was forced to take substantially longer alternative paths. In real-world experiments, the delays were even more pronounced, with all attacks successfully rerouting the robot and causing measurable disruptions. These results highlight that the algorithm's robustness is not solely an attribute of its design but is significantly influenced by the operational environment. For example, in constrained environments like tunnels, the delays were maximized due to the limited availability of alternative routes.
Quantization-Free Autoregressive Action Transformer
Current transformer-based imitation learning approaches introduce discrete action representations and train an autoregressive transformer decoder on the resulting latent code. However, the initial quantization breaks the continuous structure of the action space thereby limiting the capabilities of the generative model. We propose a quantization-free method instead that leverages Generative Infinite-Vocabulary Transformers (GIVT) as a direct, continuous policy parametrization for autoregressive transformers. This simplifies the imitation learning pipeline while achieving state-of-the-art performance on a variety of popular simulated robotics tasks. We enhance our policy roll-outs by carefully studying sampling algorithms, further improving the results.
SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots
Visual traversability estimation is critical for autonomous navigation, but existing VLM-based methods rely on hand-crafted prompts, generalize poorly across embodiments, and output only traversability maps, leaving trajectory generation to slow external planners. We propose SwarmDiffusion, a lightweight end-to-end diffusion model that jointly predicts traversability and generates a feasible trajectory from a single RGB image. To remove the need for annotated or planner-produced paths, we introduce a planner-free trajectory construction pipeline based on randomized waypoint sampling, Bezier smoothing, and regularization enforcing connectivity, safety, directionality, and path thinness. This enables learning stable motion priors without demonstrations. SwarmDiffusion leverages VLM-derived supervision without prompt engineering and conditions the diffusion process on a compact embodiment state, producing physically consistent, traversable paths that transfer across different robot platforms. Across indoor environments and two embodiments (quadruped and aerial), the method achieves 80-100% navigation success and 0.09s inference, and adapts to a new robot using only-500 additional visual samples. It generalizes reliably to unseen environments in simulation and real-world trials, offering a scalable, prompt-free approach to unified traversability reasoning and trajectory generation.
comment: This work has been submitted for publication and is currently under review
RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion
Robot manipulation in the real world is fundamentally constrained by the visual sim2real gap, where depth observations collected in simulation fail to reflect the complex noise patterns inherent to real sensors. In this work, inspired by the denoising capability of diffusion models, we invert the conventional perspective and propose a clean-to-noisy paradigm that learns to synthesize noisy depth, thereby bridging the visual sim2real gap through purely simulation-driven robotic learning. Building on this idea, we introduce RealD$^2$iff, a hierarchical coarse-to-fine diffusion framework that decomposes depth noise into global structural distortions and fine-grained local perturbations. To enable progressive learning of these components, we further develop two complementary strategies: Frequency-Guided Supervision (FGS) for global structure modeling and Discrepancy-Guided Optimization (DGO) for localized refinement. To integrate RealD$^2$iff seamlessly into imitation learning, we construct a pipeline that spans six stages. We provide comprehensive empirical and experimental validation demonstrating the effectiveness of this paradigm. RealD$^2$iff enables two key applications: (1) generating real-world-like depth to construct clean-noisy paired datasets without manual sensor data collection. (2) Achieving zero-shot sim2real robot manipulation, substantially improving real-world performance without additional fine-tuning.
comment: We are the author team of the paper "RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion". After self-examination, our team discovered inappropriate wording in the citation of related work, the introduction, and the contribution statement, which may affect the contribution of other related works. Therefore, we have decided to revise the paper and request its withdrawal
LOG-Nav: Efficient Layout-Aware Object-Goal Navigation with Hierarchical Planning
We introduce LOG-Nav, an efficient layout-aware object-goal navigation approach designed for complex multi-room indoor environments. By planning hierarchically leveraging a global topologigal map with layout information and local imperative approach with detailed scene representation memory, LOG-Nav achieves both efficient and effective navigation. The process is managed by an LLM-powered agent, ensuring seamless effective planning and navigation, without the need for human interaction, complex rewards, or costly training. Our experimental results on the MP3D benchmark achieves 85\% object navigation success rate (SR) and 79\% success rate weighted by path length (SPL) (over 40\% point improvement in SR and 60\% improvement in SPL compared to exsisting methods). Furthermore, we validate the robustness of our approach through virtual agent and real-world robotic deployment, showcasing its capability in practical scenarios.
Enhanced Spatiotemporal Consistency for Image-to-LiDAR Data Pretraining
LiDAR representation learning has emerged as a promising approach to reducing reliance on costly and labor-intensive human annotations. While existing methods primarily focus on spatial alignment between LiDAR and camera sensors, they often overlook the temporal dynamics critical for capturing motion and scene continuity in driving scenarios. To address this limitation, we propose SuperFlow++, a novel framework that integrates spatiotemporal cues in both pretraining and downstream tasks using consecutive LiDAR-camera pairs. SuperFlow++ introduces four key components: (1) a view consistency alignment module to unify semantic information across camera views, (2) a dense-to-sparse consistency regularization mechanism to enhance feature robustness across varying point cloud densities, (3) a flow-based contrastive learning approach that models temporal relationships for improved scene understanding, and (4) a temporal voting strategy that propagates semantic information across LiDAR scans to improve prediction consistency. Extensive evaluations on 11 heterogeneous LiDAR datasets demonstrate that SuperFlow++ outperforms state-of-the-art methods across diverse tasks and driving conditions. Furthermore, by scaling both 2D and 3D backbones during pretraining, we uncover emergent properties that provide deeper insights into developing scalable 3D foundation models. With strong generalizability and computational efficiency, SuperFlow++ establishes a new benchmark for data-efficient LiDAR-based perception in autonomous driving. The code is publicly available at https://github.com/Xiangxu-0103/SuperFlow
comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
Building Gradient by Gradient: Decentralised Energy Functions for Bimanual Robot Assembly
There are many challenges in bimanual assembly, including high-level sequencing, multi-robot coordination, and low-level, contact-rich operations such as component mating. Task and motion planning (TAMP) methods, while effective in this domain, may be prohibitively slow to converge when adapting to disturbances that require new task sequencing and optimisation. These events are common during tight-tolerance assembly, where difficult-to-model dynamics such as friction or deformation require rapid replanning and reattempts. Moreover, defining explicit task sequences for assembly can be cumbersome, limiting flexibility when task replanning is required. To simplify this planning, we introduce a decentralised gradient-based framework that uses a piecewise continuous energy function through the automatic composition of adaptive potential functions. This approach generates sub-goals using only myopic optimisation, rather than long-horizon planning. It demonstrates effectiveness at solving long-horizon tasks due to the structure and adaptivity of the energy function. We show that our approach scales to physical bimanual assembly tasks for constructing tight-tolerance assemblies. In these experiments, we discover that our gradient-based rapid replanning framework generates automatic retries, coordinated motions and autonomous handovers in an emergent fashion.
comment: 8 pages, 7 figures, 1 table
Switch-JustDance: Benchmarking Whole Body Motion Tracking Policies Using a Commercial Console Game
Recent advances in whole-body robot control have enabled humanoid and legged robots to perform increasingly agile and coordinated motions. However, standardized benchmarks for evaluating these capabilities in real-world settings, and in direct comparison to humans, remain scarce. Existing evaluations often rely on pre-collected human motion datasets or simulation-based experiments, which limit reproducibility, overlook hardware factors, and hinder fair human-robot comparisons. We present Switch-JustDance, a low-cost and reproducible benchmarking pipeline that leverages motion-sensing console games, Just Dance on the Nintendo Switch, to evaluate robot whole-body control. Using Just Dance on the Nintendo Switch as a representative platform, Switch-JustDance converts in-game choreography into robot-executable motions through streaming, motion reconstruction, and motion retargeting modules and enables users to evaluate controller performance through the game's built-in scoring system. We first validate the evaluation properties of Just Dance, analyzing its reliability, validity, sensitivity, and potential sources of bias. Our results show that the platform provides consistent and interpretable performance measures, making it a suitable tool for benchmarking embodied AI. Building on this foundation, we benchmark three state-of-the-art humanoid whole-body controllers on hardware and provide insights into their relative strengths and limitations.
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via Neural Action Tokenization
Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a trade-off between reconstruction fidelity and inference efficiency. We introduce FASTer, a unified framework for efficient and generalizable robot learning that integrates a learnable tokenizer with an autoregressive policy built upon it. FASTerVQ encodes action chunks as single-channel images, capturing global spatio-temporal dependencies while maintaining a high compression ratio. FASTerVLA builds on this tokenizer with block-wise autoregressive decoding and a lightweight action expert, achieving both faster inference and higher task performance. Extensive experiments across simulated and real-world benchmarks show that FASTerVQ delivers superior reconstruction quality, high token utilization, and strong cross-task and cross-embodiment generalization, while FASTerVLA further improves overall capability, surpassing previous state-of-the-art VLA models in both inference speed and task performance.
Pretraining in Actor-Critic Reinforcement Learning for Robot Locomotion
The pretraining-finetuning paradigm has facilitated numerous transformative advancements in artificial intelligence research in recent years. However, in the domain of reinforcement learning (RL) for robot locomotion, individual skills are often learned from scratch despite the high likelihood that some generalizable knowledge is shared across all task-specific policies belonging to the same robot embodiment. This work aims to define a paradigm for pretraining neural network models that encapsulate such knowledge and can subsequently serve as a basis for warm-starting the RL process in classic actor-critic algorithms, such as Proximal Policy Optimization (PPO). We begin with a task-agnostic exploration-based data collection algorithm to gather diverse, dynamic transition data, which is then used to train a Proprioceptive Inverse Dynamics Model (PIDM) through supervised learning. The pretrained weights are then loaded into both the actor and critic networks to warm-start the policy optimization of actual tasks. We systematically validated our proposed method with 9 distinct robot locomotion RL environments comprising 3 different robot embodiments, showing significant benefits of this initialization strategy. Our proposed approach on average improves sample efficiency by 36.9% and task performance by 7.3% compared to random initialization. We further present key ablation studies and empirical analyses that shed light on the mechanisms behind the effectiveness of this method.
An Open-Source Soft Robotic Platform for Autonomous Aerial Manipulation in the Wild
Aerial manipulation combines the versatility and speed of flying platforms with the functional capabilities of mobile manipulation, which presents significant challenges due to the need for precise localization and control. Traditionally, researchers have relied on offboard perception systems, which are limited to expensive and impractical specially equipped indoor environments. In this work, we introduce a novel platform for autonomous aerial manipulation that exclusively utilizes onboard perception systems. Our platform can perform aerial manipulation in various indoor and outdoor environments without depending on external perception systems. Our experimental results demonstrate the platform's ability to autonomously grasp various objects in diverse settings. This advancement significantly improves the scalability and practicality of aerial manipulation applications by eliminating the need for costly tracking solutions. To accelerate future research, we open source our ROS 2 software stack and custom hardware design, making our contributions accessible to the broader research community.
comment: Project website: https://sites.google.com/view/open-source-soft-platform/open-source-soft-robotic-platform GitHub: https://github.com/raptor-ethz
Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture
In commercial robotic systems, it is common to encounter a closed inner-loop torque controller that is not user-modifiable. However, the outer-loop controller, which sends kinematic commands such as position or velocity for the inner-loop controller to track, is typically exposed to users. In this work, we focus on the development of an easily integrated add-on at the outer-loop layer by combining disturbance rejection control and robust control barrier function for high-performance tracking and safe control of the whole dynamic system of an industrial manipulator. This is particularly beneficial when 1) the inner-loop controller is imperfect, unmodifiable, and uncertain; and 2) the dynamic model exhibits significant uncertainty. Stability analysis, formal safety guarantee proof, and hardware experiments with a PUMA robotic manipulator are presented. Our solution demonstrates superior performance in terms of simplicity of implementation, robustness, tracking precision, and safety compared to the state of the art. Video: https://youtu.be/zw1tanvrV8Q
comment: Extended version of the paper submitted for publication. This document contains detailed mathematical derivations and additional experimental results omitted from the submission due to page constraints
BEDI: A Comprehensive Benchmark for Evaluating Embodied Agents on UAVs
With the rapid advancement of low-altitude remote sensing and Vision-Language Models (VLMs), Embodied Agents based on Unmanned Aerial Vehicles (UAVs) have shown significant potential in autonomous tasks. However, current evaluation methods for UAV-Embodied Agents (UAV-EAs) remain constrained by the lack of standardized benchmarks, diverse testing scenarios and open system interfaces. To address these challenges, we propose BEDI (Benchmark for Embodied Drone Intelligence), a systematic and standardized benchmark designed for evaluating UAV-EAs. Specifically, we introduce a novel Dynamic Chain-of-Embodied-Task paradigm based on the perception-decision-action loop, which decomposes complex UAV tasks into standardized, measurable subtasks. Building on this paradigm, we design a unified evaluation framework encompassing six core sub-skills: semantic perception, spatial perception, motion control, tool utilization, task planning and action generation. Furthermore, we develop a hybrid testing platform that incorporates a wide range of both virtual and real-world scenarios, enabling a comprehensive evaluation of UAV-EAs across diverse contexts. The platform also offers open and standardized interfaces, allowing researchers to customize tasks and extend scenarios, thereby enhancing flexibility and scalability in the evaluation process. Finally, through empirical evaluations of several state-of-the-art (SOTA) VLMs, we reveal their limitations in embodied UAV tasks, underscoring the critical role of the BEDI benchmark in advancing embodied intelligence research and model optimization. By filling the gap in systematic and standardized evaluation within this field, BEDI facilitates objective model comparison and lays a robust foundation for future development in this field. Our benchmark is now publicly available at https://github.com/lostwolves/BEDI.
First Responders' Perceptions of Semantic Information for Situational Awareness in Robot-Assisted Emergency Response
This study investigates First Responders' (FRs) attitudes toward the use of semantic information and Situational Awareness (SA) in robotic systems during emergency operations. A structured questionnaire was administered to 22 FRs across eight countries, capturing their demographic profiles, general attitudes toward robots, and experiences with semantics-enhanced SA. Results show that most FRs expressed positive attitudes toward robots, and rated the usefulness of semantic information for building SA at an average of 3.6 out of 5. Semantic information was also valued for its role in predicting unforeseen emergencies (mean 3.9). Participants reported requiring an average of 74.6\% accuracy to trust semantic outputs and 67.8\% for them to be considered useful, revealing a willingness to use imperfect but informative AI support tools. To the best of our knowledge, this study offers novel insights by being one of the first to directly survey FRs on semantic-based SA in a cross-national context. It reveals the types of semantic information most valued in the field, such as object identity, spatial relationships, and risk context-and connects these preferences to the respondents' roles, experience, and education levels. The findings also expose a critical gap between lab-based robotics capabilities and the realities of field deployment, highlighting the need for more meaningful collaboration between FRs and robotics researchers. These insights contribute to the development of more user-aligned and situationally aware robotic systems for emergency response.
2Fast-2Lamaa: Large-Scale Lidar-Inertial Localization and Mapping with Continuous Distance Fields
This paper introduces 2Fast-2Lamaa, a lidar-inertial state estimation framework for odometry, mapping, and localization. Its first key component is the optimization-based undistortion of lidar scans, which uses continuous IMU preintegration to model the system's pose at every lidar point timestamp. The continuous trajectory over 100-200ms is parameterized only by the initial scan conditions (linear velocity and gravity orientation) and IMU biases, yielding eleven state variables. These are estimated by minimizing point-to-line and point-to-plane distances between lidar-extracted features without relying on previous estimates, resulting in a prior-less motion-distortion correction strategy. Because the method performs local state estimation, it directly provides scan-to-scan odometry. To maintain geometric consistency over longer periods, undistorted scans are used for scan-to-map registration. The map representation employs Gaussian Processes to form a continuous distance field, enabling point-to-surface distance queries anywhere in space. Poses of the undistorted scans are refined by minimizing these distances through non-linear least-squares optimization. For odometry and mapping, the map is built incrementally in real time; for pure localization, existing maps are reused. The incremental map construction also includes mechanisms for removing dynamic objects. We benchmark 2Fast-2Lamaa on 250km (over 10h) of public and self-collected datasets from both automotive and handheld systems. The framework achieves state-of-the-art performance across diverse and challenging scenarios, reaching odometry and localization errors as low as 0.27% and 0.06 m, respectively. The real-time implementation is publicly available at https://github.com/clegenti/2fast2lamaa.
Multiagent Systems
Reliable agent engineering should integrate machine-compatible organizational principles
As AI agents built on large language models (LLMs) become increasingly embedded in society, issues of coordination, control, delegation, and accountability are entangled with concerns over their reliability. To design and implement LLM agents around reliable operations, we should consider the task complexity in the application settings and reduce their limitations while striving to minimize agent failures and optimize resource efficiency. High-functioning human organizations have faced similar balancing issues, which led to evidence-based theories that seek to understand their functioning strategies. We examine the parallels between LLM agents and the compatible frameworks in organization science, focusing on what the design, scaling, and management of organizations can inform agentic systems towards improving reliability. We offer three preliminary accounts of organizational principles for AI agent engineering to attain reliability and effectiveness, through balancing agency and capabilities in agent design, resource constraints and performance benefits in agent scaling, and internal and external mechanisms in agent management. Our work extends the growing exchanges between the operational and governance principles of AI systems and social systems to facilitate system integration.
comment: 20 pages incl. references, comments are welcome
Understanding Individual Decision-Making in Multi-Agent Reinforcement Learning: A Dynamical Systems Approach
Analysing learning behaviour in Multi-Agent Reinforcement Learning (MARL) environments is challenging, in particular with respect to \textit{individual} decision-making. Practitioners frequently tend to study or compare MARL algorithms from a qualitative perspective largely due to the inherent stochasticity in practical algorithms arising from random dithering exploration strategies, environment transition noise, and stochastic gradient updates to name a few. Traditional analytical approaches, such as replicator dynamics, often rely on mean-field approximations to remove stochastic effects, but this simplification, whilst able to provide general overall trends, might lead to dissonance between analytical predictions and actual realisations of individual trajectories. In this paper, we propose a novel perspective on MARL systems by modelling them as \textit{coupled stochastic dynamical systems}, capturing both agent interactions and environmental characteristics. Leveraging tools from dynamical systems theory, we analyse the stability and sensitivity of agent behaviour at individual level, which are key dimensions for their practical deployments, for example, in presence of strict safety requirements. This framework allows us, for the first time, to rigorously study MARL dynamics taking into consideration their inherent stochasticity, providing a deeper understanding of system behaviour and practical insights for the design and control of multi-agent learning processes.
Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics
As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has profound implications for safety, coordination, and the design of AI-driven social and economic infrastructures. Assessing such behaviour requires methods that capture not only what LLMs output, but the underlying intentions that guide their decisions. In this work, we extend the FAIRGAME framework to systematically evaluate LLM behaviour in repeated social dilemmas through two complementary advances: a payoff-scaled Prisoners Dilemma isolating sensitivity to incentive magnitude, and an integrated multi-agent Public Goods Game with dynamic payoffs and multi-agent histories. These environments reveal consistent behavioural signatures across models and languages, including incentive-sensitive cooperation, cross-linguistic divergence and end-game alignment toward defection. To interpret these patterns, we train traditional supervised classification models on canonical repeated-game strategies and apply them to FAIRGAME trajectories, showing that LLMs exhibit systematic, model- and language-dependent behavioural intentions, with linguistic framing at times exerting effects as strong as architectural differences. Together, these findings provide a unified methodological foundation for auditing LLMs as strategic agents and reveal systematic cooperation biases with direct implications for AI governance, collective decision-making, and the design of safe multi-agent systems.
Social welfare optimisation in well-mixed and structured populations
Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive cost while maximising the frequency of cooperation. However, the optimal value of social welfare under such constraints remains largely unexplored. In this work, we hypothesise that achieving maximal social welfare is not guaranteed at the minimal incentive cost required to drive agents to a desired cooperative state. To address this gap, we adopt to a single-objective approach focused on maximising social welfare, building upon foundational evolutionary game theory models that examined cost efficiency in finite populations, in both well-mixed and structured population settings. Our analytical model and agent-based simulations show how different interference strategies, including rewarding local versus global behavioural patterns, affect social welfare and dynamics of cooperation. Our results reveal a significant gap in the per-individual incentive cost between optimising for pure cost efficiency or cooperation frequency and optimising for maximal social welfare. Overall, our findings indicate that incentive design, policy, and benchmarking in multi-agent systems and human societies should prioritise welfare-centric objectives over proxy targets of cost or cooperation frequency.
Characterizing Lane-Changing Behavior in Mixed Traffic
Characterizing and understanding lane-changing behavior in the presence of automated vehicles (AVs) is crucial to ensuring safety and efficiency in mixed traffic. Accordingly, this study aims to characterize the interactions between the lane-changing vehicle (active vehicle) and the vehicle directly impacted by the maneuver in the target lane (passive vehicle). Utilizing real-world trajectory data from the Waymo Open Motion Dataset (WOMD), this study explores patterns in lane-changing behavior and provides insight into how these behaviors evolve under different AV market penetration rates (MPRs). In particular, we propose a game-theoretic framework to analyze cooperative and defective behaviors in mixed traffic, applied to the 7,636 observed lane-changing events in the WOMD. First, we utilize k-means clustering to classify vehicles as cooperative or defective, revealing that the proportions of cooperative AVs are higher than those of HDVs in both active and passive roles. Next, we jointly estimate the utilities of active and passive vehicles to model their behaviors using the quantal response equilibrium framework. Empirical payoff tables are then constructed based on these utilities. Using these payoffs, we analyze the presence of social dilemmas and examine the evolution of cooperative behaviors using evolutionary game theory. Our results reveal the presence of social dilemmas in approximately 4% and 11% of lane-changing events for active and passive vehicles, respectively, with most classified as Stag Hunt or Prisoner's Dilemma (Chicken Game rarely observed). Moreover, the Monte Carlo simulation results show that repeated lane-changing interactions consistently lead to increased cooperative behavior over time, regardless of the AV penetration rate.
MASim: Multilingual Agent-Based Simulation for Social Science
Multi-agent role-playing has recently shown promise for studying social behavior with language agents, but existing simulations are mostly monolingual and fail to model cross-lingual interaction, an essential property of real societies. We introduce MASim, the first multilingual agent-based simulation framework that supports multi-turn interaction among generative agents with diverse sociolinguistic profiles. MASim offers two key analyses: (i) global public opinion modeling, by simulating how attitudes toward open-domain hypotheses evolve across languages and cultures, and (ii) media influence and information diffusion, via autonomous news agents that dynamically generate content and shape user behavior. To instantiate simulations, we construct the MAPS benchmark, which combines survey questions and demographic personas drawn from global population distributions. Experiments on calibration, sensitivity, consistency, and cultural case studies show that MASim reproduces sociocultural phenomena and highlights the importance of multilingual simulation for scalable, controlled computational social science.
Time-Varying Formation Tracking Control of Wheeled Mobile Robots With Region Constraint: A Generalized Udwadia-Kalaba Framework
In this paper, the time-varying formation tracking control of wheeled mobile robots with region constraint is investigated from a generalized Udwadia-Kalaba framework. The communication topology is directed, weighted and has a spanning tree with the leader being the root. By reformulating the time-varying formation tracking control objective as a constrained equation and transforming the region constraint by a diffeomorphism, the time-varying formation tracking controller with the region constraint is designed under the generalized Udwadia-Kalaba framework. Compared with the existing works on time-varying formation tracking control, the region constraint is takeninto account in this paper, which ensures the safety of the robots.Finally, some numerical simulations are presented to illustrate the effectiveness of the proposed control strategy.
comment: 10 pages,9 figures
SGEMAS: A Self-Growing Ephemeral Multi-Agent System for Unsupervised Online Anomaly Detection via Entropic Homeostasis
Current deep learning approaches for physiological signal monitoring suffer from static topologies and constant energy consumption. We introduce SGEMAS (Self-Growing Ephemeral Multi-Agent System), a bio-inspired architecture that treats intelligence as a dynamic thermodynamic process. By coupling a structural plasticity mechanism (agent birth death) to a variational free energy objective, the system naturally evolves to minimize prediction error with extreme sparsity. An ablation study on the MIT-BIH Arrhythmia Database reveals that adding a multi-scale instability index to the agent dynamics significantly improves performance. In a challenging inter-patient, zero-shot setting, the final SGEMAS v3.3 model achieves a mean AUC of 0.570 +- 0.070, outperforming both its simpler variants and a standard autoencoder baseline. This result validates that a physics-based, energy-constrained model can achieve robust unsupervised anomaly detection, offering a promising direction for efficient biomedical AI.
EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory
Planning has been a cornerstone of artificial intelligence for solving complex problems, and recent progress in LLM-based multi-agent frameworks have begun to extend this capability. However, the role of human-like memory within these frameworks remains largely unexplored. Understanding how agents coordinate through memory is critical for natural language planning, where iterative reasoning, constraint tracking, and error correction drive the success. Inspired by working memory model in cognitive psychology, we present EvoMem, a multi-agent framework built on a dual-evolving memory mechanism. The framework consists of three agents (Constraint Extractor, Verifier, and Actor) and two memory modules: Constraint Memory (CMem), which evolves across queries by storing task-specific rules and constraints while remains fixed within a query, and Query-feedback Memory (QMem), which evolves within a query by accumulating feedback across iterations for solution refinement. Both memory modules are reset at the end of each query session. Evaluations on trip planning, meeting planning, and calendar scheduling show consistent performance improvements, highlighting the effectiveness of EvoMem. This success underscores the importance of memory in enhancing multi-agent planning.
Computing Evolutionarily Stable Strategies in Multiplayer Games
We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
Learning to Deliberate: Meta-policy Collaboration for Agentic LLMs with Multi-agent Reinforcement Learning
Multi-agent systems of large language models (LLMs) show promise for complex reasoning, but their effectiveness is often limited by fixed collaboration protocols. These frameworks typically focus on macro-level orchestration while overlooking agents' internal deliberative capabilities. This critical meta-cognitive blindspot treats agents as passive executors unable to adapt their strategy based on internal cognitive states like uncertainty or confidence. We introduce the Meta-Policy Deliberation Framework (MPDF), where agents learn a decentralized policy over a set of high-level meta-cognitive actions: Persist, Refine, and Concede. To overcome the instability of traditional policy gradients in this setting, we develop SoftRankPO, a novel reinforcement learning algorithm. SoftRankPO stabilizes training by shaping advantages based on the rank of rewards mapped through smooth normal quantiles, making the learning process robust to reward variance. Experiments show that MPDF with SoftRankPO achieves a a 4-5% absolute gain in average accuracy across five mathematical and general reasoning benchmarks compared to six state-of-the-art heuristic and learning-based multi-agent reasoning algorithms. Our work presents a paradigm for learning adaptive, meta-cognitive policies for multi-agent LLM systems, shifting the focus from designing fixed protocols to learning dynamic, deliberative strategies.
Systems and Control (EESS)
Augmented Neural Ordinary Differential Equations for Power System Identification
Due the complexity of modern power systems, modeling based on first-order principles becomes increasingly difficult. As an alternative, dynamical models for simulation and control design can be obtained by black-box identification techniques. One such technique for the identification of continuous-time systems are neural ordinary differential equations. For training and inference, they require initial values of system states, such as phase angles and frequencies. While frequencies can typically be measured, phase angle measurements are usually not available. To tackle this problem, we propose a novel structure based on augmented neural ordinary differential equations, learning latent phase angle representations on historic observations with temporal convolutional networks. Our approach combines state-of-the art deep learning techniques, avoiding the necessity of phase angle information for the power system identification. Results show, that our approach clearly outperforms simpler augmentation techniques.
The explicit game-theoretic linear quadratic regulator for constrained multi-agent systems
We present an efficient algorithm to compute the explicit open-loop solution to both finite and infinite-horizon dynamic games subject to state and input constraints. Our approach relies on a multiparametric affine variational inequality characterization of the open-loop Nash equilibria and extends the classical explicit constrained LQR and MPC frameworks to multi-agent non-cooperative settings. A key practical implication is that linear-quadratic game-theoretic MPC becomes viable even at very high sampling rates for multi-agent systems of moderate size. Extensive numerical experiments demonstrate order-of-magnitude improvements in online computation time and solution accuracy compared with state-of-the-art game-theoretic solvers.
Bimorph Lithium Niobate Piezoelectric Micromachined Ultrasonic Transducers
Piezoelectric micromachined ultrasonic transducers (PMUTs) are widely used in applications that demand mechanical resilience, thermal stability, and compact form factors. Lead zirconate titanate (PZT) and aluminum nitride (AlN) active layers are used in PMUTs to enable acoustic actuation, sensing, or bidirectional operation. These platforms rely on bimorph films to maximize electromechanical coupling ($k^2$) through thin-film deposition, which uses intermediate electrode layers to establish opposing electric fields. Consequently, incumbent PMUT platforms are limited in achievable film thickness and feature material interfaces that compromise mechanical integrity and thermal performance. Combined with the intrinsic limitations of PZT and AlN, these factors motivate exploration of alternative PMUT material platforms. Recent efforts have sought to demonstrate that single-crystal lithium niobate (LN) is a promising candidate, offering substantially higher $k^2$ and bidirectional performance. Advances in LN film transfer technology have enabled the formation of periodically poled piezoelectric (P3F) LN, facilitating a bimorph stack without intermediate electrodes. In this work, we showcase bimorph PMUTs incorporating a mechanically robust, 20 micron thick P3F LN active layer. We establish the motivation for LN PMUTs through a material comparison, followed by extensive membrane geometry optimization and subsequent enhancement of the PMUT's $k^2$. We demonstrate a 775 kHz flexural mode device with a quality factor (Q) of 200 and an extracted $k^2$ of 6.4%, yielding a high transmit efficiency of 65 nm/V with a mechanically robust active layer. We leverage the high performance to demonstrate extreme-temperature resilience, showcasing stable device operation up to 600 degrees C and survival up to 900 degrees C, highlighting LN's potential as a resilient PMUT platform.
comment: 12 pages, 21 figures
Research on a Monitoring System for High-Voltage Cables in a Coal Mine Based on Intelligent Sensing Technology
Given the importance of monitoring the operational status of high-voltage cables in coal mines, this study investigates the application of intelligent sensing technology to the online monitoring of such cables. Taking an actual coal mine as a case study, a three-layer architecture high-voltage cable monitoring system was designed. The system employs high-frequency current sensors and distributed optical fiber temperature sensors to achieve real-time acquisition of partial discharge signals and temperature distribution data. Data analysis and fault diagnosis are performed through a combined approach of edge computing and cloud computing. The research results demonstrate that the system can accurately identify cable insulation defects and potential overheating hazards, with a diagnostic accuracy exceeding 95%, thereby significantly enhancing the reliability of power supply in mines.
Linear Quadratic Control with Non-Markovian and Non-Semimartingale Noise Models
The standard linear quadratic Gaussian (LQG) framework assumes a Brownian noise process and relies on classical stochastic calculus tools, such as those based on Itô calculus. In this paper, we solve a generalized linear quadratic optimal control problem where the process and measurement noises can be non-Markovian and non-semimartingale stochastic processes with sample paths that have low Hölder regularity. Since these noise models do not, in general, permit the use of the standard Itô calculus, we employ rough path theory to formulate and solve the problem. By leveraging signature representations and controlled rough paths, we derive the optimal state estimation and control strategies.
Obstacle Avoidance of UAV in Dynamic Environments Using Direction and Velocity-Adaptive Artificial Potential Field
The conventional Artificial Potential Field (APF) is fundamentally limited by the local minima issue and its inability to account for the kinematics of moving obstacles. This paper addresses the critical challenge of autonomous collision avoidance for Unmanned Aerial Vehicles (UAVs) operating in dynamic and cluttered airspace by proposing a novel Direction and Relative Velocity Weighted Artificial Potential Field (APF). In this approach, a bounded weighting function, $ω(θ,v_{e})$, is introduced to dynamically scale the repulsive potential based on the direction and velocity of the obstacle relative to the UAV. This robust APF formulation is integrated within a Model Predictive Control (MPC) framework to generate collision-free trajectories while adhering to kinematic constraints. Simulation results demonstrate that the proposed method effectively resolves local minima and significantly enhances safety by enabling smooth, predictive avoidance maneuvers. The system ensures superior path integrity and reliable performance, confirming its viability for autonomous navigation in complex environments.
Optimal Pulse Patterns through a Hybrid Optimal Control Perspective
Optimal pulse patterns (OPPs) are a modulation method in which the switching angles and levels of a switching signal are computed via an offline optimization procedure to minimize a performance metric, typically the harmonic distortions of the load current. Additional constraints can be incorporated into the optimization problem to achieve secondary objectives, such as the limitation of specific harmonics or the reduction of power converter losses. The resulting optimization problem, however, is highly nonconvex, featuring a trigonometric objective function and constraints as well as both real- and integer-valued optimization variables. This work casts the task of OPP synthesis for a multilevel converter as an optimal control problem of a hybrid system. This problem is in turn lifted into a convex but infinite-dimensional conic program of occupation measures using established methods in convex relaxations of optimal control. Lower bounds on the minimum achievable harmonic distortion are acquired by solving a sequence of semidefinite programs via the moment-sum-of-squares hierarchy, where each semidefinite program scales in a jointly linear manner with the numbers of permitted switching transitions and converter voltage levels.
comment: 8 pages, 4 tables, 8 figures
Data-Driven Robust Safety Verification for Markov Decision Processes
In this paper, we propose a data-driven robust safety verification framework for stochastic dynamical systems modeled as Markov decision processes with time-varying and uncertain transition probabilities. Rather than assuming access to the exact nominal transition kernel, we consider the realistic setting where only samples from multiple system executions are available. These samples may correspond to different transition models inside an ambiguity set around the nominal transition kernel. Using these observations, we construct a unified ambiguity set that captures both inherent run-to-run variability in the transition dynamics and finite-sample statistical uncertainty. This ambiguity set is formalized through a Wasserstein-distance ball around a nominal empirical distribution and naturally induces an interval Markov decision process representation of the underlying system. Within this representation, we introduce a robust safety function that characterizes reach-avoid type probabilistic safety under all transition kernels consistent with the interval Markov decision process. We further derive high-confidence safety guarantees for the true, unknown time-varying system. A numerical example illustrates the applicability and effectiveness of the proposed approach.
Control of Discrete-Time Linear Systems with Charge-Balanced Inputs
Electrical brain stimulation relies on externally applied currents to modulate neural activity, but safety constraints require each stimulation cycle to be charge-balanced, enforcing a zero net injected charge. However, how such charge-balanced stimulation works remains poorly understood. This paper investigates the ability of charge-balanced inputs to steer state trajectories in discrete-time linear systems. Motivated by both open-loop and adaptive neurostimulation protocols, we study two practically relevant input structures: periodic (repetitive) charge-balanced inputs and non-repetitive charge-balanced inputs. For each case, we derive novel reachability and controllability conditions. The theoretical results are further validated through numerical demonstrations of minimum-energy control input design.
comment: 6 pages, 4 figures, submitted to IFAC Congress 2026
Scalable Formal Verification of Incremental Stability in Large-Scale Systems Using Graph Neural Networks
This work proposes a novel distributed framework for verifying the incremental stability of large-scale systems with unknown dynamics and known interconnection structures using graph neural networks. Our proposed approach relies on the construction of local incremental Lyapunov functions for subsystems, which are then composed together to obtain a suitable Lyapunov function for the interconnected system. Graph neural networks are used to synthesize these functions in a data-driven fashion. The formal correctness guarantee is then obtained by leveraging Lipschitz bounds of the trained neural networks. Finally, the effectiveness of our approach is validated through two nonlinear case studies.
Safety-Critical Control on Lie Groups Using Energy-Augmented Zeroing Control Barrier Functions
We study safety-critical control on fully actuated mechanical systems by means of Zeroing Control Barrier Functions (ZCBFs) defined on Lie Groups. Specifically, we introduce and theoretically validate two classes of ZCBFs. The first enforces kinematic constraints, suitable for implementing obstacle avoidance algorithms. The second enforces kinetic energy limits along prescribed inertial-frame translational and rotational directions, relevant for ensuring safe physical interaction. Numerical simulations involving slit traversal and safe landing scenarios are presented to validate the effectiveness and versatility of the proposed methodology.
Off-grid solar energy storage system with hybrid lithium iron phosphate (LFP) and lead-acid batteries in high mountains: a case report of Jiujiu Cabins in Taiwan
Mountain huts are buildings located at high altitude, offering a place for hikers and providing shelter. Energy supply on mountain huts is still an open issue. Using renewable energies could be an appropriate solution. Jiujiu Cabins, a famous mountain hut in Shei-Pa National Park, Taiwan, has operated an off-grid solar energy storage system (ESS) with lead-acid batteries. In 2021, a serious system failures took place, leading to no electricity. After an detailed on-site survey, a reorganization and repair project implemented, the energy system came back to operate normally. Meanwhile, a eco-friendly lithium iron phosphate battery (LFP battery) ESS replaces part of the lead-acid battery ESS, forming a hybrid ESS, making a better and green off-grid solar ESS. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided. An on-site survey of the failed energy system, a system improvement project, and future plan are listed.
comment: 10 pages, 9 figures, 3 tables
Model Predictive Control for Cooperative Docking Between Autonomous Surface Vehicles with Disturbance Rejection
Uncrewed Surface Vehicles (USVs) are a popular and efficient type of marine craft that find application in a large number of water-based tasks. When multiple USVs operate in the same area, they may be required to dock to each other to perform a shared task. Existing approaches for the docking between autonomous USVs generally consider one USV as a stationary target, while the second one is tasked to reach the required docking pose. In this work, we propose a cooperative approach for USV-USV docking, where two USVs work together to dock at an agreed location. We use a centralized Model Predictive Control (MPC) approach to solve the control problem, obtaining feasible trajectories that also guarantee constraint satisfaction. Owing to its model-based nature, this approach allows the rejection of disturbances, inclusive of exogenous inputs, by anticipating their effect on the USVs through the MPC prediction model. This is particularly effective in case of almost-stationary disturbances such as water currents. In simulations, we demonstrate how the proposed approach allows for a faster and more efficient docking with respect to existing approaches.
comment: 7 pages, 4 figures, submitted to IFAC World Congress 2026
SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks
Integrating autonomous mobile robots into human environments requires human-like decision-making and energy-efficient, event-based computation. Despite progress, neuromorphic methods are rarely applied to Deep Reinforcement Learning (DRL) navigation approaches due to unstable training. We address this gap with a hybrid socially integrated DRL actor-critic approach that combines Spiking Neural Networks (SNNs) in the actor with Artificial Neural Networks (ANNs) in the critic and a neuromorphic feature extractor to capture temporal crowd dynamics and human-robot interactions. Our approach enhances social navigation performance and reduces estimated energy consumption by approximately 1.69 orders of magnitude.
comment: 8 pages, 6 figures
Characterizing Lane-Changing Behavior in Mixed Traffic
Characterizing and understanding lane-changing behavior in the presence of automated vehicles (AVs) is crucial to ensuring safety and efficiency in mixed traffic. Accordingly, this study aims to characterize the interactions between the lane-changing vehicle (active vehicle) and the vehicle directly impacted by the maneuver in the target lane (passive vehicle). Utilizing real-world trajectory data from the Waymo Open Motion Dataset (WOMD), this study explores patterns in lane-changing behavior and provides insight into how these behaviors evolve under different AV market penetration rates (MPRs). In particular, we propose a game-theoretic framework to analyze cooperative and defective behaviors in mixed traffic, applied to the 7,636 observed lane-changing events in the WOMD. First, we utilize k-means clustering to classify vehicles as cooperative or defective, revealing that the proportions of cooperative AVs are higher than those of HDVs in both active and passive roles. Next, we jointly estimate the utilities of active and passive vehicles to model their behaviors using the quantal response equilibrium framework. Empirical payoff tables are then constructed based on these utilities. Using these payoffs, we analyze the presence of social dilemmas and examine the evolution of cooperative behaviors using evolutionary game theory. Our results reveal the presence of social dilemmas in approximately 4% and 11% of lane-changing events for active and passive vehicles, respectively, with most classified as Stag Hunt or Prisoner's Dilemma (Chicken Game rarely observed). Moreover, the Monte Carlo simulation results show that repeated lane-changing interactions consistently lead to increased cooperative behavior over time, regardless of the AV penetration rate.
A Structured Review of Fixed and Multimodal Sensing Techniques for Bat Monitoring
Effective monitoring of mobile animal populations is crucial for ecological research, wildlife management, and agricultural applications. Monitoring of bats specifically can help understand the spread of disease as well as shine light on bat migration patterns, population dynamics, and the impacts of environmental changes on bat colonies. Fixed sensing modalities, such as infrared sensors, cameras, radar, and acoustic detectors, play a pivotal role in tracking and understanding animal behavior. This survey goes over context-informing details about bat biology, and then reviews these fixed sensing modalities, discussing the unique challenges and contributions of each approach. We highlight the coverage, applications, accuracy, and limitations associated with each of these sensing modalities. By synthesizing recent advances, we provide a comprehensive overview to guide future research in this area.
Surrogate compliance modeling enables reinforcement learned locomotion gaits for soft robots
Adaptive morphogenetic robots adapt their morphology and control policies to meet changing tasks and environmental conditions. Many such systems leverage soft components, which enable shape morphing but also introduce simulation and control challenges. Soft-body simulators remain limited in accuracy and computational tractability, while rigid-body simulators cannot capture soft-material dynamics. Here, we present a surrogate compliance modeling approach: rather than explicitly modeling soft-body physics, we introduce indirect variables representing soft-material deformation within a rigid-body simulator. We validate this approach using our amphibious robotic turtle, a quadruped with soft morphing limbs designed for multi-environment locomotion. By capturing deformation effects as changes in effective limb length and limb center of mass, and by applying reinforcement learning with extensive randomization of these indirect variables, we achieve reliable policy learning entirely in a rigid-body simulation. The resulting gaits transfer directly to hardware, demonstrating high-fidelity sim-to-real performance on hard, flat substrates and robust, though lower-fidelity, transfer on rheologically complex terrains. The learned closed-loop gaits exhibit unprecedented terrestrial maneuverability and achieve an order-of-magnitude reduction in cost of transport compared to open-loop baselines. Field experiments with the robot further demonstrate stable, multi-gait locomotion across diverse natural terrains, including gravel, grass, and mud.
Momentum-Accelerated Online Feedback Optimization for Power System Flexibility
Flexibility is increasingly gaining importance in modern power system operation. This paper presents a controller framework based on Online Feedback Optimization for real-time coordination of power system flexibility. The proposed approach introduces a momentum-augmented projection-step to accelerate convergence and improve dynamic performance. We derive the controller formulation, and evaluate its performance and stability in two representative case studies. The first examines online congestion management in distribution feeders, and the second addresses multi-layer flexibility dispatch across system interfaces. Numerical results demonstrate that the momentum-based controller achieves faster convergence and maintains constraint satisfaction, highlighting its potential for real-time flexibility control in large-scale power systems.
Random Access for LEO Satellite Communication Systems via Deep Learning
Integrating contention-based random access procedures into low Earth orbit (LEO) satellite communication (SatCom) systems poses new challenges, including long propagation delays, large Doppler shifts, and a large number of simultaneous access attempts. These factors degrade the efficiency and responsiveness of conventional random access schemes, particularly in scenarios such as satellite-based internet of things and direct-to-device services. In this paper, we propose a deep learning-based random access framework designed for LEO SatCom systems. The framework incorporates an early preamble collision classifier that uses multi-antenna correlation features and a lightweight 1D convolutional neural network to estimate the number of collided users at the earliest stage. Based on this estimate, we introduce an opportunistic transmission scheme that balances access probability and resource efficiency to improve success rates and reduce delay. Simulation results under 3GPP-compliant LEO settings confirm that the proposed framework achieves higher access success probability, lower delay, better physical uplink shared channel utilization, and reduced computational complexity compared to existing schemes.
comment: 12 pages, 8 figures, 4 tables
Mitigation of Datacenter Demand Ramping and Fluctuation using Hybrid ESS and Supercapacitor
This paper proposes a hybrid energy storage system (HESS)-based control framework that enables comprehensive power smoothing for hyperscale AI datacenters with large load variations. Datacenters impose severe ramping and fluctuation-induced stresses on the grid frequency and voltage stability. To mitigate such disturbances, the proposed HESS integrates a battery energy storage system (BESS) and a supercapacitor (SC) through coordinated multi-timescale control. A high-pass filter (HPF) separates the datacenter demand into slow and fast components, allocating them respectively to the ESS via a leaky-integral controller and to the SC via a phase-lead proportional-derivative controller enhanced with feedforward and ramp-tracking compensation. Adaptive weighting and repetitive control mechanisms further improve transient and periodic responses. Case studies verify that the proposed method effectively suppresses both ramping and fluctuations, stabilizes the system frequency, and maintains sustainable state-of-charge (SoC) trajectories for both ESS and SC under prolonged, stochastic training cycles.
Cabin Layout, Seat Density, and Passenger Segmentation in Air Transport: Implications for Prices, Ancillary Revenues, and Efficiency
This study investigates how the layout and density of seats in aircraft cabins influence the pricing of airline tickets on domestic flights. The analysis is based on microdata from boarding passes linked to face-to-face interviews with passengers, allowing us to relate the price paid to the location on the aircraft seat map, as well as market characteristics and flight operations. Econometric models were estimated using the Post-Double-Selection LASSO (PDS-LASSO) procedure, which selects numerous controls for unobservable factors linked to commercial and operational aspects, thus enabling better identification of the effect of variables such as advance purchase, reason for travel, fuel price, market structure, and load factor, among others. The results suggest that a higher density of seat rows is associated with lower prices, reflecting economies of scale with the increase in aircraft size and gains in operational efficiency. An unexpected result was also obtained: in situations where there was no seat selection fee, passengers with more expensive tickets were often allocated middle seats due to purchasing at short notice, when the side alternatives were no longer available. This behavior helps explain the economic logic behind one of the main ancillary revenues of airlines. In addition to quantitative analysis, the study incorporates an exploratory approach to innovative cabin concepts and their possible effects on density and comfort on board.
Joint Activity Design Heuristics for Enhancing Human-Machine Collaboration
Joint activity describes when more than one agent (human or machine) contributes to the completion of a task or activity. Designing for joint activity focuses on explicitly supporting the interdependencies between agents necessary for effective coordination among agents engaged in the joint activity. This builds and expands upon designing for usability to further address how technologies can be designed to act as effective team players. Effective joint activity requires supporting, at minimum, five primary macrocognitive functions within teams: Event Detection, Sensemaking, Adaptability, Perspective-Shifting, and Coordination. Supporting these functions is equally as important as making technologies usable. We synthesized fourteen heuristics from relevant literature including display design, human factors, cognitive systems engineering, cognitive psychology, and computer science to aid the design, development, and evaluation of technologies that support joint human-machine activity.
comment: 10 pages
Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control
Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of limited sensing by formulating a Bayesian prior over the continuous-time dynamics and latent state trajectory in state-space form and updating it through a targeted marginal Metropolis-Hastings sampler equipped with a numerical ODE integrator. The resulting posterior samples are used to formulate a scenario-based optimal control problem that accounts for both model and measurement uncertainty and is solved using standard nonlinear programming methods. The approach is validated in a numerical case study on glucose regulation using a Type 1 diabetes model.
comment: Submitted to the 2026 IFAC World Congress
Sensor Attack Detection Method for Encrypted State Observers
This paper proposes an encrypted state observer that is capable of detecting sensor attacks without decryption. We first design a state observer that operates over a finite field of integers with the modular arithmetic. The observer generates a residue signal that indicates the presence of attacks under sparse attack and sensing redundancy conditions. Then, we develop a homomorphic encryption scheme that enables the observer to operate over encrypted data while automatically disclosing the residue signal. Unlike our previous work restricted to single-input single-output systems, the proposed scheme is applicable to general multi-input multi-output systems. Given that the disclosed residue signal remains below a prescribed threshold, the full state can be recovered as an encrypted message.
comment: Submitted to IFAC World Congress 2026
ALS Storage Ring RF Control System Upgrade Plan and Status
The Advanced Light Source (ALS) at Lawrence Berkeley National Laboratory, a third-generation synchrotron light source operational since 1992, is undergoing a comprehensive upgrade of its storage ring RF control system. The legacy Horner PLC controllers and remote I/O modules, now at end-of-life, are being replaced with an Allen-Bradley PLC platform to improve maintainability, reliability, and long-term support. This paper presents the planning, design, and current status of the upgrade project.
Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Large Language Models (LLMs) have demonstrated remarkable capabilities in knowledge acquisition, reasoning, and tool use, making them promising candidates for autonomous agent applications. However, training LLM agents for complex multi-turn task planning faces significant challenges, including sparse episode-wise rewards, credit assignment across long horizons, and the computational overhead of reinforcement learning in multi-turn interaction settings. To this end, this paper introduces a novel approach that transforms multi-turn task planning into single-turn task reasoning problems, enabling efficient policy optimization through Group Relative Policy Optimization (GRPO) with dense and verifiable reward from expert trajectories. Our theoretical analysis shows that GRPO improvement on single-turn task reasoning results in a lower bound of the multi-turn success probability under the minimal turns, as well as the generalization to subtasks with shorter horizons. Experimental evaluation on the complex task planning benchmark demonstrates that our 1.5B parameter model trained with single-turn GRPO achieves superior performance compared to larger baseline models up to 14B parameters, with success rates of 70% for long-horizon planning tasks.
comment: Accepted by IEEE Control Systems Letters (L-CSS)
Iterative VCG-based Mechanism Fosters Cooperation in Multi-Regional Network Design
Transportation network design often involves multiple stakeholders with diverse priorities. We consider a system with a hierarchical multi-agent structure, featuring self-optimized subnetwork operators at the lower level and a central organization at the upper level. Independent regional planning can lead to inefficiencies due to the lack of coordination, hindering interregional travel and cross-border infrastructure development, while centralized methods may struggle to align local interests and can be impractical to implement. To support decision making for such a system, we introduce an iterative VCG-based mechanism for multi-regional network design that fosters cooperation among subnetwork operators. By leveraging the Vickery-Clarke-Groves (VCG) mechanism, the framework determines collective investment decisions and the necessary payments from both operators and the central organization to achieve efficient outcomes. A case study on the European Railway System validates the effectiveness of the proposed method, demonstrating significant improvements in overall network performance through enhanced cross-region cooperation.
Linearizability of flows by embeddings
We consider the problem of determining the class of continuous-time dynamical systems that can be globally linearized in the sense of admitting an embedding into a linear system on a higher-dimensional Euclidean space. We solve this problem for dynamical systems on connected state spaces that are either compact or contain at least one nonempty compact attractor, obtaining necessary and sufficient conditions for the existence of linearizing $C^k$ embeddings for $k\in \mathbb{N}_{\geq 0}\cup \{\infty\}$. Corollaries include (i) several checkable necessary conditions for global linearizability and (ii) extensions of the Hartman-Grobman and Floquet normal form theorems beyond the classical settings. Our results open new perspectives on linearizability by establishing relationships to symmetry, topology, and invariant manifold theory.
comment: Accepted for publication in Selecta Mathematica
Physics-Informed Inductive Biases for Voltage Prediction in Distribution Grids
Voltage prediction in distribution grids is a critical yet difficult task for maintaining power system stability. Machine learning approaches, particularly Graph Neural Networks (GNNs), offer significant speedups but suffer from poor generalization when trained on limited or incomplete data. In this work, we systematically investigate the role of inductive biases in improving a model's ability to reliably learn power flow. Specifically, we evaluate three physics-informed strategies: (i) power-flow-constrained loss functions, (ii) complex-valued neural networks, and (iii) residual-based task reformulation. Using the ENGAGE dataset, which spans multiple low- and medium-voltage grid configurations, we conduct controlled experiments to isolate the effect of each inductive bias and assess both standard predictive performance and out-of-distribution generalization. Our study provides practical insights into which model assumptions most effectively guide learning for reliable and efficient voltage prediction in modern distribution networks.
An Information Theory of Finite Abstractions and their Fundamental Scalability Limits
Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same ``accuracy" (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And yet, after decades of research on abstractions, there are no formal results on their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions' accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory -- the information theory of lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems. Rate measures abstraction size, while distortion describes accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum achievable abstraction distortion, given the system dynamics and the abstraction size; and vice-versa a lower bound on the minimum size, for given distortion. The bound depends on the complexity of the dynamics, through trajectory entropy. We demonstrate its tightness on some dynamical systems. Finally, we showcase how this new theory enables constructing minimal abstractions, optimizing the size-accuracy tradeoff, through an example on a chaotic system.
An Open-Source Soft Robotic Platform for Autonomous Aerial Manipulation in the Wild
Aerial manipulation combines the versatility and speed of flying platforms with the functional capabilities of mobile manipulation, which presents significant challenges due to the need for precise localization and control. Traditionally, researchers have relied on offboard perception systems, which are limited to expensive and impractical specially equipped indoor environments. In this work, we introduce a novel platform for autonomous aerial manipulation that exclusively utilizes onboard perception systems. Our platform can perform aerial manipulation in various indoor and outdoor environments without depending on external perception systems. Our experimental results demonstrate the platform's ability to autonomously grasp various objects in diverse settings. This advancement significantly improves the scalability and practicality of aerial manipulation applications by eliminating the need for costly tracking solutions. To accelerate future research, we open source our ROS 2 software stack and custom hardware design, making our contributions accessible to the broader research community.
comment: Project website: https://sites.google.com/view/open-source-soft-platform/open-source-soft-robotic-platform GitHub: https://github.com/raptor-ethz
InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Power Grid Control
The transition toward power grids with high renewable penetration demands context-aware decision making frameworks. Traditional operational paradigms, which rely on static optimization of history-based load forecasting, often fail to capture the complex nature of real-time operational conditions, such as operator-issued maintenance mandates, emergency topology changes, or event-driven load surges. To address this challenge, we introduce InstructMPC, a closed-loop framework that integrates Large Language Models~(LLMs) to generate context-aware predictions, enabling the controller to optimize power system operation. Our method employs a Contextual Disturbances Predictor~(CDP) module to translate contextual information into predictive disturbance trajectories, which are then incorporated into the Model Predictive Control~(MPC) optimization. Unlike conventional open-loop forecasting frameworks, InstructMPC features an online tuning mechanism where the predictor's parameters are continuously updated based on the realized control cost with a theoretical guarantee, achieving a regret bound of $O(\sqrt{T \log T})$ for linear dynamics when optimized via a tailored loss function, ensuring task-aware learning and adaption to non-stationary grid conditions.
Computation of Feasible Assume-Guarantee Contracts: A Resilience-based Approach
We propose a resilience-based framework for computing feasible assume-guarantee contracts that ensure the satisfaction of temporal specifications in interconnected discrete-time systems. Interconnection effects are modeled as structured disturbances. We use a resilience metric, the maximum disturbance under which local specifications hold, to refine assumptions and guarantees across subsystems iteratively. We first demonstrate correctness and monotone refinement of guarantees for two subsystems. Then, we extend our approach to general networks of L subsystems using weighted combinations of interconnection effects. We instantiate the framework on linear systems by meeting finite-horizon safety, exact-time reachability, and finite-horizon reachability specifications, and on nonlinear systems by fulfilling general finite-horizon specifications. Our approach is demonstrated through numerical linear examples and a nonlinear DC microgrid case study, showcasing the impact of our framework on verifying temporal logic specifications with compositional reasoning.
Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture
In commercial robotic systems, it is common to encounter a closed inner-loop torque controller that is not user-modifiable. However, the outer-loop controller, which sends kinematic commands such as position or velocity for the inner-loop controller to track, is typically exposed to users. In this work, we focus on the development of an easily integrated add-on at the outer-loop layer by combining disturbance rejection control and robust control barrier function for high-performance tracking and safe control of the whole dynamic system of an industrial manipulator. This is particularly beneficial when 1) the inner-loop controller is imperfect, unmodifiable, and uncertain; and 2) the dynamic model exhibits significant uncertainty. Stability analysis, formal safety guarantee proof, and hardware experiments with a PUMA robotic manipulator are presented. Our solution demonstrates superior performance in terms of simplicity of implementation, robustness, tracking precision, and safety compared to the state of the art. Video: https://youtu.be/zw1tanvrV8Q
comment: Extended version of the paper submitted for publication. This document contains detailed mathematical derivations and additional experimental results omitted from the submission due to page constraints
Stability-Guaranteed Dual Kalman Filtering for Electrochemical Battery State Estimation
Accurate and stable state estimation is critical for battery management. Although dual Kalman filtering can jointly estimate states and parameters, the strong coupling between filters may cause divergence under large initialization errors or model mismatch. This paper proposes a Stability Guaranteed Dual Kalman Filtering (SG-DKF) method. A Lyapunov-based analysis yields a sufficient stability condition, leading to an adaptive dead-zone rule that suspends parameter updates when the innovation exceeds a stability bound. Applied to an electrochemical battery model, SG-DKF achieves accuracy comparable to a dual EKF and reduces state of charge RMSE by over 45% under large initial state errors.
comment: This work has been submitted to 23rd IFAC World Congress for possible publication
Optimizing Parameter Estimation for Electrochemical Battery Model: A Comparative Analysis of Operating Profiles on Computational Efficiency and Accuracy
Parameter estimation in electrochemical models remains a significant challenge in their application. This study investigates the impact of different operating profiles on electrochemical model parameter estimation to identify the optimal conditions. In particular, the present study is focused on Nickel Manganese Cobalt Oxide(NMC) lithium-ion batteries. Based on five fundamental current profiles (C/5, C/2, 1C, Pulse, DST), 31 combinations of conditions were generated and used for parameter estimation and validation, resulting in 961 evaluation outcomes. The Particle Swarm Optimization is employed for parameter identification in electrochemical models, specifically using the Single Particle Model (SPM). The analysis considered three dimensions: model voltage output error, parameter estimation error, and time cost. Results show that using all five profiles (C/5, C/2, 1C, Pulse, DST) minimizes voltage output error, while {C/5, C/2, Pulse, DST} minimizes parameter estimation error. The shortest time cost is achieved with {1C}. When considering both model voltage output and parameter errors, {C/5, C/2, 1C, DST} is optimal. For minimizing model voltage output error and time cost, {C/2, 1C} is best, while {1C} is ideal for parameter error and time cost. The comprehensive optimal condition is {C/5, C/2, 1C, DST}. These findings provide guidance for selecting current conditions tailored to specific needs.
comment: This manuscript has been peer-reviewed and accepted for publication in the Journal of Power Sources. The final published version is available at the official journal site: https://doi.org/10.1016/j.jpowsour.2025.239044.Please cite the published version
Online Energy Storage Arbitrage under Imperfect Predictions: A Conformal Risk-Aware Approach
This work proposes a conformal approach for energy storage arbitrage to control the downside risk arising from imperfect price forecasts. Energy storage arbitrage relies solely on predictions of future market prices, while inaccurate price predictions may lead to significant profit losses. Based on conformal decision theory, we describe a controller that dynamically adjusts decision conservativeness through prediction sets without distributional assumptions. To enable online calibration when online profit loss feedback is unobservable, we establish that a temporal difference error serves as a measurable proxy. Building on this insight, we develop two online calibration strategies: prediction error-based adaptation targeting forecast accuracy, and value error-based calibration focusing on decision quality. Analysis of the conformal controller proves bounded long-term risk with convergence guarantees in temporal difference error, which further effectively manages risk exposure in potential profit losses. Case studies demonstrate superior performance in balancing risk and opportunity compared to benchmarks under varying forecast conditions.
Fast Switching in Mixed-Integer Model Predictive Control
We deduce stability results for finite control set and mixed-integer model predictive control with a downstream oversampling phase. The presentation rests upon the inherent robustness of model predictive control with stabilizing terminal conditions and techniques for solving mixed-integer optimal control problems by continuous optimization. Partial outer convexification and binary relaxation transform mixed-integer problems into common optimal control problems. We deduce nominal asymptotic stability for the resulting relaxed system formulation and implement sum-up rounding to restore efficiently integer feasibility on an oversampling time grid. If fast control switching is technically possible and inexpensive, we can approximate the relaxed system behavior in the state space arbitrarily close. We integrate input perturbed model predictive control with practical asymptotic stability. Numerical experiments illustrate practical relevance of fast control switching.
comment: This preprint was revised based on the feedback from the reviewers and resubmitted to the IEEE. The previous version has been conditionally accepted for publication
Verifiable Mission Planning For Space Operations
Spacecraft must operate under environmental and actuator uncertainties while meeting strict safety requirements. Traditional approaches rely on scenario-based heuristics that fail to account for stochastic influences, leading to suboptimal or unsafe plans. We propose a finite-horizon, chance-constrained Markov decision process for mission planning, where states represent mission and vehicle parameters, actions correspond to operational adjustments, and temporal logic specifications encode operational reach-avoid constraints. We synthesize policies that optimize mission objectives while ensuring constraints are met with high probability. Applied to the GRACE-FO mission, the approach accounts for stochastic solar activity and uncertain thrust performance, yielding maneuver schedules that maximize scientific return and provably satisfy safety requirements. We demonstrate how Markov decision processes can be applied to space missions, enabling autonomous operation with formal guarantees.
comment: Submitted to the 2025 AAS/AIAA Astrodynamics Specialist Conference
The Battle of the Water Futures
The highly anticipated 'Battle of the Water Networks' is back with a new challenge for the water community. This competition will be hosted at the 4th International Joint Conference on Water Distribution Systems Analysis and Computing and Control in the Water Industry (WDSA/CCWI 2026), taking place in Paphos, Cyprus, from May 18-21, 2026. This competition embodies the core mission of Water-Futures and the theme for WDSA/CCWI 2026: "Designing the next generation of urban water (and wastewater) systems." The objective is to design and operate a water distribution system over a long-term horizon under deep uncertainty, with interventions applied in stages. For the first time, this challenge features a staged-design approach, unobservable and unknown uncertainties, and incorporates elements of policymaking and artificial intelligence. The solutions will be assessed using a transparent and inspectable open-source evaluation framework.
Robotics
CERNet: Class-Embedding Predictive-Coding RNN for Unified Robot Motion, Recognition, and Confidence Estimation
Robots interacting with humans must not only generate learned movements in real-time, but also infer the intent behind observed behaviors and estimate the confidence of their own inferences. This paper proposes a unified model that achieves all three capabilities within a single hierarchical predictive-coding recurrent neural network (PC-RNN) equipped with a class embedding vector, CERNet, which leverages a dynamically updated class embedding vector to unify motor generation and recognition. The model operates in two modes: generation and inference. In the generation mode, the class embedding constrains the hidden state dynamics to a class-specific subspace; in the inference mode, it is optimized online to minimize prediction error, enabling real-time recognition. Validated on a humanoid robot across 26 kinesthetically taught alphabets, our hierarchical model achieves 76% lower trajectory reproduction error than a parameter-matched single-layer baseline, maintains motion fidelity under external perturbations, and infers the demonstrated trajectory class online with 68% Top-1 and 81% Top-2 accuracy. Furthermore, internal prediction errors naturally reflect the model's confidence in its recognition. This integration of robust generation, real-time recognition, and intrinsic uncertainty estimation within a compact PC-RNN framework offers a compact and extensible approach to motor memory in physical robots, with potential applications in intent-sensitive human-robot collaboration.
A Hetero-Associative Sequential Memory Model Utilizing Neuromorphic Signals: Validated on a Mobile Manipulator
This paper presents a hetero-associative sequential memory system for mobile manipulators that learns compact, neuromorphic bindings between robot joint states and tactile observations to produce step-wise action decisions with low compute and memory cost. The method encodes joint angles via population place coding and converts skin-measured forces into spike-rate features using an Izhikevich neuron model; both signals are transformed into bipolar binary vectors and bound element-wise to create associations stored in a large-capacity sequential memory. To improve separability in binary space and inject geometry from touch, we introduce 3D rotary positional embeddings that rotate subspaces as a function of sensed force direction, enabling fuzzy retrieval through a softmax weighted recall over temporally shifted action patterns. On a Toyota Human Support Robot covered by robot skin, the hetero-associative sequential memory system realizes a pseudocompliance controller that moves the link under touch in the direction and with speed correlating to the amplitude of applied force, and it retrieves multi-joint grasp sequences by continuing tactile input. The system sets up quickly, trains from synchronized streams of states and observations, and exhibits a degree of generalization while remaining economical. Results demonstrate single-joint and full-arm behaviors executed via associative recall, and suggest extensions to imitation learning, motion planning, and multi-modal integration.
Parametric Design of a Cable-Driven Coaxial Spherical Parallel Mechanism for Ultrasound Scans
Haptic interfaces play a critical role in medical teleoperation by enabling surgeons to interact with remote environments through realistic force and motion feedback. Achieving high fidelity in such systems requires balancing performance trade-off among workspace, dexterity, stiffness, inertia, and bandwidth, particularly in applications demanding pure rotational motion. This paper presents the design methodology and kinematic analysis of a Cable-Driven Coaxial Spherical Parallel Mechanism (CDC-SPM) developed to address these challenges. The proposed cable-driven interface design allows for reducing the mass placed at the robot arm end-effector, thereby minimizing inertial loads, enhancing stiffness, and improving dynamic responsiveness. Through parallel and coaxial actuation, the mechanism achieves decoupled rotational degrees of freedom with isotropic force and torque transmission. Simulation and analysis demonstrate that the CDC-SPM provides accurate, responsive, and safe motion characteristics suitable for high-precision haptic applications. These results highlight the mechanism's potential for medical teleoperation tasks such as ultrasound imaging, where precise and intuitive manipulation is essential.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Generalization in robot manipulation is essential for deploying robots in open-world environments and advancing toward artificial general intelligence. While recent Vision-Language-Action (VLA) models leverage large pre-trained understanding models for perception and instruction following, their ability to generalize to novel tasks, objects, and settings remains limited. In this work, we present VideoVLA, a simple approach that explores the potential of transforming large video generation models into robotic VLA manipulators. Given a language instruction and an image, VideoVLA predicts an action sequence as well as the future visual outcomes. Built on a multi-modal Diffusion Transformer, VideoVLA jointly models video, language, and action modalities, using pre-trained video generative models for joint visual and action forecasting. Our experiments show that high-quality imagined futures correlate with reliable action predictions and task success, highlighting the importance of visual imagination in manipulation. VideoVLA demonstrates strong generalization, including imitating other embodiments' skills and handling novel objects. This dual-prediction strategy - forecasting both actions and their visual consequences - explores a paradigm shift in robot learning and unlocks generalization capabilities in manipulation systems.
comment: Project page: https://videovla-nips2025.github.io
Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge NeurIPS
We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation, requiring bimanual manipulation, navigation, and context-aware decision making. Building on the Pi0.5 architecture, we introduce several innovations. Our primary contribution is correlated noise for flow matching, which improves training efficiency and enables correlation-aware inpainting for smooth action sequences. We also apply learnable mixed-layer attention and System 2 stage tracking for ambiguity resolution. Training employs multi-sample flow matching to reduce variance, while inference uses action compression and challenge-specific correction rules. Our approach achieves 26% q-score across all 50 tasks on both public and private leaderboards.
comment: 2025 NeurIPS Behavior Challenge 1st place solution
Interconnection and Damping Assignment Passivity-Based Control using Sparse Neural ODEs
Interconnection and Damping Assignment Passivity-Based Control (IDA-PBC) is a nonlinear control technique that assigns a port-Hamiltonian (pH) structure to a controlled system using a state-feedback law. While IDA-PBC has been extensively studied and applied to many systems, its practical implementation often remains confined to academic examples and, almost exclusively, to stabilization tasks. The main limitation of IDA-PBC stems from the complexity of analytically solving a set of partial differential equations (PDEs), referred to as the matching conditions, which enforce the pH structure of the closed-loop system. However, this is extremely challenging, especially for complex physical systems and tasks. In this work, we propose a novel numerical approach for designing IDA-PBC controllers without solving the matching PDEs exactly. We cast the IDA-PBC problem as the learning of a neural ordinary differential equation. In particular, we rely on sparse dictionary learning to parametrize the desired closed-loop system as a sparse linear combination of nonlinear state-dependent functions. Optimization of the controller parameters is achieved by solving a multi-objective optimization problem whose cost function is composed of a generic task-dependent cost and a matching condition-dependent cost. Our numerical results show that the proposed method enables (i) IDA-PBC to be applicable to complex tasks beyond stabilization, such as the discovery of periodic oscillatory behaviors, (ii) the derivation of closed-form expressions of the controlled system, including residual terms
Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields ICRA 2026
For centuries, khalasi have skillfully harnessed ocean currents to navigate vast waters with minimal effort. Emulating this intuition in autonomous systems remains a significant challenge, particularly for Autonomous Surface Vehicles tasked with long duration missions under strict energy budgets. In this work, we present a learning-based approach for energy-efficient surface vehicle navigation in vortical flow fields, where partial observability often undermines traditional path-planning methods. We present an end to end reinforcement learning framework based on Soft Actor Critic that learns flow-aware navigation policies using only local velocity measurements. Through extensive evaluation across diverse and dynamically rich scenarios, our method demonstrates substantial energy savings and robust generalization to previously unseen flow conditions, offering a promising path toward long term autonomy in ocean environments. The navigation paths generated by our proposed approach show an improvement in energy conservation 30 to 50 percent compared to the existing state of the art techniques.
comment: Under Review for International Conference on Robotics and Automation (ICRA 2026)
Ground Compliance Improves Retention of Visual Feedback-Based Propulsion Training for Gait Rehabilitation
This study investigates whether adding ground compliance to visual feedback (VF) gait training is more effective at increasing push-off force (POF) compared to using VF alone, with implications for gait rehabilitation. Ten healthy participants walked on a custom split-belt treadmill. All participants received real-time visual feedback of their ground reaction forces. One group also experienced changes in ground compliance, while a control group received only visual feedback. Intentional increases in propulsive ground reaction forces (POF) were successfully achieved and sustained post-intervention, especially in the group that experienced ground compliance. This group also demonstrated lasting after-effects in muscle activity and joint kinematics, indicating a more robust learning of natural strategies to increase propulsion. This work demonstrates how visual and proprioceptive systems coordinate during gait adaptation. It uniquely shows that combining ground compliance with visual feedback enhances the learning of propulsive forces, supporting the potential use of compliant terrain in long-term rehabilitation targeting propulsion deficits, such as those following a stroke.
Control of Powered Ankle-Foot Prostheses on Compliant Terrain: A Quantitative Approach to Stability Enhancement
Walking on compliant terrain presents a substantial challenge for individuals with lower-limb amputation, further elevating their already high risk of falling. While powered ankle-foot prostheses have demonstrated adaptability across speeds and rigid terrains, control strategies optimized for soft or compliant surfaces remain underexplored. This work experimentally validates an admittance-based control strategy that dynamically adjusts the quasi-stiffness of powered prostheses to enhance gait stability on compliant ground. Human subject experiments were conducted with three healthy individuals walking on two bilaterally compliant surfaces with ground stiffness values of 63 and 25 kN/m, representative of real-world soft environments. Controller performance was quantified using phase portraits and two walking stability metrics, offering a direct assessment of fall risk. Compared to a standard phase-variable controller developed for rigid terrain, the proposed admittance controller consistently improved gait stability across all compliant conditions. These results demonstrate the potential of adaptive, stability-aware prosthesis control to reduce fall risk in real-world environments and advance the robustness of human-prosthesis interaction in rehabilitation robotics.
From Zero to High-Speed Racing: An Autonomous Racing Stack
High-speed, head-to-head autonomous racing presents substantial technical and logistical challenges, including precise localization, rapid perception, dynamic planning, and real-time control-compounded by limited track access and costly hardware. This paper introduces the Autonomous Race Stack (ARS), developed by the IU Luddy Autonomous Racing team for the Indy Autonomous Challenge (IAC). We present three iterations of our ARS, each validated on different tracks and achieving speeds up to 260 km/h. Our contributions include: (i) the modular architecture and evolution of the ARS across ARS1, ARS2, and ARS3; (ii) a detailed performance evaluation that contrasts control, perception, and estimation across oval and road-course environments; and (iii) the release of a high-speed, multi-sensor dataset collected from oval and road-course tracks. Our findings highlight the unique challenges and insights from real-world high-speed full-scale autonomous racing.
AQUILA: A QUIC-Based Link Architecture for Resilient Long-Range UAV Communication
The proliferation of autonomous Unmanned Aerial Vehicles (UAVs) in Beyond Visual Line of Sight (BVLOS) applications is critically dependent on resilient, high-bandwidth, and low-latency communication links. Existing solutions face critical limitations: TCP's head-of-line blocking stalls time-sensitive data, UDP lacks reliability and congestion control, and cellular networks designed for terrestrial users degrade severely for aerial platforms. This paper introduces AQUILA, a cross-layer communication architecture built on QUIC to address these challenges. AQUILA contributes three key innovations: (1) a unified transport layer using QUIC's reliable streams for MAVLink Command and Control (C2) and unreliable datagrams for video, eliminating head-of-line blocking under unified congestion control; (2) a priority scheduling mechanism that structurally ensures C2 latency remains bounded and independent of video traffic intensity; (3) a UAV-adapted congestion control algorithm extending SCReAM with altitude-adaptive delay targeting and telemetry headroom reservation. AQUILA further implements 0-RTT connection resumption to minimize handover blackouts with application-layer replay protection, deployed over an IP-native architecture enabling global operation. Experimental validation demonstrates that AQUILA significantly outperforms TCP- and UDP-based approaches in C2 latency, video quality, and link resilience under realistic conditions, providing a robust foundation for autonomous BVLOS missions.
comment: 13 pages, 10 figures
Dynamic Visual SLAM using a General 3D Prior
Reliable incremental estimation of camera poses and 3D reconstruction is key to enable various applications including robotics, interactive visualization, and augmented reality. However, this task is particularly challenging in dynamic natural environments, where scene dynamics can severely deteriorate camera pose estimation accuracy. In this work, we propose a novel monocular visual SLAM system that can robustly estimate camera poses in dynamic scenes. To this end, we leverage the complementary strengths of geometric patch-based online bundle adjustment and recent feed-forward reconstruction models. Specifically, we propose a feed-forward reconstruction model to precisely filter out dynamic regions, while also utilizing its depth prediction to enhance the robustness of the patch-based visual SLAM. By aligning depth prediction with estimated patches from bundle adjustment, we robustly handle the inherent scale ambiguities of the batch-wise application of the feed-forward reconstruction model.
comment: 8 pages
MagicSkin: Balancing Marker and Markerless Modes in Vision-Based Tactile Sensors with a Translucent Skin ICRA2026
Vision-based tactile sensors (VBTS) face a fundamental trade-off in marker and markerless design on the tactile skin: opaque ink markers enable measurement of force and tangential displacement but completely occlude geometric features necessary for object and texture classification, while markerless skin preserves surface details but struggles in measuring tangential displacements effectively. Current practice to solve the above problem via UV lighting or virtual transfer using learning-based models introduces hardware complexity or computing burdens. This paper introduces MagicSkin, a novel tactile skin with translucent, tinted markers balancing the modes of marker and markerless for VBTS. It enables simultaneous tangential displacement tracking, force prediction, and surface detail preservation. This skin is easy to plug into GelSight-family sensors without requiring additional hardware or software tools. We comprehensively evaluate MagicSkin in downstream tasks. The translucent markers impressively enhance rather than degrade sensing performance compared with traditional markerless and inked marker design: it achieves best performance in object classification (99.17\%), texture classification (93.51\%), tangential displacement tracking (97\% point retention) and force prediction (66\% improvement in total force error). These experimental results demonstrate that translucent skin eliminates the traditional performance trade-off in marker or markerless modes, paving the way for multimodal tactile sensing essential in tactile robotics. See videos at this \href{https://zhuochenn.github.io/MagicSkin_project/}{link}.
comment: Submitted to ICRA2026
db-LaCAM: Fast and Scalable Multi-Robot Kinodynamic Motion Planning with Discontinuity-Bounded Search and Lightweight MAPF
State-of-the-art multi-robot kinodynamic motion planners struggle to handle more than a few robots due to high computational burden, which limits their scalability and results in slow planning time. In this work, we combine the scalability and speed of modern multi-agent path finding (MAPF) algorithms with the dynamic-awareness of kinodynamic planners to address these limitations. To this end, we propose discontinuity-Bounded LaCAM (db-LaCAM), a planner that utilizes a precomputed set of motion primitives that respect robot dynamics to generate horizon-length motion sequences, while allowing a user-defined discontinuity between successive motions. The planner db-LaCAM is resolution-complete with respect to motion primitives and supports arbitrary robot dynamics. Extensive experiments demonstrate that db-LaCAM scales efficiently to scenarios with up to 50 robots, achieving up to ten times faster runtime compared to state-of-the-art planners, while maintaining comparable solution quality. The approach is validated in both 2D and 3D environments with dynamics such as the unicycle and 3D double integrator. We demonstrate the safe execution of trajectories planned with db-LaCAM in two distinct physical experiments involving teams of flying robots and car-with-trailer robots.
Model-Less Feedback Control of Space-based Continuum Manipulators using Backbone Tension Optimization
Continuum manipulators offer intrinsic dexterity and safe geometric compliance for navigation within confined and obstacle-rich environments. However, their infinite-dimensional backbone deformation, unmodeled internal friction, and configuration-dependent stiffness fundamentally limit the reliability of model-based kinematic formulations, resulting in inaccurate Jacobian predictions, artificial singularities, and unstable actuation behavior. Motivated by these limitations, this work presents a complete model-less control framework that bypasses kinematic modeling by using an empirically initialized Jacobian refined online through differential convex updates. Tip motion is generated via a real-time quadratic program that computes actuator increments while enforcing tendon slack avoidance and geometric limits. A backbone tension optimization term is introduced in this paper to regulate axial loading and suppress co-activation compression. The framework is validated across circular, pentagonal, and square trajectories, demonstrating smooth convergence, stable tension evolution, and sub-millimeter steady-state accuracy without any model calibration or parameter identification. These results establish the proposed controller as a scalable alternative to model-dependent continuum manipulation in a constrained environment.
A Novel Deep Neural Network Architecture for Real-Time Water Demand Forecasting
Short-term water demand forecasting (StWDF) is the foundation stone in the derivation of an optimal plan for controlling water supply systems. Deep learning (DL) approaches provide the most accurate solutions for this purpose. However, they suffer from complexity problem due to the massive number of parameters, in addition to the high forecasting error at the extreme points. In this work, an effective method to alleviate the error at these points is proposed. It is based on extending the data by inserting virtual data within the actual data to relieve the nonlinearity around them. To our knowledge, this is the first work that considers the problem related to the extreme points. Moreover, the water demand forecasting model proposed in this work is a novel DL model with relatively low complexity. The basic model uses the gated recurrent unit (GRU) to handle the sequential relationship in the historical demand data, while an unsupervised classification method, K-means, is introduced for the creation of new features to enhance the prediction accuracy with less number of parameters. Real data obtained from two different water plants in China are used to train and verify the model proposed. The prediction results and the comparison with the state-of-the-art illustrate that the method proposed reduces the complexity of the model six times of what achieved in the literature while conserving the same accuracy. Furthermore, it is found that extending the data set significantly reduces the error by about 30%. However, it increases the training time.
FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving
Federated Learning (FL) enables collaborative training of autonomous driving (AD) models across distributed vehicles while preserving data privacy. However, FL encounters critical challenges such as poor generalization and slow convergence due to non-independent and identically distributed (non-IID) data from diverse driving environments. To overcome these obstacles, we introduce Federated Deep Supervision and Regularization (FedDSR), a paradigm that incorporates multi-access intermediate layer supervision and regularization within federated AD system. Specifically, FedDSR comprises following integral strategies: (I) to select multiple intermediate layers based on predefined architecture-agnostic standards. (II) to compute mutual information (MI) and negative entropy (NE) on those selected layers to serve as intermediate loss and regularizer. These terms are integrated into the output-layer loss to form a unified optimization objective, enabling comprehensive optimization across the network hierarchy. (III) to aggregate models from vehicles trained based on aforementioned rules of (I) and (II) to generate the global model on central server. By guiding and penalizing the learning of feature representations at intermediate stages, FedDSR enhances the model generalization and accelerates model convergence for federated AD. We then take the semantic segmentation task as an example to assess FedDSR and apply FedDSR to multiple model architectures and FL algorithms. Extensive experiments demonstrate that FedDSR achieves up to 8.93% improvement in mIoU and 28.57% reduction in training rounds, compared to other FL baselines, making it highly suitable for practical deployment in federated AD ecosystems.
comment: 9 pages
Statistic-Augmented, Decoupled MoE Routing and Aggregating in Autonomous Driving
Autonomous driving (AD) scenarios are inherently complex and diverse, posing significant challenges for a single deep learning model to effectively cover all possible conditions, such as varying weather, traffic densities, and road types. Large Model (LM)-Driven Mixture of Experts (MoE) paradigm offers a promising solution, where LM serves as the backbone to extract latent features while MoE serves as the downstream head to dynamically select and aggregate specialized experts to adapt to different scenarios. However, routing and aggregating in MoE face intrinsic challenges, including imprecise expert selection due to flawed routing strategy and inefficient expert aggregation leading to suboptimal prediction. To address these issues, we propose a statistic-augmented, decoupled MoE }outing and Aggregating Mechanism (MoE-RAM) driven by LM. Specifically, on the one hand, MoE-RAM enhances expert routing by incorporating statistical retrieval mechanism to match LM-extracted latent features with cached prototypical features of the most relevant experts; on the other hand, MoE-RAM adaptively reweights experts' outputs in fusion by measuring statistical distances of experts' instant features against LM-extracted latent features. Benefiting from the synergy of the statistic-augmented MoE's routing and aggregating, MoE-RAM ultimately improves the prediction performance. We take the AD semantic segmentation task as an example to assess the proposed MoE-RAM. Extensive experiments on AD datasets demonstrate the superiority of MoE-RAM compared to other MoE baselines and conventional single-model approaches.
comment: 9 pages
MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment
Embodied imitation learning is constrained by the scarcity of diverse, long-horizon robotic manipulation data. Existing video generation models for this domain are limited to synthesizing short clips of simple actions and often rely on manually defined trajectories. To this end, we introduce MIND-V, a hierarchical framework designed to synthesize physically plausible and logically coherent videos of long-horizon robotic manipulation. Inspired by cognitive science, MIND-V bridges high-level reasoning with pixel-level synthesis through three core components: a Semantic Reasoning Hub (SRH) that leverages a pre-trained vision-language model for task planning; a Behavioral Semantic Bridge (BSB) that translates abstract instructions into domain-invariant representations; and a Motor Video Generator (MVG) for conditional video rendering. MIND-V employs Staged Visual Future Rollouts, a test-time optimization strategy to enhance long-horizon robustness. To align the generated videos with physical laws, we introduce a GRPO reinforcement learning post-training phase guided by a novel Physical Foresight Coherence (PFC) reward. PFC leverages the V-JEPA world model to enforce physical plausibility by aligning the predicted and actual dynamic evolutions in the feature space. MIND-V demonstrates state-of-the-art performance in long-horizon robotic manipulation video generation, establishing a scalable and controllable paradigm for embodied data synthesis.
Robust Optimization-based Autonomous Dynamic Soaring with a Fixed-Wing UAV
Dynamic soaring is a flying technique to exploit the energy available in wind shear layers, enabling potentially unlimited flight without the need for internal energy sources. We propose a framework for autonomous dynamic soaring with a fixed-wing unmanned aerial vehicle (UAV). The framework makes use of an explicit representation of the wind field and a classical approach for guidance and control of the UAV. Robustness to wind field estimation error is achieved by constructing point-wise robust reference paths for dynamic soaring and the development of a robust path following controller for the fixed-wing UAV. The framework is evaluated in dynamic soaring scenarios in simulation and real flight tests. In simulation, we demonstrate robust dynamic soaring flight subject to varied wind conditions, estimation errors and disturbances. Critical components of the framework, including energy predictions and path-following robustness, are further validated in real flights to assure small sim-to-real gap. Together, our results strongly indicate the ability of the proposed framework to achieve autonomous dynamic soaring flight in wind shear.
comment: This work has been submitted to the IEEE for possible publication
A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance
Crowd navigation has garnered considerable research interest in recent years, especially with the proliferating application of deep reinforcement learning (DRL) techniques. Many studies, however, do not sufficiently analyze the relative priorities among evaluation metrics, which compromises the fair assessment of methods with divergent objectives. Furthermore, trajectory-continuity metrics, specifically those requiring $C^2$ smoothness, are rarely incorporated. Current DRL approaches generally prioritize efficiency and proximal comfort, often neglecting trajectory optimization or addressing it only through simplistic, unvalidated smoothness reward. Nevertheless, effective trajectory optimization is essential to ensure naturalness, enhance comfort, and maximize the energy efficiency of any navigation system. To address these gaps, this paper proposes a unified framework that enables the fair and transparent assessment of navigation methods by examining the prioritization and joint evaluation of multiple optimization objectives. We further propose a novel reward-shaping strategy that explicitly emphasizes trajectory-curvature optimization. The resulting trajectory quality and adaptability are significantly enhanced across multi-scale scenarios. Through extensive 2D and 3D experiments, we demonstrate that the proposed method achieves superior performance compared to state-of-the-art approaches.
comment: 8 pages, 6 figures
Safety-Aware Reinforcement Learning for Control via Risk-Sensitive Action-Value Iteration and Quantile Regression
Mainstream approximate action-value iteration reinforcement learning (RL) algorithms suffer from overestimation bias, leading to suboptimal policies in high-variance stochastic environments. Quantile-based action-value iteration methods reduce this bias by learning a distribution of the expected cost-to-go using quantile regression. However, ensuring that the learned policy satisfies safety constraints remains a challenge when these constraints are not explicitly integrated into the RL framework. Existing methods often require complex neural architectures or manual tradeoffs due to combined cost functions. To address this, we propose a risk-regularized quantile-based algorithm integrating Conditional Value-at-Risk (CVaR) to enforce safety without complex architectures. We also provide theoretical guarantees on the contraction properties of the risk-sensitive distributional Bellman operator in Wasserstein space, ensuring convergence to a unique cost distribution. Simulations of a mobile robot in a dynamic reach-avoid task show that our approach leads to more goal successes, fewer collisions, and better safety-performance trade-offs than risk-neutral methods.
comment: 13 pages, 4 figures. Expanded version of a paper to appear in the Proceedings of the 2025 IEEE Conference on Decision and Control (CDC)
MeshA*: Efficient Path Planning With Motion Primitives AAAI-2026
We study a path planning problem where the possible move actions are represented as a finite set of motion primitives aligned with the grid representation of the environment. That is, each primitive corresponds to a short kinodynamically-feasible motion of an agent and is represented as a sequence of the swept cells of a grid. Typically, heuristic search, i.e. A*, is conducted over the lattice induced by these primitives (lattice-based planning) to find a path. However, due to the large branching factor, such search may be inefficient in practice. To this end, we suggest a novel technique rooted in the idea of searching over the grid cells (as in vanilla A*) simultaneously fitting the possible sequences of the motion primitives into these cells. The resultant algorithm, MeshA*, provably preserves the guarantees on completeness and optimality, on the one hand, and is shown to notably outperform conventional lattice-based planning (x1.5-x2 decrease in the runtime), on the other hand.
comment: Accepted to AAAI-2026
Unveiling the Impact of Data and Model Scaling on High-Level Control for Humanoid Robots
Data scaling has long remained a critical bottleneck in robot learning. For humanoid robots, human videos and motion data are abundant and widely available, offering a free and large-scale data source. Besides, the semantics related to the motions enable modality alignment and high-level robot control learning. However, how to effectively mine raw video, extract robot-learnable representations, and leverage them for scalable learning remains an open problem. To address this, we introduce Humanoid-Union, a large-scale dataset generated through an autonomous pipeline, comprising over 260 hours of diverse, high-quality humanoid robot motion data with semantic annotations derived from human motion videos. The dataset can be further expanded via the same pipeline. Building on this data resource, we propose SCHUR, a scalable learning framework designed to explore the impact of large-scale data on high-level control in humanoid robots. Experimental results demonstrate that SCHUR achieves high robot motion generation quality and strong text-motion alignment under data and model scaling, with 37\% reconstruction improvement under MPJPE and 25\% alignment improvement under FID comparing with previous methods. Its effectiveness is further validated through deployment in real-world humanoid robot.
Concerns and Values in Human-Robot Interactions: A Focus on Social Robotics
Robots, as AI with physical instantiation, inhabit our social and physical world, where their actions have both social and physical consequences, posing challenges for researchers when designing social robots. This study starts with a scoping review to identify discussions and potential concerns arising from interactions with robotic systems in the context of healthcare, education, and private homes. Two focus groups of technology ethics experts then validated a comprehensive list of key topics and values in human-robot interaction (HRI) literature in these contexts. These insights were integrated into the HRI Value Compass web tool, to help HRI researchers identify these values in robot design. The tool was evaluated in a pilot study. This work benefits the HRI community by highlighting key concerns in human-robot interactions and providing an instrument to help researchers design robots that align with human values, ensuring future robotic systems adhere to these values in social applications.
comment: 31 pages, 7 figures, 6 tables; 4 appendices
Dribble Master: Learning Agile Humanoid Dribbling Through Legged Locomotion
Humanoid soccer dribbling is a highly challenging task that demands dexterous ball manipulation while maintaining dynamic balance. Traditional rule-based methods often struggle to achieve accurate ball control due to their reliance on fixed walking patterns and limited adaptability to real-time ball dynamics. To address these challenges, we propose a two-stage curriculum learning framework that enables a humanoid robot to acquire dribbling skills without explicit dynamics or predefined trajectories. In the first stage, the robot learns basic locomotion skills; in the second stage, we fine-tune the policy for agile dribbling maneuvers. We further introduce a virtual camera model in simulation that simulates the field of view and perception constraints of the real robot, enabling realistic ball perception during training. We also design heuristic rewards to encourage active sensing, promoting a broader visual range for continuous ball perception. The policy is trained in simulation and successfully transferred to a physical humanoid robot. Experiment results demonstrate that our method enables effective ball manipulation, achieving flexible and visually appealing dribbling behaviors across multiple environments. This work highlights the potential of reinforcement learning in developing agile humanoid soccer robots. Additional details and videos are available at https://zhuoheng0910.github.io/dribble-master/.
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark AAAI 2026
While cooperative perception can overcome the limitations of single-vehicle systems, the practical implementation of vehicle-to-vehicle and vehicle-to-infrastructure systems is often impeded by significant economic barriers. Aerial-ground cooperation (AGC), which pairs ground vehicles with drones, presents a more economically viable and rapidly deployable alternative. However, this emerging field has been held back by a critical lack of high-quality public datasets and benchmarks. To bridge this gap, we present \textit{Griffin}, a comprehensive AGC 3D perception dataset, featuring over 250 dynamic scenes (37k+ frames). It incorporates varied drone altitudes (20-60m), diverse weather conditions, realistic drone dynamics via CARLA-AirSim co-simulation, and critical occlusion-aware 3D annotations. Accompanying the dataset is a unified benchmarking framework for cooperative detection and tracking, with protocols to evaluate communication efficiency, altitude adaptability, and robustness to communication latency, data loss and localization noise. By experiments through different cooperative paradigms, we demonstrate the effectiveness and limitations of current methods and provide crucial insights for future research. The dataset and codes are available at https://github.com/wang-jh18-SVM/Griffin.
comment: Accepted by AAAI 2026
Kinodynamic Motion Planning for Collaborative Object Transportation by Multiple Mobile Manipulators
This work proposes a kinodynamic motion planning technique for collaborative object transportation by multiple mobile manipulators in dynamic environments. A global path planner computes a linear piecewise path from start to goal. A novel algorithm detects the narrow regions between the static obstacles and aids in defining the obstacle-free region to enhance the feasibility of the global path. We then formulate a local online motion planning technique for trajectory generation that minimizes the control efforts in a receding horizon manner. It plans the trajectory for finite time horizons, considering the kinodynamic constraints and the static and dynamic obstacles. The planning technique jointly plans for the mobile bases and the arms to utilize the locomotion capability of the mobile base and the manipulation capability of the arm efficiently. We use a convex cone approach to avoid self-collision of the formation by modifying the mobile manipulators admissible state without imposing additional constraints. Numerical simulations and hardware experiments showcase the efficiency of the proposed approach.
comment: Video: https://youtu.be/LhE_HcK4g-s
Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire additional knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN identifies contextually prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards to train EFN, ensuring that the predicted actions align with past behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience". Experiments across diverse embodied tasks show that EFN improves adaptability, robustness, and success rates over frozen baselines. We provide code and demo in our supplementary material.
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.
LabUtopia: High-Fidelity Simulation and Hierarchical Benchmark for Scientific Embodied Agents NeurIPS 2025
Scientific embodied agents play a crucial role in modern laboratories by automating complex experimental workflows. Compared to typical household environments, laboratory settings impose significantly higher demands on perception of physical-chemical transformations and long-horizon planning, making them an ideal testbed for advancing embodied intelligence. However, its development has been long hampered by the lack of suitable simulator and benchmarks. In this paper, we address this gap by introducing LabUtopia, a comprehensive simulation and benchmarking suite designed to facilitate the development of generalizable, reasoning-capable embodied agents in laboratory settings. Specifically, it integrates i) LabSim, a high-fidelity simulator supporting multi-physics and chemically meaningful interactions; ii) LabScene, a scalable procedural generator for diverse scientific scenes; and iii) LabBench, a hierarchical benchmark spanning five levels of complexity from atomic actions to long-horizon mobile manipulation. LabUtopia supports 30 distinct tasks and includes more than 200 scene and instrument assets, enabling large-scale training and principled evaluation in high-complexity environments. We demonstrate that LabUtopia offers a powerful platform for advancing the integration of perception, planning, and control in scientific-purpose agents and provides a rigorous testbed for exploring the practical capabilities and generalization limits of embodied intelligence in future research.
comment: Accepted by NeurIPS 2025 Dataset and Benchmark Track
Safe MPC Alignment with Human Directional Feedback
In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this article, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy from human online directional feedback. To our knowledge, it is the first method to learn safety constraints from human feedback. The proposed method is based on an empirical observation: human directional feedback, when available, tends to guide the robot toward safer regions. The method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning, or declaring the hypothesis misspecification, i.e., the true safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method in numerical examples and user studies with two simulation games. Additionally, we tested the proposed method on a real-world Franka robot arm performing mobile water-pouring tasks. The results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.
comment: 20 pages, pre-print, submitted to TRO
Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation
Urban driving with connected and automated vehicles (CAVs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. This neglects the significant impact of lateral decisions, such as lane changes, on overall energy efficiency, especially in environments with traffic signals and heterogeneous traffic flow. To address this gap, we propose a novel energy-aware motion planning framework that jointly optimizes longitudinal speed and lateral lane-change decisions using vehicle-to-infrastructure (V2I) communication. Our approach estimates long-term energy costs using a graph-based approximation and solves short-horizon optimal control problems under traffic constraints. Using a data-driven energy model calibrated to an actual battery electric vehicle, we demonstrate with vehicle-in-the-loop experiments that our method reduces motion energy consumption by up to 24 percent compared to a human driver, highlighting the potential of connectivity-enabled planning for sustainable urban autonomy.
comment: Accepted at 2025 IEEE Conference on Decision and Control (CDC25')
Inversely Learning Transferable Rewards via Abstracted States
Inverse reinforcement learning (IRL) has progressed significantly toward accurately learning the underlying rewards in both discrete and continuous domains from behavior data. The next advance is to learn {\em intrinsic} preferences in ways that produce useful behavior in settings or tasks which are different but aligned with the observed ones. In the context of robotic applications, this helps integrate robots into processing lines involving new tasks (with shared intrinsic preferences) without programming from scratch. We introduce a method to inversely learn an abstract reward function from behavior trajectories in two or more differing instances of a domain. The abstract reward function is then used to learn task behavior in another separate instance of the domain. This step offers evidence of its transferability and validates its correctness. We evaluate the method on trajectories in tasks from multiple domains in OpenAI's Gym testbed and AssistiveGym and show that the learned abstract reward functions can successfully learn task behaviors in instances of the respective domains, which have not been seen previously.
Multiagent Systems
Analyzing Collision Rates in Large-Scale Mixed Traffic Control via Multi-Agent Reinforcement Learning
Vehicle collisions remain a major challenge in large-scale mixed traffic systems, especially when human-driven vehicles (HVs) and robotic vehicles (RVs) interact under dynamic and uncertain conditions. Although Multi-Agent Reinforcement Learning (MARL) offers promising capabilities for traffic signal control, ensuring safety in such environments remains difficult. As a direct indicator of traffic risk, the collision rate must be well understood and incorporated into traffic control design. This study investigates the primary factors influencing collision rates in a MARL-governed Mixed Traffic Control (MTC) network. We examine three dimensions: total vehicle count, signalized versus unsignalized intersection configurations, and turning-movement strategies. Through controlled simulation experiments, we evaluate how each factor affects collision likelihood. The results show that collision rates are sensitive to traffic density, the level of signal coordination, and turning-control design. These findings provide practical insights for improving the safety and robustness of MARL-based mixed traffic control systems, supporting the development of intelligent transportation systems in which both efficiency and safety are jointly optimized.
LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms
Large Language Models (LLMs) are increasingly being utilized as autonomous agents, yet their ability to coordinate in distributed systems remains poorly understood. We introduce \textbf{LoopBench}, a benchmark to evaluate LLM reasoning in distributed symmetry breaking and meta-cognitive thinking. The benchmark focuses on coloring odd cycle graphs ($C_3, C_5, C_{11}$) with limited colors, where deterministic, non-communicating agents fail in infinite loops. A strategy passing mechanism is implemented as a form of consistent memory. We show that while standard LLMs and classical heuristics struggle, advanced reasoning models (e.g., O3) devise strategies to escape deadlocks. LoopBench allows the study of emergent distributed algorithms based on language-based reasoning, offering a testbed for collective intelligence.
comment: 11 pages, 3 figures, submitted to ANTS 2026
Lark: Biologically Inspired Neuroevolution for Multi-Stakeholder LLM Agents NeurIPS 2025
We present Lark, a biologically inspired decision-making framework that couples LLM-driven reasoning with an evolutionary, stakeholder-aware Multi-Agent System (MAS). To address verbosity and stakeholder trade-offs, we integrate four mechanisms: (i) plasticity, which applies concise adjustments to candidate solutions; (ii) duplication and maturation, which copy high-performing candidates and specialize them into new modules; (iii) ranked-choice stakeholder aggregation using influence-weighted Borda scoring; and (iv) compute awareness via token-based penalties that reward brevity. The system iteratively proposes diverse strategies, applies plasticity tweaks, simulates stakeholder evaluations, aggregates preferences, selects top candidates, and performs duplication/maturation while factoring compute cost into final scores. In a controlled evaluation over 30 rounds comparing 14 systems, Lark Full achieves a mean rank of 2.55 (95% CI [2.17, 2.93]) and a mean composite score of 29.4/50 (95% CI [26.34, 32.46]), finishing Top-3 in 80% of rounds while remaining cost competitive with leading commercial models ($0.016 per task). Paired Wilcoxon tests confirm that all four mechanisms contribute significantly as ablating duplication/maturation yields the largest deficit (ΔScore = 3.5, Cohen's d_z = 2.53, p < 0.001), followed by plasticity (ΔScore = 3.4, d_z = 1.86), ranked-choice voting (ΔScore = 2.4, d_z = 1.20), and token penalties (ΔScore = 2.2, d_z = 1.63). Rather than a formal Markov Decision Process with constrained optimization, Lark is a practical, compute-aware neuroevolutionary loop that scales stakeholder-aligned strategy generation and makes trade-offs transparent through per-step metrics. Our work presents proof-of-concept findings and invites community feedback as we expand toward real-world validation studies.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Workshop: NeurIPS 2025 Workshop on Efficient Reasoning
Optimizing Day-Ahead Energy Trading with Proximal Policy Optimization and Blockchain
The increasing penetration of renewable energy sources in day-ahead energy markets introduces challenges in balancing supply and demand, ensuring grid resilience, and maintaining trust in decentralized trading systems. This paper proposes a novel framework that integrates the Proximal Policy Optimization (PPO) algorithm, a state-of-the-art reinforcement learning method, with blockchain technology to optimize automated trading strategies for prosumers in day-ahead energy markets. We introduce a comprehensive framework that employs RL agent for multi-objective energy optimization and blockchain for tamper-proof data and transaction management. Simulations using real-world data from the Electricity Reliability Council of Texas (ERCOT) demonstrate the effectiveness of our approach. The RL agent achieves demand-supply balancing within 2\% and maintains near-optimal supply costs for the majority of the operating hours. Moreover, it generates robust battery storage policies capable of handling variability in solar and wind generation. All decisions are recorded on an Algorand-based blockchain, ensuring transparency, auditability, and security - key enablers for trustworthy multi-agent energy trading. Our contributions include a novel system architecture, curriculum learning for robust agent development, and actionable policy insights for practical deployment.
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Reinforcement learning (RL) is the dominant paradigm for sharpening strategic tool use capabilities of LLMs on long-horizon, sparsely-rewarded agent tasks, yet it faces a fundamental challenge of exploration-exploitation trade-off. Existing studies stimulate exploration through the lens of policy entropy, but such mechanical entropy maximization is prone to RL instability due to the multi-turn distribution shifting. In this paper, we target the progressive exploration-exploitation balance under the guidance of the agent's own experiences without succumbing to either entropy collapsing or runaway divergence. We propose SPEAR, a self-imitation learning (SIL) recipe for training agentic LLMs. It extends the vanilla SIL, where a replay buffer stores good experience for off-policy update, by gradually steering the policy entropy across stages. Specifically, the proposed curriculum scheduling harmonizes intrinsic reward shaping and self-imitation to 1) expedite exploration via frequent tool interactions at the beginning, and 2) strengthen exploitation of successful tactics upon convergence towards familiarity with the environment. We also combine bag-of-tricks of industrial RL optimizations for a strong baseline Dr.BoT to demonstrate our effectiveness. In ALFWorld and WebShop, SPEAR increases the success rates of GRPO/GiGPO/Dr.BoT by up to 16.1%/5.1%/8.6% and 20.7%/11.8%/13.9%, respectively. In AIME24 and AIME25, SPEAR boosts Dr.BoT by up to 3.8% and 6.1%, respectively. Such gains incur only 10%-25% extra theoretical complexity and negligible runtime overhead in practice, demonstrating the plug-and-play scalability of SPEAR.
comment: 45 pages, 14 figures
Kinodynamic Motion Planning for Collaborative Object Transportation by Multiple Mobile Manipulators
This work proposes a kinodynamic motion planning technique for collaborative object transportation by multiple mobile manipulators in dynamic environments. A global path planner computes a linear piecewise path from start to goal. A novel algorithm detects the narrow regions between the static obstacles and aids in defining the obstacle-free region to enhance the feasibility of the global path. We then formulate a local online motion planning technique for trajectory generation that minimizes the control efforts in a receding horizon manner. It plans the trajectory for finite time horizons, considering the kinodynamic constraints and the static and dynamic obstacles. The planning technique jointly plans for the mobile bases and the arms to utilize the locomotion capability of the mobile base and the manipulation capability of the arm efficiently. We use a convex cone approach to avoid self-collision of the formation by modifying the mobile manipulators admissible state without imposing additional constraints. Numerical simulations and hardware experiments showcase the efficiency of the proposed approach.
comment: Video: https://youtu.be/LhE_HcK4g-s
Systems and Control (EESS)
Green O-RAN Operation: a Modern ML-Driven Network Energy Consumption Optimisation
The increasing energy demand of next-generation mobile networks, especially 6G, is becoming a major concern, particularly due to the high power usage of base station components RU, which often remain active even during low traffic periods. To tackle this challenge, our study focuses on improving energy efficiency in O-RAN systems using intelligent control strategies. TD3 leverages a continuous action space to overcome the limitations of traditional discrete-action methods like DQN. By avoiding exponential growth in action space, TD3 enables more precise control of RU sleep modes in dense and large radio environments. Simulation results show that our approach consistently achieves over 50% energy savings compared to the always-on baseline, with TD3 outperforming DQN-based methods by up to 6%, while also offering better stability and faster convergence.
comment: 6 pages. Accepted by the IEEE Globecom 2025
Synergies between AI Computing and Power Systems: Metrics, Scheduling, and Resilience
In this paper, we first clarify the concepts of green AI versus frugal AI, positioning frugality as efficiency by design and green AI as transparency and accountability. We then argue that these approaches, while complementary, are insufficient without a shared quantitative foundation that links AI computing to power system contexts. This motivates the development of standardized carbon metrics as a bridge between algorithmic decisions and their physical consequences. We next embed these signals into scheduling and planning frameworks, presenting two architectures: (i) an iterative signal-response loop for real-time operations, and (ii) an integrated optimization that learns and encodes flexible-load behavior for long-term planning. Finally, we show how the same coordination stack supports resilience, enabling signals to shift from emissions-first to stability-first during stress events, providing targeted relief and faster restoration.
LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding NeurIPS 2025
Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -- remains underexplored and often requires manual design. We formalize this challenge as a problem of composite neural architecture search (NAS), where multiple source-specific modules and a fusion module are jointly optimized. Existing NAS methods overlook useful side information from the intermediate outputs of these modules -- such as their representation quality -- limiting sample efficiency in multi-source RL settings. To address this, we propose an LLM-driven NAS pipeline that leverages language-model priors and intermediate-output signals to guide sample-efficient search for high-performing composite state encoders. On a mixed-autonomy traffic control task, our approach discovers higher-performing architectures with fewer candidate evaluations than traditional NAS baselines and the LLM-based GENIUS framework.
comment: NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning
Joint Learning of Feasibility-Aware Signal Temporal Logic and BarrierNet for Robust and Correct Control
Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled the specification-driven synthesis of complex robotic behaviors. However, existing CBF-STL approaches typically rely on fixed hyperparameters and myopic, per-time step optimization, which can lead to overly conservative behavior, infeasibility near tight input limits, and difficulty satisfying long-horizon STL tasks. To address these limitations, we propose a feasibility-aware learning framework that embeds trainable, time-varying High Order Control Barrier Functions (HOCBFs) into a differentiable Quadratic Program (dQP). Our approach provides a systematic procedure for constructing time-varying HOCBF constraints for a broad fragment of STL and introduces a unified robustness measure that jointly captures STL satisfaction, QP feasibility, and control-bound compliance. Three neural networks-InitNet, RefNet, and an extended BarrierNet-collaborate to generate reference inputs and adapt constraint-related hyperparameters automatically over time and across initial conditions, reducing conservativeness while maximizing robustness. The resulting controller achieves STL satisfaction with strictly feasible dQPs and requires no manual tuning. Simulation results demonstrate that the proposed framework maintains high STL robustness under tight input bounds and significantly outperforms fixed-parameter and non-adaptive baselines in complex environments.
comment: 16 pages, 11 figures
Control of Powered Ankle-Foot Prostheses on Compliant Terrain: A Quantitative Approach to Stability Enhancement
Walking on compliant terrain presents a substantial challenge for individuals with lower-limb amputation, further elevating their already high risk of falling. While powered ankle-foot prostheses have demonstrated adaptability across speeds and rigid terrains, control strategies optimized for soft or compliant surfaces remain underexplored. This work experimentally validates an admittance-based control strategy that dynamically adjusts the quasi-stiffness of powered prostheses to enhance gait stability on compliant ground. Human subject experiments were conducted with three healthy individuals walking on two bilaterally compliant surfaces with ground stiffness values of 63 and 25 kN/m, representative of real-world soft environments. Controller performance was quantified using phase portraits and two walking stability metrics, offering a direct assessment of fall risk. Compared to a standard phase-variable controller developed for rigid terrain, the proposed admittance controller consistently improved gait stability across all compliant conditions. These results demonstrate the potential of adaptive, stability-aware prosthesis control to reduce fall risk in real-world environments and advance the robustness of human-prosthesis interaction in rehabilitation robotics.
Bridging Abstraction-Based Hierarchical Control and Moment Matching: A Conceptual Unification
In this paper, we establish a relation between approximate-simulation-based hierarchical control (ASHC) and moment matching techniques, and build a conceptual bridge between these two frameworks. To this end, we study the two key requirements of the ASHC technique, namely the bounded output discrepancy and the $M$-relation, through the lens of moment matching. We show that, in the linear time-invariant case, both requirements can be interpreted in the moment matching perspective through certain system interconnection structures. Building this conceptual bridge provides a foundation for cross-pollination of ideas between these two frameworks.
comment: 8 pages, accepted by 64th IEEE Conference on Decision and Control (CDC 2025)
A Physics-Aware Attention LSTM Autoencoder for Early Fault Diagnosis of Battery Systems
Battery safety is paramount for electric vehicles. Early fault diagnosis remains a challenge due to the subtle nature of anomalies and the interference of dynamic operating noise. Existing data-driven methods often suffer from "physical blindness" leading to missed detections or false alarms. To address this, we propose a Physics-Aware Attention LSTM Autoencoder (PA-ALSTM-AE). This novel framework explicitly integrates battery aging laws (mileage) into the deep learning pipeline through a multi-stage fusion mechanism. Specifically, an adaptive physical feature construction module selects mileage-sensitive features, and a physics-guided latent fusion module dynamically calibrates the memory cells of the LSTM based on the aging state. Extensive experiments on the large-scale Vloong real-world dataset demonstrate that the proposed method significantly outperforms state-of-the-art baselines. Notably, it improves the recall rate of early faults by over 3 times while maintaining high precision, offering a robust solution for industrial battery management systems.
comment: 5 pages, 7 figures
Distributed Traffic State Estimation in V2X-Enabled Connected Vehicle Networks
This paper presents a distributed traffic state estimation framework in which infrastructure sensors and connected vehicles act as autonomous, cooperative sensing nodes. These nodes share local traffic estimates with nearby nodes using Vehicle-to-Everything (V2X) communication. The proposed estimation algorithm uses a distributed Kalman filter tailored to a second-order macroscopic traffic flow model. To achieve global state awareness, the algorithm employs a consensus protocol to fuse heterogeneous spatiotemporal estimates from V2X neighbors and applies explicit projection steps to maintain physical consistency in density and flow estimates. The algorithm's performance is validated through microscopic simulations of a highway segment experiencing transient congestion. Results demonstrate that the proposed distributed estimator accurately reconstructs nonlinear shockwave dynamics, even with sparse infrastructure sensors and intermittent vehicular network connectivity. Statistical analysis explores how different connected vehicle penetration rates affect estimation accuracy, revealing notable phase transitions in network observability.
comment: 8 pages, 4 figures, submitted to IFAC World Congress 2026
Model-Less Feedback Control of Space-based Continuum Manipulators using Backbone Tension Optimization
Continuum manipulators offer intrinsic dexterity and safe geometric compliance for navigation within confined and obstacle-rich environments. However, their infinite-dimensional backbone deformation, unmodeled internal friction, and configuration-dependent stiffness fundamentally limit the reliability of model-based kinematic formulations, resulting in inaccurate Jacobian predictions, artificial singularities, and unstable actuation behavior. Motivated by these limitations, this work presents a complete model-less control framework that bypasses kinematic modeling by using an empirically initialized Jacobian refined online through differential convex updates. Tip motion is generated via a real-time quadratic program that computes actuator increments while enforcing tendon slack avoidance and geometric limits. A backbone tension optimization term is introduced in this paper to regulate axial loading and suppress co-activation compression. The framework is validated across circular, pentagonal, and square trajectories, demonstrating smooth convergence, stable tension evolution, and sub-millimeter steady-state accuracy without any model calibration or parameter identification. These results establish the proposed controller as a scalable alternative to model-dependent continuum manipulation in a constrained environment.
Symmetry-Based Formation Control on Cycle Graphs Using Dihedral Point Groups
This work develops a symmetry-based framework for formation control on cycle graphs using Dihedral point-group constraints. We show that enforcing inter-agent reflection symmetries, together with anchoring a single designated agent to its prescribed mirror axis, is sufficient to realize every $\mathcal{C}_{nv}$-symmetric configuration using only $n-1$ communication links. The resulting control laws have a matrix-weighted Laplacian structure and guarantee exponential convergence to the desired symmetric configuration. Furthermore, we extend the method to enable coordinated maneuvers along a time-varying reference trajectory. Simulation results are provided to support the theoretical analysis.
From Forecast to Action: A Deep Learning Model for Predicting Power Outages During Tropical Cyclones
Power outages caused by tropical cyclones (TCs) pose serious risks to electric power systems and the communities they serve. Accurate, high-resolution outage forecasting is essential for enabling both proactive mitigation planning and real-time emergency response. This study introduces the SpatioTemporal Outage ForeCAST (STO-CAST) model, a deep learning framework developed for real-time, regional-scale outage prediction during TC events with high-resolution outputs in both space and time. STO-CAST integrates static environmental and infrastructure attributes with dynamic meteorological and outage sequences using gated recurrent units (GRUs) and fully connected layers, and is trained via a Leave-One-Storm-Out (LOSO) cross-validation strategy along with holdout grid experiments to demonstrate its preliminary generalization capability to unseen storms and grids. The model produces hourly outage forecasts at a 4 km * 4 km resolution and supports dual forecasting modes: short-term nowcasting with a 6-hour lead time via assimilation of real-time observations, and long-term forecasting with a 60-hour lead time based on evolving meteorological projections. A case study on Typhoon Muifa (2022) demonstrates STO-CAST's operational effectiveness, including error decomposition across model design, meteorological uncertainty, and observation gaps, while highlighting the value of real-time data assimilation and the model's capacity to identify evolving outage hotspots. STO-CAST offers a scalable, data-driven solution to support risk-informed emergency response and enhance power system resilience under intensifying TC threats.
comment: 24 pages, 15 figures, 4 tables. Under review
Data-driven functional state estimation of complex networks
The internal state of a dynamical system, a set of variables that defines its evolving configuration, is often hidden and cannot be fully measured, posing a central challenge for real-time monitoring and control. While observers are designed to estimate these latent states from sensor outputs, their classical designs rely on precise system models, which are often unattainable for complex network systems. Here, we introduce a data-driven framework for estimating a targeted set of state variables, known as functional observers, without identifying the model parameters. We establish a fundamental functional observability criterion based on historical trajectories that guarantees the existence of such observers. We then develop methods to construct observers using either input-output data or partial state data. These observers match or exceed the performance of model-based counterparts while remaining applicable even to unobservable systems. The framework incorporates noise mitigation and can be easily extended to nonlinear networks via Koopman embeddings. We demonstrate its broad utility through applications including sensor fault detection in water networks, load-frequency control in power grids, and target estimation in nonlinear neuronal systems. Our work provides a practical route for real-time target state inference in complex systems where models are unavailable.
Sliding-Mode Control Strategies for PMSM: Benchmarking and Comparative Simulation Study
Permanent Magnet Synchronous Motors (PMSMs) are widely employed in high-performance drive systems owing to their high efficiency and power density. However, nonlinear dynamics, parameter uncertainties, and load disturbances complicate their control. Sliding-Mode Control (SMC) offers strong robustness but exists in numerous variants with unstandardized evaluation criteria. This paper presents a unified simulation benchmark and comparative analysis of six representative SMC techniques for PMSM speed regulation: conventional, integral, terminal, fractional-order, adaptive, and super-twisting. A standardized PMSM model, disturbance profile, and tuning protocol are adopted to ensure fair comparison across all methods. Performance is assessed through time-domain responses, integral error indices (ISE, IAE, ITSE, ITAE), and control-effort profiles, while also examining computational complexity and implementation feasibility. Results demonstrate that adaptive and higher-order SMCs, particularly the super-twisting and adaptive variants, achieve the most balanced trade-off between robustness, smoothness, and computational cost. The study provides a reproducible benchmarking framework, parameter-selection guidelines, and practical insights for designing efficient, low-chatter SMC-based PMSM drives suitable for real-time embedded implementation.
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: 33 pages, 7 figures
Wholesale Market Participation via Competitive DER Aggregation
We consider the aggregation of distributed energy resources (DERs), such as solar PV, energy storage, and flexible loads, by a profit-seeking aggregator participating directly in the wholesale market under distribution network access constraints. We propose a competitive DER aggregator (DERA) model that directly controls local DERs to maximize its profits, while ensuring each aggregated customer gains a surplus higher than their surplus under the regulated retail tariff. The DERA participates in the wholesale electricity market as virtual storage with optimized generation offers and consumption bids derived from the propoed competitive aggregation model. Also derived are DERA's bid curves for the distribution network access and DERA's profitability when competing with the regulated retail tariff. We show that, with the same distribution network access, the proposed DERA's wholesale market participation achieves the same welfare-maximizing outcome as when its customers participate directly in the wholesale market. Extensive numerical studies compare the proposed DERA with existing methods in terms of customer surplus and DERA profit. We empirically evaluate how many DERAs can survive in the competition at long-run equilibrium, and assess the impacts of DER adoption levels and distribution network access on short-run operations.
On the $H$-property for Step-graphons: The Residual Case
We investigate the $H$-property for step-graphons. Specifically, we sample graphs $G_n$ on $n$ nodes from a step-graphon and evaluate the probability that $G_n$ has a Hamiltonian decomposition in the asymptotic regime as $n\to\infty$. It has been shown that for almost all step-graphons, this probability converges to either zero or one. We focus in this paper on the residual case where the zero-one law does not apply. We show that the limit of the probability still exists and provide an explicit expression of it. We present a complete proof of the result and validate it through numerical studies.
Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation
Urban driving with connected and automated vehicles (CAVs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. This neglects the significant impact of lateral decisions, such as lane changes, on overall energy efficiency, especially in environments with traffic signals and heterogeneous traffic flow. To address this gap, we propose a novel energy-aware motion planning framework that jointly optimizes longitudinal speed and lateral lane-change decisions using vehicle-to-infrastructure (V2I) communication. Our approach estimates long-term energy costs using a graph-based approximation and solves short-horizon optimal control problems under traffic constraints. Using a data-driven energy model calibrated to an actual battery electric vehicle, we demonstrate with vehicle-in-the-loop experiments that our method reduces motion energy consumption by up to 24 percent compared to a human driver, highlighting the potential of connectivity-enabled planning for sustainable urban autonomy.
comment: Accepted at 2025 IEEE Conference on Decision and Control (CDC25')
Evaluation of Large Language Models for Numeric Anomaly Detection in Power Systems
Large language models (LLMs) have gained increasing attention in power grids for their general-purpose capabilities. Meanwhile, anomaly detection (AD) remains critical for grid resilience, requiring accurate and interpretable decisions based on multivariate telemetry. Yet the performance of LLMs on large-scale numeric data for AD remains largely unexplored. This paper presents a comprehensive evaluation of LLMs for numeric AD in power systems. We use GPT-OSS-20B as a representative model and evaluate it on the IEEE 14-bus system. A standardized prompt framework is applied across zero-shot, few-shot, in-context learning, low rank adaptation (LoRA), fine-tuning, and a hybrid LLM-traditional approach. We adopt a rule-aware design based on the three-sigma criterion, and report detection performance and rationale quality. This study lays the groundwork for further investigation into the limitations and capabilities of LLM-based AD and its integration with classical detectors in cyber-physical power grid applications.
Robotics
Error-Centric PID Untrained Neural-Net (EC-PIDUNN) For Nonlinear Robotics Control
Classical Proportional-Integral-Derivative (PID) control has been widely successful across various industrial systems such as chemical processes, robotics, and power systems. However, as these systems evolved, the increase in the nonlinear dynamics and the complexity of interconnected variables have posed challenges that classical PID cannot effectively handle, often leading to instability, overshooting, or prolonged settling times. Researchers have proposed PIDNN models that combine the function approximation capabilities of neural networks with PID control to tackle these nonlinear challenges. However, these models require extensive, highly refined training data and have significant computational costs, making them less favorable for real-world applications. In this paper, We propose a novel EC-PIDUNN architecture, which integrates an untrained neural network with an improved PID controller, incorporating a stabilizing factor (\(τ\)) to generate the control signal. Like classical PID, our architecture uses the steady-state error \(e_t\) as input bypassing the need for explicit knowledge of the systems dynamics. By forming an input vector from \(e_t\) within the neural network, we increase the dimensionality of input allowing for richer data representation. Additionally, we introduce a vector of parameters \( ρ_t \) to shape the output trajectory and a \textit{dynamic compute} function to adjust the PID coefficients from predefined values. We validate the effectiveness of EC-PIDUNN on multiple nonlinear robotics applications: (1) nonlinear unmanned ground vehicle systems that represent the Ackermann steering mechanism and kinematics control, (2) Pan-Tilt movement system. In both tests, it outperforms classical PID in convergence and stability achieving a nearly critically damped response.
comment: Under review at SoftComputing
Deep Neural Network-Based Aerial Transport in the Presence of Cooperative and Uncooperative UAS
We present a resilient deep neural network (DNN) framework for decentralized transport and coverage using uncrewed aerial systems (UAS) operating in $\mathbb{R}^n$. The proposed DNN-based mass-transport architecture constructs a layered inter-UAS communication graph from an initial formation, assigns time-varying communication weights through a forward scheduling mechanism that guides the team from the initial to the final configuration, and ensures stability and convergence of the resulting multi-agent transport dynamics. The framework is explicitly designed to remain robust in the presence of uncooperative agents that deviate from or refuse to follow the prescribed protocol. Our method preserves a fixed feed-forward topology but dynamically prunes edges to uncooperative agents, maintains convex, feedforward mentoring among cooperative agents, and computes global desired set points through a sparse linear relation consistent with leader references. The target set is abstracted by $N$ points that become final desired positions, enabling coverage-optimal transport while keeping computation low and guarantees intact. Extensive simulations demonstrate that, under full cooperation, all agents converge rapidly to the target zone with a 10\% boundary margin and under partial cooperation with uncooperative agents, the system maintains high convergence among cooperative agents with performance degradation localized near the disruptions, evidencing graceful resilience and scalability. These results confirm that forward-weight scheduling, hierarchical mentor--mentee coordination, and on-the-fly DNN restructuring yield robust, provably stable UAS transport in realistic fault scenarios.
Learning Agile Striker Skills for Humanoid Soccer Robots from Noisy Sensory Input
Learning fast and robust ball-kicking skills is a critical capability for humanoid soccer robots, yet it remains a challenging problem due to the need for rapid leg swings, postural stability on a single support foot, and robustness under noisy sensory input and external perturbations (e.g., opponents). This paper presents a reinforcement learning (RL)-based system that enables humanoid robots to execute robust continual ball-kicking with adaptability to different ball-goal configurations. The system extends a typical teacher-student training framework -- in which a "teacher" policy is trained with ground truth state information and the "student" learns to mimic it with noisy, imperfect sensing -- by including four training stages: (1) long-distance ball chasing (teacher); (2) directional kicking (teacher); (3) teacher policy distillation (student); and (4) student adaptation and refinement (student). Key design elements -- including tailored reward functions, realistic noise modeling, and online constrained RL for adaptation and refinement -- are critical for closing the sim-to-real gap and sustaining performance under perceptual uncertainty. Extensive evaluations in both simulation and on a real robot demonstrate strong kicking accuracy and goal-scoring success across diverse ball-goal configurations. Ablation studies further highlight the necessity of the constrained RL, noise modeling, and the adaptation stage. This work presents a system for learning robust continual humanoid ball-kicking under imperfect perception, establishing a benchmark task for visuomotor skill learning in humanoid whole-body control.
Embodied Referring Expression Comprehension in Human-Robot Interaction
As robots enter human workspaces, there is a crucial need for them to comprehend embodied human instructions, enabling intuitive and fluent human-robot interaction (HRI). However, accurate comprehension is challenging due to a lack of large-scale datasets that capture natural embodied interactions in diverse HRI settings. Existing datasets suffer from perspective bias, single-view collection, inadequate coverage of nonverbal gestures, and a predominant focus on indoor environments. To address these issues, we present the Refer360 dataset, a large-scale dataset of embodied verbal and nonverbal interactions collected across diverse viewpoints in both indoor and outdoor settings. Additionally, we introduce MuRes, a multimodal guided residual module designed to improve embodied referring expression comprehension. MuRes acts as an information bottleneck, extracting salient modality-specific signals and reinforcing them into pre-trained representations to form complementary features for downstream tasks. We conduct extensive experiments on four HRI datasets, including the Refer360 dataset, and demonstrate that current multimodal models fail to capture embodied interactions comprehensively; however, augmenting them with MuRes consistently improves performance. These findings establish Refer360 as a valuable benchmark and exhibit the potential of guided residual learning to advance embodied referring expression comprehension in robots operating within human environments.
comment: 14 pages, 7 figures, accepted at the ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2026
TacFinRay: Soft Tactile Fin-Ray Finger with Indirect Tactile Sensing for Robust Grasping
We present a tactile-sensorized Fin-Ray finger that enables simultaneous detection of contact location and indentation depth through an indirect sensing approach. A hinge mechanism is integrated between the soft Fin-Ray structure and a rigid sensing module, allowing deformation and translation information to be transferred to a bottom crossbeam upon which are an array of marker-tipped pins based on the biomimetic structure of the TacTip vision-based tactile sensor. Deformation patterns captured by an internal camera are processed using a convolutional neural network to infer contact conditions without directly sensing the finger surface. The finger design was optimized by varying pin configurations and hinge orientations, achieving 0.1\,mm depth and 2mm location-sensing accuracies. The perception demonstrated robust generalization to various indenter shapes and sizes, which was applied to a pick-and-place task under uncertain picking positions, where the tactile feedback significantly improved placement accuracy. Overall, this work provides a lightweight, flexible, and scalable tactile sensing solution suitable for soft robotic structures where the sensing needs situating away from the contact interface.
comment: Accepted in IEEE Robotics Automation Letters. S. Nam, B. Deng co-first authors
Vision-Guided Grasp Planning for Prosthetic Hands in Unstructured Environments
Recent advancements in prosthetic technology have increasingly focused on enhancing dexterity and autonomy through intelligent control systems. Vision-based approaches offer promising results for enabling prosthetic hands to interact more naturally with diverse objects in dynamic environments. Building on this foundation, the paper presents a vision-guided grasping algorithm for a prosthetic hand, integrating perception, planning, and control for dexterous manipulation. A camera mounted on the set up captures the scene, and a Bounding Volume Hierarchy (BVH)-based vision algorithm is employed to segment an object for grasping and define its bounding box. Grasp contact points are then computed by generating candidate trajectories using Rapidly-exploring Random Tree Star algorithm, and selecting fingertip end poses based on the minimum Euclidean distance between these trajectories and the objects point cloud. Each finger grasp pose is determined independently, enabling adaptive, object-specific configurations. Damped Least Square (DLS) based Inverse kinematics solver is used to compute the corresponding joint angles, which are subsequently transmitted to the finger actuators for execution. This modular pipeline enables per-finger grasp planning and supports real-time adaptability in unstructured environments. The proposed method is validated in simulation, and experimental integration on a Linker Hand O7 platform.
Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion
The subject of this research is the development of an intelligent, integrated framework for the automated inspection of photovoltaic (PV) infrastructure that addresses the critical shortcomings of conventional methods, including thermal palette bias, data redundancy, and high communication bandwidth requirements. The goal of this study is to design, develop, and validate a comprehensive, multi-modal system that fully automates the monitoring workflow, from data acquisition to the generation of actionable, geo-located maintenance alerts, thereby enhancing plant safety and operational efficiency. The methods employed involve a synergistic architecture that begins with a palette-invariant thermal embedding, learned by enforcing representational consistency, which is fused with a contrast-normalized RGB stream via a gated mechanism. This is supplemented by a closed-loop, adaptive re-acquisition controller that uses Rodrigues-based updates for targeted confirmation of ambiguous anomalies and a geospatial deduplication module that clusters redundant alerts using DBSCAN over the haversine distance. In conclusion, this study establishes a powerful new paradigm for proactive PV inspection, with the proposed system achieving a mean Average Precision (mAP@0.5) of 0.903 on the public PVF-10 benchmark, a significant 12-15% improvement over single-modality baselines. Field validation confirmed the system's readiness, achieving 96% recall, while the de-duplication process reduced duplicate-induced false positives by 15-20%, and relevance-only telemetry cut airborne data transmission by 60-70%.
Entropy-Controlled Intrinsic Motivation Reinforcement Learning for Quadruped Robot Locomotion in Complex Terrains
Learning is the basis of both biological and artificial systems when it comes to mimicking intelligent behaviors. From the classical PPO (Proximal Policy Optimization), there is a series of deep reinforcement learning algorithms which are widely used in training locomotion policies for quadrupedal robots because of their stability and sample efficiency. However, among all these variants, experiments and simulations often converge prematurely, leading to suboptimal locomotion and reduced task performance. Therefore, in this paper, we introduce Entropy-Controlled Intrinsic Motivation (ECIM), an entropy-based reinforcement learning algorithm in contrast with the PPO series, that can reduce premature convergence by combining intrinsic motivation with adaptive exploration. For experiments, in order to parallel with other baselines, we chose to apply it in Isaac Gym across six terrain categories: upward slopes, downward slopes, uneven rough terrain, ascending stairs, descending stairs, and flat ground as widely used. For comparison, our experiments consistently achieve better performance: task rewards increase by 4--12%, peak body pitch oscillation is reduced by 23--29%, joint acceleration decreases by 20--32%, and joint torque consumption declines by 11--20%. Overall, our model ECIM, by combining entropy control and intrinsic motivation control, achieves better results in stability across different terrains for quadrupedal locomotion, and at the same time reduces energetic cost and makes it a practical choice for complex robotic control tasks.
Fault Tolerant Control of Mecanum Wheeled Mobile Robots
Mecanum wheeled mobile robots (MWMRs) are highly susceptible to actuator faults that degrade performance and risk mission failure. Current fault tolerant control (FTC) schemes for MWMRs target complete actuator failures like motor stall, ignoring partial faults e.g., in torque degradation. We propose an FTC strategy handling both fault types, where we adopt posterior probability to learn real-time fault parameters. We derive the FTC law by aggregating probability-weighed control laws corresponding to predefined faults. This ensures the robustness and safety of MWMR control despite varying levels of fault occurrence. Simulation results demonstrate the effectiveness of our FTC under diverse scenarios.
Leveraging Port-Hamiltonian Theory for Impedance Control Benchmarking
This work proposes PH-based metrics for benchmarking impedance control. A causality-consistent PH model is introduced for mass-spring-damper impedance in Cartesian space. Based on this model, a differentiable, force-torque sensing-independent, n-DoF passivity condition is derived, valid for time-varying references. An impedance fidelity metric is also defined from step-response power in free motion, capturing dynamic decoupling. The proposed metrics are validated in Gazebo simulations with a six-DoF manipulator and a quadruped leg. Results demonstrate the suitability of the PH framework for standardized impedance control benchmarking.
comment: This is the author's version of the paper accepted for publication in the 2025 International Conference on Advanced Robotics (ICAR). The final version will be available at IEEE Xplore
Beyond Model Jailbreak: Systematic Dissection of the "Ten DeadlySins" in Embodied Intelligence
Embodied AI systems integrate language models with real world sensing, mobility, and cloud connected mobile apps. Yet while model jailbreaks have drawn significant attention, the broader system stack of embodied intelligence remains largely unexplored. In this work, we conduct the first holistic security analysis of the Unitree Go2 platform and uncover ten cross layer vulnerabilities the "Ten Sins of Embodied AI Security." Using BLE sniffing, traffic interception, APK reverse engineering, cloud API testing, and hardware probing, we identify systemic weaknesses across three architectural layers: wireless provisioning, core modules, and external interfaces. These include hard coded keys, predictable handshake tokens, WiFi credential leakage, missing TLS validation, static SSH password, multilingual safety bypass behavior, insecure local relay channels, weak binding logic, and unrestricted firmware access. Together, they allow adversaries to hijack devices, inject arbitrary commands, extract sensitive information, or gain full physical control.Our findings show that securing embodied AI requires far more than aligning the model itself. We conclude with system level lessons learned and recommendations for building embodied platforms that remain robust across their entire software hardware ecosystem.
Proportional integral derivative booster for neural networks-based time-series prediction: Case of water demand prediction
Multi-step time-series prediction is an essential supportive step for decision-makers in several industrial areas. Artificial intelligence techniques, which use a neural network component in various forms, have recently frequently been used to accomplish this step. However, the complexity of the neural network structure still stands up as a critical problem against prediction accuracy. In this paper, a method inspired by the proportional-integral-derivative (PID) control approach is investigated to enhance the performance of neural network models used for multi-step ahead prediction of periodic time-series information while maintaining a negligible impact on the complexity of the system. The PID-based method is applied to the predicted value at each time step to bring that value closer to the real value. The water demand forecasting problem is considered as a case study, where two deep neural network models from the literature are used to prove the effectiveness of the proposed boosting method. Furthermore, to prove the applicability of this PID-based booster to other types of periodic time-series prediction problems, it is applied to enhance the accuracy of a neural network model used for multi-step forecasting of hourly energy consumption. The comparison between the results of the original prediction models and the results after using the proposed technique demonstrates the superiority of the proposed method in terms of prediction accuracy and system complexity.
comment: Engineering Applications of Artificial Intelligence 2022
Safe Model Predictive Diffusion with Shielding
Generating safe, kinodynamically feasible, and optimal trajectories for complex robotic systems is a central challenge in robotics. This paper presents Safe Model Predictive Diffusion (Safe MPD), a training-free diffusion planner that unifies a model-based diffusion framework with a safety shield to generate trajectories that are both kinodynamically feasible and safe by construction. By enforcing feasibility and safety on all samples during the denoising process, our method avoids the common pitfalls of post-processing corrections, such as computational intractability and loss of feasibility. We validate our approach on challenging non-convex planning problems, including kinematic and acceleration-controlled tractor-trailer systems. The results show that it substantially outperforms existing safety strategies in success rate and safety, while achieving sub-second computation times.
comment: Project page: https://www.taekyung.me/safe-mpd
Variational Shape Inference for Grasp Diffusion on SE(3)
Grasp synthesis is a fundamental task in robotic manipulation which usually has multiple feasible solutions. Multimodal grasp synthesis seeks to generate diverse sets of stable grasps conditioned on object geometry, making the robust learning of geometric features crucial for success. To address this challenge, we propose a framework for learning multimodal grasp distributions that leverages variational shape inference to enhance robustness against shape noise and measurement sparsity. Our approach first trains a variational autoencoder for shape inference using implicit neural representations, and then uses these learned geometric features to guide a diffusion model for grasp synthesis on the SE(3) manifold. Additionally, we introduce a test-time grasp optimization technique that can be integrated as a plugin to further enhance grasping performance. Experimental results demonstrate that our shape inference for grasp synthesis formulation outperforms state-of-the-art multimodal grasp synthesis methods on the ACRONYM dataset by 6.3%, while demonstrating robustness to deterioration in point cloud density compared to other approaches. Furthermore, our trained model achieves zero-shot transfer to real-world manipulation of household objects, generating 34% more successful grasps than baselines despite measurement noise and point cloud calibration errors.
comment: This work has been submitted to the IEEE for possible publication
Grasping a Handful: Sequential Multi-Object Dexterous Grasp Generation
We introduce the sequential multi-object robotic grasp sampling algorithm SeqGrasp that can robustly synthesize stable grasps on diverse objects using the robotic hand's partial Degrees of Freedom (DoF). We use SeqGrasp to construct the large-scale Allegro Hand sequential grasping dataset SeqDataset and use it for training the diffusion-based sequential grasp generator SeqDiffuser. We experimentally evaluate SeqGrasp and SeqDiffuser against the state-of-the-art non-sequential multi-object grasp generation method MultiGrasp in simulation and on a real robot. The experimental results demonstrate that SeqGrasp and SeqDiffuser reach an 8.71%-43.33% higher grasp success rate than MultiGrasp. Furthermore, SeqDiffuser is approximately 1000 times faster at generating grasps than SeqGrasp and MultiGrasp. Project page: https://yulihn.github.io/SeqGrasp/.
comment: We replace the sets in Section II with an odered sequences
Collaborative Drill Alignment in Surgical Robotics
Robotic assistance allows surgeries to be reliably and accurately executed while still under direct supervision of the surgeon, combining the strengths of robotic technology with the surgeon's expertise. This paper describes a robotic system designed to assist in surgical procedures by implementing a virtual drill guide. The system integrates virtual-fixture functionality using a novel virtual-mechanism controller with additional visual feedback. The controller constrains the tool to the desired axis, while allowing axial motion to remain under the surgeon's control. Compared to prior virtual-fixture approaches -- which primarily perform pure energy-shaping and damping injection with linear springs and dampers -- our controller uses a virtual prismatic joint to which the robot is constrained by nonlinear springs, allowing us to easily shape the dynamics of the system. We detail the calibration procedures required to achieve sufficient precision, and describe the implementation of the controller. We apply this system to a veterinary procedure: drilling for transcondylar screw placement in dogs. The results of the trials on 3D-printed bone models demonstrate sufficient precision to perform the procedure and suggest improved angular accuracy and reduced exit translation errors compared to patient specific guides (PSG). Discussion and future improvements follow.
comment: 14 pages, 12 figures, accepted to IEEE Transactions on Control Systems Technology
Optimal Virtual Model Control for Robotics: Design and Tuning of Passivity-Based Controllers
Passivity-based control is a cornerstone of control theory and an established design approach in robotics. Its strength is based on the passivity theorem, which provides a powerful interconnection framework for robotics. However, the design of passivity-based controllers and their optimal tuning remain challenging. We propose here an intuitive design approach for fully actuated robots, where the control action is determined by a `virtual-mechanism' as in classical virtual model control. The result is a robot whose controlled behavior can be understood in terms of physics. We achieve optimal tuning by applying algorithmic differentiation to ODE simulations of the rigid body dynamics. Overall, this leads to a flexible design and optimization approach: stability is proven by passivity of the virtual mechanism, while performance is obtained by optimization using algorithmic differentiation.
comment: 16 pages, 19 figures
Primal-Dual iLQR for GPU-Accelerated Learning and Control in Legged Robots
This paper introduces a novel Model Predictive Control (MPC) implementation for legged robot locomotion that leverages GPU parallelization. Our approach enables both temporal and state-space parallelization by incorporating a parallel associative scan to solve the primal-dual Karush-Kuhn-Tucker (KKT) system. In this way, the optimal control problem is solved in $\mathcal{O}(\log^2(n)\log{N} + \log^2(m))$ complexity, instead of $\mathcal{O}(N(n + m)^3)$, where $n$, $m$, and $N$ are the dimension of the system state, control vector, and the length of the prediction horizon. We demonstrate the advantages of this implementation over two state-of-the-art solvers (acados and crocoddyl), achieving up to a 60\% improvement in runtime for Whole Body Dynamics (WB)-MPC and a 700\% improvement for Single Rigid Body Dynamics (SRBD)-MPC when varying the prediction horizon length. The presented formulation scales efficiently with the problem state dimensions as well, enabling the definition of a centralized controller for up to 16 legged robots that can be computed in less than 25 ms. Furthermore, thanks to the JAX implementation, the solver supports large-scale parallelization across multiple environments, allowing the possibility of performing learning with the MPC in the loop directly in GPU. The code associated with this work can be found at https://github.com/iit-DLSLab/mpx.
SIGN: Safety-Aware Image-Goal Navigation for Autonomous Drones via Reinforcement Learning
Image-goal navigation (ImageNav) tasks a robot with autonomously exploring an unknown environment and reaching a location that visually matches a given target image. While prior works primarily study ImageNav for ground robots, enabling this capability for autonomous drones is substantially more challenging due to their need for high-frequency feedback control and global localization for stable flight. In this paper, we propose a novel sim-to-real framework that leverages reinforcement learning (RL) to achieve ImageNav for drones. To enhance visual representation ability, our approach trains the vision backbone with auxiliary tasks, including image perturbations and future transition prediction, which results in more effective policy training. The proposed algorithm enables end-to-end ImageNav with direct velocity control, eliminating the need for external localization. Furthermore, we integrate a depth-based safety module for real-time obstacle avoidance, allowing the drone to safely navigate in cluttered environments. Unlike most existing drone navigation methods that focus solely on reference tracking or obstacle avoidance, our framework supports comprehensive navigation behaviors, including autonomous exploration, obstacle avoidance, and image-goal seeking, without requiring explicit global mapping. Code and model checkpoints are available at https://github.com/Zichen-Yan/SIGN.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions NeurIPS 2025
Establishing stability certificates for closed-loop systems under reinforcement learning (RL) policies is essential to move beyond empirical performance and offer guarantees of system behavior. Classical Lyapunov methods require a strict stepwise decrease in the Lyapunov function but such certificates are difficult to construct for learned policies. The RL value function is a natural candidate but it is not well understood how it can be adapted for this purpose. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
comment: NeurIPS 2025
UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies
We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.
comment: Result videos can be found at umi-on-air.github.io
High Torque Density PCB Axial Flux Permanent Magnet Motor for Micro Robots
Quasi-direct-drive (QDD) actuation is transforming legged and manipulator robots by eliminating high-ratio gearboxes, yet it demands motors that deliver very high torque at low speed within a thin, disc-shaped joint envelope. Axial-flux permanent-magnet (AFPM) machines meet these geometric and torque requirements, but scaling them below a 20mm outer diameter is hampered by poor copper fill in conventional wound stators, inflating resistance and throttling continuous torque. This paper introduces a micro-scale AFPM motor that overcomes these limitations through printed-circuit-board (PCB) windings fabricated with advanced IC-substrate high-density interconnect (HDI) technology. The resulting 48-layer stator-formed by stacking four 12-layer HDI modules-achieves a record 45\% copper fill in a package only 5mm thick and 19mm in diameter. We perform comprehensive electromagnetic and thermal analyses to inform the motor design, then fabricate a prototype whose performance characteristics are experimentally verified.
Multiagent Systems
ChargingBoul: A Competitive Negotiating Agent with Novel Opponent Modeling
Automated negotiation has emerged as a critical area of research in multiagent systems, with applications spanning e-commerce, resource allocation, and autonomous decision-making. This paper presents ChargingBoul, a negotiating agent that competed in the 2022 Automated Negotiating Agents Competition (ANAC) and placed second in individual utility by an exceptionally narrow margin. ChargingBoul employs a lightweight yet effective strategy that balances concession and opponent modeling to achieve high negotiation outcomes. The agent classifies opponents based on bid patterns, dynamically adjusts its bidding strategy, and applies a concession policy in later negotiation stages to maximize utility while fostering agreements. We evaluate ChargingBoul's performance using competition results and subsequent studies that have utilized the agent in negotiation research. Our analysis highlights ChargingBoul's effectiveness across diverse opponent strategies and its contributions to advancing automated negotiation techniques. We also discuss potential enhancements, including more sophisticated opponent modeling and adaptive bidding heuristics, to improve its performance further.
comment: 10 pages, 3 figures. Describes the ChargingBoul negotiating agent submitted to ANAC 2022. Preprint
Towards Efficient Hypergraph and Multi-LLM Agent Recommender Systems
Recommender Systems (RSs) have become the cornerstone of various applications such as e-commerce and social media platforms. The evolution of RSs is paramount in the digital era, in which personalised user experience is tailored to the user's preferences. Large Language Models (LLMs) have sparked a new paradigm - generative retrieval and recommendation. Despite their potential, generative RS methods face issues such as hallucination, which degrades the recommendation performance, and high computational cost in practical scenarios. To address these issues, we introduce HGLMRec, a novel Multi-LLM agent-based RS that incorporates a hypergraph encoder designed to capture complex, multi-behaviour relationships between users and items. The HGLMRec model retrieves only the relevant tokens during inference, reducing computational overhead while enriching the retrieval context. Experimental results show performance improvement by HGLMRec against state-of-the-art baselines at lower computational cost.
comment: 8 Pages
The Effect of Belief Boxes and Open-mindedness on Persuasion
As multi-agent systems are increasingly utilized for reasoning and decision-making applications, there is a greater need for LLM-based agents to have something resembling propositional beliefs. One simple method for doing so is to include statements describing beliefs maintained in the prompt space (in what we'll call their belief boxes). But when agents have such statements in belief boxes, how does it actually affect their behaviors and dispositions towards those beliefs? And does it significantly affect agents' ability to be persuasive in multi-agent scenarios? Likewise, if the agents are given instructions to be open-minded, how does that affect their behaviors? We explore these and related questions in a series of experiments. Our findings confirm that instructing agents to be open-minded affects how amenable they are to belief change. We show that incorporating belief statements and their strengths influences an agent's resistance to (and persuasiveness against) opposing viewpoints. Furthermore, it affects the likelihood of belief change, particularly when the agent is outnumbered in a debate by opposing viewpoints, i.e., peer pressure scenarios. The results demonstrate the feasibility and validity of the belief box technique in reasoning and decision-making tasks.
comment: Accepted at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain
HiveMind: Contribution-Guided Online Prompt Optimization of LLM Multi-Agent Systems AAAI2026
Recent advances in LLM-based multi-agent systems have demonstrated remarkable capabilities in complex decision-making scenarios such as financial trading and software engineering. However, evaluating each individual agent's effectiveness and online optimization of underperforming agents remain open challenges. To address these issues, we present HiveMind, a self-adaptive framework designed to optimize LLM multi-agent collaboration through contribution analysis. At its core, HiveMind introduces Contribution-Guided Online Prompt Optimization (CG-OPO), which autonomously refines agent prompts based on their quantified contributions. We first propose the Shapley value as a grounded metric to quantify each agent's contribution, thereby identifying underperforming agents in a principled manner for automated prompt refinement. To overcome the computational complexity of the classical Shapley value, we present DAG-Shapley, a novel and efficient attribution algorithm that leverages the inherent Directed Acyclic Graph structure of the agent workflow to axiomatically prune non-viable coalitions. By hierarchically reusing intermediate outputs of agents in the DAG, our method further reduces redundant computations, and achieving substantial cost savings without compromising the theoretical guarantees of Shapley values. Evaluated in a multi-agent stock-trading scenario, HiveMind achieves superior performance compared to static baselines. Notably, DAG-Shapley reduces LLM calls by over 80\% while maintaining attribution accuracy comparable to full Shapley values, establishing a new standard for efficient credit assignment and enabling scalable, real-world optimization of multi-agent collaboration.
comment: This paper was accepted to AAAI2026
A Physics-Informed Fixed Skyroad Model for Continuous UAS Traffic Management (C-UTM)
Unlike traditional multi-agent coordination frameworks, which assume a fixed number of agents, UAS traffic management (UTM) requires a platform that enables Uncrewed Aerial Systems (UAS) to freely enter or exit constrained low-altitude airspace. Consequently, the number of UAS operating in a given region is time-varying, with vehicles dynamically joining or leaving even in dense, obstacle-laden environments. The primary goal of this paper is to develop a computationally efficient management system that maximizes airspace usability while ensuring safety and efficiency. To achieve this, we first introduce physics-informed methods to structure fixed skyroads across multiple altitude layers of urban airspace, with the directionality of each skyroad designed to guarantee full reachability. We then present a novel Continuous UTM (C-UTM) framework that optimally allocates skyroads to UAS requests while accounting for the time-varying capacity of the airspace. Collectively, the proposed model addresses the key challenges of low-altitude UTM by providing a scalable, safe, and efficient solution for urban airspace usability.
Large Language Models Miss the Multi-Agent Mark NeurIPS 2025
Recent interest in Multi-Agent Systems of Large Language Models (MAS LLMs) has led to an increase in frameworks leveraging multiple LLMs to tackle complex tasks. However, much of this literature appropriates the terminology of MAS without engaging with its foundational principles. In this position paper, we highlight critical discrepancies between MAS theory and current MAS LLMs implementations, focusing on four key areas: the social aspect of agency, environment design, coordination and communication protocols, and measuring emergent behaviours. Our position is that many MAS LLMs lack multi-agent characteristics such as autonomy, social interaction, and structured environments, and often rely on oversimplified, LLM-centric architectures. The field may slow down and lose traction by revisiting problems the MAS literature has already addressed. Therefore, we systematically analyse this issue and outline associated research opportunities; we advocate for better integrating established MAS concepts and more precise terminology to avoid mischaracterisation and missed opportunities.
comment: NeurIPS 2025 - position track -
Systems and Control (EESS)
Learning Reachability of Energy Storage Arbitrage
Power systems face increasing weather-driven variability and, therefore, increasingly rely on flexible but energy-limited storage resources. Energy storage can buffer this variability, but its value depends on intertemporal decisions under uncertain prices. Without accounting for the future reliability value of stored energy, batteries may act myopically, discharging too early or failing to preserve reserves during critical hours. This paper introduces a stopping-time reward that, together with a state-of-charge (SoC) range target penalty, aligns arbitrage incentives with system reliability by rewarding storage that maintains sufficient SoC before critical hours. We formulate the problem as an online optimization with a chance-constrained terminal SoC and embed it in an end-to-end (E2E) learning framework, jointly training the price predictor and control policy. The proposed design enhances reachability of target SoC ranges, improves profit under volatile conditions, and reduces its standard deviation.
Deep Neural Network-Based Aerial Transport in the Presence of Cooperative and Uncooperative UAS
We present a resilient deep neural network (DNN) framework for decentralized transport and coverage using uncrewed aerial systems (UAS) operating in $\mathbb{R}^n$. The proposed DNN-based mass-transport architecture constructs a layered inter-UAS communication graph from an initial formation, assigns time-varying communication weights through a forward scheduling mechanism that guides the team from the initial to the final configuration, and ensures stability and convergence of the resulting multi-agent transport dynamics. The framework is explicitly designed to remain robust in the presence of uncooperative agents that deviate from or refuse to follow the prescribed protocol. Our method preserves a fixed feed-forward topology but dynamically prunes edges to uncooperative agents, maintains convex, feedforward mentoring among cooperative agents, and computes global desired set points through a sparse linear relation consistent with leader references. The target set is abstracted by $N$ points that become final desired positions, enabling coverage-optimal transport while keeping computation low and guarantees intact. Extensive simulations demonstrate that, under full cooperation, all agents converge rapidly to the target zone with a 10\% boundary margin and under partial cooperation with uncooperative agents, the system maintains high convergence among cooperative agents with performance degradation localized near the disruptions, evidencing graceful resilience and scalability. These results confirm that forward-weight scheduling, hierarchical mentor--mentee coordination, and on-the-fly DNN restructuring yield robust, provably stable UAS transport in realistic fault scenarios.
Switched Linear Ensemble Systems and Structural Controllability
This paper introduces and solves a structural controllability problem for ensembles of switched linear systems. All individual subsystems in the ensemble are sparse, governed by the same sparsity pattern, and undergo switching at the same sequence of time instants. The controllability of an ensemble system describes the ability to use a common control input to simultaneously steer every individual system. A sparsity pattern is called structurally controllable for pair \((k,q)\) if it admits a controllable ensemble of \(q\) individual systems with at most \(k\) switches. We derive a necessary and sufficient condition for a sparsity pattern to be structurally controllable for a given \((k,q)\), and characterize when a sparsity pattern admits a finite \(k\) that guarantees structural controllability for \((k,q)\) for arbitrary $q$. Compared with the linear time-invariant ensemble case, this second condition is strictly weaker. We further show that these conditions have natural connections with maximum flow, and hence can be checked by polynomial algorithms. Specifically, the time complexity of deciding structural controllability is \(O(n^3)\) and the complexity of computing the smallest number of switches needed is \(O(n^3 \log n)\), with \(n\) the dimension of each individual subsystem.
The E-Rocket: Low-cost Testbed for TVC Rocket GNC Validation
This paper presents the E-Rocket, an electric-powered, low-cost rocket prototype for validation of Guidance, Navigation & Control (GNC) algorithms based on Thrust Vector Control (TVC). Relying on commercially available components and 3D printed parts, a pair of contra-rotating DC brushless motors is assembled on a servo-actuated gimbal mechanism that provides thrust vectoring capability. A custom avionics hardware and software stack is developed considering a dual computer setup which leverages the capabilities of the PX4 autopilot and the modularity of ROS 2 to accommodate for tailored GNC algorithms. The platform is validated in an indoor motion-capture arena using a baseline PID-based trajectory tracking controller. Results demonstrate accurate trajectory tracking and confirm the suitability of the E-Rocket as a versatile testbed for rocket GNC algorithms.
comment: This work has been submitted to IFAC for possible publication
Leveraging Port-Hamiltonian Theory for Impedance Control Benchmarking
This work proposes PH-based metrics for benchmarking impedance control. A causality-consistent PH model is introduced for mass-spring-damper impedance in Cartesian space. Based on this model, a differentiable, force-torque sensing-independent, n-DoF passivity condition is derived, valid for time-varying references. An impedance fidelity metric is also defined from step-response power in free motion, capturing dynamic decoupling. The proposed metrics are validated in Gazebo simulations with a six-DoF manipulator and a quadruped leg. Results demonstrate the suitability of the PH framework for standardized impedance control benchmarking.
comment: This is the author's version of the paper accepted for publication in the 2025 International Conference on Advanced Robotics (ICAR). The final version will be available at IEEE Xplore
Demographic Dependence of Vaccine Adoption under Opinion Persuasion
Inspired by contagion models of social belief formation, we develop an epistemically-informed modeling framework, SIS-Vo, in which vaccine-related information propagates on a signed opinion network. Our model allows for heterogeneous treatment effects of policy messages across subpopulations through demographic-specific responses. We derive fixed-point characterizations of the healthy (disease-free) and endemic equilibria of this model, and obtain conditions for local stability of the healthy state in terms of the contact network and opinion-dependent vaccination capacities. Using numerical simulations, we illustrate how suitably targeted policy interventions, acting through opinion dynamics, can stabilize the epidemic process by moving the system towards the healthy regime. The SIS-Vo framework thus provides a natural basis for control-theoretic analysis of vaccination policies that remain robust even when misinformation targets specific subgroups.
comment: 6 pages, 4 figures
Data Generation for Stability Studies of Power Systems with High Penetration of Inverter-Based Resources
The increasing penetration of inverter-based resources (IBRs) is fundamentally reshaping power system dynamics and creating new challenges for stability assessment. Data-driven approaches, and in particular machine learning models, require large and representative datasets that capture how system stability varies across a wide range of operating conditions and control settings. This paper presents an open-source, high-performance computing framework for the systematic generation of such datasets. The proposed tool defines a scalable operating space for large-scale power systems, explores it through an adaptive sampling strategy guided by sensitivity analysis, and performs small-signal stability assessments to populate a high-information-content dataset. The framework efficiently targets regions near the stability margin while maintaining broad coverage of feasible operating conditions. The workflow is fully implemented in Python and designed for parallel execution. The resulting tool enables the creation of high-quality datasets that support data-driven stability studies in modern power systems with high IBR penetration.
The Missing Variable: Socio-Technical Alignment in Risk Evaluation
This paper addresses a critical gap in the risk assessment of AI-enabled safety-critical systems. While these systems, where AI systems assists human operators, function as complex socio-technical systems, existing risk evaluation methods fail to account for the associated complex interaction between human, technical, and organizational elements. Through a comparative analysis of system attributes from both socio-technical and AI-enabled systems and a review of current risk evaluation methods, we confirm the absence of socio-technical considerations in standard risk expressions. To bridge this gap, we introduce a novel socio-technical alignment $STA$ variable designed to be integrated into the foundational risk equation. This variable estimates the degree of harmonious interaction between the AI systems, human operators, and organizational processes. A case study on an AI-enabled liquid hydrogen bunkering system demonstrates the variable's relevance. By comparing a naive and a safeguarded system design, we illustrate how the $STA$-augmented expression captures socio-technical safety implications that traditional risk evaluation overlooks, providing a more holistic basis for risk evaluation.
Stabilizing Rate of Stochastic Control Systems with Multiplicative Noise
This paper develops a quantitative framework for analyzing the mean-square exponential stabilization of stochastic linear systems with multiplicative noise, focusing specifically on the optimal stabilizing rate, which characterizes the fastest exponential stabilization achievable under admissible control policies. Our contributions are twofold. First, we extend norm-based techniques from deterministic switched systems to the stochastic setting, deriving a verifiable necessary and sufficient condition for the exact attainability of the optimal stabilizing rate, together with computable upper and lower bounds. Second, by restricting attention to state-feedback policies, we reformulate the optimal stabilizing rate problem as an optimal control problem with a nonlinear cost function and derive a Bellman-type equation. Since this Bellman-type equation is not directly tractable, we recast it as a nonlinear matrix eigenvalue problem whose valid solutions require strictly positive-definite matrices. To ensure the existence of such solutions, we introduce a regularization scheme and develop a Regularized Normalized Value Iteration (RNVI) algorithm, which in turn generates strictly positive-definite fixed points for a perturbed version of original nonlinear matrix eigenvalue problem while producing feedback controllers. Evaluating these regularized solutions further yields certified lower and upper bounds for the optimal stabilizing rate, resulting in a constructive and verifiable framework for determining the fastest achievable mean-square stabilization under multiplicative noise.
comment: 30 pages
Defending Event-Triggered Systems against Out-of-Envelope Environments
The design of real-time systems is based on assumptions about environmental conditions in which they will operate. We call this their safe operational envelope. Violation of these assumptions, i.e., out-of-envelope environments, can jeopardize timeliness and safety of real-time systems, e.g., by overwhelming them with interrupt storms. A long-lasting debate has been going on over which design paradigm, the time- or event-triggered, is more robust against such behavior. In this work, we investigate the claim that time-triggered systems are immune against out-of-envelope behavior and how event-triggered systems can be constructed to defend against being overwhelmed by interrupt showers. We introduce importance (independently of priority and criticality) as a means to express which tasks should still be scheduled in case environmental design assumptions cease to hold, draw parallels to mixed-criticality scheduling, and demonstrate how event-triggered systems can defend against out-of-envelope behavior.
comment: Published at the RTAutoSec Workshop, which was co-located to ECRTS 2025
Control-Oriented System Identification: Classical, Learning, and Physics-Informed Approaches
We survey classical, machine learning, and data-driven system identification approaches to learn control-relevant and physics-informed models of dynamical systems. Recently, machine learning approaches have enabled system identification from noisy, high-dimensional, and complex data. However, their utility is limited by their ability to provide provable guarantees on control-relevant properties. Meanwhile, control theory has identified several properties that are useful in analysis and control synthesis, such as dissipativity, monotonicity, energy conservation, and symmetry-preserving structures. We posit that merging system identification with such control-relevant or physics-informed properties can provide useful inductive bias, enhance explainability, enable control synthesis with provable guarantees, and improve sample complexity. We formulate system identification as an optimization problem where control-relevant properties can be enforced through direct parameterization (constraining the model structure to satisfy a desired property by construction), soft constraints (encouraging control-relevant properties through regularization or penalty terms), and hard constraints (imposing control-relevant properties as constraints in the optimization problem). Through this lens, we survey methods to learn physics-informed and control-relevant models spanning classical linear and nonlinear system identification, machine learning approaches, and direct identification through data-driven and behavioral representations. We also provide several expository examples that are accompanied by code and brief tutorials on a public Github repository. We also describe challenging directions for future research, including identification in networked, switched, and time-varying systems, experiment design, and bridging the gaps between data-driven, learning-based, and control-oriented approaches.
A Hybrid Physics-Based and Reinforcement Learning Framework for Electric Vehicle Charging Time Prediction
In this paper, we develop a hybrid prediction framework for accurate electric vehicle (EV) charging time estimation, a capability that is critical for trip planning, user satisfaction, and efficient operation of charging infrastructure. We combine a physics-based analytical model with a reinforcement learning (RL) approach. The analytical component captures the nonlinear constant-current/constant-voltage (CC--CV) charging dynamics and explicitly models state-of-health (SoH)--dependent capacity and power fade, providing a reliable baseline when historical data are limited. Building on this foundation, we introduce an RL component that progressively refines charging-time predictions as operational data accumulate, enabling improved long-term adaptation. Both models incorporate SoH degradation to maintain predictive accuracy over the battery lifetime. We evaluate the framework using $5{,}000$ simulated charging sessions calibrated to manufacturer specifications and publicly available EV charging datasets. Our results show that the analytical model achieves $R^{2}=98.5\%$ and $\mathrm{MAPE}=2.1\%$, while the RL model further improves performance to $R^{2}=99.2\%$ and $\mathrm{MAPE}=1.6\%$, corresponding to a $23\%$ accuracy gain and $35\%$ improved robustness to battery aging.
Distributionally Robust Kalman Filter
In this work, we propose a noise-centric formulation of the distributionally robust Kalman filter (DRKF) for discrete-time linear stochastic systems with uncertain noise statistics. By placing Wasserstein ambiguity sets directly on the process and measurement noise distributions, the proposed DRKF preserves the analytical structure of the classical Kalman filter while providing a priori spectral bounds on all feasible covariances. In the time-invariant setting, we derive a steady-state DRKF from a single stationary semidefinite program, yielding a constant-gain estimator with the same per-step computational complexity as the standard Kalman filter. We establish conditions guaranteeing the existence, uniqueness, and convergence of this steady-state solution, and we prove its asymptotic minimax optimality with respect to the worst-case mean-square error. Numerical experiments validate the theory and demonstrate that the proposed DRKF improves estimation accuracy under unknown or uncertain noise models while offering computational advantages over existing robust and distributionally robust filters.
Optimal Platoon Formation and Stable Benefit Allocation in Mixed-Energy Truck Fleets under Size Limitations
In this paper, we investigate cooperative platoon formation and benefit allocation in mixed-energy truck fleets composed of both electric and fuel-powered trucks. The central challenge arises from the platoon-size constraint, which limits the number of trucks permitted in each platoon and introduces combinatorial coupling into the search for optimal platoon formation structures. We formulate this problem as a coalitional game with bounded coalition sizes and derive a closed-form characterization of the optimal coalition structure that maximizes the fleet-wide platooning benefit. Building on this structure, we develop a type-based least-core payoff allocation scheme that guarantees stability within the coalition-structure core (CS-core). For cases in which the CS-core is empty, we compute the least-core radius to determine the minimal relaxation required to achieve approximate stability. Through numerical studies, we demonstrate that the proposed framework consistently achieves the highest total platooning benefit among all feasible formation configurations while providing stable benefit allocations that outperform existing baseline methods.
Scale-free weak output synchronization of multi-agent systems with adaptive protocols
In this paper, we study output synchronization for multi-agent systems. The objective is to design a protocol which only depends on the agent dynamics and does not require any knowledge of the network. If the network has a directed spanning tree then the protocols designed in this paper achieve classical output synchronization. Otherwise, the protocol achieves weak synchronization which is induced by network stability in the sense that the signals exchanged over the network converge to zero. Weak sychronization is explained in detail in this paper. Even though we consider linear agents, it is known that this in general requires nonlinear protocols. In the paper we use adaptive protocols. In the literature, two classes of protocols are considered often called collaborative protocols (with additional communication between the protocols and non-collaborative protocols (sometimes referred to as fully decentralized where the additional communication is not present). This paper considers both of these cases.
comment: This paper was submitted to European Journal of Control on December 1st, 2025
A Physics-Informed Fixed Skyroad Model for Continuous UAS Traffic Management (C-UTM)
Unlike traditional multi-agent coordination frameworks, which assume a fixed number of agents, UAS traffic management (UTM) requires a platform that enables Uncrewed Aerial Systems (UAS) to freely enter or exit constrained low-altitude airspace. Consequently, the number of UAS operating in a given region is time-varying, with vehicles dynamically joining or leaving even in dense, obstacle-laden environments. The primary goal of this paper is to develop a computationally efficient management system that maximizes airspace usability while ensuring safety and efficiency. To achieve this, we first introduce physics-informed methods to structure fixed skyroads across multiple altitude layers of urban airspace, with the directionality of each skyroad designed to guarantee full reachability. We then present a novel Continuous UTM (C-UTM) framework that optimally allocates skyroads to UAS requests while accounting for the time-varying capacity of the airspace. Collectively, the proposed model addresses the key challenges of low-altitude UTM by providing a scalable, safe, and efficient solution for urban airspace usability.
AC/DC Frequency-Dependent Power Flow Jacobian: Quantifying Grid Support and Stability Implications
This letter proposes an AC/DC frequency-dependent power flow Jacobian analysis to identify the system support capabilities. In addition, the analyses reveal that system support capabilities do not necessarily enhance the system stability margin, suggesting that technical requirements of narrow-frequency-band and AC-side focused specifications may not lead to the expected performance of GFM.
comment: 3 pages, 5 figures
A Methodology for Process Design Kit Re-Centering Using TCAD and Experimental Data for Cryogenic Temperatures
In this work, we describe and demonstrate a novel Technology Computer Aided Design (TCAD) driven methodology to re-center room-temperature Process Design Kits (PDKs) for cryogenic operation using a limited set of experimental measurements. Unlike previous approaches that relied on direct fitting of sparse measurements, our technique accounts for process-induced deviations by calibrating TCAD models to both room-temperature and cryogenic data. Compact models for all process corners are extracted from TCAD-generated target characteristics, enabling accurate cryogenic modeling without dedicated foundry support. This scalable, technology-independent method provides a practical path for cryogenic circuit design.
Collaborative Drill Alignment in Surgical Robotics
Robotic assistance allows surgeries to be reliably and accurately executed while still under direct supervision of the surgeon, combining the strengths of robotic technology with the surgeon's expertise. This paper describes a robotic system designed to assist in surgical procedures by implementing a virtual drill guide. The system integrates virtual-fixture functionality using a novel virtual-mechanism controller with additional visual feedback. The controller constrains the tool to the desired axis, while allowing axial motion to remain under the surgeon's control. Compared to prior virtual-fixture approaches -- which primarily perform pure energy-shaping and damping injection with linear springs and dampers -- our controller uses a virtual prismatic joint to which the robot is constrained by nonlinear springs, allowing us to easily shape the dynamics of the system. We detail the calibration procedures required to achieve sufficient precision, and describe the implementation of the controller. We apply this system to a veterinary procedure: drilling for transcondylar screw placement in dogs. The results of the trials on 3D-printed bone models demonstrate sufficient precision to perform the procedure and suggest improved angular accuracy and reduced exit translation errors compared to patient specific guides (PSG). Discussion and future improvements follow.
comment: 14 pages, 12 figures, accepted to IEEE Transactions on Control Systems Technology
Revealing POMDPs: Qualitative and Quantitative Analysis for Parity Objectives AAAI 2026
Partially observable Markov decision processes (POMDPs) are a central model for uncertainty in sequential decision making. The most basic objective is the reachability objective, where a target set must be eventually visited, and the more general parity objectives can model all omega-regular specifications. For such objectives, the computational analysis problems are the following: (a) qualitative analysis that asks whether the objective can be satisfied with probability 1 (almost-sure winning) or probability arbitrarily close to 1 (limit-sure winning); and (b) quantitative analysis that asks for the approximation of the optimal probability of satisfying the objective. For general POMDPs, almost-sure analysis for reachability objectives is EXPTIME-complete, but limit-sure and quantitative analyses for reachability objectives are undecidable; almost-sure, limit-sure, and quantitative analyses for parity objectives are all undecidable. A special class of POMDPs, called revealing POMDPs, has been studied recently in several works, and for this subclass the almost-sure analysis for parity objectives was shown to be EXPTIME-complete. In this work, we show that for revealing POMDPs the limit-sure analysis for parity objectives is EXPTIME-complete, and even the quantitative analysis for parity objectives can be achieved in EXPTIME.
comment: Conference AAAI 2026
A New Framework for Bounding Reachability Probabilities of Continuous-time Stochastic Systems
This manuscript presents an innovative framework for constructing barrier functions to bound reachability probabilities for continuous-time stochastic systems described by stochastic differential equations (SDEs). The reachability probabilities considered in this paper encompass two aspects: the probability of reaching a set of specified states within a predefined finite time horizon, and the probability of reaching a set of specified states at a particular time instant. The barrier functions presented in this manuscript are developed either by relaxing a parabolic partial differential equation that characterizes the exact reachability probability or by applying the Grönwall's inequality. In comparison to the prevailing construction method, which relies on Doob's non-negative supermartingale inequality (or Ville's inequality), the proposed barrier functions provide stronger alternatives, complement existing methods, or fill gaps.
comment: To Appear in Nonlinear Analysis: Hybrid Systems
Jointly Computation- and Communication-Efficient Distributed Learning
We address distributed learning problems over undirected networks. Specifically, we focus on designing a novel ADMM-based algorithm that is jointly computation- and communication-efficient. Our design guarantees computational efficiency by allowing agents to use stochastic gradients during local training. Moreover, communication efficiency is achieved as follows: i) the agents perform multiple training epochs between communication rounds, and ii) compressed transmissions are used. We prove exact linear convergence of the algorithm in the strongly convex setting. We corroborate our theoretical results by numerical comparisons with state of the art techniques on a classification task.
comment: To be presented at 2025 IEEE Conference on Decision and Control
Optimizing Age of Information in Networks with Large and Small Updates
Modern sensing and monitoring applications typically consist of sources transmitting updates of different sizes, ranging from a few bytes (position, temperature, etc.) to multiple megabytes (images, video frames, LIDAR point scans, etc.). Existing approaches to wireless scheduling for information freshness typically ignore this mix of large and small updates, leading to suboptimal performance. In this paper, we consider a single-hop wireless broadcast network with sources transmitting updates of different sizes to a base station over unreliable links. Some sources send large updates spanning many time slots while others send small updates spanning only a few time slots. Due to medium access constraints, only one source can transmit to the base station at any given time, thus requiring careful design of scheduling policies that takes the sizes of updates into account. First, we derive a lower bound on the achievable Age of Information (AoI) by any transmission scheduling policy. Second, we develop optimal randomized policies that consider both switching and no-switching during the transmission of large updates. Third, we introduce a novel Lyapunov function and associated analysis to propose an AoI-based Max-Weight policy that has provable constant factor optimality guarantees. Finally, we evaluate and compare the performance of our proposed scheduling policies through simulations, which show that our Max-Weight policy achieves near-optimal AoI performance.
comment: To appear in WiOpt 2025
Observer Design for Networked Linear Systems with Fast and Slow Dynamics under Measurement Noise
This paper addresses the emulation-based observer design for networked control systems (NCS) with linear plants that operate at two time scales in the presence of measurement noise. The system is formulated as a hybrid singularly perturbed dynamical system, enabling the systematic use of singular perturbation techniques to derive explicit bounds on the maximum allowable transmission intervals (MATI) for both fast and slow communication channels. Under the resulting conditions, the proposed observer guarantees that the estimation error satisfies a global exponential derivative-input-to-state stability (DISS)-like property, where the ultimate bound scales proportionally with the magnitudes of the measurement noise and the time derivative of the control input. The effectiveness of the approach is illustrated through a numerical example.
comment: 9 pages, 2 figures, full version of the paper submitted to the IFAC World Congress
Distributed Traffic Signal Control of Interconnected Intersections: A Two-Lane Traffic Network Model
In this paper, we investigate traffic signal control in a network of interconnected intersections, aiming to balance lane-level vehicle densities through optimal green-time allocation. We develop a two-lane traffic flow model that explicitly captures lane-specific propagation dynamics, addressing key limitations of conventional road-level formulations. The proposed model offers a more granular and flexible representation of urban traffic, enabling controllers to react more accurately to lane-specific congestion patterns. Building on this model, we design a distributed model predictive control (MPC) framework and integrate it with the efficient alternating direction method of multipliers (ADMM) to enhance scalability and real-time performance. To accommodate time-varying traffic conditions, we further introduce a data-driven method for forecasting dynamic split ratios. Comprehensive VISSIM simulations on a six-intersection network in Dalian, China, demonstrate that the proposed approach outperforms existing signal control strategies in both traffic efficiency and computational speed, showing its promise for real-time deployment.
comment: conference
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions NeurIPS 2025
Establishing stability certificates for closed-loop systems under reinforcement learning (RL) policies is essential to move beyond empirical performance and offer guarantees of system behavior. Classical Lyapunov methods require a strict stepwise decrease in the Lyapunov function but such certificates are difficult to construct for learned policies. The RL value function is a natural candidate but it is not well understood how it can be adapted for this purpose. To gain intuition, we first study the linear quadratic regulator (LQR) problem and make two key observations. First, a Lyapunov function can be obtained from the value function of an LQR policy by augmenting it with a residual term related to the system dynamics and stage cost. Second, the classical Lyapunov decrease requirement can be relaxed to a generalized Lyapunov condition requiring only decrease on average over multiple time steps. Using this intuition, we consider the nonlinear setting and formulate an approach to learn generalized Lyapunov functions by augmenting RL value functions with neural network residual terms. Our approach successfully certifies the stability of RL policies trained on Gymnasium and DeepMind Control benchmarks. We also extend our method to jointly train neural controllers and stability certificates using a multi-step Lyapunov loss, resulting in larger certified inner approximations of the region of attraction compared to the classical Lyapunov approach. Overall, our formulation enables stability certification for a broad class of systems with learned policies by making certificates easier to construct, thereby bridging classical control theory and modern learning-based methods.
comment: NeurIPS 2025
Constrained Parameter Update Law for Adaptive Control
In this paper, a constrained parameter update law is derived in the context of adaptive control. The parameter update law is based on constrained optimization technique where a Lagrangian is formulated to incorporate the constraints on the parameters using inverse Barrier function. The constrained parameter update law is used to develop a adaptive tracking controller and the overall stability of the adaptive controller along with the constrained parameter update law is shown using Lyapunov analysis and development in stability of constrained primal-dual dynamics. The performance of the constrained parameter update law is tested in simulation for keeping the parameters within constraints and convergence to true parameters.
Learning Enhanced Ensemble Filters
The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting particles, employing a Gaussian ansatz for the joint distribution of the state and observation at each observation time. These methods are robust, but the Gaussian ansatz limits accuracy. Here this shortcoming is addressed by using machine learning to map the joint predicted state and observation to the updated state estimate. The derivation of methods from a mean field formulation of the true filtering distribution suggests a single parametrization of the algorithm that can be deployed at different ensemble sizes. And we use a mean field formulation of the ensemble Kalman filter as an inductive bias for our architecture. To develop this perspective, in which the mean-field limit of the algorithm and finite interacting ensemble particle approximations share a common set of parameters, a novel form of neural operator is introduced, taking probability distributions as input: a measure neural mapping (MNM). A MNM is used to design a novel approach to filtering, the MNM-enhanced ensemble filter (MNMEF), which is defined in both the mean-field limit and for interacting ensemble particle approximations. The ensemble approach uses empirical measures as input to the MNM and is implemented using the set transformer, which is invariant to ensemble permutation and allows for different ensemble sizes. In practice fine-tuning of a small number of parameters, for specific ensemble sizes, further enhances the accuracy of the scheme. The promise of the approach is demonstrated by its superior root-mean-square-error performance relative to leading methods in filtering the Lorenz '96 and Kuramoto-Sivashinsky models.
comment: Accepted by the Journal of Computational Physics
Robotics
Training-Time Action Conditioning for Efficient Real-Time Chunking
Real-time chunking (RTC) enables vision-language-action models (VLAs) to generate smooth, reactive robot trajectories by asynchronously predicting action chunks and conditioning on previously committed actions via inference-time inpainting. However, this inpainting method introduces computational overhead that increases inference latency. In this work, we propose a simple alternative: simulating inference delay at training time and conditioning on action prefixes directly, eliminating any inference-time overhead. Our method requires no modifications to the model architecture or robot runtime, and can be implemented with only a few additional lines of code. In simulated experiments, we find that training-time RTC outperforms inference-time RTC at higher inference delays. In real-world experiments on box building and espresso making tasks with the $π_{0.6}$ VLA, we demonstrate that training-time RTC maintains both task performance and speed parity with inference-time RTC while being computationally cheaper. Our results suggest that training-time action conditioning is a practical drop-in replacement for inference-time inpainting in real-time robot control.
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models
Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises from training VLMs on static internet-scale visual-language data that contain no causal interactions or action-conditioned changes. Consequently, it remains challenging to leverage VLMs for fine-grained robotic manipulation tasks that require physical understanding, reasoning, and corresponding action planning. To overcome this, we present SIMPACT, a test-time, SIMulation-enabled ACTion Planning framework that equips VLMs with physical reasoning through simulation-in-the-loop world modeling, without requiring any additional training. From a single RGB-D observation, SIMPACT efficiently constructs physics simulations, enabling the VLM to propose informed actions, observe simulated rollouts, and iteratively refine its reasoning. By integrating language reasoning with physics prediction, our simulation-enabled VLM can understand contact dynamics and action outcomes in a physically grounded way. Our method demonstrates state-of-the-art performance on five challenging, real-world rigid-body and deformable manipulation tasks that require fine-grained physical reasoning, outperforming existing general-purpose robotic manipulation models. Our results demonstrate that embedding physics understanding via efficient simulation into VLM reasoning at test time offers a promising path towards generalizable embodied intelligence. Project webpage can be found at https://simpact-bot.github.io
Correspondence-Oriented Imitation Learning: Flexible Visuomotor Control with 3D Conditioning
We introduce Correspondence-Oriented Imitation Learning (COIL), a conditional policy learning framework for visuomotor control with a flexible task representation in 3D. At the core of our approach, each task is defined by the intended motion of keypoints selected on objects in the scene. Instead of assuming a fixed number of keypoints or uniformly spaced time intervals, COIL supports task specifications with variable spatial and temporal granularity, adapting to different user intents and task requirements. To robustly ground this correspondence-oriented task representation into actions, we design a conditional policy with a spatio-temporal attention mechanism that effectively fuses information across multiple input modalities. The policy is trained via a scalable self-supervised pipeline using demonstrations collected in simulation, with correspondence labels automatically generated in hindsight. COIL generalizes across tasks, objects, and motion patterns, achieving superior performance compared to prior methods on real-world manipulation tasks under both sparse and dense specifications.
Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception
Common approaches to explainable AI (XAI) for deep learning focus on analyzing the importance of input features on the classification task in a given model: saliency methods like SHAP and GradCAM are used to measure the impact of spatial regions of the input image on the classification result. Combined with ground truth information about the location of the object in the input image (e.g., a binary mask), it is determined whether object pixels had a high impact on the classification result, or whether the classification focused on background pixels. The former is considered to be a sign of a healthy classifier, whereas the latter is assumed to suggest overfitting on spurious correlations. A major challenge, however, is that these intuitive interpretations are difficult to test quantitatively, and hence the output of such explanations lacks an explanation itself. One particular reason is that correlations in real-world data are difficult to avoid, and whether they are spurious or legitimate is debatable. Synthetic data in turn can facilitate to actively enable or disable correlations where desired but often lack a sufficient quantification of realism and stochastic properties. [...] Therefore, we systematically generate six synthetic datasets for the task of traffic sign recognition, which differ only in their degree of camera variation and background correlation [...] to quantify the isolated influence of background correlation, different levels of camera variation, and considered traffic sign shapes on the classification performance, as well as background feature importance. [...] Results include a quantification of when and how much background features gain importance to support the classification task based on changes in the training domain [...]. Download: synset.de/datasets/synset-signset-ger/background-effect
comment: 8 pages, 2 figures, 7 tables
Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition
In this paper, we present a synthesis pipeline and dataset for training / testing data in the task of traffic sign recognition that combines the advantages of data-driven and analytical modeling: GAN-based texture generation enables data-driven dirt and wear artifacts, rendering unique and realistic traffic sign surfaces, while the analytical scene modulation achieves physically correct lighting and allows detailed parameterization. In particular, the latter opens up applications in the context of explainable AI (XAI) and robustness tests due to the possibility of evaluating the sensitivity to parameter changes, which we demonstrate with experiments. Our resulting synthetic traffic sign recognition dataset Synset Signset Germany contains a total of 105500 images of 211 different German traffic sign classes, including newly published (2020) and thus comparatively rare traffic signs. In addition to a mask and a segmentation image, we also provide extensive metadata including the stochastically selected environment and imaging effect parameters for each image. We evaluate the degree of realism of Synset Signset Germany on the real-world German Traffic Sign Recognition Benchmark (GTSRB) and in comparison to CATERED, a state-of-the-art synthetic traffic sign recognition dataset.
comment: 8 pages, 8 figures, 3 tables
Physically-Based Simulation of Automotive LiDAR
We present an analytic model for simulating automotive time-of-flight (ToF) LiDAR that includes blooming, echo pulse width, and ambient light, along with steps to determine model parameters systematically through optical laboratory measurements. The model uses physically based rendering (PBR) in the near-infrared domain. It assumes single-bounce reflections and retroreflections over rasterized rendered images from shading or ray tracing, including light emitted from the sensor as well as stray light from other, non-correlated sources such as sunlight. Beams from the sensor and sensitivity of the receiving diodes are modeled with flexible beam steering patterns and with non-vanishing diameter. Different (all non-real time) computational approaches can be chosen based on system properties, computing capabilities, and desired output properties. Model parameters include system-specific properties, namely the physical spread of the LiDAR beam, combined with the sensitivity of the receiving diode; the intensity of the emitted light; the conversion between the intensity of reflected light and the echo pulse width; and scenario parameters such as environment lighting, positioning, and surface properties of the target(s) in the relevant infrared domain. System-specific properties of the model are determined from laboratory measurements of the photometric luminance on different target surfaces aligned with a goniometer at 0.01° resolution, which marks the best available resolution for measuring the beam pattern. The approach is calibrated for and tested on two automotive LiDAR systems, the Valeo Scala Gen. 2 and the Blickfeld Cube 1. Both systems differ notably in their properties and available interfaces, but the relevant model parameters could be extracted successfully.
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty
Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is conditioned on text and action inputs, e.g., in instruction-guided video editing and world modeling in robotics. Despite these exceptional capabilities, controllable video models often hallucinate - generating future video frames that are misaligned with physical reality - which raises serious concerns in many tasks such as robot policy evaluation and planning. However, state-of-the-art video models lack the ability to assess and express their confidence, impeding hallucination mitigation. To rigorously address this challenge, we propose C3, an uncertainty quantification (UQ) method for training continuous-scale calibrated controllable video models for dense confidence estimation at the subpatch level, precisely localizing the uncertainty in each generated video frame. Our UQ method introduces three core innovations to empower video models to estimate their uncertainty. First, our method develops a novel framework that trains video models for correctness and calibration via strictly proper scoring rules. Second, we estimate the video model's uncertainty in latent space, avoiding training instability and prohibitive training costs associated with pixel-space approaches. Third, we map the dense latent-space uncertainty to interpretable pixel-level uncertainty in the RGB space for intuitive visualization, providing high-resolution uncertainty heatmaps that identify untrustworthy regions. Through extensive experiments on large-scale robot learning datasets (Bridge and DROID) and real-world evaluations, we demonstrate that our method not only provides calibrated uncertainty estimates within the training distribution, but also enables effective out-of-distribution detection.
A Residual Variance Matching Recursive Least Squares Filter for Real-time UAV Terrain Following
Accurate real-time waypoints estimation for the UAV-based online Terrain Following during wildfire patrol missions is critical to ensuring flight safety and enabling wildfire detection. However, existing real-time filtering algorithms struggle to maintain accurate waypoints under measurement noise in nonlinear and time-varying systems, posing risks of flight instability and missed wildfire detections during UAV-based terrain following. To address this issue, a Residual Variance Matching Recursive Least Squares (RVM-RLS) filter, guided by a Residual Variance Matching Estimation (RVME) criterion, is proposed to adaptively estimate the real-time waypoints of nonlinear, time-varying UAV-based terrain following systems. The proposed method is validated using a UAV-based online terrain following system within a simulated terrain environment. Experimental results show that the RVM-RLS filter improves waypoints estimation accuracy by approximately 88$\%$ compared with benchmark algorithms across multiple evaluation metrics. These findings demonstrate both the methodological advances in real-time filtering and the practical potential of the RVM-RLS filter for UAV-based online wildfire patrol.
Optimal Safety-Aware Scheduling for Multi-Agent Aerial 3D Printing with Utility Maximization under Dependency Constraints
This article presents a novel coordination and task-planning framework to enable the simultaneous conflict-free collaboration of multiple unmanned aerial vehicles (UAVs) for aerial 3D printing. The proposed framework formulates an optimization problem that takes a construction mission divided into sub-tasks and a team of autonomous UAVs, along with limited volume and battery. It generates an optimal mission plan comprising task assignments and scheduling while accounting for task dependencies arising from the geometric and structural requirements of the 3D design, inter-UAV safety constraints, material usage, and total flight time of each UAV. The potential conflicts occurring during the simultaneous operation of the UAVs are addressed at a segment level by dynamically selecting the starting time and location of each task to guarantee collision-free parallel execution. An importance prioritization is proposed to accelerate the computation by guiding the solution toward more important tasks. Additionally, a utility maximization formulation is proposed to dynamically determine the optimal number of UAVs required for a given mission, balancing the trade-off between minimizing makespan and the deployment of excess agents. The proposed framework's effectiveness is evaluated through a Gazebo-based simulation setup, where agents are coordinated by a mission control module allocating the printing tasks based on the generated optimal scheduling plan while remaining within the material and battery constraints of each UAV.
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
comment: This work has been submitted to the IEEE for possible publication
Real-time Remote Tracking and Autonomous Planning for Whale Rendezvous using Robots
We introduce a system for real-time sperm whale rendezvous at sea using an autonomous uncrewed aerial vehicle. Our system employs model-based reinforcement learning that combines in situ sensor data with an empirical whale dive model to guide navigation decisions. Key challenges include (i) real-time acoustic tracking in the presence of multiple whales, (ii) distributed communication and decision-making for robot deployments, and (iii) on-board signal processing and long-range detection from fish-trackers. We evaluate our system by conducting rendezvous with sperm whales at sea in Dominica, performing hardware experiments on land, and running simulations using whale trajectories interpolated from marine biologists' surface observations.
Global stability of vehicle-with-driver dynamics via Sum-of-Squares programming
This work estimates safe invariant subsets of the Region of Attraction (ROA) for a seven-state vehicle-with-driver system, capturing both asymptotic stability and the influence of state-safety bounds along the system trajectory. Safe sets are computed by optimizing Lyapunov functions through an original iterative Sum-of-Squares (SOS) procedure. The method is first demonstrated on a two-state benchmark, where it accurately recovers a prescribed safe region as the 1-level set of a polynomial Lyapunov function. We then describe the distinguishing characteristics of the studied vehicle-with-driver system: the control dynamics mimic human driver behavior through a delayed preview-tracking model that, with suitable parameter choices, can also emulate digital controllers. To enable SOS optimization, a polynomial approximation of the nonlinear vehicle model is derived, together with its operating-envelope constraints. The framework is then applied to understeering and oversteering scenarios, and the estimated safe sets are compared with reference boundaries obtained from exhaustive simulations. The results show that SOS techniques can efficiently deliver Lyapunov-defined safe regions, supporting their potential use for real-time safety assessment, for example as a supervisory layer for active vehicle control.
comment: 20 pages, 7 figures, 2 tables
3D Path Planning for Robot-assisted Vertebroplasty from Arbitrary Bi-plane X-ray via Differentiable Rendering
Robotic systems are transforming image-guided interventions by enhancing accuracy and minimizing radiation exposure. A significant challenge in robotic assistance lies in surgical path planning, which often relies on the registration of intraoperative 2D images with preoperative 3D CT scans. This requirement can be burdensome and costly, particularly in procedures like vertebroplasty, where preoperative CT scans are not routinely performed. To address this issue, we introduce a differentiable rendering-based framework for 3D transpedicular path planning utilizing bi-planar 2D X-rays. Our method integrates differentiable rendering with a vertebral atlas generated through a Statistical Shape Model (SSM) and employs a learned similarity loss to refine the SSM shape and pose dynamically, independent of fixed imaging geometries. We evaluated our framework in two stages: first, through vertebral reconstruction from orthogonal X-rays for benchmarking, and second, via clinician-in-the-loop path planning using arbitrary-view X-rays. Our results indicate that our method outperformed a normalized cross-correlation baseline in reconstruction metrics (DICE: 0.75 vs. 0.65) and achieved comparable performance to the state-of-the-art model ReVerteR (DICE: 0.77), while maintaining generalization to arbitrary views. Success rates for bipedicular planning reached 82% with synthetic data and 75% with cadaver data, exceeding the 66% and 31% rates of a 2D-to-3D baseline, respectively. In conclusion, our framework facilitates versatile, CT-free 3D path planning for robot-assisted vertebroplasty, effectively accommodating real-world imaging diversity without the need for preoperative CT scans.
Label-Efficient Point Cloud Segmentation with Active Learning
Semantic segmentation of 3D point cloud data often comes with high annotation costs. Active learning automates the process of selecting which data to annotate, reducing the total amount of annotation needed to achieve satisfactory performance. Recent approaches to active learning for 3D point clouds are often based on sophisticated heuristics for both, splitting point clouds into annotatable regions and selecting the most beneficial for further neural network training. In this work, we propose a novel and easy-to-implement strategy to separate the point cloud into annotatable regions. In our approach, we utilize a 2D grid to subdivide the point cloud into columns. To identify the next data to be annotated, we employ a network ensemble to estimate the uncertainty in the network output. We evaluate our method on the S3DIS dataset, the Toronto-3D dataset, and a large-scale urban 3D point cloud of the city of Freiburg, which we labeled in parts manually. The extensive evaluation shows that our method yields performance on par with, or even better than, complex state-of-the-art methods on all datasets. Furthermore, we provide results suggesting that in the context of point clouds the annotated area can be a more meaningful measure for active learning algorithms than the number of annotated points.
Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning
This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-generated demonstrations with probabilistic generative modeling to encode high-level symbolic planning, low-level motion policies, and wireless signal feedback. During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations. Simulation results demonstrate that the proposed method achieves near-expert performance, significantly reducing communication interference and mission cost compared to model-free reinforcement learning baselines, while maintaining robust generalization in dynamic environments.
comment: This paper has been accepted for the 2026 IEEE Consumer Communications & Networking Conference (IEEE CCNC 2026)
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies
The development of foundation models for embodied intelligence critically depends on access to large-scale, high-quality robot demonstration data. Recent approaches have sought to address this challenge by training on large collections of heterogeneous robotic datasets. However, unlike vision or language data, robotic demonstrations exhibit substantial heterogeneity across embodiments and action spaces as well as other prominent variations such as senor configurations and action control frequencies. The lack of explicit designs for handling such heterogeneity causes existing methods to struggle with integrating diverse factors, thereby limiting their generalization and leading to degraded performance when transferred to new settings. In this paper, we present HiMoE-VLA, a novel vision-language-action (VLA) framework tailored to effectively handle diverse robotic data with heterogeneity. Specifically, we introduce a Hierarchical Mixture-of-Experts (HiMoE) architecture for the action module which adaptively handles multiple sources of heterogeneity across layers and gradually abstracts them into shared knowledge representations. Through extensive experimentation with simulation benchmarks and real-world robotic platforms, HiMoE-VLA demonstrates a consistent performance boost over existing VLA baselines, achieving higher accuracy and robust generalization across diverse robots and action spaces. The code and models are publicly available at https://github.com/ZhiyingDu/HiMoE-VLA.
Scenario-aware Uncertainty Quantification for Trajectory Prediction with Statistical Guarantees
Reliable uncertainty quantification in trajectory prediction is crucial for safety-critical autonomous driving systems, yet existing deep learning predictors lack uncertainty-aware frameworks adaptable to heterogeneous real-world scenarios. To bridge this gap, we propose a novel scenario-aware uncertainty quantification framework to provide the predicted trajectories with prediction intervals and reliability assessment. To begin with, predicted trajectories from the trained predictor and their ground truth are projected onto the map-derived reference routes within the Frenet coordinate system. We then employ CopulaCPTS as the conformal calibration method to generate temporal prediction intervals for distinct scenarios as the uncertainty measure. Building upon this, within the proposed trajectory reliability discriminator (TRD), mean error and calibrated confidence intervals are synergistically analyzed to establish reliability models for different scenarios. Subsequently, the risk-aware discriminator leverages a joint risk model that integrates longitudinal and lateral prediction intervals within the Frenet coordinate to identify critical points. This enables segmentation of trajectories into reliable and unreliable segments, holding the advantage of informing downstream planning modules with actionable reliability results. We evaluated our framework using the real-world nuPlan dataset, demonstrating its effectiveness in scenario-aware uncertainty quantification and reliability assessment across diverse driving contexts.
An Integrated System for WEEE Sorting Employing X-ray Imaging, AI-based Object Detection and Segmentation, and Delta Robot Manipulation
Battery recycling is becoming increasingly critical due to the rapid growth in battery usage and the limited availability of natural resources. Moreover, as battery energy densities continue to rise, improper handling during recycling poses significant safety hazards, including potential fires at recycling facilities. Numerous systems have been proposed for battery detection and removal from WEEE recycling lines, including X-ray and RGB-based visual inspection methods, typically driven by AI-powered object detection models (e.g., Mask R-CNN, YOLO, ResNets). Despite advances in optimizing detection techniques and model modifications, a fully autonomous solution capable of accurately identifying and sorting batteries across diverse WEEEs types has yet to be realized. In response to these challenges, we present our novel approach which integrates a specialized X-ray transmission dual energy imaging subsystem with advanced pre-processing algorithms, enabling high-contrast image reconstruction for effective differentiation of dense and thin materials in WEEE. Devices move along a conveyor belt through a high-resolution X-ray imaging system, where YOLO and U-Net models precisely detect and segment battery-containing items. An intelligent tracking and position estimation algorithm then guides a Delta robot equipped with a suction gripper to selectively extract and properly discard the targeted devices. The approach is validated in a photorealistic simulation environment developed in NVIDIA Isaac Sim and on the real setup.
A Comprehensive Framework for Automated Quality Control in the Automotive Industry
This paper presents a cutting-edge robotic inspection solution designed to automate quality control in automotive manufacturing. The system integrates a pair of collaborative robots, each equipped with a high-resolution camera-based vision system to accurately detect and localize surface and thread defects in aluminum high-pressure die casting (HPDC) automotive components. In addition, specialized lenses and optimized lighting configurations are employed to ensure consistent and high-quality image acquisition. The YOLO11n deep learning model is utilized, incorporating additional enhancements such as image slicing, ensemble learning, and bounding-box merging to significantly improve performance and minimize false detections. Furthermore, image processing techniques are applied to estimate the extent of the detected defects. Experimental results demonstrate real-time performance with high accuracy across a wide variety of defects, while minimizing false detections. The proposed solution is promising and highly scalable, providing the flexibility to adapt to various production environments and meet the evolving demands of the automotive industry.
A Hyperspectral Imaging Guided Robotic Grasping System
Hyperspectral imaging is an advanced technique for precisely identifying and analyzing materials or objects. However, its integration with robotic grasping systems has so far been explored due to the deployment complexities and prohibitive costs. Within this paper, we introduce a novel hyperspectral imaging-guided robotic grasping system. The system consists of PRISM (Polyhedral Reflective Imaging Scanning Mechanism) and the SpectralGrasp framework. PRISM is designed to enable high-precision, distortion-free hyperspectral imaging while simplifying system integration and costs. SpectralGrasp generates robotic grasping strategies by effectively leveraging both the spatial and spectral information from hyperspectral images. The proposed system demonstrates substantial improvements in both textile recognition compared to human performance and sorting success rate compared to RGB-based methods. Additionally, a series of comparative experiments further validates the effectiveness of our system. The study highlights the potential benefits of integrating hyperspectral imaging with robotic grasping systems, showcasing enhanced recognition and grasping capabilities in complex and dynamic environments. The project is available at: https://zainzh.github.io/PRISM.
comment: 8 pages, 7 figures, Accepted to IEEE Robotics and Automation Letters (RA-L) 2025
Spatiotemporal Tubes for Differential Drive Robots with Model Uncertainty
This paper presents a Spatiotemporal Tube (STT)-based control framework for differential-drive mobile robots with dynamic uncertainties and external disturbances, guaranteeing the satisfaction of Temporal Reach-Avoid-Stay (T-RAS) specifications. The approach employs circular STT, characterized by smoothly time-varying center and radius, to define dynamic safe corridors that guide the robot from the start region to the goal while avoiding obstacles. In particular, we first develop a sampling-based synthesis algorithm to construct a feasible STT that satisfies the prescribed timing and safety constraints with formal guarantees. To ensure that the robot remains confined within this tube, we then design analytically a closed-form, approximation-free control law. The resulting controller is computationally efficient, robust to disturbances and {model uncertainties}, and requires no model approximations or online optimization. The proposed framework is validated through simulation studies on a differential-drive robot and benchmarked against state-of-the-art methods, demonstrating superior robustness, accuracy, and computational efficiency.
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning
We study visual domain transfer for end-to-end imitation learning in a realistic and challenging setting where target-domain data are strictly off-policy, expert-free, and scarce. We first provide a theoretical analysis showing that the target-domain imitation loss can be upper bounded by the source-domain loss plus a state-conditional latent KL divergence between source and target observation models. Guided by this result, we propose State- Conditional Adversarial Learning, an off-policy adversarial framework that aligns latent distributions conditioned on system state using a discriminator-based estimator of the conditional KL term. Experiments on visually diverse autonomous driving environments built on the BARC-CARLA simulator demonstrate that SCAL achieves robust transfer and strong sample efficiency.
Where to Fly, What to Send: Communication-Aware Aerial Support for Ground Robots
In this work we consider a multi-robot team operating in an unknown environment where one aerial agent is tasked to map the environment and transmit (a portion of) the mapped environment to a group of ground agents that are trying to reach their goals. The entire operation takes place over a bandwidth-limited communication channel, which motivates the problem of determining what and how much information the assisting agent should transmit and when while simultaneously performing exploration/mapping. The proposed framework enables the assisting aerial agent to decide what information to transmit based on the Value-of-Information (VoI), how much to transmit using a Mixed-Integer Linear Programming (MILP), and how to acquire additional information through an utility score-based environment exploration strategy. We perform a communication-motion trade-off analysis between the total amount of map data communicated by the aerial agent and the navigation cost incurred by the ground agents.
comment: Submitted to conference
Cascaded Tightly-Coupled Observer Design for Single-Range-Aided Inertial Navigation
This work introduces a single-range-aided navigation observer that reconstructs the full state of a rigid body using only an Inertial Measurement Unit (IMU), a body-frame vector measurement (e.g., magnetometer), and a distance measurement from a fixed anchor point. The design first formulates an extended linear time-varying (LTV) system to estimate body-frame position, body-frame velocity, and the gravity direction. The recovered gravity direction, combined with the body-frame vector measurement, is then used to reconstruct the full orientation on $\mathrm{SO}(3)$, resulting in a cascaded observer architecture. Almost Global Asymptotic Stability (AGAS) of the cascaded design is established under a uniform observability condition, ensuring robustness to sensor noise and trajectory variations. Simulation studies on three-dimensional trajectories demonstrate accurate estimation of position, velocity, and orientation, highlighting single-range aiding as a lightweight and effective modality for autonomous navigation.
comment: 8 pages
REWW-ARM -- Remote Wire-Driven Mobile Robot: Design, Control, and Experimental Validation
Electronic devices are essential for robots but limit their usable environments. To overcome this, methods excluding electronics from the operating environment while retaining advanced electronic control and actuation have been explored. These include the remote hydraulic drive of electronics-free mobile robots, which offer high reachability, and long wire-driven robot arms with motors consolidated at the base, which offer high environmental resistance. To combine the advantages of both, this study proposes a new system, "Remote Wire Drive." As a proof-of-concept, we designed and developed the Remote Wire-Driven robot "REWW-ARM", which consists of the following components: 1) a novel power transmission mechanism, the "Remote Wire Transmission Mechanism" (RWTM), the key technology of the Remote Wire Drive; 2) an electronics-free distal mobile robot driven by it; and 3) a motor-unit that generates power and provides electronic closed-loop control based on state estimation via the RWTM. In this study, we evaluated the mechanical and control performance of REWW-ARM through several experiments, demonstrating its capability for locomotion, posture control, and object manipulation both on land and underwater. This suggests the potential for applying the Remote Wire-Driven system to various types of robots, thereby expanding their operational range.
comment: Accepted on Advanced Intelligent Systems
Situation-Aware Interactive MPC Switching for Autonomous Driving
To enable autonomous driving in interactive traffic scenarios, various model predictive control (MPC) formulations have been proposed, each employing different interaction models. While higher-fidelity models enable more intelligent behavior, they incur increased computational cost. Since strong interactions are relatively infrequent in traffic, a practical strategy for balancing performance and computational overhead is to invoke an appropriate controller based on situational demands. To achieve this approach, we first conduct a comparative study to assess and hierarchize the interactive capabilities of different MPC formulations. Furthermore, we develop a neural network-based classifier to enable situation-aware switching among controllers with different levels of interactive capability. We demonstrate that this situation-aware switching can both substantially improve overall performance by activating the most advanced interactive MPC in rare but critical situations, and significantly reduce computational load by using a basic MPC in the majority of scenarios.
Real-Time Spatiotemporal Tubes for Dynamic Unsafe Sets
This paper presents a real-time control framework for nonlinear pure-feedback systems with unknown dynamics to satisfy reach-avoid-stay tasks within a prescribed time in dynamic environments. To achieve this, we introduce a real-time spatiotemporal tube (STT) framework. An STT is defined as a time-varying ball in the state space whose center and radius adapt online using only real-time sensory input. A closed-form, approximation-free control law is then derived to constrain the system output within the STT, ensuring safety and task satisfaction. We provide formal guarantees for obstacle avoidance and on-time task completion. The effectiveness and scalability of the framework are demonstrated through simulations and hardware experiments on a mobile robot and an aerial vehicle, navigating in cluttered dynamic environments.
GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers
While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation design remain rare. To bridge this gap, we conducted a comprehensive human study involving interviews with 26 guide dog handlers, four white cane users, nine guide dog trainers, and one O\&M trainer, along with 15+ hours of observing guide dog-assisted walking. After de-identification, we open-sourced the dataset to promote human-centered development and informed decision-making for assistive systems for BLV people. Building on insights from this formative study, we developed GuideNav, a vision-only, teach-and-repeat navigation system. Inspired by how guide dogs are trained and assist their handlers, GuideNav autonomously repeats a path demonstrated by a sighted person using a robot. Specifically, the system constructs a topological representation of the taught route, integrates visual place recognition with temporal filtering, and employs a relative pose estimator to compute navigation actions - all without relying on costly, heavy, power-hungry sensors such as LiDAR. In field tests, GuideNav consistently achieved kilometer-scale route following across five outdoor environments, maintaining reliability despite noticeable scene variations between teach and repeat runs. A user study with 3 guide dog handlers and 1 guide dog trainer further confirmed the system's feasibility, marking (to our knowledge) the first demonstration of a quadruped mobile system retrieving a path in a manner comparable to guide dogs.
Probabilistic Weapon Engagement Zones for a Turn Constrained Pursuer
Curve-straight probabilistic engagement zones (CSPEZ) quantify the spatial regions an evader should avoid to reduce capture risk from a turn-rate-limited pursuer following a curve-straight path with uncertain parameters including position, heading, velocity, range, and maximum turn rate. This paper presents methods for generating evader trajectories that minimize capture risk under such uncertainty. We first derive an analytic solution for the deterministic curve-straight basic engagement zone (CSBEZ), then extend this formulation to a probabilistic framework using four uncertainty-propagation approaches: Monte Carlo sampling, linearization, quadratic approximation, and neural-network regression. We evaluate the accuracy and computational cost of each approximation method and demonstrate how CSPEZ constraints can be integrated into a trajectory-optimization algorithm to produce safe paths that explicitly account for pursuer uncertainty.
comment: Accepted for presentation at AIAA SciTech 2026. 17 pages, 7 figures
WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving
We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders, WAM-Flow performs fully parallel, bidirectional denoising, enabling coarse-to-fine refinement with a tunable compute-accuracy trade-off. Specifically, the approach combines a metric-aligned numerical tokenizer that preserves scalar geometry via triplet-margin learning, a geometry-aware flow objective and a simulator-guided GRPO alignment that integrates safety, ego progress, and comfort rewards while retaining parallel generation. A multi-stage adaptation converts a pre-trained auto-regressive backbone (Janus-1.5B) from causal decoding to non-causal flow model and strengthens road-scene competence through continued multimodal pretraining. Thanks to the inherent nature of consistency model training and parallel decoding inference, WAM-Flow achieves superior closed-loop performance against autoregressive and diffusion-based VLA baselines, with 1-step inference attaining 89.1 PDMS and 5-step inference reaching 90.3 PDMS on NAVSIM v1 benchmark. These results establish discrete flow matching as a new promising paradigm for end-to-end autonomous driving. The code will be publicly available soon.
comment: 18 pages, 11 figures
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
STATE-NAV: Stability-Aware Traversability Estimation for Bipedal Navigation on Rough Terrain
Bipedal robots have advantages in maneuvering human-centered environments, but face greater failure risk compared to other stable mobile platforms such as wheeled or quadrupedal robots. While learning-based traversability has been widely studied for these platforms, bipedal traversability has instead relied on manually designed rules with limited consideration of locomotion stability on rough terrain. In this work, we present the first learning-based traversability estimation and risk-sensitive navigation framework for bipedal robots operating in diverse, uneven environments. TravFormer, a transformer-based neural network, is trained to predict bipedal instability with uncertainty, enabling risk-aware and adaptive planning. Based on the network, we define traversability as stability-aware command velocity-the fastest command velocity that keeps instability below a user-defined limit. This velocity-based traversability is integrated into a hierarchical planner that combines traversability-informed Rapid Random Tree Star (TravRRT*) for time-efficient planning and Model Predictive Control (MPC) for safe execution. We validate our method in MuJoCo simulation and the real world, demonstrating improved navigation performance, with enhanced robustness and time efficiency across varying terrains compared to existing methods.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight
A key open challenge in agile quadrotor flight is how to combine the flexibility and task-level generality of model-free reinforcement learning (RL) with the structure and online replanning capabilities of model predictive control (MPC), aiming to leverage their complementary strengths in dynamic and uncertain environments. This paper provides an answer by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an actor-critic RL framework. This integration allows for short-term predictive optimization of control actions through MPC, while leveraging RL for end-to-end learning and exploration over longer horizons. Through various ablation studies, conducted in the context of agile quadrotor racing, we expose the benefits of the proposed approach: it achieves better out-of-distribution behaviour, better robustness to changes in the quadrotor's dynamics and improved sample efficiency. Additionally, we conduct an empirical analysis using a quadrotor platform that reveals a relationship between the critic's learned value function and the cost function of the differentiable MPC, providing a deeper understanding of the interplay between the critic's value and the MPC cost functions. Finally, we validate our method in a drone racing task on different tracks, in both simulation and the real world. Our method achieves the same superhuman performance as state-of-the-art model-free RL, showcasing speeds of up to 21 m/s. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior.
comment: 20 pages, 13 figures, evolved paper
A neural signed configuration distance function for path planning of picking manipulators
Picking manipulators are task specific robots, with fewer degrees of freedom compared to general-purpose manipulators, and are heavily used in industry. The efficiency of the picking robots is highly dependent on the path planning solution, which is commonly done using sampling-based multi-query methods. The planner is robustly able to solve the problem, but its heavy use of collision-detection limits the planning capabilities for online use. We approach this problem by presenting a novel implicit obstacle representation for path planning, a neural signed configuration distance function (nSCDF), which allows us to form collision-free balls in the configuration space. We use the ball representation to re-formulate a state of the art multi-query path planner, i.e., instead of points, we use balls in the graph. Our planner returns a collision-free corridor, which allows us to use convex programming to produce optimized paths. From our numerical experiments, we observe that our planner produces paths that are close to those from an asymptotically optimal path planner, in significantly less time.
Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving
Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
Semantic Communication and Control Co-Design for Multi-Objective Distinct Dynamics
This letter introduces a machine-learning approach to learning the semantic dynamics of correlated systems with different control rules and dynamics. By leveraging the Koopman operator in an autoencoder (AE) framework, the system's state evolution is linearized in the latent space using a dynamic semantic Koopman (DSK) model, capturing the baseline semantic dynamics. Signal temporal logic (STL) is incorporated through a logical semantic Koopman (LSK) model to encode system-specific control rules. These models form the proposed logical Koopman AE framework that reduces communication costs while improving state prediction accuracy and control performance, showing a 91.65% reduction in communication samples and significant performance gains in simulation.
Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video
Data-driven learning of soft continuum robot (SCR) dynamics from high-dimensional observations offers flexibility but often lacks physical interpretability, while model-based approaches require prior knowledge and can be computationally expensive. We bridge this gap by introducing (1) the Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds. (2) By coupling these attention maps to 2D oscillator networks, we enable direct on-image visualization of learned dynamics (masses, stiffness, and forces) without prior knowledge. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy: 5.7x error reduction for Koopman operators and 3.5x for oscillator networks on the two-segment robot. The learned oscillator network autonomously discovers a chain structure of oscillators. Unlike standard methods, ABCD models enable smooth latent space extrapolation beyond training data. This fully data-driven approach yields compact, physically interpretable models suitable for control applications.
comment: Dataset available at: https://zenodo.org/records/17812071
Perspective-Invariant 3D Object Detection ICCV 2025
With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on vehicle-mounted platforms, leaving other autonomous platforms underexplored. To bridge this gap, we introduce Pi3DET, the first benchmark featuring LiDAR data and 3D bounding box annotations collected from multiple platforms: vehicle, quadruped, and drone, thereby facilitating research in 3D object detection for non-vehicle platforms as well as cross-platform 3D detection. Based on Pi3DET, we propose a novel cross-platform adaptation framework that transfers knowledge from the well-studied vehicle platform to other platforms. This framework achieves perspective-invariant 3D detection through robust alignment at both geometric and feature levels. Additionally, we establish a benchmark to evaluate the resilience and robustness of current 3D detectors in cross-platform scenarios, providing valuable insights for developing adaptive 3D perception systems. Extensive experiments validate the effectiveness of our approach on challenging cross-platform tasks, demonstrating substantial gains over existing adaptation methods. We hope this work paves the way for generalizable and unified 3D perception systems across diverse and complex environments. Our Pi3DET dataset, cross-platform benchmark suite, and annotation toolkit have been made publicly available.
comment: ICCV 2025; 54 pages, 18 figures, 22 tables; Project Page at https://pi3det.github.io
Point-PNG: Conditional Pseudo-Negatives Generation for Point Cloud Pre-Training
We propose Point-PNG, a novel self-supervised learning framework that generates conditional pseudo-negatives in the latent space to learn point cloud representations that are both discriminative and transformation-sensitive. Conventional self-supervised learning methods focus on achieving invariance, discarding transformation-specific information. Recent approaches incorporate transformation sensitivity by explicitly modeling relationships between original and transformed inputs. However, they often suffer from an invariant-collapse phenomenon, where the predictor degenerates into identity mappings, resulting in latent representations with limited variation across transformations. To address this, we propose Point-PNG that explicitly penalizes invariant collapse through pseudo-negatives generation, enabling the network to capture richer transformation cues while preserving discriminative representations. To this end, we introduce a parametric network, COnditional Pseudo-Negatives Embedding (COPE), which learns localized displacements induced by transformations within the latent space. A key challenge arises when jointly training COPE with the MAE, as it tends to converge to trivial identity mappings. To overcome this, we design a loss function based on pseudo-negatives conditioned on the transformation, which penalizes such trivial invariant solutions and enforces meaningful representation learning. We validate Point-PNG on shape classification and relative pose estimation tasks, showing competitive performance on ModelNet40 and ScanObjectNN under challenging evaluation protocols, and achieving superior accuracy in relative pose estimation compared to supervised baselines.
comment: Accepted for publication in IEEE ACCESS
ReSem3D: Refinable 3D Spatial Constraints via Fine-Grained Semantic Grounding for Generalizable Robotic Manipulation
Semantics-driven 3D spatial constraints align highlevel semantic representations with low-level action spaces, facilitating the unification of task understanding and execution in robotic manipulation. The synergistic reasoning of Multimodal Large Language Models (MLLMs) and Vision Foundation Models (VFMs) enables cross-modal 3D spatial constraint construction. Nevertheless, existing methods have three key limitations: (1) coarse semantic granularity in constraint modeling, (2) lack of real-time closed-loop planning, (3) compromised robustness in semantically diverse environments. To address these challenges, we propose ReSem3D, a unified manipulation framework for semantically diverse environments, leveraging the synergy between VFMs and MLLMs to achieve fine-grained visual grounding and dynamically constructs hierarchical 3D spatial constraints for real-time manipulation. Specifically, the framework is driven by hierarchical recursive reasoning in MLLMs, which interact with VFMs to automatically construct 3D spatial constraints from natural language instructions and RGB-D observations in two stages: part-level extraction and region-level refinement. Subsequently, these constraints are encoded as real-time optimization objectives in joint space, enabling reactive behavior to dynamic disturbances. Extensive simulation and real-world experiments are conducted in semantically rich household and sparse chemical lab environments. The results demonstrate that ReSem3D performs diverse manipulation tasks under zero-shot conditions, exhibiting strong adaptability and generalization. Code and videos are available at https://github.com/scy-v/ReSem3D and https://resem3d.github.io.
comment: 12 pages,9 figures
Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios
This paper proposes a momentum-constrained hybrid heuristic trajectory optimization framework (MHHTOF) tailored for assistive navigation in visually impaired scenarios, integrating trajectory sampling generation, optimization and evaluation with residual-enhanced deep reinforcement learning (DRL). In the first stage, heuristic trajectory sampling cluster (HTSC) is generated in the Frenet coordinate system using third-order interpolation with fifth-order polynomials and momentum-constrained trajectory optimization (MTO) constraints to ensure smoothness and feasibility. After first stage cost evaluation, the second stage leverages a residual-enhanced actor-critic network with LSTM-based temporal feature modeling to adaptively refine trajectory selection in the Cartesian coordinate system. A dual-stage cost modeling mechanism (DCMM) with weight transfer aligns semantic priorities across stages, supporting human-centered optimization. Experimental results demonstrate that the proposed LSTM-ResB-PPO achieves significantly faster convergence, attaining stable policy performance in approximately half the training iterations required by the PPO baseline, while simultaneously enhancing both reward outcomes and training stability. Compared to baseline method, the selected model reduces average cost and cost variance by 30.3% and 53.3%, and lowers ego and obstacle risks by over 77%. These findings validate the framework's effectiveness in enhancing robustness, safety, and real-time feasibility in complex assistive planning tasks.
comment: Upon further internal evaluation, we found that the current version does not adequately represent the clarity and completeness that we intend for this work. To avoid possible misunderstanding caused by this preliminary form, we request withdrawal. A refined version will be prepared privately before any further dissemination
GEX: Democratizing Dexterity with Fully-Actuated Dexterous Hand and Exoskeleton Glove
This paper introduces GEX, an innovative low-cost dexterous manipulation system that combines the GX11 tri-finger anthropomorphic hand (11 DoF) with the EX12 tri-finger exoskeleton glove (12 DoF), forming a closed-loop teleoperation framework through kinematic retargeting for high-fidelity control. Both components employ modular 3D-printed finger designs, achieving ultra-low manufacturing costs while maintaining full actuation capabilities. Departing from conventional tendon-driven or underactuated approaches, our electromechanical system integrates independent joint motors across all 23 DoF, ensuring complete state observability and accurate kinematic modeling. This full-actuation architecture enables precise bidirectional kinematic calculations, substantially enhancing kinematic retargeting fidelity between the exoskeleton and robotic hand. The proposed system bridges the cost-performance gap in dexterous manipulation research, providing an accessible platform for acquiring high-quality demonstration data to advance embodied AI and dexterous robotic skill transfer learning.
Real-Time Execution of Action Chunking Flow Policies NeurIPS 2025
Modern AI systems, especially those interacting with the physical world, increasingly require real-time performance. However, the high latency of state-of-the-art generalist models, including recent vision-language action models (VLAs), poses a significant challenge. While action chunking has enabled temporal consistency in high-frequency control tasks, it does not fully address the latency problem, leading to pauses or out-of-distribution jerky movements at chunk boundaries. This paper presents a novel inference-time algorithm that enables smooth asynchronous execution of action chunking policies. Our method, real-time chunking (RTC), is applicable to any diffusion- or flow-based VLA out of the box with no re-training. It generates the next action chunk while executing the current one, "freezing" actions guaranteed to execute and "inpainting" the rest. To test RTC, we introduce a new benchmark of 12 highly dynamic tasks in the Kinetix simulator, as well as evaluate 6 challenging real-world bimanual manipulation tasks. Results demonstrate that RTC is fast, performant, and uniquely robust to inference delay, significantly improving task throughput and enabling high success rates in precise tasks $\unicode{x2013}$ such as lighting a match $\unicode{x2013}$ even in the presence of significant latency. See https://pi.website/research/real_time_chunking for videos.
comment: published in NeurIPS 2025
Evo-1: Lightweight Vision-Language-Action Model with Preserved Semantic Alignment
Vision-Language-Action (VLA) models have emerged as a powerful framework that unifies perception, language, and control, enabling robots to perform diverse tasks through multimodal understanding. However, current VLA models typically contain massive parameters and rely heavily on large-scale robot data pretraining, leading to high computational costs during training, as well as limited deployability for real-time inference. Moreover, most training paradigms often degrade the perceptual representations of the vision-language backbone, resulting in overfitting and poor generalization to downstream tasks. In this work, we present Evo-1, a lightweight VLA model that reduces computation and improves deployment efficiency, while maintaining strong performance without pretraining on robot data. Evo-1 builds on a native multimodal Vision-Language model (VLM), incorporating a novel cross-modulated diffusion transformer along with an optimized integration module, together forming an effective architecture. We further introduce a two-stage training paradigm that progressively aligns action with perception, preserving the representations of the VLM. Notably, with only 0.77 billion parameters, Evo-1 achieves state-of-the-art results on the Meta-World and RoboTwin suite, surpassing the previous best models by 12.4% and 6.9%, respectively, and also attains a competitive result of 94.8% on LIBERO. In real-world evaluations, Evo-1 attains a 78% success rate with high inference frequency and low memory overhead, outperforming all baseline methods. We release code, data, and model weights to facilitate future research on lightweight and efficient VLA models.
comment: Github: https://github.com/MINT-SJTU/Evo-1
IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks
Flawed planning from VLM-driven embodied agents poses significant safety hazards, hindering their deployment in real-world household tasks. However, existing static, non-interactive evaluation paradigms fail to adequately assess risks within these interactive environments, since they cannot simulate dynamic risks that emerge from an agent's actions and rely on unreliable post-hoc evaluations that ignore unsafe intermediate steps. To bridge this critical gap, we propose evaluating an agent's interactive safety: its ability to perceive emergent risks and execute mitigation steps in the correct procedural order. We thus present IS-Bench, the first multi-modal benchmark designed for interactive safety, featuring 161 challenging scenarios with 388 unique safety risks instantiated in a high-fidelity simulator. Crucially, it facilitates a novel process-oriented evaluation that verifies whether risk mitigation actions are performed before/after specific risk-prone steps. Extensive experiments on leading VLMs, including the GPT-4o and Gemini-2.5 series, reveal that current agents lack interactive safety awareness, and that while safety-aware Chain-of-Thought can improve performance, it often compromises task completion. By highlighting these critical limitations, IS-Bench provides a foundation for developing safer and more reliable embodied AI systems. Code and data are released under https://github.com/AI45Lab/IS-Bench.
H-GAR: A Hierarchical Interaction Framework via Goal-Driven Observation-Action Refinement for Robotic Manipulation AAAI 2026
Unified video and action prediction models hold great potential for robotic manipulation, as future observations offer contextual cues for planning, while actions reveal how interactions shape the environment. However, most existing approaches treat observation and action generation in a monolithic and goal-agnostic manner, often leading to semantically misaligned predictions and incoherent behaviors. To this end, we propose H-GAR, a Hierarchical interaction framework via Goal-driven observation-Action Refinement.To anchor prediction to the task objective, H-GAR first produces a goal observation and a coarse action sketch that outline a high-level route toward the goal. To enable explicit interaction between observation and action under the guidance of the goal observation for more coherent decision-making, we devise two synergistic modules. (1) Goal-Conditioned Observation Synthesizer (GOS) synthesizes intermediate observations based on the coarse-grained actions and the predicted goal observation. (2) Interaction-Aware Action Refiner (IAAR) refines coarse actions into fine-grained, goal-consistent actions by leveraging feedback from the intermediate observations and a Historical Action Memory Bank that encodes prior actions to ensure temporal consistency. By integrating goal grounding with explicit action-observation interaction in a coarse-to-fine manner, H-GAR enables more accurate manipulation. Extensive experiments on both simulation and real-world robotic manipulation tasks demonstrate that H-GAR achieves state-of-the-art performance.
comment: Accepted to AAAI 2026 (Oral), Project Page: https://github.com/JiuTian-VL/H-GAR
LLM-Driven Corrective Robot Operation Code Generation with Static Text-Based Simulation
Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM-driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM-driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating semantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text-based simulation but also the reliable code generation of our LLM-driven corrective framework, which achieves a comparable performance with state-of-the-art research while does not rely on dynamic code execution using physical experiments or simulators.
comment: 8 pages, 2 figures
Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views IROS'25
Forecasting how human hands move in egocentric views is critical for applications like augmented reality and human-robot policy transfer. Recently, several hand trajectory prediction (HTP) methods have been developed to generate future possible hand waypoints, which still suffer from insufficient prediction targets, inherent modality gaps, entangled hand-head motion, and limited validation in downstream tasks. To address these limitations, we present a universal hand motion forecasting framework considering multi-modal input, multi-dimensional and multi-target prediction patterns, and multi-task affordances for downstream applications. We harmonize multiple modalities by vision-language fusion, global context incorporation, and task-aware text embedding injection, to forecast hand waypoints in both 2D and 3D spaces. A novel dual-branch diffusion is proposed to concurrently predict human head and hand movements, capturing their motion synergy in egocentric vision. By introducing target indicators, the prediction model can forecast the specific joint waypoints of the wrist or the fingers, besides the widely studied hand center points. In addition, we enable Uni-Hand to additionally predict hand-object interaction states (contact/separation) to facilitate downstream tasks better. As the first work to incorporate downstream task evaluation in the literature, we build novel benchmarks to assess the real-world applicability of hand motion forecasting algorithms. The experimental results on multiple publicly available datasets and our newly proposed benchmarks demonstrate that Uni-Hand achieves the state-of-the-art performance in multi-dimensional and multi-target hand motion forecasting. Extensive validation in multiple downstream tasks also presents its impressive human-robot policy transfer to enable robotic manipulation, and effective feature enhancement for action anticipation/recognition.
comment: Extended journal version of MMTwin (IROS'25). Code and data: https://github.com/IRMVLab/UniHand
Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments
In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an error-based selection module utilizing photometric and structural similarity (SSIM) errors, and a momentum-based update module that dynamically adjusts keyframe selection thresholds according to scene motion dynamics. By dynamically curating the most informative frames, our approach addresses a key data bottleneck in real-time perception. This allows for the creation of high-quality 3D world representations from a compressed data stream, a critical step towards scalable robot learning and deployment in complex, dynamic environments. Experimental results demonstrate significant improvements over traditional static keyframe selection strategies, such as fixed temporal intervals or uniform frame skipping. These findings highlight a meaningful advancement toward adaptive perception systems that can dynamically respond to complex and evolving visual scenes. We evaluate our proposed adaptive keyframe selection module on two recent state-of-the-art 3D reconstruction networks, Spann3r and CUT3R, and observe consistent improvements in reconstruction quality across both frameworks. Furthermore, an extensive ablation study confirms the effectiveness of each individual component in our method, underlining their contribution to the overall performance gains.
comment: Accepted at ROBOVIS 2026
DexFruit: Dexterous Manipulation and Gaussian Splatting Inspection of Fruit
DexFruit is a robotic manipulation framework that enables gentle, autonomous handling of fragile fruit and precise evaluation of damage. Many fruits are fragile and prone to bruising, thus requiring humans to manually harvest them with care. In this work, we demonstrate by using optical tactile sensing, autonomous manipulation of fruit with minimal damage can be achieved. We show that our tactile informed diffusion policies outperform baselines in both reduced bruising and pick-and-place success rate across three fruits: strawberries, tomatoes, and blackberries. In addition, we introduce FruitSplat, a novel technique to represent and quantify visual damage in high-resolution 3D representation via 3D Gaussian Splatting (3DGS). Existing metrics for measuring damage lack quantitative rigor or require expensive equipment. With FruitSplat, we distill a 2D strawberry mask as well as a 2D bruise segmentation mask into the 3DGS representation. Furthermore, this representation is modular and general, compatible with any relevant 2D model. Overall, we demonstrate a 92% grasping policy success rate, up to a 20% reduction in visual bruising, and up to an 31% improvement in grasp success rate on challenging fruit compared to our baselines across our three tested fruits. We rigorously evaluate this result with over 630 trials. Please checkout our website at https://dex-fruit.github.io .
comment: 8 pages, 5 figures
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimum iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs, and often outperforms distilled shortcut models. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance. Project page: https://simchowitzlabpublic.github.io/much-ado-about-noising-project/
Multiagent Systems
Trusted AI Agents in the Cloud
AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However, these agents run within a complex multi-party ecosystem, where untrusted components can lead to data leakage, tampering, or unintended behavior. Existing Confidential Virtual Machines (CVMs) provide only per binary protection and offer no guarantees for cross-principal trust, accelerator-level isolation, or supervised agent behavior. We present Omega, a system that enables trusted AI agents by enforcing end-to-end isolation, establishing verifiable trust across all contributing principals, and supervising every external interaction with accountable provenance. Omega builds on Confidential VMs and Confidential GPUs to create a Trusted Agent Platform that hosts many agents within a single CVM using nested isolation. It also provides efficient multi-agent orchestration with cross-principal trust establishment via differential attestation, and a policy specification and enforcement framework that governs data access, tool usage, and inter-agent communication for data protection and regulatory compliance. Implemented on AMD SEV-SNP and NVIDIA H100, Omega fully secures agent state across CVM-GPU, and achieves high performance while enabling high-density, policy-compliant multi-agent deployments at cloud scale.
Invariant Price of Anarchy: a Metric for Welfarist Traffic Control
The Price of Anarchy (PoA) is a standard metric for quantifying inefficiency in socio-technical systems, widely used to guide policies like traffic tolling. Conventional PoA analysis relies on exact numerical costs. However, in many settings, costs represent agents' preferences and may be defined only up to possibly arbitrary scaling and shifting, representing informational and modeling ambiguities. We observe that while such transformations preserve equilibrium and optimal outcomes, they change the PoA value. To resolve this issue, we rely on results from Social Choice Theory and define the Invariant PoA. By connecting admissible transformations to degrees of comparability of agents' costs, we derive the specific social welfare functions which ensure that efficiency evaluations do not depend on arbitrary rescalings or translations of individual costs. Case studies on a toy example and the Zurich network demonstrate that identical tolling strategies can lead to substantially different efficiency estimates depending on the assumed comparability. Our framework thus demonstrates that explicit axiomatic foundations are necessary in order to define efficiency metrics and to appropriately guide policy in large-scale infrastructure design robustly and effectively.
$α$-Potential Games for Decentralized Control of Connected and Automated Vehicles
Designing scalable and safe control strategies for large populations of connected and automated vehicles (CAVs) requires accounting for strategic interactions among heterogeneous agents under decentralized information. While dynamic games provide a natural modeling framework, computing Nash equilibria (NEs) in large-scale settings remains challenging, and existing mean-field game approximations rely on restrictive assumptions that fail to capture collision avoidance and heterogeneous behaviors. This paper proposes an $α$-potential game framework for decentralized CAV control. We show that computing $α$-NE reduces to solving a decentralized control problem, and derive tight bounds of the parameter $α$ based on interaction intensity and asymmetry. We further develop scalable policy gradient algorithms for computing $α$-NEs using decentralized neural-network policies. Numerical experiments demonstrate that the proposed framework accommodates diverse traffic flow models and effectively captures collision avoidance, obstacle avoidance, and agent heterogeneity.
Distributed scalable coupled policy algorithm for networked multi-agent reinforcement learning
This paper studies networked multi-agent reinforcement learning (NMARL) with interdependent rewards and coupled policies. In this setting, each agent's reward depends on its own state-action pair as well as those of its direct neighbors, and each agent's policy is parameterized by its local parameters together with those of its $κ_{p}$-hop neighbors, with $κ_{p}\geq 1$ denoting the coupled radius. The objective of the agents is to collaboratively optimize their policies to maximize the discounted average cumulative reward. To address the challenge of interdependent policies in collaborative optimization, we introduce a novel concept termed the neighbors' averaged $Q$-function and derive a new expression for the coupled policy gradient. Based on these theoretical foundations, we develop a distributed scalable coupled policy (DSCP) algorithm, where each agent relies only on the state-action pairs of its $κ_{p}$-hop neighbors and the rewards its their $(κ_{p}+1)$-hop neighbors. Specially, in the DSCP algorithm, we employ a geometric 2-horizon sampling method that does not require storing a full $Q$-table to obtain an unbiased estimate of the coupled policy gradient. Moreover, each agent interacts exclusively with its direct neighbors to obtain accurate policy parameters, while maintaining local estimates of other agents' parameters to execute its local policy and collect samples for optimization. These estimates and policy parameters are updated via a push-sum protocol, enabling distributed coordination of policy updates across the network. We prove that the joint policy produced by the proposed algorithm converges to a first-order stationary point of the objective function. Finally, the effectiveness of DSCP algorithm is demonstrated through simulations in a robot path planning environment, showing clear improvement over state-of-the-art methods.
MARINE: Theoretical Optimization and Design for Multi-Agent Recursive IN-context Enhancement
Large Language Model (LLM)-based agents demonstrate advanced reasoning capabilities, yet practical constraints frequently limit outputs to single responses, leaving significant performance potential unrealized. This paper introduces MARINE (Multi-Agent Recursive IN-context Enhancement), a theoretically grounded framework that reconceptualizes test-time reasoning as iterative refinement of a persistent reference trajectory, fundamentally departing from conventional one-shot or multi-sample paradigms. The MARINE refinement operator systematically converts a base model's pass@N capabilities into near-optimal pass@1 performance. Rigorous theoretical analysis establishes that minimal feasible batches maximize expected performance gains under fixed invocation budgets, while logarithmically growing batch schedules ensure continuous improvement without computational constraints. Comprehensive evaluation on the BrowserComp-ZH benchmark demonstrates state-of-the-art results, with a 685B-parameter implementation achieving 46.0% pass@1 accuracy. Meanwhile, MARINE establishes a new paradigm for parameter-efficient reasoning: an 80B-parameter model augmented with MARINE matches the performance of standalone 1000B-parameter agents, reducing parameter requirements by over an order of magnitude. Notably, within a fixed computational budget, the proposed MARINE delivers higher-quality samples to alignment and optimization processes than traditional sampling-and-ranking strategies. Consequently, it has great potential to boost post-training efficiency.
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Distributionally Robust Markov Games with Average Reward
We study distributionally robust Markov games (DR-MGs) with the average-reward criterion, a framework for multi-agent decision-making under uncertainty over extended horizons. In average reward DR-MGs, agents aim to maximize their worst-case infinite-horizon average reward, to ensure satisfactory performance under environment uncertainties and opponent actions. We first establish a connection between the best-response policies and the optimal policies for the induced single-agent problems. Under a standard irreducible assumption, we derive a correspondence between the optimal policies and the solutions of the robust Bellman equation, and derive the existence of stationary Nash Equilibrium (NE) based on these results. We further study DR-MGs under the weakly communicating setting, where we construct a set-valued map and show its value is a subset of the best-response policies, convex and upper hemi-continuous, and derive the existence of NE. We then explore algorithmic solutions, by first proposing a Robust Nash-Iteration algorithm and providing convergence guarantees under some additional assumptions and a NE computing oracle. We further develop a temporal-difference based algorithm for DR-MGs, and provide convergence guarantees without any additional oracle or assumptions. Finally, we connect average-reward robust NE to discounted ones, showing that the average reward robust NE can be approximated by the discounted ones under a large discount factor. Our studies provide a comprehensive theoretical and algorithmic foundation for decision-making in complex, uncertain, and long-running multi-player environments.
comment: Preprint, work in progress
Systems and Control (EESS)
LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction
Deep residual networks (ResNets) have demonstrated outstanding success in computer vision tasks, attributed to their ability to maintain gradient flow through deep architectures. Simultaneously, controlling the Lipschitz constant in neural networks has emerged as an essential area of research to enhance adversarial robustness and network certifiability. This paper presents a rigorous approach to the general design of $\mathcal{L}$-Lipschitz deep residual networks using a Linear Matrix Inequality (LMI) framework. Initially, the ResNet architecture was reformulated as a cyclic tridiagonal LMI, and closed-form constraints on network parameters were derived to ensure $\mathcal{L}$-Lipschitz continuity; however, using a new $LDL^\top$ decomposition approach for certifying LMI feasibility, we extend the construction of $\mathcal{L}$-Lipchitz networks to any other nonlinear architecture. Our contributions include a provable parameterization methodology for constructing Lipschitz-constrained residual networks and other hierarchical architectures. Cholesky decomposition is also used for efficient parameterization. These findings enable robust network designs applicable to adversarial robustness, certified training, and control systems. The $LDL^\top$ formulation is shown to be a tight relaxation of the SDP-based network, maintaining full expressiveness and achieving 3\%-13\% accuracy gains over SLL Layers on 121 UCI data sets.
comment: 39 pages, 3 figures, 12 tables
Numerically Reliable Brunovsky Transformations
The Brunovsky canonical form provides sparse structural representations that are beneficial for computational optimal control, yet existing methods fail to compute it reliably. We propose a technique that produces Brunovsky transformations with substantially lower construction errors and improved conditioning. A controllable linear system is first reduced to staircase form via an orthogonal similarity transformation. We then derive a simple linear parametrization of the transformations yielding the unique Brunovsky form. Numerical stability is further enhanced by applying a deadbeat gain before computing system matrix powers and by optimizing the linear parameters to minimize condition numbers.
comment: Submitted to IFAC World Congress 2026 as a regular paper
Log-linear Dynamic Inversion for Thrusting Spacecraft on SE2(3)
We show that the dynamics of a thrusting spacecraft can be embedded in the Lie group SE2(3) in a form that is group-affine with application of a feed-forward control law. This structure implies that the configuration-tracking error evolves exactly linearly in the associated Lie algebra coordinates (log-linear dynamics), rather than arising from a local linearization of the nonlinear system. As a result, a broad class of linear analysis and synthesis tools becomes directly applicable to powered spacecraft motion on SE2(3). A simple numerical example confirms that the error predicted by the linear Lie-algebra dynamics matches the error computed from the full nonlinear system, illustrating the exact log-linear behavior. This foundational property opens a path toward rigorous tools for satellite docking, autonomous rendezvous and proximity operations, robust controller design, and convex safety certification-capabilities that are difficult to achieve with classical local linearizations such as Tschauner-Hempel/Yamanaka-Ankersen (TH/YA).
InstructMPC: A Human-LLM-in-the-Loop Framework for Context-Aware Power Grid Control
The transition toward power grids with high renewable penetration demands context-aware decision making frameworks. Traditional operational paradigms, which rely on static optimization of history-based load forecasting, often fail to capture the complex nature of real-time operational conditions, such as operator-issued maintenance mandates, emergency topology changes, or event-driven load surges. To address this challenge, we introduce InstructMPC, a closed-loop framework that integrates Large Language Models~(LLMs) to generate context-aware predictions, enabling the controller to optimize power system operation. Our method employs a Contextual Disturbances Predictor~(CDP) module to translate contextual information into predictive disturbance trajectories, which are then incorporated into the Model Predictive Control~(MPC) optimization. Unlike conventional open-loop forecasting frameworks, InstructMPC features an online tuning mechanism where the predictor's parameters are continuously updated based on the realized control cost with a theoretical guarantee, achieving a regret bound of $O(\sqrt{T \log T})$ for linear dynamics when optimized via a tailored loss function, ensuring task-aware learning and adaption to non-stationary grid conditions.
A Hybrid Dynamic Model for Predicting Human Cognition and Reliance during Automated Driving
We propose a simple (12 parameter) hybrid dynamic model that simultaneously captures the continuous-valued dynamics of three human cognitive states-trust, perceived risk, and mental workload-as well as discrete transitions in reliance on the automation. The discrete-time dynamic evolution of each cognitive state is modeled using a first-order affine difference equation. Reliance is defined as a single discrete-valued state, whose evolution at each time step depends on the cognitive states satisfying certain threshold conditions. Using data collected from 16 participants, we estimate participant-specific model parameters based on their reliance on the automation and intermittently self-reported cognitive states during a continuous drive in a vehicle simulator. The model can be estimated using a single user's trajectory data (e.g. 8 minutes of driving), making it suitable for online parameter adaptation methods. Our results show that the model fits the observed trajectories well for several participants, with their reliance behavior primarily influenced by trust, perceived risk, or both. Importantly, the model is interpretable, such that the variations in model parameters across participants provide insights into differences in the time scales over which cognitive states evolve, and how these states are influenced by task complexity. Implications on the design of human-centric vehicle automation design are discussed.
Advanced Hybrid Automated Insulin Delivery System based on Successive Linearization Model Predictive Control: The UniBE System
Background and objective: Hybrid automated insulin delivery (hAID) systems represent the most advanced therapy for type 1 diabetes (T1D). Current systems rely on linear or linearized models of glucose homeostasis, which may compromise prediction accuracy and, in turn, timely decision-making by the controller. Physiological variability further complicates insulin requirements, underscoring the need for controllers that adapt dynamically and reduce user burden. Methods: We introduce the University of Bern (UniBE) hAID system, a framework based on successive linearization model predictive control (MPC). The controller integrates basal insulin infusion with the insulin bolus delivery module for meal-related and corrective bolus dosing, adapting bounds in real time to glucose dynamics while accounting for both automated and user-initiated inputs. In-silico evaluation was conducted using the commercial version of the FDA-accepted UVa/Padova metabolic simulator across nine scenarios involving persistent and time-varying errors in meal timing, carbohydrate estimation, and basal insulin profiles. Results: In the baseline scenario, UniBE achieved a mean time in range of 92.0+-13.2%, with time below range at 0.1+-0.2% and time above range at 7.9+-13.2%. Across perturbation scenarios, time in range remained between 75.1 and 92.8%, with low hypoglycemia incidence, demonstrating resilience to clinically relevant disturbances.
comment: 12 pages, 3 Figures, and 5 Tables
3D-ICE 4.0: Accurate and efficient thermal modeling for 2.5D/3D heterogeneous chiplet systems
The increasing power densities and intricate heat dissipation paths in advanced 2.5D/3D chiplet systems necessitate thermal modeling frameworks that deliver detailed thermal maps with high computational efficiency. Traditional compact thermal models (CTMs) often struggle to scale with the complexity and heterogeneity of modern architectures. This work introduces 3D-ICE 4.0, designed for heterogeneous chip-based systems. Key innovations include: (i) preservation of material heterogeneity and anisotropy directly from industrial layouts, integrated with OpenMP and SuperLU MT-based parallel solvers for scalable performance, (ii) adaptive vertical layer partitioning to accurately model vertical heat conduction, and (iii) temperature-aware non-uniform grid generation. The results with different benchmarks demonstrate that 3D-ICE 4.0 achieves speedups ranging from 3.61x-6.46x over state-of-the-art tools, while reducing grid complexity by more than 23.3% without compromising accuracy. Compared to the commercial software COMSOL, 3D-ICE 4.0 effectively captures both lateral and vertical heat flows, validating its precision and robustness. These advances demonstrate that 3D-ICE 4.0 is an efficient solution for thermal modeling in emerging heterogeneous 2.5D/3D integrated systems.
Safe Output Regulation of Coupled Hyperbolic PDE-ODE Systems
This paper presents a safe output regulation control strategy for a class of systems modeled by a coupled $2\times 2$ hyperbolic PDE-ODE structure, subject to fully distributed disturbances throughout the system. A state-feedback controller is developed by the {nonovershooting backstepping} method to simultaneously achieve exponential output regulation and enforce safety constraints on the system output, which is the state furthest from the control input. To handle unmeasured PDE states and external disturbances, a state observer and a disturbance estimator are designed. Explicit bounds on the estimation errors are derived and used to construct a robust safe regulator that accounts for the uncertainties. The proposed control scheme guarantees that: 1) If the system output is initially within the safe region, it remains there; otherwise, it will be rescued to the safety within a prescribed time; 2) The output tracking error converges to zero exponentially; 3) The observer accurately estimates both the distributed states and external disturbances, with estimation errors converging to zero exponentially; 4) All signals in the closed-loop system remain bounded. The effectiveness of the proposed method is demonstrated through a UAV delivery scenario with a cable-suspended payload, where the payload is regulated to track a desired reference while avoiding collisions with barriers.
Global stability of vehicle-with-driver dynamics via Sum-of-Squares programming
This work estimates safe invariant subsets of the Region of Attraction (ROA) for a seven-state vehicle-with-driver system, capturing both asymptotic stability and the influence of state-safety bounds along the system trajectory. Safe sets are computed by optimizing Lyapunov functions through an original iterative Sum-of-Squares (SOS) procedure. The method is first demonstrated on a two-state benchmark, where it accurately recovers a prescribed safe region as the 1-level set of a polynomial Lyapunov function. We then describe the distinguishing characteristics of the studied vehicle-with-driver system: the control dynamics mimic human driver behavior through a delayed preview-tracking model that, with suitable parameter choices, can also emulate digital controllers. To enable SOS optimization, a polynomial approximation of the nonlinear vehicle model is derived, together with its operating-envelope constraints. The framework is then applied to understeering and oversteering scenarios, and the estimated safe sets are compared with reference boundaries obtained from exhaustive simulations. The results show that SOS techniques can efficiently deliver Lyapunov-defined safe regions, supporting their potential use for real-time safety assessment, for example as a supervisory layer for active vehicle control.
comment: 20 pages, 7 figures, 2 tables
Task-Specific Trust Evaluation for Multi-Hop Collaborator Selection via GNN-Aided Distributed Agentic AI
The success of collaborative task completion among networked devices hinges on the effective selection of trustworthy collaborators. However, accurate task-specific trust evaluation of multi-hop collaborators can be extremely complex. The reason is that their trust evaluation is determined by a combination of diverse trust-related perspectives with different characteristics, including historical collaboration reliability, volatile and sensitive conditions of available resources for collaboration, as well as continuously evolving network topologies. To address this challenge, this paper presents a graph neural network (GNN)-aided distributed agentic AI (GADAI) framework, in which different aspects of devices' task-specific trustworthiness are separately evaluated and jointly integrated to facilitate multi-hop collaborator selection. GADAI first utilizes a GNN-assisted model to infer device trust from historical collaboration data. Specifically, it employs GNN to propagate and aggregate trust information among multi-hop neighbours, resulting in more accurate device reliability evaluation. Considering the dynamic and privacy-sensitive nature of device resources, a privacy-preserving resource evaluation mechanism is implemented using agentic AI. Each device hosts a large AI model-driven agent capable of autonomously determining whether its local resources meet the requirements of a given task, ensuring both task-specific and privacy-preserving trust evaluation. By combining the outcomes of these assessments, only the trusted devices can coordinate a task-oriented multi-hop cooperation path through their agents in a distributed manner. Experimental results show that our proposed GADAI outperforms the comparison algorithms in planning multi-hop paths that maximize the value of task completion.
Pauli Decomposition of Impedance Matrices for Understanding the Root Cause of Instabilities in Grid-Connected Power Electronic Converters
The impedance criterion has emerged as an alternative way to stability assessment of grid-connected power electronic converters. However, the lack of physical meaning of impedance and admittance matrices hinders the ability to understand the root cause of instabilities. To address this issue, this paper proposes the application of Pauli decomposition to the impedance matrices and the minor loop of grid-connected power electronic converters. The application of this methodology simplifies establishing the link between impedance matrix terms and closed-loop stability properties. Moreover, Pauli decomposition transforms impedance matrices in a quaternion-like form that is helpful to assess the root cause of instabilities. The theoretical contributions are validated using a case study consisting of a power electronic converter connected to a weak grid that has been previously analysed in the literature using existing techniques.
comment: The source code used to obtain the results can be found here: https://doi.org/10.5281/zenodo.17829032
Modified global finite-time quasi-continuous second-order robust feedback control
A non-overshooting quasi-continuous sliding mode control with sub-optimal damping was recently introduced in Ruderman and Efimov (2025) for perturbed second-order systems. The present work proposes an essential modification of the nonlinear control law which (i) allows for a parameterizable control amplitude limitation in a large subset of the initial values, (ii) admits an entire state-space R2 (that was not given in Ruderman and Efimov (2025)) for the finite-time control, and finally (iii) enables for the found analytic solution of the state trajectories in the unperturbed case. The latter allows also for an exact estimation of the finite convergence time, and open an avenue for other potentially interesting analysis of the control properties in the future. For a perturbed case, the solution-based and Lyapunov function-based approaches are developed to show the uniform global asymptotic stability. The proposed robustness and convergence analysis are accompanied by several illustrative numerical examples.
comment: 6 pages, 5 figures
Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning
This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-generated demonstrations with probabilistic generative modeling to encode high-level symbolic planning, low-level motion policies, and wireless signal feedback. During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations. Simulation results demonstrate that the proposed method achieves near-expert performance, significantly reducing communication interference and mission cost compared to model-free reinforcement learning baselines, while maintaining robust generalization in dynamic environments.
comment: This paper has been accepted for the 2026 IEEE Consumer Communications & Networking Conference (IEEE CCNC 2026)
IMMPC: An Internal Model Based MPC for Rejecting Unknown Disturbances
Model predictive control (MPC) is a powerful control method that allows to directly include state and input constraints into the controller design. However, errors in the model, e.g., caused by unknown disturbances, can lead to constraint violation, loss of feasibility and deteriorate closed-loop performance. In this paper, we propose a new MPC scheme based on the internal model principle. This enables the MPC to reject unknown disturbances provided that the dynamics of the linear signal generator are known. We reformulate the output regulation problem as a stability problem, to ensure feasibility, constraint satisfaction, and convergence to the optimal reachable setpoint. The controller is validated on a real fourtank system.
LA-RL: Language Action-guided Reinforcement Learning with Safety Guarantees for Autonomous Highway Driving
Autonomous highway driving demands a critical balance between proactive, efficiency-seeking behavior and robust safety guarantees. This paper proposes Language Action-guided Reinforcement Learning (LA-RL) with Safety Guarantees, a novel framework that integrates the semantic reasoning of large language models (LLMs) into the actor-critic architecture with an improved safety layer. Within this framework, task-specific reward shaping harmonizes the dual objectives of maximizing driving efficiency and ensuring safety, guiding decision-making based on both environmental insights and clearly defined goals. To enhance safety, LA-RL incorporates a safety-critical planner that combines model predictive control (MPC) with discrete control barrier functions (DCBFs). This layer formally constrains the LLM-informed policy to a safe action set, employs a slack mechanism that enhances solution feasibility, prevents overly conservative behavior and allows for greater policy exploration without compromising safety. Extensive experiments demonstrate that it significantly outperforms several current state-of-the-art methods, offering a more adaptive, reliable, and robust solution for autonomous highway driving. Compared to existing SOTA, it achieves approximately 20$\%$ higher success rate than the knowledge graph (KG) based baseline and about 30$\%$ higher than the retrieval augmented generation (RAG) based baseline. In low-density environments, LA-RL achieves a 100$\%$ success rate. These results confirm its enhanced exploration of the state-action space and its ability to autonomously adopt more efficient, proactive strategies in complex, mixed-traffic highway environments.
Scenario-aware Uncertainty Quantification for Trajectory Prediction with Statistical Guarantees
Reliable uncertainty quantification in trajectory prediction is crucial for safety-critical autonomous driving systems, yet existing deep learning predictors lack uncertainty-aware frameworks adaptable to heterogeneous real-world scenarios. To bridge this gap, we propose a novel scenario-aware uncertainty quantification framework to provide the predicted trajectories with prediction intervals and reliability assessment. To begin with, predicted trajectories from the trained predictor and their ground truth are projected onto the map-derived reference routes within the Frenet coordinate system. We then employ CopulaCPTS as the conformal calibration method to generate temporal prediction intervals for distinct scenarios as the uncertainty measure. Building upon this, within the proposed trajectory reliability discriminator (TRD), mean error and calibrated confidence intervals are synergistically analyzed to establish reliability models for different scenarios. Subsequently, the risk-aware discriminator leverages a joint risk model that integrates longitudinal and lateral prediction intervals within the Frenet coordinate to identify critical points. This enables segmentation of trajectories into reliable and unreliable segments, holding the advantage of informing downstream planning modules with actionable reliability results. We evaluated our framework using the real-world nuPlan dataset, demonstrating its effectiveness in scenario-aware uncertainty quantification and reliability assessment across diverse driving contexts.
A Note on Emergent Behavior in Multi-agent Systems Enabled by Neuro-spike Communication
In this note, we present a novel synchronization framework for heterogeneous multi-agent systems enabled by neuro-spike communication, which induces emergence. Unlike conventional synchronization strategies that require continuous transmission of full-state data packets, our approach utilizes a bio-inspired neuromorphic amplifier to achieve practical synchronization via intermittent, 1-bit Dirac delta pulses. The proposed method drastically improves communication efficiency in terms of bandwidth and energy by minimizing the information payload to a single bit, with intermittent and asynchronous communication. We provide a rigorous convergence analysis of the proposed method and validate the proposed scheme through numerical examples.
Executing Discrete/Continuous Declarative Process Specifications via Complex Event Processing
Traditional Business Process Management (BPM) focuses on discrete events and fails to incorporate critical continuous sensor data in cyber-physical environments. Hybrid declarative specifications, utilizing Signal Temporal Logic (STL), address this limitation by allowing constraints over both discrete events and real-valued signals. However, existing work has been limited to monitoring and post-hoc conformance checking. This paper introduces a novel Complex Event Processing (CEP)-based execution architecture that enables the real-time execution and enforcement of hybrid declarative models. Our three-layer approach integrates STL-inspired predicates into the execution flow, allowing the system to actively trigger activities and enforce process boundaries based on continuous sensor behavior. This approach bridges the gap between hybrid specification and operational control.
comment: Preprint
A Regularization and Active Learning Method for Identification of Quasi Linear Parameter Varying Systems
This paper proposes an active learning method for designing experiments to identify quasi-Linear Parameter-Varying (qLPV) models. Since informative experiments are costly, input signals must be selected to maximize information content based on the currently available model. To improve the extrapolation properties of the identified model, we introduce a manifold-regularization strategy that enforces smooth variations in the qLPV dynamics, promoting Linear Time-Varying (LTV) behavior. Using this regularized structure, we propose a new active learning criterion based on path integrals of an inverse-distance variance measure and derive an efficient approximation exploiting the LTV smoothness. Numerical examples show that the proposed regularization enhances qLPV extrapolation and that the resulting active learning scheme accelerates the identification process.
comment: 8 pages, 6 figures
Feedback stabilization of some fourth-order nonlinear parabolic equations with saturated controlsEQUATIONS WITH SATURATED CONTROLS
In this work, we analyze the internal and boundary stabilization of the Cahn-Hilliard and Kuramoto-Sivashinsky equations under saturated feedback control. We conduct our study through the spectral analysis of the associated linear operator. We identify a finite number of eigenvalues related to the unstable part of the system and then design a stabilization strategy based on modal decomposition, linear matrix inequalities (LMIs), and geometric conditions on the saturation function. Local exponential stabilization in $H^{2}$ is established.
Supervisory Measurement-Guided Noise Covariance Estimation: Discussing Forward and Reverse Differentiation
Reliable state estimation depends on accurately modeled noise covariances, which are difficult to determine in practice. This paper formulates the noise covariance estimation as a bilevel optimization problem that factorizes the joint likelihood of primary and supervisory measurements to reconcile information exploitation with computational tractability. The factorization converts the nested Bayesian dependency into a Markov-chain structure, allowing efficient computation. At the lower level, a Kalman filter with state augmentation performs such computation. Meanwhile, closed-form forward and reverse differentiation provide efficient gradients for the upper-level updates, and we compare the two models' space and time complexities to inform their practical selection. The upper level subsequently refines the noise covariances to guide the lower-level estimation. Taken together, the proposed algorithms offer a systematic and computationally efficient approach to noise covariance estimation in linear Gaussian systems.
A Non-Invasive Path to Animal Welfare: Contactless Vital Signs and Activity Monitoring of In-Vivo Rodents Using a mm-Wave FMCW Radar
Monitoring physiological and behavioral parameters of laboratory rodents is fundamental for biomedical research, yet conventional techniques often rely on invasive sensors or frequent handling that can induce stress and compromise data fidelity. To address these limitations, this paper presents a contactless and non-invasive in-vivo monitoring system based on a low-power 60 GHz frequency-modulated continuous wave (FMCW) radar. The proposed system enables simultaneous detection of rodent activity and vital signs directly within home-cage environments, eliminating the need for implants, electrodes, or human intervention. The hardware platform leverages a compact Infineon BGT60 series radar sensor, optimized for low power consumption and continuous operation. We investigate sensor placement strategies and design a complete signal processing pipeline, including range bin selection, phase extraction, and frequency-domain estimation tailored to rodent vital signs. The system achieves 3 cm and 0.1 m/s sensitivity for motion and activity detection, while allowing discrimination of micro-movements associated with cardiopulmonary activity with a 2 um distance resolution. Experimental validation with two rodents in realistic in-vivo cages demonstrates that the radar can track animal position and extract respiration rates with 2 bpm accuracy. By minimizing stress and disturbance, this work improves both animal welfare and the reliability of physiological measurements, offering a refined alternative to traditional monitoring methods. This work represents the first demonstration of continuous radar-based vital sign monitoring in freely moving rodents within group-housed cages. The proposed approach lays the foundation for scalable, automated, and ethical monitoring solutions in preclinical and translational research.
Measurement-based Initial Point Smoothing and Control Approach to Quantum Memory Systems
This paper is concerned with a quantum memory system for storing quantum information in the form of its initial dynamic variables in the presence of environmental noise. In order to compensate for the deviation from the initial conditions, the classical parameters of the system Hamiltonian are affected by the actuator output of a measurement-based classical controller. The latter uses an observation process produced by a measuring apparatus from the quantum output field of the memory system. The underlying system is modelled as an open quantum harmonic oscillator whose Heisenberg evolution is governed by linear Hudson-Parthasarathy quantum stochastic differential equations. The controller is organised as a classical linear time-varying system, so that the resulting closed-loop system has quantum and classical dynamic variables. We apply linear-quadratic-Gaussian control and fixed-point smoothing at the level of the first two moments and consider controllers with a separation structure which involve a continuously updated estimate for the initial quantum variables. The initial-point smoother is used for actuator signal formation so as to minimise the sum of a mean-square deviation of the quantum memory system variables at a given time horizon from their initial values and an integral quadratic penalty on the control signal.
comment: 6 pages, 1 figure, submitted to IFAC World Congress 2026
Inverse Linear-Quadratic Gaussian Differential Games
This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.
PAC One-Step Safety Certification for Black-Box Discrete-Time Stochastic Systems
This paper investigates the problem of safety certification for black-box discrete-time stochastic systems, where both the system dynamics and disturbance distributions are unknown, and only sampled data are available. Under such limited information, ensuring robust or classical quantitative safety over finite or infinite horizons is generally infeasible. To address this challenge, we propose a data-driven framework that provides theoretical one-step safety guarantees in the Probably Approximately Correct (PAC) sense. This one-step guarantee can be applied recursively at each time step, thereby yielding step-by-step safety assurances over extended horizons. Our approach formulates barrier certificate conditions based solely on sampled data and establishes PAC safety guarantees by leveraging the VC dimension, scenario approaches, Markov's inequality, and Hoeffding's inequality. Two sampling procedures are proposed, and three methods are proposed to derive PAC safety guarantees. The properties and comparative advantages of these three methods are thoroughly discussed. Finally, the effectiveness of the proposed methods are demonstrated through several numerical examples.
Constructive boundary observer-based control of high-dimensional semilinear heat equations
This paper presents a constructive finite-dimensional output-feedback design for semilinear $M$-dimensional ($M\geq 2$) heat equations with boundary actuation and sensing. A key challenge in high dimensions is the slower growth rate of the Laplacian eigenvalues. The novel features of our modal-decomposition-based design, which allows to enlarge Lipschitz constants, include a larger class of shape functions that may be distributed over a part of the boundary only, the corresponding lifting transformation and the full-order controller gain found from the design LMIs. We further analyze the robustness of the closed-loop system with respect to either multiplicative noise (vanishing at the origin) or additive noise (persistent). Effective LMI conditions are provided for specifying the minimal observer dimension and maximal Lipschitz constants that preserve the stability (mean-square exponential stability for multiplicative noise and noise-to-state stability for additive noise). Numerical examples for 2D and 3D cases demonstrate the efficacy and advantages of our method.
Solving Multiparametric Generalized Nash Equilibrium Problems and Explicit Game-Theoretic Model Predictive Control
We present a method to compute explicit solutions of parametric Generalized Nash Equilibrium (GNE) problems with convex quadratic cost functions and linear coupling and local constraints. Assuming the parameters only enter the linear terms of the cost functions and constraint right-hand sides, we provide the exact multiparametric solution of the GNE problem. Such a solution enables (i) minimal real-time computation, (ii) inherent interpretability, explainability, and exact enumeration of all multiple equilibria, (iii) determine desired GNE solution types in the case of infinitely-many equilibria, and (iv) zero-shot updates of the GNE solution due to changes of constraint right-hand sides and/or linear costs. In line with explicit Model Predictive Control (MPC) approaches, we apply our method to solve game-theoretic MPC (Receding Horizon Games) explicitly, comparing performance against centralized solvers in a battery charging game and in a toy two-mass spring system control problem. A Python implementation of the algorithms presented in this paper is available on https://github.com/bemporad/nash_mpqp.
Spatiotemporal Tubes for Differential Drive Robots with Model Uncertainty
This paper presents a Spatiotemporal Tube (STT)-based control framework for differential-drive mobile robots with dynamic uncertainties and external disturbances, guaranteeing the satisfaction of Temporal Reach-Avoid-Stay (T-RAS) specifications. The approach employs circular STT, characterized by smoothly time-varying center and radius, to define dynamic safe corridors that guide the robot from the start region to the goal while avoiding obstacles. In particular, we first develop a sampling-based synthesis algorithm to construct a feasible STT that satisfies the prescribed timing and safety constraints with formal guarantees. To ensure that the robot remains confined within this tube, we then design analytically a closed-form, approximation-free control law. The resulting controller is computationally efficient, robust to disturbances and {model uncertainties}, and requires no model approximations or online optimization. The proposed framework is validated through simulation studies on a differential-drive robot and benchmarked against state-of-the-art methods, demonstrating superior robustness, accuracy, and computational efficiency.
Privacy-Preserving Fully Distributed Gaussian Process Regression
Although distributed Gaussian process regression (GPR) enables multiple agents with separate datasets to jointly learn a model of the target function, its collaborative nature poses risks of private data leakage. To address this, we propose a privacy-preserving fully distributed GPR protocol based on secure multi-party computation (SMPC) that preserves the confidentiality of each agent's local dataset. Building upon a secure distributed average consensus algorithm, the protocol guarantees that each agent's local model practically converges to the same global model that would be obtained by the standard distributed GPR. Further, we adopt the paradigm of simulation based security to provide formal privacy guarantees, and extend the proposed protocol to enable kernel hyperparameter optimization, which is critical yet often overlooked in the literature. Experimental results demonstrate the effectiveness and practical applicability of the proposed method.
comment: Submitted to IEEE Transactions on Neural Networks and Learning Systems
Hybrid modeling approach for better identification of building thermal network model and improved prediction
The gray-box modeling approach, which uses a semi-physical thermal network model, has been widely used in building prediction applications, such as model predictive control (MPC). However, unmeasured disturbances, such as occupants, lighting, and in/exfiltration loads, make it challenging to apply this approach to practical buildings. In this study, we propose a hybrid modeling approach that integrates the gray-box model with a model for unmeasured disturbance. After reviewing several system identification approaches, we systematically designed the unmeasured disturbance model with a model selection process based on statistical tests to make it robust. We generated data based on the building model calibrated by real operational data and then trained the hybrid model for two different weather conditions. The Hybrid model approach demonstrates the reduction of RMSE approximately 0.2-0.9C and 0.3-2C on 1-day ahead temperature prediction compared to the Conventional approach for mild (Berkeley, CA) and cold (Chicago, IL) climates, respectively. In addition, this approach was applied for experimental data obtained from the laboratory building to be used for the MPC application, showing superior prediction performances.
Data-Driven Adaptive Output Regulation of Unknown Linear Systems
This paper investigates the linear output regulation problem with both the exosystem and the plant fully unknown. A data-driven regulator is proposed to achieve asymptotic regulation and closed-loop stability without performing model identification. The method constructs a nominal approximate internal model and filters of input and outputs, thereby yielding a stabilizable cascaded nominal system whose states are available. For this nominal system, a stabilizing law is derived from an offline dataset that has been acquired from the plant during experiments, such that the system states exponentially converge to a subspace. An identifier in discrete-time is, then, implemented to correct the internal model and update the stabilizing law; as a result, the regulation error can be steered to zero asymptotically under some persistent excitation conditions.
comment: 8 pages, 2 figures, conference
Comparative Analysis of Barrier-like Function Methods for Reach-Avoid Verification in Stochastic Discrete-Time Systems
In this paper, we compare several representative barrier-like conditions from the literature for infinite-horizon reach-avoid verification of stochastic discrete-time systems. Our comparison examines both their theoretical properties and computational tractability, highlighting each condition's strengths and limitations that affect applicability and conservativeness. Finally, we illustrate their practical performance through computational experiments using semidefinite programming (SDP) and counterexample-guided inductive synthesis (CEGIS).
comment: 23pages, 5tables
Symmetric Linear Dynamical Systems are Learnable from Few Observations
We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce and analyze a new estimator that achieves a small maximum element-wise error on the recovery of symmetric dynamic matrices using only $T=\mathcal{O}(\log N)$ observations, irrespective of whether the matrix is sparse or dense. This estimator is based on the method of moments and does not rely on problem-specific regularization. This is especially important for applications such as structure discovery.
Explainable LP-MPC: Shadow Price Contributions Reveal MV-CV Pairings
Large industrial LP-MPC (Linear Program-Model Predictive Control) systems contain tens to hundreds of manipulated variables (MVs) and controlled variables (CVs) for honoring constraints while operating at plant-wide economic optima. The LP is a white-box model, yet for a number of reasons, abnormal behaviors in industrial controllers are often difficult to rationalize. We introduce a novel, post-hoc LP explainability method by recasting the role of shadow prices in the LP solution as an attribution mechanism for MV-CV relationships. The core idea is that CV shadow prices are not just intrinsic properties of the optimal solution, but an aggregate of the cost sensitivities contributed by MVs in enforcing CV constraints, which is then resolved into one-to-one MV-CV pairings using a linear sum assignment solution. The proposed MV-CV pairing framework serves as a practical explainability tool for online LP-MPC systems, enabling practitioners to diagnose suboptimal constraints and verify alignment of the controller's behavior with its original design intent and historical constraints.
comment: Submitted to the 2026 IFAC World Congress
Sparse Neural Approximations for Bilevel Adversarial Problems in Power Grids
The adversarial worst-case load shedding (AWLS) problem is pivotal for identifying critical contingencies under line outages. It is naturally cast as a bilevel program: the upper level simulates an attacker determining worst-case line failures, and the lower level corresponds to the defender's generator redispatch operations. Conventional techniques using optimality conditions render the bilevel, mixed-integer formulation computationally prohibitive due to the combinatorial number of topologies and the nonconvexity of AC power flow constraints. To address these challenges, we develop a novel single-level optimal value-function (OVF) reformulation and further leverage a data-driven neural network (NN) surrogate of the follower's optimal value. To ensure physical realizability, we embed the trained surrogate in a physics-constrained NN (PCNN) formulation that couples the OVF inequality with (relaxed) AC feasibility, yielding a mixed-integer convex model amenable to off-the-shelf solvers. To achieve scalability, we learn a sparse, area-partitioned NN via spectral clustering; the resulting block-sparse architecture scales essentially linearly with system size while preserving accuracy. Notably, our approach produces near-optimal worst-case failures and generalizes across loading conditions and unseen topologies, enabling rapid online recomputation. Numerical experiments on the IEEE 14- and 118-bus systems demonstrate the method's scalability and solution quality for large-scale contingency analysis, with an average optimality gap of 5.8% compared to conventional methods, while maintaining computation times under one minute.
Situation-Aware Interactive MPC Switching for Autonomous Driving
To enable autonomous driving in interactive traffic scenarios, various model predictive control (MPC) formulations have been proposed, each employing different interaction models. While higher-fidelity models enable more intelligent behavior, they incur increased computational cost. Since strong interactions are relatively infrequent in traffic, a practical strategy for balancing performance and computational overhead is to invoke an appropriate controller based on situational demands. To achieve this approach, we first conduct a comparative study to assess and hierarchize the interactive capabilities of different MPC formulations. Furthermore, we develop a neural network-based classifier to enable situation-aware switching among controllers with different levels of interactive capability. We demonstrate that this situation-aware switching can both substantially improve overall performance by activating the most advanced interactive MPC in rare but critical situations, and significantly reduce computational load by using a basic MPC in the majority of scenarios.
Real-Time Spatiotemporal Tubes for Dynamic Unsafe Sets
This paper presents a real-time control framework for nonlinear pure-feedback systems with unknown dynamics to satisfy reach-avoid-stay tasks within a prescribed time in dynamic environments. To achieve this, we introduce a real-time spatiotemporal tube (STT) framework. An STT is defined as a time-varying ball in the state space whose center and radius adapt online using only real-time sensory input. A closed-form, approximation-free control law is then derived to constrain the system output within the STT, ensuring safety and task satisfaction. We provide formal guarantees for obstacle avoidance and on-time task completion. The effectiveness and scalability of the framework are demonstrated through simulations and hardware experiments on a mobile robot and an aerial vehicle, navigating in cluttered dynamic environments.
A Nonlinear Observer for Air-Velocity and Attitude Estimation Using Pitot and Barometric Measurements
This paper addresses the problem of estimating air velocity and full attitude for unmanned aerial vehicles (UAVs) in GNSS-denied environments using minimal onboard sensing-an interesting and practically relevant challenge for UAV navigation. The contribution of the paper is twofold: (i) an observability analysis establishing the conditions for uniform observability, which are useful for trajectory planning and motion control of the UAV; and (ii) the design of a nonlinear observer on SO3R3R that incorporates pitot-tube, barometric altitude, and magnetometer measurements as outputs, with IMU data used as inputs, within a unified framework. Simulation results are presented to confirm the convergence and robustness of the proposed design, including under minimally excited trajectories.
comment: 8 pages, 4 figures, submitted to IFAC2026
Probabilistic Weapon Engagement Zones for a Turn Constrained Pursuer
Curve-straight probabilistic engagement zones (CSPEZ) quantify the spatial regions an evader should avoid to reduce capture risk from a turn-rate-limited pursuer following a curve-straight path with uncertain parameters including position, heading, velocity, range, and maximum turn rate. This paper presents methods for generating evader trajectories that minimize capture risk under such uncertainty. We first derive an analytic solution for the deterministic curve-straight basic engagement zone (CSBEZ), then extend this formulation to a probabilistic framework using four uncertainty-propagation approaches: Monte Carlo sampling, linearization, quadratic approximation, and neural-network regression. We evaluate the accuracy and computational cost of each approximation method and demonstrate how CSPEZ constraints can be integrated into a trajectory-optimization algorithm to produce safe paths that explicitly account for pursuer uncertainty.
comment: Accepted for presentation at AIAA SciTech 2026. 17 pages, 7 figures
Variable L0 Guidance Strategy: Enlarged Operational Envelope and Path-Following
This paper presents a geometric and theoretical study of an exponentially varying look-ahead parameter for UAV path-following guidance. Conventional guidance laws with a fixed look-ahead distance often drive the vehicle into turn-rate saturation when the heading or cross-track error is large, leading to constrained maneuvers and higher control effort. The proposed variable L0 strategy reshapes the look-ahead profile so that the guidance command adapts to the evolving tracking error geometry. A detailed investigation shows that this adaptation significantly enlarges the region in which the commanded turn rate remains unsaturated, allowing the vehicle to operate smoothly over a broader range of error conditions. For representative settings, the unsaturated operational envelope increases by more than 70% relative to the constant L0 formulation. These geometric insights translate to smoother trajectories, earlier recovery from saturation, and reduced control demand. Simulation studies on straight-line and elliptical paths demonstrate the merits of the variable look-ahead strategy, highlighting its control-efficient and reliable path-following performance.
comment: Under review in IFAC
Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions
This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates the KL penalties on policies and transitions, assigning them independent weights, thereby generalizing the standard trajectory-level KL-regularization commonly used in probabilistic and KL-regularized control. This generalized formulation acts as a generative structure allowing to recover various control problems. These include the classical Stochastic Optimal Control (SOC), Risk-Sensitive Optimal Control (RSOC), and their policy-based KL-regularized counterparts. The latter we refer to as soft-policy SOC and RSOC, facilitating alternative problems with tractable solutions. Beyond serving as regularized variants, we show that these soft-policy formulations majorize the original SOC and RSOC problem. This means that the regularized solution can be iterated to retrieve the original solution. Furthermore, we identify a structurally synchronized case of the risk-seeking soft-policy RSOC formulation, wherein the policy and transition KL-regularization weights coincide. Remarkably, this specific setting gives rise to several powerful properties such as a linear Bellman equation, path integral solution, and, compositionality, thereby extending these computationally favourable properties to a broad class of control problems.
Cascaded Tightly-Coupled Observer Design for Single-Range-Aided Inertial Navigation
This work introduces a single-range-aided navigation observer that reconstructs the full state of a rigid body using only an Inertial Measurement Unit (IMU), a body-frame vector measurement (e.g., magnetometer), and a distance measurement from a fixed anchor point. The design first formulates an extended linear time-varying (LTV) system to estimate body-frame position, body-frame velocity, and the gravity direction. The recovered gravity direction, combined with the body-frame vector measurement, is then used to reconstruct the full orientation on $\mathrm{SO}(3)$, resulting in a cascaded observer architecture. Almost Global Asymptotic Stability (AGAS) of the cascaded design is established under a uniform observability condition, ensuring robustness to sensor noise and trajectory variations. Simulation studies on three-dimensional trajectories demonstrate accurate estimation of position, velocity, and orientation, highlighting single-range aiding as a lightweight and effective modality for autonomous navigation.
comment: 8 pages
Communication-Aware Dissipative Control for Networks of Heterogeneous Nonlinear Agents
Communication-aware control is essential to reduce costs and complexity in large-scale networks. However, it is challenging to simultaneously determine a sparse communication topology and achieve high performance and robustness. This work achieves all three objectives through dissipativity-based, sparsity-promoting controller synthesis. The approach identifies an optimal sparse structure using either weighted l1 penalties or alternating direction methods of multipliers (ADMM) with a cardinality term, and iteratively solves a convexified version of the NP hard structured optimal control problem. The proposed methods are demonstrated on heterogeneous networks with uncertain and unstable agents.
comment: Under review for IFAC 2026. 8 pages, 4 figures, 1 table
Stabilization of Age-Structured Competing Populations
Age-structured models capture the dynamic behavior of populations over time and result in nonlinear integro-partial differential equations (IPDEs). These processes arise in various fields such as biotechnology, economics, or demography. While coupled age-structured IPDEs modeling two or more interacting species occur naturally in epidemiology and ecology, they remain relatively underexplored. Prior work has primarily addressed stable and marginally stable dynamics. In constrast, this work considers an exponentially unstable model of two competing predator populations, formally referred to in the literature as ``competition'' dynamics. If one were to apply an input that simultaneously harvests both predator species, one would have control over only the product of the densities of the species, not over their ratio. Therefore, it is necessary to design a control input that directly harvests only one of the two predator species, while indirectly influencing the other via a backstepping approach. The model is transformed into a system of two coupled ordinary differential equations (ODEs), of which only one is actuated, and two autonomous, exponentially stable integral delay equations (IDEs) which enter the ODEs as nonlinear disturbances. The ODEs are globally stabilized with backstepping and an estimate of the region of attraction of the asymptotically stabilized equilibrium of the full IPDE system is provided, under a positivity restriction on control. Additionally, the full IPDE system is also shown to be local exponential stable. Such generalizations of competition dynamics open exciting possibilities for future research directions for systems with more than two species.
comment: revised version
Generation Expansion Planning with Upstream Supply Chain Constraints on Materials, Manufacturing, and Deployment
Rising electricity demand underscores the need for secure and reliable generation expansion planning that accounts for upstream supply chain constraints. Traditional models often overlook limitations in materials, manufacturing capacity, lead times for deployment, and field availability, which can delay availability of planned resources and thus to threaten system reliability. This paper introduces a multi-stage supply chain-constrained generation expansion planning (SC-GEP) model that optimizes long-term investments while capturing material availability, production limits, spatial and temporal constraints, and material reuse from retired assets. A decomposition algorithm efficiently solves the resulting MILP. A Maryland case study shows that supply chain constraints shift technology choices, amplify deployment delays caused by lead times, and prompt earlier investment in shorter lead-time, low-material-intensity options. In the low-demand scenario, supply chain constraints raise investment costs by $1.2 billion. Under high demand, persistent generation and reserve shortfalls emerge, underscoring the need to integrate upstream constraints into long-term planning.
comment: 16 pages, 10 figures
Subtransmission Grid Control via Online Feedback Optimization
The increasing electric power consumption and the shift towards renewable energy resources demand for new ways to operate transmission and subtransmission grids. Online Feedback Optimization (OFO) is a feedback real-time control method that can be employed to enable optimal operation of these grids. Such controllers can maximize grid efficiency (e.g., minimizing curtailment) while satisfying grid constraints like voltage and current limits. The OFO control method is tailored and extended to handle discrete inputs and it is explained how to design an OFO controller for the subtransmission grid. A novel benchmark is presented and published that corresponds to the real French subtransmission grid on which the proposed controller is analyzed in terms of robustness against model mismatch, constraint satisfaction, and tracking performance. It is shown that OFO controllers can help utilize the grid to its full extent, virtually reinforce it, and operate it optimally and in real-time by using the flexibility offered by renewable generators connected to distribution grids.
A robot-assisted pipeline to rapidly scan 1.7 million historical aerial photographs
During the 20th Century, aerial surveys captured hundreds of millions of high-resolution photographs of the earth's surface. These images, the precursors to modern satellite imagery, represent an extraordinary visual record of the environmental and social upheavals of the 20th Century. However, most of these images currently languish in physical archives where retrieval is difficult and costly. Digitization could revolutionize access, but manual scanning is slow and expensive. Automated scanning could make at-scale digitization feasible, unlocking this visual record of the 20th Century for the digital era. Here, we describe and validate a novel robot-assisted pipeline that increases worker productivity in scanning 30-fold, applied at scale to digitize an archive of 1.7 million historical aerial photographs from 65 countries.
Optimal Power Scheduling for High Renewables-Integrated Energy Systems with Battery Storage
In high renewables-integrated power systems, irrespective to their sizes, energy storage is commonly included and utilized to mitigate fluctuations from both the load and renewable power generation, ensuring system reliability, among which battery energy storage system (BESS) are experiencing fast-growth in recent years. The BESS systems, predominantly employing lithi-um-ion batteries, have been extensively deployed. The degrada-tion of these batteries significantly affects system efficiency. Deep neural networks can accurately quantify the battery degrada-tion; however, the model complexity hinders their applications in energy scheduling for various power systems at different scales. To address this issue, this paper presents a novel approach, in-troducing a linearized sparse neural network-based battery deg-radation model (SNNBD), specifically tailored to quantify battery degradation based on the scheduled BESS operational profiles. This approach achieves accurate degradation prediction while substantially reducing the complexity associated with a dense neural network model. The computational burden of day-ahead energy scheduling when integrating battery degradation can thus be substantially alleviated. Case studies, conducted on both small-scale microgrids and large-scale bulk power grids, demonstrated the efficiency and suitability of the proposed optimal energy scheduling model that can effectively address battery degrada-tion concerns while optimizing day-ahead energy scheduling op-erations.
Levelised Cost of Demand Response: Estimating the Cost-Competitiveness of Flexible Demand
To make well-informed investment decisions, energy system stakeholders require reliable cost frameworks for demand response (DR) and storage technologies. While the levelised cost of storage (LCOS) permits comprehensive cost comparisons between different storage technologies, no generic cost measure for the comparison of different DR schemes exists. This paper introduces the levelised cost of demand response (LCODR) which is an analogous measure to the LCOS but crucially differs from it by considering consumer reward payments. Additionally, the value factor from cost estimations of variable renewable energy is adapted to account for the variable availability of DR. The LCODRs for four direct load control (DLC) schemes and twelve storage applications are estimated and contrasted against LCOS literature values for the most competitive storage technologies. The DLC schemes are vehicle-to-grid, smart charging, smart heat pumps, and heat pumps with thermal storage. The results show that only heat pumps with thermal storage consistently outcompete storage technologies with EV-based DR schemes being competitive for some applications. The results and the underlying methodology offer a tool for energy system stakeholders to assess the competitiveness of DR schemes even with limited user data.
Semantic Communication and Control Co-Design for Multi-Objective Distinct Dynamics
This letter introduces a machine-learning approach to learning the semantic dynamics of correlated systems with different control rules and dynamics. By leveraging the Koopman operator in an autoencoder (AE) framework, the system's state evolution is linearized in the latent space using a dynamic semantic Koopman (DSK) model, capturing the baseline semantic dynamics. Signal temporal logic (STL) is incorporated through a logical semantic Koopman (LSK) model to encode system-specific control rules. These models form the proposed logical Koopman AE framework that reduces communication costs while improving state prediction accuracy and control performance, showing a 91.65% reduction in communication samples and significant performance gains in simulation.
Luré-Postnikov Stability Analysis of Closed-Loop Control Systems with Gated Recurrent Neural Network-based Virtual Sensors
This article addresses certification of closed-loop stability when a virtual-sensor based on a gated recurrent neural network operates in the feedback path of a nonlinear control system. The Hadamard gating used in standard GRU/LSTM cells is shown to violate the Luré-Postnikov Lyapunov conditions of absolute-stability theory, leading to conservative analysis. To overcome this limitation, a modified architecture-termed the Luré-Postnikov gated recurrent neural network (LP-GRNN)-is proposed; its affine update law is compatible with the Luré-Postnikov framework while matching the prediction accuracy of vanilla GRU/LSTM models on the NASA CMAPSS benchmark. Embedding the LP-GRNN, the plant, and a saturated PI controller in a unified standard nonlinear operator form (SNOF) reduces the stability problem to a compact set of tractable linear matrix inequalities (LMIs) whose feasibility certifies global asymptotic stability. A linearized boiler case study illustrates the workflow and validates the closed-loop performance, thereby bridging modern virtual-sensor design with formal stability guarantees.
Wasserstein Distributionally Robust Nash Equilibrium Seeking with Heterogeneous Data: A Lagrangian Approach
We study a class of distributionally robust games where agents are allowed to heterogeneously choose their risk aversion with respect to distributional shifts of the uncertainty. In our formulation, heterogeneous Wasserstein ball constraints on each distribution are enforced through a penalty function leveraging a Lagrangian formulation. We then formulate the distributionally robust Nash equilibrium problem and show that under certain assumptions it is equivalent to a finite-dimensional variational inequality problem with a strongly monotone mapping. We then design an approximate Nash equilibrium seeking algorithm and prove convergence of the average regret to a quantity that diminishes with the number of iterations, thus learning the desired equilibrium up to an a priori specified accuracy. Numerical simulations corroborate our theoretical findings.
Accelerated ADMM: Automated Parameter Tuning and Improved Linear Convergence
This work studies the linear convergence of an accelerated scheme of the Alternating Direction Method of Multipliers (ADMM) for strongly convex and Lipschitz-smooth problems. We use the methodology of expressing the accelerated ADMM as a Lur'e system, i.e., an interconnection of a linear dynamical system in feedback with a slope-restricted operator, and we use Integral Quadratic Constraints to establish linear convergence. In addition, we propose several parameter tuning heuristics and their impact on the convergence rate through numerical analyses. Our new bounds show improved linear convergence rates compared to the vanilla algorithm and previous proposed accelerated variants, which is also empirically validated on a LASSO regression benchmark.
Inertia-Constrained Generation Scheduling: Sample Selection, Learning-Embedded Optimization Modeling, and Computational Enhancement
Day-ahead generation scheduling is typically conducted by solv-ing security-constrained unit commitment (SCUC) problem. However, with fast-growing of inverter-based resources, grid inertia has been dramatically reduced, compromising the dy-namic stability system. Traditional SCUC (T-SCUC), without any inertia requirements, may no longer be effective for renewa-bles-dominated grids. To address this, we propose the active linearized sparse neural network-embedded SCUC (ALSNN-SCUC) model, utilizing machine learning (ML) to incorporate system dynamic performance. A multi-output deep neural net-work (DNN) model is trained offline on strategically-selected data samples to accurately predict frequency stability metrics: locational RoCoF and frequency nadir. Structured sparsity and active ReLU linearization are implemented to prune redundant DNN neurons, significantly reducing its size while ensuring pre-diction accuracy even at high sparsity levels. By embedding this ML-based frequency stability predictor into SCUC as con-straints, the proposed ALSNN-SCUC model minimizes its com-putational complexity while ensuring frequency stability follow-ing G-1 contingency. Case studies show that the proposed ALSNN-SCUC can enforce pre-specified frequency requirements without being overly conservative, outperforming five bench-mark models including T-SCUC, two physics-based SCUC, and two ML-based SCUC. The proposed sparsification and active linearization strategies can reduce the DNN-SCUC computing time by over 95% for both IEEE 24-bus and 118-bus systems, demonstrating the effectiveness and scalability of the proposed ALSNN-SCUC model.
Computing Sound and Accurate Upper and Lower Bounds on Hamilton-Jacobi Reachability Value Functions
Hamilton-Jacobi (HJ) reachability analysis is a fundamental tool for safety verification and control synthesis for nonlinear-control systems. Classical HJ reachability analysis methods discretize the continuous state space and solve the HJ partial differential equation over a grid, but these approaches do not account for discretization errors and can under-approximate backward reachable sets, which represent unsafe sets of states. We present a framework for computing sound upper and lower bounds on the HJ value functions via value iteration over grids. Additionally, we develop a refinement algorithm that splits cells that were not possible to classify as safe or unsafe given the computed bounds. This algorithm enables computing accurate over-approximations of backward reachable sets even when starting from coarse grids. Finally, we validate the effectiveness of our method in two case studies.
comment: Theoretical results are incorrect. Paper is currently under revising and will be updated when new correct proofs and results are updated
Robotics
STARE-VLA: Progressive Stage-Aware Reinforcement for Fine-Tuning Vision-Language-Action Models
Recent advances in Vision-Language-Action (VLA) models, powered by large language models and reinforcement learning-based fine-tuning, have shown remarkable progress in robotic manipulation. Existing methods often treat long-horizon actions as linguistic sequences and apply trajectory-level optimization methods such as Trajectory-wise Preference Optimization (TPO) or Proximal Policy Optimization (PPO), leading to coarse credit assignment and unstable training. However, unlike language, where a unified semantic meaning is preserved despite flexible sentence order, action trajectories progress through causally chained stages with different learning difficulties. This motivates progressive stage optimization. Thereby, we present Stage-Aware Reinforcement (STARE), a module that decomposes a long-horizon action trajectory into semantically meaningful stages and provides dense, interpretable, and stage-aligned reinforcement signals. Integrating STARE into TPO and PPO, we yield Stage-Aware TPO (STA-TPO) and Stage-Aware PPO (STA-PPO) for offline stage-wise preference and online intra-stage interaction, respectively. Further building on supervised fine-tuning as initialization, we propose the Imitation -> Preference -> Interaction (IPI), a serial fine-tuning pipeline for improving action accuracy in VLA models. Experiments on SimplerEnv and ManiSkill3 demonstrate substantial gains, achieving state-of-the-art success rates of 98.0 percent on SimplerEnv and 96.4 percent on ManiSkill3 tasks.
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.
From Generated Human Videos to Physically Plausible Robot Trajectories
Video generation models are rapidly improving in their ability to synthesize human actions in novel contexts, holding the potential to serve as high-level planners for contextual robot control. To realize this potential, a key research question remains open: how can a humanoid execute the human actions from generated videos in a zero-shot manner? This challenge arises because generated videos are often noisy and exhibit morphological distortions that make direct imitation difficult compared to real video. To address this, we introduce a two-stage pipeline. First, we lift video pixels into a 4D human representation and then retarget to the humanoid morphology. Second, we propose GenMimic-a physics-aware reinforcement learning policy conditioned on 3D keypoints, and trained with symmetry regularization and keypoint-weighted tracking rewards. As a result, GenMimic can mimic human actions from noisy, generated videos. We curate GenMimicBench, a synthetic human-motion dataset generated using two video generation models across a spectrum of actions and contexts, establishing a benchmark for assessing zero-shot generalization and policy robustness. Extensive experiments demonstrate improvements over strong baselines in simulation and confirm coherent, physically stable motion tracking on a Unitree G1 humanoid robot without fine-tuning. This work offers a promising path to realizing the potential of video generation models as high-level policies for robot control.
comment: For project website, see https://genmimic.github.io
Object Reconstruction under Occlusion with Generative Priors and Contact-induced Constraints
Object geometry is key information for robot manipulation. Yet, object reconstruction is a challenging task because cameras only capture partial observations of objects, especially when occlusion occurs. In this paper, we leverage two extra sources of information to reduce the ambiguity of vision signals. First, generative models learn priors of the shapes of commonly seen objects, allowing us to make reasonable guesses of the unseen part of geometry. Second, contact information, which can be obtained from videos and physical interactions, provides sparse constraints on the boundary of the geometry. We combine the two sources of information through contact-guided 3D generation. The guidance formulation is inspired by drag-based editing in generative models. Experiments on synthetic and real-world data show that our approach improves the reconstruction compared to pure 3D generation and contact-based optimization.
comment: Project page: https://contactgen3d.github.io/
Contact-Implicit Modeling and Simulation of a Snake Robot on Compliant and Granular Terrain
This thesis presents a unified modeling and simulation framework for analyzing sidewinding and tumbling locomotion of the COBRA snake robot across rigid, compliant, and granular terrains. A contact-implicit formulation is used to model distributed frictional interactions during sidewinding, and validated through MATLAB Simscape simulations and physical experiments on rigid ground and loose sand. To capture terrain deformation effects, Project Chrono's Soil Contact Model (SCM) is integrated with the articulated multibody dynamics, enabling prediction of slip, sinkage, and load redistribution that reduce stride efficiency on deformable substrates. For high-energy rolling locomotion on steep slopes, the Chrono DEM Engine is used to simulate particle-resolved granular interactions, revealing soil failure, intermittent lift-off, and energy dissipation mechanisms not captured by rigid models. Together, these methods span real-time control-oriented simulation and high-fidelity granular physics. Results demonstrate that rigid-ground models provide accurate short-horizon motion prediction, while continuum and particle-based terrain modeling becomes necessary for reliable mobility analysis in soft and highly dynamic environments. This work establishes a hierarchical simulation pipeline that advances robust, terrain-aware locomotion for robots operating in challenging unstructured settings.
Introducing V-Soft Pro: a Modular Platform for a Transhumeral Prosthesis with Controllable Stiffness
Current upper limb prostheses aim to enhance user independence in daily activities by incorporating basic motor functions. However, they fall short of replicating the natural movement and interaction capabilities of the human arm. In contrast, human limbs leverage intrinsic compliance and actively modulate joint stiffness, enabling adaptive responses to varying tasks, impact absorption, and efficient energy transfer during dynamic actions. Inspired by this adaptability, we developed a transhumeral prosthesis with Variable Stiffness Actuators (VSAs) to replicate the controllable compliance found in biological joints. The proposed prosthesis features a modular design, allowing customization for different residual limb shapes and accommodating a range of independent control signals derived from users' biological cues. Integrated elastic elements passively support more natural movements, facilitate safe interactions with the environment, and adapt to diverse task requirements. This paper presents a comprehensive overview of the platform and its functionalities, highlighting its potential applications in the field of prosthetics.
comment: This article has been accepted for publication in Proceedings of the International Conference On Rehabilitation Robotics (ICORR), 2025. This is the author's version, which has not been fully edited, and content may change prior to final publication. Citation information: DOI 10.1109/ICORR66766.2025.11062964
Preliminary Analysis and Simulation of a Compact Variable Stiffness Wrist
Variable Stiffness Actuators prove invaluable for robotics applications in unstructured environments, fostering safe interactions and enhancing task adaptability. Nevertheless, their mechanical design inevitably results in larger and heavier structures compared to classical rigid actuators. This paper introduces a novel 3 Degrees of Freedom (DoFs) parallel wrist that achieves variable stiffness through redundant elastic actuation. Leveraging its parallel architecture, the device employs only four motors, rendering it compact and lightweight. This characteristic makes it particularly well-suited for applications in prosthetics or humanoid robotics. The manuscript delves into the theoretical model of the device and proposes a sophisticated control strategy for independent regulation of joint position and stiffness. Furthermore, it validates the proposed controller through simulation, utilizing a comprehensive analysis of the system dynamics. The reported results affirm the ability of the device to achieve high accuracy and disturbance rejection in rigid configurations while minimizing interaction forces with its compliant behavior.
comment: This article has been accepted for publication in Springer Proceedings in Advanced Robotics, vol 31. Springer, Cham. This is the author's version, which has not been fully edited, and the content may change prior to final publication. Citation information: DOI https://doi.org/10.1007/978-3-031-64057-5_9
Hybrid-Diffusion Models: Combining Open-loop Routines with Visuomotor Diffusion Policies
Despite the fact that visuomotor-based policies obtained via imitation learning demonstrate good performances in complex manipulation tasks, they usually struggle to achieve the same accuracy and speed as traditional control based methods. In this work, we introduce Hybrid-Diffusion models that combine open-loop routines with visuomotor diffusion policies. We develop Teleoperation Augmentation Primitives (TAPs) that allow the operator to perform predefined routines, such as locking specific axes, moving to perching waypoints, or triggering task-specific routines seamlessly during demonstrations. Our Hybrid-Diffusion method learns to trigger such TAPs during inference. We validate the method on challenging real-world tasks: Vial Aspiration, Open-Container Liquid Transfer, and container unscrewing. All experimental videos are available on the project's website: https://hybriddiffusion.github.io/
FASTer: Toward Efficient Autoregressive Vision Language Action Modeling via neural Action Tokenization
Autoregressive vision-language-action (VLA) models have recently demonstrated strong capabilities in robotic manipulation. However, their core process of action tokenization often involves a trade-off between reconstruction fidelity and inference efficiency. We introduce FASTer, a unified framework for efficient and generalizable robot learning that integrates a learnable tokenizer with an autoregressive policy built upon it. FASTerVQ encodes action chunks as single-channel images, capturing global spatio-temporal dependencies while maintaining a high compression ratio. FASTerVLA builds on this tokenizer with block-wise autoregressive decoding and a lightweight action expert, achieving both faster inference and higher task performance. Extensive experiments across simulated and real-world benchmarks show that FASTerVQ delivers superior reconstruction quality, high token utilization, and strong cross-task and cross-embodiment generalization, while FASTerVLA further improves overall capability, surpassing previous state-of-the-art VLA models in both inference speed and task performance.
On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator
This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.
comment: 18 pages, 11 figures, 5 tables
Hoi! -- A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation
We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated under four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper - where the tool embodiment provides synchronized end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as force sensing and prediction.
MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation
Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a \emph{single, static spatial configuration} of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose \textbf{MOtion-Based Variability Enhancement} (\emph{MOVE}), a simple yet effective data collection paradigm that enables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, \emph{MOVE} achieves an average success rate of 39.1\%, a 76.1\% relative improvement over the static data collection paradigm (22.2\%), and yields up to 2--5$\times$ gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE.
comment: 9 pages, 9 figures
SIMA 2: A Generalist Embodied Agent for Virtual Worlds
We introduce SIMA 2, a generalist embodied agent that understands and acts in a wide variety of 3D virtual worlds. Built upon a Gemini foundation model, SIMA 2 represents a significant step toward active, goal-directed interaction within an embodied environment. Unlike prior work (e.g., SIMA 1) limited to simple language commands, SIMA 2 acts as an interactive partner, capable of reasoning about high-level goals, conversing with the user, and handling complex instructions given through language and images. Across a diverse portfolio of games, SIMA 2 substantially closes the gap with human performance and demonstrates robust generalization to previously unseen environments, all while retaining the base model's core reasoning capabilities. Furthermore, we demonstrate a capacity for open-ended self-improvement: by leveraging Gemini to generate tasks and provide rewards, SIMA 2 can autonomously learn new skills from scratch in a new environment. This work validates a path toward creating versatile and continuously learning agents for both virtual and, eventually, physical worlds.
Using Machine Learning to Take Stay-or-Go Decisions in Data-driven Drone Missions
Drones are becoming indispensable in many application domains. In data-driven missions, besides sensing, the drone must process the collected data at runtime to decide whether additional action must be taken on the spot, before moving to the next point of interest. If processing does not reveal an event or situation that requires such an action, the drone has waited in vain instead of moving to the next point. If, however, the drone starts moving to the next point and it turns out that a follow-up action is needed at the previous point, it must spend time to fly-back. To take this decision, we propose different machine-learning methods based on branch prediction and reinforcement learning. We evaluate these methods for a wide range of scenarios where the probability of event occurrence changes with time. Our results show that the proposed methods consistently outperform the regression-based method proposed in the literature and can significantly improve the worst-case mission time by up to 4.1x. Also, the achieved median mission time is very close, merely up to 2.7% higher, to that of a method with perfect knowledge of the current underlying event probability at each point of interest.
comment: 19 pages, 3 figures, to appear in the proceedings of MobiQuitous 2025
TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards
In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results will be available at the dedicated webpage upon acceptance.
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Bridging Simulation and Reality: Cross-Domain Transfer with Semantic 2D Gaussian Splatting
Cross-domain transfer in robotic manipulation remains a longstanding challenge due to the significant domain gap between simulated and real-world environments. Existing methods such as domain randomization, adaptation, and sim-real calibration often require extensive tuning or fail to generalize to unseen scenarios. To address this issue, we observe that if domain-invariant features are utilized during policy training in simulation, and the same features can be extracted and provided as the input to policy during real-world deployment, the domain gap can be effectively bridged, leading to significantly improved policy generalization. Accordingly, we propose Semantic 2D Gaussian Splatting (S2GS), a novel representation method that extracts object-centric, domain-invariant spatial features. S2GS constructs multi-view 2D semantic fields and projects them into a unified 3D space via feature-level Gaussian splatting. A semantic filtering mechanism removes irrelevant background content, ensuring clean and consistent inputs for policy learning. To evaluate the effectiveness of S2GS, we adopt Diffusion Policy as the downstream learning algorithm and conduct experiments in the ManiSkill simulation environment, followed by real-world deployment. Results demonstrate that S2GS significantly improves sim-to-real transferability, maintaining high and stable task performance in real-world scenarios.
When Robots Should Say "I Don't Know": Benchmarking Abstention in Embodied Question Answering
Embodied Question Answering (EQA) requires an agent to interpret language, perceive its environment, and navigate within 3D scenes to produce responses. Existing EQA benchmarks assume that every question must be answered, but embodied agents should know when they do not have sufficient information to answer. In this work, we focus on a minimal requirement for EQA agents, abstention: knowing when to withhold an answer. From an initial study of 500 human queries, we find that 32.4% contain missing or underspecified context. Drawing on this initial study and cognitive theories of human communication errors, we derive five representative categories requiring abstention: actionability limitation, referential underspecification, preference dependence, information unavailability, and false presupposition. We augment OpenEQA by having annotators transform well-posed questions into ambiguous variants outlined by these categories. The resulting dataset, AbstainEQA, comprises 1,636 annotated abstention cases paired with 1,636 original OpenEQA instances for balanced evaluation. Evaluating on AbstainEQA, we find that even the best frontier model only attains 42.79% abstention recall, while humans achieve 91.17%. We also find that scaling, prompting, and reasoning only yield marginal gains, and that fine-tuned models overfit to textual cues. Together, these results position abstention as a fundamental prerequisite for reliable interaction in embodied settings and as a necessary basis for effective clarification.
Gauss-Newton accelerated MPPI Control
Model Predictive Path Integral (MPPI) control is a sampling-based optimization method that has recently attracted attention, particularly in the robotics and reinforcement learning communities. MPPI has been widely applied as a GPU-accelerated random search method to deterministic direct single-shooting optimal control problems arising in model predictive control (MPC) formulations. MPPI offers several key advantages, including flexibility, robustness, ease of implementation, and inherent parallelizability. However, its performance can deteriorate in high-dimensional settings since the optimal control problem is solved via Monte Carlo sampling. To address this limitation, this paper proposes an enhanced MPPI method that incorporates a Jacobian reconstruction technique and the second-order Generalized Gauss-Newton method. This novel approach is called \textit{Gauss-Newton accelerated MPPI}. The numerical results show that the Gauss-Newton accelerated MPPI approach substantially improves MPPI scalability and computational efficiency while preserving the key benefits of the classical MPPI framework, making it a promising approach even for high-dimensional problems.
comment: 6 pages, 3 figures, submitted to the IFAC World Congress 2026
One Ring to Rule Them All: Constrained Distributional Control for Massive-Scale Heterogeneous Robotic Ensemble Systems
Ensemble control aims to steer a population of dynamical systems using a shared control input. This paper introduces a constrained ensemble control framework for parameterized, heterogeneous robotic systems operating under state and environmental constraints, such as obstacle avoidance. We develop a moment kernel transform that maps the parameterized ensemble dynamics to the moment system in a kernel space, enabling the characterization of population-level behavior. The state-space constraints, such as polyhedral waypoints to be visited and obstacles to be avoided, are also transformed into the moment space, leading to a unified formulation for safe, large-scale ensemble control. Expressive signal temporal logic specifications are employed to encode complex visit-avoid tasks, which are achieved through a single shared controller synthesized from our constrained ensemble control formulation. Simulation and hardware experiments demonstrate the effectiveness of the proposed approach in safely and efficiently controlling robotic ensembles within constrained environments.
comment: 9 pages, 8 figures
MARL Warehouse Robots
We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environment and a custom Unity 3D simulation. Our experiments reveal that QMIX's value decomposition significantly outperforms independent learning approaches (achieving 3.25 mean return vs. 0.38 for advanced IPPO), but requires extensive hyperparameter tuning -- particularly extended epsilon annealing (5M+ steps) for sparse reward discovery. We demonstrate successful deployment in Unity ML-Agents, achieving consistent package delivery after 1M training steps. While MARL shows promise for small-scale deployments (2-4 robots), significant scaling challenges remain. Code and analyses: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
comment: 6 pages, 4 tables. Project documentation: https://pallman14.github.io/MARL-QMIX-Warehouse-Robots/
Open-Ended Goal Inference through Actions and Language for Human-Robot Collaboration
To collaborate with humans, robots must infer goals that are often ambiguous, difficult to articulate, or not drawn from a fixed set. Prior approaches restrict inference to a predefined goal set, rely only on observed actions, or depend exclusively on explicit instructions, making them brittle in real-world interactions. We present BALI (Bidirectional Action-Language Inference) for goal prediction, a method that integrates natural language preferences with observed human actions in a receding-horizon planning tree. BALI combines language and action cues from the human, asks clarifying questions only when the expected information gain from the answer outweighs the cost of interruption, and selects supportive actions that align with inferred goals. We evaluate the approach in collaborative cooking tasks, where goals may be novel to the robot and unbounded. Compared to baselines, BALI yields more stable goal predictions and significantly fewer mistakes.
comment: Accepted to ACM/IEEE International Conference on Human-Robot Interaction, 2026 (HRI 2026), 10 pages, 4 figures
Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops
Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.
RoboBPP: Benchmarking Robotic Online Bin Packing with Physics-based Simulation
Physical feasibility in 3D bin packing is a key requirement in modern industrial logistics and robotic automation. With the growing adoption of industrial automation, online bin packing has gained increasing attention. However, inconsistencies in problem settings, test datasets, and evaluation metrics have hindered progress in the field, and there is a lack of a comprehensive benchmarking system. Direct testing on real hardware is costly, and building a realistic simulation environment is also challenging. To address these limitations, we introduce RoboBPP, a benchmarking system designed for robotic online bin packing. RoboBPP integrates a physics-based simulator to assess physical feasibility. In our simulation environment, we introduce a robotic arm and boxes at real-world scales to replicate real industrial packing workflows. By simulating conditions that arise in real industrial applications, we ensure that evaluated algorithms are practically deployable. In addition, prior studies often rely on synthetic datasets whose distributions differ from real-world industrial data. To address this issue, we collect three datasets from real industrial workflows, including assembly-line production, logistics packing, and furniture manufacturing. The benchmark comprises three carefully designed test settings and extends existing evaluation metrics with new metrics for structural stability and operational safety. We design a scoring system and derive a range of insights from the evaluation results. RoboBPP is fully open-source and is equipped with visualization tools and an online leaderboard, providing a reproducible and extensible foundation for future research and industrial applications (https://robot-bin-packing-benchmark.github.io).
comment: Under review at the International Journal of Robotics Research (IJRR)
Bridging Probabilistic Inference and Behavior Trees: An Interactive Framework for Adaptive Multi-Robot Cooperation
This paper proposes an Interactive Inference Behavior Tree (IIBT) framework that integrates behavior trees (BTs) with active inference under the free energy principle for distributed multi-robot decision-making. The proposed IIBT node extends conventional BTs with probabilistic reasoning, enabling online joint planning and execution across multiple robots. It remains fully compatible with standard BT architectures, allowing seamless integration into existing multi-robot control systems. Within this framework, multi-robot cooperation is formulated as a free-energy minimization process, where each robot dynamically updates its preference matrix based on perceptual inputs and peer intentions, thereby achieving adaptive coordination in partially observable and dynamic environments. The proposed approach is validated through both simulation and real-world experiments, including a multi-robot maze navigation and a collaborative manipulation task, compared against traditional BTs(https://youtu.be/KX_oT3IDTf4). Experimental results demonstrate that the IIBT framework reduces BT node complexity by over 70%, while maintaining robust, interpretable, and adaptive cooperative behavior under environmental uncertainty.
comment: 34 pages, is submitted RAS Journal
Development of a 15-Degree-of-Freedom Bionic Hand with Cable-Driven Transmission and Distributed Actuation
In robotic hand research, minimizing the number of actuators while maintaining human-hand-consistent dimensions and degrees of freedom constitutes a fundamental challenge. Drawing bio-inspiration from human hand kinematic configurations and muscle distribution strategies, this work proposes a novel 15-DoF dexterous robotic hand, with detailed analysis of its mechanical architecture, electrical system, and control system. The bionic hand employs a new tendon-driven mechanism, significantly reducing the number of motors required by traditional tendon-driven systems while enhancing motion performance and simplifying the mechanical structure. This design integrates five motors in the forearm to provide strong gripping force, while ten small motors are installed in the palm to support fine manipulation tasks. Additionally, a corresponding joint sensing and motor driving electrical system was developed to ensure efficient control and feedback. The entire system weighs only 1.4kg, combining lightweight and high-performance features. Through experiments, the bionic hand exhibited exceptional dexterity and robust grasping capabilities, demonstrating significant potential for robotic manipulation tasks.
FALCON: Actively Decoupled Visuomotor Policies for Loco-Manipulation with Foundation-Model-Based Coordination
We present FoundAtion-model-guided decoupled LoCO-maNipulation visuomotor policies (FALCON), a framework for loco-manipulation that combines modular diffusion policies with a vision-language foundation model as the coordinator. Our approach explicitly decouples locomotion and manipulation into two specialized visuomotor policies, allowing each subsystem to rely on its own observations. This mitigates the performance degradation that arise when a single policy is forced to fuse heterogeneous, potentially mismatched observations from locomotion and manipulation. Our key innovation lies in restoring coordination between these two independent policies through a vision-language foundation model, which encodes global observations and language instructions into a shared latent embedding conditioning both diffusion policies. On top of this backbone, we introduce a phase-progress head that uses textual descriptions of task stages to infer discrete phase and continuous progress estimates without manual phase labels. To further structure the latent space, we incorporate a coordination-aware contrastive loss that explicitly encodes cross-subsystem compatibility between arm and base actions. We evaluate FALCON on two challenging loco-manipulation tasks requiring navigation, precise end-effector placement, and tight base-arm coordination. Results show that it surpasses centralized and decentralized baselines while exhibiting improved robustness and generalization to out-of-distribution scenarios.
Vertical Planetary Landing on Sloped Terrain Using Optical Flow Divergence Estimates
Autonomous landing on sloped terrain poses significant challenges for small, lightweight spacecraft, such as rotorcraft and landers. These vehicles have limited processing capability and payload capacity, which makes advanced deep learning methods and heavy sensors impractical. Flying insects, such as bees, achieve remarkable landings with minimal neural and sensory resources, relying heavily on optical flow. By regulating flow divergence, a measure of vertical velocity divided by height, they perform smooth landings in which velocity and height decay exponentially together. However, adapting this bio-inspired strategy for spacecraft landings on sloped terrain presents two key challenges: global flow-divergence estimates obscure terrain inclination, and the nonlinear nature of divergence-based control can lead to instability when using conventional controllers. This paper proposes a nonlinear control strategy that leverages two distinct local flow divergence estimates to regulate both thrust and attitude during vertical landings. The control law is formulated based on Incremental Nonlinear Dynamic Inversion to handle the nonlinear flow divergence. The thrust control ensures a smooth vertical descent by keeping a constant average of the local flow divergence estimates, while the attitude control aligns the vehicle with the inclined surface at touchdown by exploiting their difference. The approach is evaluated in numerical simulations using a simplified 2D spacecraft model across varying slopes and divergence setpoints. Results show that regulating the average divergence yields stable landings with exponential decay of velocity and height, and using the divergence difference enables effective alignment with inclined terrain. Overall, the method offers a robust, low-resource landing strategy that enhances the feasibility of autonomous planetary missions with small spacecraft.
comment: This paper is accepted at International Astronautical Congress (IAC 2025)
Seabed-to-Sky Mapping of Maritime Environments with a Dual Orthogonal SONAR and LiDAR Sensor Suite
Critical maritime infrastructure increasingly demands situational awareness both above and below the surface, yet existing ''seabed-to-sky'' mapping pipelines either rely on GNSS (vulnerable to shadowing/spoofing) or expensive bathymetric sonars. We present a unified, GNSS-independent mapping system that fuses LiDAR-IMU with a dual, orthogonally mounted Forward Looking Sonars (FLS) to generate consistent seabed-to-sky maps from an Autonomous Surface Vehicle. On the acoustic side, we extend orthogonal wide-aperture fusion to handle arbitrary inter-sonar translations (enabling heterogeneous, non-co-located models) and extract a leading edge from each FLS to form line-scans. On the mapping side, we modify LIO-SAM to ingest both stereo-derived 3D sonar points and leading-edge line-scans at and between keyframes via motion-interpolated poses, allowing sparse acoustic updates to contribute continuously to a single factor-graph map. We validate the system on real-world data from Belvederekanalen (Copenhagen), demonstrating real-time operation with approx. 2.65 Hz map updates and approx. 2.85 Hz odometry while producing a unified 3D model that spans air-water domains.
ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety
Vulnerable road users (VRUs) face high collision risks in mixed traffic, yet most existing safety systems prioritize driver or vehicle assistance over direct VRU support. This paper presents ARCAS, a real-time augmented reality collision avoidance system that provides personalized spatial alerts to VRUs via wearable AR headsets. By fusing roadside 360-degree 3D LiDAR with SLAM-based headset tracking and an automatic 3D calibration procedure, ARCAS accurately overlays world-locked 3D bounding boxes and directional arrows onto approaching hazards in the user's passthrough view. The system also enables multi-headset coordination through shared world anchoring. Evaluated in real-world pedestrian interactions with e-scooters and vehicles (180 trials), ARCAS nearly doubled pedestrians' time-to-collision and increased counterparts' reaction margins by up to 4x compared to unaided-eye conditions. Results validate the feasibility and effectiveness of LiDAR-driven AR guidance and highlight the potential of wearable AR as a promising next-generation safety tool for urban mobility.
comment: 8 pages, 3 figures, 1 table
Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture
In commercial robotic systems, it is common to encounter a closed inner-loop torque controller that is not user-modifiable. However, the outer-loop controller, which sends kinematic commands such as position or velocity for the inner-loop controller to track, is typically exposed to users. In this work, we focus on the development of an easily integrated add-on at the outer-loop layer by combining disturbance rejection control and robust control barrier function for high-performance tracking and safe control of the whole dynamic system of an industrial manipulator. This is particularly beneficial when 1) the inner-loop controller is imperfect, unmodifiable, and uncertain; and 2) the dynamic model exhibits significant uncertainty. Stability analysis, formal safety guarantee proof, and hardware experiments with a PUMA robotic manipulator are presented. Our solution demonstrates superior performance in terms of simplicity of implementation, robustness, tracking precision, and safety compared to the state of the art. Video: https://youtu.be/zw1tanvrV8Q
XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots
As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.
comment: 10 pages, 5 figures
Invariance Co-training for Robot Visual Generalization
Reasoning from diverse observations is a fundamental capability for generalist robot policies to operate in a wide range of environments. Despite recent advancements, many large-scale robotic policies still remain sensitive to key sources of observational variation such as changes in camera perspective, lighting, and the presence of distractor objects. We posit that the limited generalizability of these models arises from the substantial diversity required to robustly cover these quasistatic axes, coupled with the current scarcity of large-scale robotic datasets that exhibit rich variation across them. In this work, we propose to systematically examine what robots need to generalize across these challenging axes by introducing two key auxiliary tasks, state similarity and invariance to observational perturbations, applied to both demonstration data and static visual data. We then show that via these auxiliary tasks, leveraging both more-expensive robotic demonstration data and less-expensive, visually rich synthetic images generated from non-physics-based simulation (for example, Unreal Engine) can lead to substantial increases in generalization to unseen camera viewpoints, lighting configurations, and distractor conditions. Our results demonstrate that co-training on this diverse data improves performance by 18 percent over existing generative augmentation methods. For more information and videos, please visit https://invariance-cotraining.github.io
comment: 14 pages, 10 figures
Search at Scale: Improving Numerical Conditioning of Ergodic Coverage Optimization for Multi-Scale Domains
Recent methods in ergodic coverage planning have shown promise as tools that can adapt to a wide range of geometric coverage problems with general constraints, but are highly sensitive to the numerical scaling of the problem space. The underlying challenge is that the optimization formulation becomes brittle and numerically unstable with changing scales, especially under potentially nonlinear constraints that impose dynamic restrictions, due to the kernel-based formulation. This paper proposes to address this problem via the development of a scale-agnostic and adaptive ergodic coverage optimization method based on the maximum mean discrepancy metric (MMD). Our approach allows the optimizer to solve for the scale of differential constraints while annealing the hyperparameters to best suit the problem domain and ensure physical consistency. We also derive a variation of the ergodic metric in the log space, providing additional numerical conditioning without loss of performance. We compare our approach with existing coverage planning methods and demonstrate the utility of our approach on a wide range of coverage problems.
Wake Vectoring for Efficient Morphing Flight
Morphing aerial robots have the potential to transform autonomous flight, enabling navigation through cluttered environments, perching, and seamless transitions between aerial and terrestrial locomotion. Yet mid-flight reconfiguration presents a critical aerodynamic challenge: tilting propulsors to achieve shape change reduces vertical thrust, undermining stability and control authority. Here, we introduce a passive wake vectoring mechanism that recovers lost thrust during morphing. Integrated into a novel robotic system, Aerially Transforming Morphobot (ATMO), internal deflectors intercept and redirect rotor wake downward, passively steering airflow momentum that would otherwise be wasted. This electronics-free solution achieves up to a 40% recovery of vertical thrust in configurations where no useful thrust would otherwise be produced, substantially extending hover and maneuvering capabilities during transformation. Our findings highlight a new direction for morphing aerial robot design, where passive aerodynamic structures, inspired by thrust vectoring in rockets and aircraft, enable efficient, agile flight without added mechanical complexity.
Two-Stage Camera Calibration Method for Multi-Camera Systems Using Scene Geometry
Calibration of multi-camera systems is a key task for accurate object tracking. However, it remains a challenging problem in real-world conditions, where traditional methods are not applicable due to the lack of accurate floor plans, physical access to place calibration patterns, or synchronized video streams. This paper presents a novel two-stage calibration method that overcomes these limitations. In the first stage, partial calibration of individual cameras is performed based on an operator's annotation of natural geometric primitives (parallel, perpendicular, and vertical lines, or line segments of equal length). This allows estimating key parameters (roll, pitch, focal length) and projecting the camera's Effective Field of View (EFOV) onto the horizontal plane in a base 3D coordinate system. In the second stage, precise system calibration is achieved through interactive manipulation of the projected EFOV polygons. The operator adjusts their position, scale, and rotation to align them with the floor plan or, in its absence, using virtual calibration elements projected onto all cameras in the system. This determines the remaining extrinsic parameters (camera position and yaw). Calibration requires only a static image from each camera, eliminating the need for physical access or synchronized video. The method is implemented as a practical web service. Comparative analysis and demonstration videos confirm the method's applicability, accuracy, and flexibility, enabling the deployment of precise multi-camera tracking systems in scenarios previously considered infeasible.
SO-Bench: A Structural Output Evaluation of Multimodal LLMs
Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in textual domain, there is still no benchmark that systematically evaluates schema-grounded information extraction and reasoning over visual inputs. In this work, we conduct a comprehensive study of visual structural output capabilities for MLLMs with our carefully designed SO-Bench benchmark. Covering four visual domains, including UI screens, natural images, documents, and charts, SO-Bench is built from over 6.5K diverse JSON schemas and 1.8K curated image-schema pairs with human-verified quality. Benchmarking experiments on open-sourced and frontier proprietary models reveal persistent gaps in predicting accurate, schema compliant outputs, highlighting the need for better multimodal structured reasoning. Beyond benchmarking, we further conduct training experiments to largely improve the model's structured output capability. We plan to make the benchmark available to the community.
comment: v2 preprint. Fixed some typos, add a discussion about limitation, provide pseudo-codes for eval
The Autonomy-Alignment Problem in Open-Ended Learning Robots: Formalising the Purpose Framework
The rapid advancement of artificial intelligence is enabling the development of increasingly autonomous robots capable of operating beyond engineered factory settings and into the unstructured environments of human life. This shift raises a critical autonomy-alignment problem: how to ensure that a robot's autonomous learning focuses on acquiring knowledge and behaviours that serve human practical objectives while remaining aligned with broader human values (e.g., safety and ethics). This problem remains largely underexplored and lacks a unifying conceptual and formal framework. Here, we address one of its most challenging instances of the problem: open-ended learning (OEL) robots, which autonomously acquire new knowledge and skills through interaction with the environment, guided by intrinsic motivations and self-generated goals. We propose a computational framework, introduced qualitatively and then formalised, to guide the design of OEL architectures that balance autonomy with human control. At its core is the novel concept of purpose, which specifies what humans (designers or users) want the robot to learn, do, or avoid, independently of specific task domains. The framework decomposes the autonomy-alignment problem into four tractable sub-problems: the alignment of robot purposes (hardwired or learnt) with human purposes; the arbitration between multiple purposes; the grounding of abstract purposes into domain-specific goals; and the acquisition of competence to achieve those goals. The framework supports formal definitions of alignment across multiple cases and proofs of necessary and sufficient conditions under which alignment holds. Illustrative hypothetical scenarios showcase the applicability of the framework for guiding the development of purpose-aligned autonomous robots.
comment: 33 pages, 5 figures
HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments
Reinforcement learning (RL) controllers have made impressive progress in humanoid locomotion and light-weight object manipulation. However, achieving robust and precise motion control with intense force interaction remains a significant challenge. To address these limitations, this paper proposes HAFO, a dual-agent reinforcement learning framework that concurrently optimizes both a robust locomotion strategy and a precise upper-body manipulation strategy via coupled training in environments with external disturbances. The external pulling disturbances are explicitly modeled using a spring-damper system, allowing for fine-grained force control through manipulation of the virtual spring. In this process, the reinforcement learning policy autonomously generates a disturbance-rejection response by utilizing environmental feedback. Furthermore, HAFO employs an asymmetric Actor-Critic framework in which the Critic network's access to privileged external forces guides the actor network to acquire generalizable force adaptation for resisting external disturbances. The experimental results demonstrate that HAFO achieves whole-body control for humanoid robots across diverse force-interaction environments, delivering outstanding performance in load-bearing tasks and maintaining stable operation even under rope suspension state.
GigaBrain-0: A World Model-Powered Vision-Language-Action Model
Training Vision-Language-Action (VLA) models for generalist robots typically requires large-scale real-world robot data, which is expensive and time-consuming to collect. The inefficiency of physical data collection severely limits the scalability, and generalization capacity of current VLA systems. To address this challenge, we introduce GigaBrain-0, a novel VLA foundation model empowered by world model-generated data (e.g., video generation, real2real transfer, human transfer, view transfer, sim2real transfer data). By leveraging world models to generate diverse data at scale, GigaBrain-0 significantly reduces reliance on real robot data while improving cross-task generalization. Our approach further improves policy robustness through RGBD input modeling and embodied Chain-of-Thought (CoT) supervision, enabling the model to reason about spatial geometry, object states, and long-horizon dependencies during task execution. This leads to substantial gains in real-world performance on dexterous, long-horizon, and mobile manipulation tasks. Extensive experiments demonstrate that GigaBrain-0 achieves superior generalization across variations in appearances (e.g., textures, colors), object placements, and camera viewpoints. Additionally, we present GigaBrain-0-Small, an optimized lightweight variant designed to run efficiently on devices such as the NVIDIA Jetson AGX Orin.
comment: https://gigabrain0.github.io/
Surfel-LIO: Fast LiDAR-Inertial Odometry with Pre-computed Surfels and Hierarchical Z-order Voxel Hashing
LiDAR-inertial odometry (LIO) is an active research area, as it enables accurate real-time state estimation in GPS-denied environments. Recent advances in map data structures and spatial indexing have significantly improved the efficiency of LIO systems. Nevertheless, we observe that two aspects may still leave room for improvement: (1) nearest neighbor search often requires examining multiple spatial units to gather sufficient points for plane fitting, and (2) plane parameters are typically recomputed at every iteration despite unchanged map geometry. Motivated by these observations, we propose Surfel-LIO, which employs a hierarchical voxel structure (hVox) with pre-computed surfel representation. This design enables O(1) correspondence retrieval without runtime neighbor enumeration or plane fitting, combined with Z-order curve encoding for cache-friendly spatial indexing. Experimental results on the M3DGR dataset demonstrate that our method achieves significantly faster processing speed compared to recent state-of-the-art methods while maintaining comparable state estimation accuracy. Our implementation is publicly available at https://github.com/93won/lidar_inertial_odometry.
Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependencies. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from two scenarios defined in United Nations Regulation No. 157. Our evaluation on approximately 18 million instances of these two scenarios demonstrates that Gaussian Mixture Copula Models consistently surpass Gaussian Copula Models and perform better than, or at least comparably to, Gaussian Mixture Models, as measured by both log-likelihood and Sinkhorn distance. These results are promising for the adoption of Gaussian Mixture Copula Models as a statistical foundation for future scenario-based validation frameworks.
comment: 9 pages, 4 figures; This work has been submitted to the IEEE for possible publication; Code available at: https://codeocean.com/capsule/1003615/tree
WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization
Visual geo-localization for drones faces critical degradation under weather perturbations, \eg, rain and fog, where existing methods struggle with two inherent limitations: 1) Heavy reliance on limited weather categories that constrain generalization, and 2) Suboptimal disentanglement of entangled scene-weather features through pseudo weather categories. We present WeatherPrompt, a multi-modality learning paradigm that establishes weather-invariant representations through fusing the image embedding with the text context. Our framework introduces two key contributions: First, a Training-free Weather Reasoning mechanism that employs off-the-shelf large multi-modality models to synthesize multi-weather textual descriptions through human-like reasoning. It improves the scalability to unseen or complex weather, and could reflect different weather strength. Second, to better disentangle the scene and weather feature, we propose a multi-modality framework with the dynamic gating mechanism driven by the text embedding to adaptively reweight and fuse visual features across modalities. The framework is further optimized by the cross-modal objectives, including image-text contrastive learning and image-text matching, which maps the same scene with different weather conditions closer in the respresentation space. Extensive experiments validate that, under diverse weather conditions, our method achieves competitive recall rates compared to state-of-the-art drone geo-localization methods. Notably, it improves Recall@1 by +13.37\% under night conditions and by 18.69\% under fog and snow conditions.
PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces
Legged locomotion in constrained spaces (called crawl spaces) is challenging. In crawl spaces, current proprioceptive locomotion learning methods are difficult to achieve traverse because only ground features are inferred. In this study, a point cloud supervised RL framework for proprioceptive locomotion in crawl spaces is proposed. A state estimation network is designed to estimate the robot's collision states as well as ground and spatial features for locomotion. A point cloud feature extraction method is proposed to supervise the state estimation network. The method uses representation of the point cloud in polar coordinate frame and MLPs for efficient feature extraction. Experiments demonstrate that, compared with existing methods, our method exhibits faster iteration time in the training and more agile locomotion in crawl spaces. This study enhances the ability of legged robots to traverse constrained spaces without requiring exteroceptive sensors.
comment: Accepted by RA-L
SkillWrapper: Generative Predicate Invention for Skill Abstraction
Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.
Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation
Urban driving with connected and automated vehicles (CAVs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. This neglects the significant impact of lateral decisions, such as lane changes, on overall energy efficiency, especially in environments with traffic signals and heterogeneous traffic flow. To address this gap, we propose a novel energy-aware motion planning framework that jointly optimizes longitudinal speed and lateral lane-change decisions using vehicle-to-infrastructure (V2I) communication. Our approach estimates long-term energy costs using a graph-based approximation and solves short-horizon optimal control problems under traffic constraints. Using a data-driven energy model calibrated to an actual battery electric vehicle, we demonstrate with vehicle-in-the-loop experiments that our method reduces motion energy consumption by up to 24 percent compared to a human driver, highlighting the potential of connectivity-enabled planning for sustainable urban autonomy.
comment: Accepted at 2025 IEEE Conference on Decision and Control (CDC25')
A Fast and Model Based Approach for Evaluating Task-Competence of Antagonistic Continuum Arms
Soft robot arms have made significant progress towards completing human-scale tasks, but designing arms for tasks with specific load and workspace requirements remains difficult. A key challenge is the lack of model-based design tools, forcing advancement to occur through empirical iteration and observation. Existing models are focused on control and rely on parameter fits, which means they cannot provide general conclusions about the mapping between design and performance or the influence of factors outside the fitting data.As a first step toward model-based design tools, we introduce a novel method of analyzing whether a proposed arm design can complete desired tasks. Our method is informative, interpretable, and fast; it provides novel metrics for quantifying a proposed arm design's ability to perform a task, it yields a graphical interpretation of performance through segment forces, and computing it is over 80x faster than optimization based methods.Our formulation focuses on antagonistic, pneumatically-driven soft arms. We demonstrate our approach through example analysis, and also through consideration of antagonistic vs non-antagonistic designs. Our method enables fast, direct and task-specific comparison of these two architectures, and provides a new visualization of the comparative mechanics. While only a first step, the proposed approach will support advancement of model-based design tools, leading to highly capable soft arms.
comment: Published in the 8th IEEE-RAS International Conference on Soft Robotics (RoboSoft 2025). See https://github.com/wfan19/antagonistic-task-competency for code, proofs, and supplementary information. Please note the officially published version of the paper in IEEE contains an error in Equation 7. That has been corrected here, so this is the final version of the paper. Apologies for the confusion!
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
Vision Language Models (VLMs) have achieved impressive performance on spatial reasoning benchmarks, yet these evaluations mask critical weaknesses in understanding object interactions. Current benchmarks test high level relationships ('left of,' 'behind', etc.) but ignore fine-grained spatial understanding needed for real world applications: precise 3D localization, physical compatibility between objects, object affordances and multi step spatial planning. In this work, we present BOP-ASK, a novel large scale dataset for object interaction reasoning for both training and benchmarking. Our data generation pipeline leverages 6D object poses from the Benchmark for Object Pose Estimation (BOP) datasets from which we derive fine grained annotations such as grasp poses, referred object poses, path planning trajectories, relative spatial and depth relationships, and object-to-object relationships. BOP-ASK comprises over 150k images and 33M question answer pairs spanning six tasks (four novel), providing a rich resource for training and evaluating VLMs. We evaluate proprietary and open sourced VLMs, and conduct human evaluations on BOP-ASK-core, a contributed test benchmark. We also release BOP-ASK-lab, an out-of-distribution benchmark with images not sourced from BOP, enabling testing of generalization. Our experiments demonstrate that models trained on BOP-ASK outperform baselines and exhibit emergent capabilities such as precise object and grasp pose estimation, trajectory planning, and fine-grained object-centric spatial reasoning in cluttered environments. We will publicly release our datasets and dataset generation pipeline.
Designing for Distributed Heterogeneous Modularity: On Software Architecture and Deployment of MoonBots SP
This paper presents the software architecture and deployment strategy behind the MoonBot platform: a modular space robotic system composed of heterogeneous components distributed across multiple computers, networks and ultimately celestial bodies. We introduce a principled approach to distributed, heterogeneous modularity, extending modular robotics beyond physical reconfiguration to software, communication and orchestration. We detail the architecture of our system that integrates component-based design, a data-oriented communication model using ROS2 and Zenoh, and a deployment orchestrator capable of managing complex multi-module assemblies. These abstractions enable dynamic reconfiguration, decentralized control, and seamless collaboration between numerous operators and modules. At the heart of this system lies our open-source Motion Stack software, validated by months of field deployment with self-assembling robots, inter-robot cooperation, and remote operation. Our architecture tackles the significant hurdles of modular robotics by significantly reducing integration and maintenance overhead, while remaining scalable and robust. Although tested with space in mind, we propose generalizable patterns for designing robotic systems that must scale across time, hardware, teams and operational environments.
comment: 6 pages, 8 figures. Accepted at ISPARO 2025
Scalable Policy Evaluation with Video World Models
Training generalist policies for robotic manipulation has shown great promise, as they enable language-conditioned, multi-task behaviors across diverse scenarios. However, evaluating these policies remains difficult because real-world testing is expensive, time-consuming, and labor-intensive. It also requires frequent environment resets and carries safety risks when deploying unproven policies on physical robots. Manually creating and populating simulation environments with assets for robotic manipulation has not addressed these issues, primarily due to the significant engineering effort required and the substantial sim-to-real gap, both in terms of physics and rendering. In this paper, we explore the use of action-conditional video generation models as a scalable way to learn world models for policy evaluation. We demonstrate how to incorporate action conditioning into existing pre-trained video generation models. This allows leveraging internet-scale in-the-wild online videos during the pre-training stage and alleviates the need for a large dataset of paired video-action data, which is expensive to collect for robotic manipulation. Our paper examines the effect of dataset diversity, pre-trained weights, and common failure cases for the proposed evaluation pipeline. Our experiments demonstrate that across various metrics, including policy ranking and the correlation between actual policy values and predicted policy values, these models offer a promising approach for evaluating policies without requiring real-world interactions.
Q-STAC: Q-Guided Stein Variational Model Predictive Actor-Critic
Deep reinforcement learning (DRL) often struggles with complex robotic manipulation tasks due to low sample efficiency and biased value estimation. Model-based reinforcement learning (MBRL) improves efficiency by leveraging environment dynamics, with prior work integrating Model Predictive Control (MPC) to enhance policy robustness through online trajectory optimization. However, existing MBRL approaches still suffer from high model bias, task-specific cost function design, and significant computational overhead. To address these challenges, we propose Q-guided Stein Variational Model Predictive Actor-Critic (Q-STAC)--a unified framework that bridges Bayesian MPC and Soft Actor-Critic (SAC). Q-STAC employs Stein Variational Gradient Descent (SVGD) to iteratively optimize action sequences sampled from a learned prior distribution guided by Q-values, thereby eliminating manual cost-function engineering. By performing short-horizon model-predictive rollouts, Q-STAC reduces cumulative prediction errors, improves training stability and reduces computational complexity. Experiments on simulated particle navigation, diverse robotic manipulation tasks, and a real-world fruit-picking scenario demonstrate that Q-STAC consistently achieves superior sample efficiency, stability, and overall performance compared to both model-free and model-based baselines.
comment: 9 pages, 10 figures
Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning
Despite strong results on recognition and segmentation, current 3D visual pre-training methods often underperform on robotic manipulation. We attribute this gap to two factors: the lack of state-action-state dynamics modeling and the unnecessary redundancy of explicit geometric reconstruction. We introduce AFRO, a self-supervised framework that learns dynamics-aware 3D representations without action or reconstruction supervision. AFRO casts state prediction as a generative diffusion process and jointly models forward and inverse dynamics in a shared latent space to capture causal transition structure. To prevent feature leakage in action learning, we employ feature differencing and inverse-consistency supervision, improving the quality and stability of visual features. When combined with Diffusion Policy, AFRO substantially increases manipulation success rates across 16 simulated and 4 real-world tasks, outperforming existing pre-training approaches. The framework also scales favorably with data volume and task complexity. Qualitative visualizations indicate that AFRO learns semantically rich, discriminative features, offering an effective pre-training solution for 3D representation learning in robotics. Project page: https://kolakivy.github.io/AFRO/
Beyond Description: Cognitively Benchmarking Fine-Grained Action for Embodied Agents
Multimodal Large Language Models (MLLMs) show promising results as decision-making engines for embodied agents operating in complex, physical environments. However, existing benchmarks often prioritize high-level planning or spatial reasoning, leaving the fine-grained action intelligence required for embodied physical interaction underexplored. To address this gap, we introduce CFG-Bench, a new benchmark designed to systematically evaluate this crucial capability. CFG-Bench consists of 1,368 curated videos paired with 19,562 three-modalities question-answer pairs targeting four cognitive abilities: 1) Physical Interaction, 2) Temporal-Causal Relation, 3) Intentional Understanding, and 4) Evaluative Judgment. Together, these dimensions provide a systematic framework for assessing a model's ability to translate visual observations into actionable knowledge, moving beyond mere surface-level recognition. Our comprehensive evaluation on CFG-Bench reveals that leading MLLMs struggle to produce detailed instructions for physical interactions and exhibit profound limitations in the higher-order reasoning of intention and evaluation. Moreover, supervised fine-tuning (SFT) on our data demonstrates that teaching an MLLMs to articulate fine-grained actions directly translates to significant performance gains on established embodied benchmarks. Our analysis highlights these limitations and offers insights for developing more capable and grounded embodied agents. Project page: \href{https://cfg-bench.github.io/}{https://cfg-bench.github.io/}.
WiSER-X: Wireless Signals-based Efficient Decentralized Multi-Robot Exploration without Explicit Information Exchange
We introduce a Wireless Signal based Efficient multi-Robot eXploration (WiSER-X) algorithm applicable to a decentralized team of robots exploring an unknown environment with communication bandwidth constraints. WiSER-X relies only on local inter-robot relative position estimates, that can be obtained by exchanging signal pings from onboard sensors such as WiFi, Ultra-Wide Band, amongst others, to inform the exploration decisions of individual robots to minimize redundant coverage overlaps. Furthermore, WiSER-X also enables asynchronous termination without requiring a shared map between the robots. It also adapts to heterogeneous robot behaviors and even complete failures in unknown environment while ensuring complete coverage. Simulations show that WiSER-X leads to 58% lower overlap than a zero-information-sharing baseline algorithm-1 and only 23% more overlap than a full-information-sharing algorithm baseline algorithm-2. Hardware experiments further validate the feasibility of WiSER-X using full onboard sensing.
MIGHTY: Hermite Spline-based Efficient Trajectory Planning
Hard-constraint trajectory planners often rely on commercial solvers and demand substantial computational resources. Existing soft-constraint methods achieve faster computation, but either (1) decouple spatial and temporal optimization or (2) restrict the search space. To overcome these limitations, we introduce MIGHTY, a Hermite spline-based planner that performs spatiotemporal optimization while fully leveraging the continuous search space of a spline. In simulation, MIGHTY achieves a 9.3% reduction in computation time and a 13.1% reduction in travel time over state-of-the-art baselines, with a 100% success rate. In hardware, MIGHTY completes multiple high-speed flights up to 6.7 m/s in a cluttered static environment and long-duration flights with dynamically added obstacles.
comment: 9 pages, 10 figures
SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models
Reasoning about motion and space is a fundamental cognitive capability that is required by multiple real-world applications. While many studies highlight that large multimodal language models (MLMs) struggle to reason about space, they only focus on static spatial relationships, and not dynamic awareness of motion and space, i.e., reasoning about the effect of egocentric and object motions on spatial relationships. Manually annotating such object and camera movements is expensive. Hence, we introduce SAT, a simulated spatial aptitude training dataset utilizing 3D simulators, comprising both static and dynamic spatial reasoning across 175K question-answer (QA) pairs and 20K scenes. Complementing this, we also construct a small (150 image-QAs) yet challenging dynamic spatial test set using real-world images. Leveraging our SAT datasets and 6 existing static spatial benchmarks, we systematically investigate what improves both static and dynamic spatial awareness. Our results reveal that simulations are surprisingly effective at imparting spatial aptitude to MLMs that translate to real images. We show that perfect annotations in simulation are more effective than existing approaches of pseudo-annotating real images. For instance, SAT training improves a LLaVA-13B model by an average 11% and a LLaVA-Video-7B model by an average 8% on multiple spatial benchmarks, including our real-image dynamic test set and spatial reasoning on long videos -- even outperforming some large proprietary models. While reasoning over static relationships improves with synthetic training data, there is still considerable room for improvement for dynamic reasoning questions.
comment: Accepted to COLM 2025. Project webpage: https://arijitray.com/SAT/
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
comment: Project page: https://nvlabs.github.io/SONIC/
Multiagent Systems
Detecting Perspective Shifts in Multi-agent Systems
Generative models augmented with external tools and update mechanisms (or \textit{agents}) have demonstrated capabilities beyond intelligent prompting of base models. As agent use proliferates, dynamic multi-agent systems have naturally emerged. Recent work has investigated the theoretical and empirical properties of low-dimensional representations of agents based on query responses at a single time point. This paper introduces the Temporal Data Kernel Perspective Space (TDKPS), which jointly embeds agents across time, and proposes several novel hypothesis tests for detecting behavioral change at the agent- and group-level in black-box multi-agent systems. We characterize the empirical properties of our proposed tests, including their sensitivity to key hyperparameters, in simulations motivated by a multi-agent system of evolving digital personas. Finally, we demonstrate via natural experiment that our proposed tests detect changes that correlate sensitively, specifically, and significantly with a real exogenous event. As far as we are aware, TDKPS is the first principled framework for monitoring behavioral dynamics in black-box multi-agent systems -- a critical capability as generative agent deployment continues to scale.
Strategic Self-Improvement for Competitive Agents in AI Labour Markets
As artificial intelligence (AI) agents are deployed across economic domains, understanding their strategic behavior and market-level impact becomes critical. This paper puts forward a groundbreaking new framework that is the first to capture the real-world economic forces that shape agentic labor markets: adverse selection, moral hazard, and reputation dynamics. Our framework encapsulates three core capabilities that successful LLM-agents will need: \textbf{metacognition} (accurate self-assessment of skills), \textbf{competitive awareness} (modeling rivals and market dynamics), and \textbf{long-horizon strategic planning}. We illustrate our framework through a tractable simulated gig economy where agentic Large Language Models (LLMs) compete for jobs, develop skills, and adapt their strategies under competitive pressure. Our simulations illustrate how LLM agents explicitly prompted with reasoning capabilities learn to strategically self-improve and demonstrate superior adaptability to changing market conditions. At the market level, our simulations reproduce classic macroeconomic phenomena found in human labor markets, while controlled experiments reveal potential AI-driven economic trends, such as rapid monopolization and systemic price deflation. This work provides a foundation to further explore the economic properties of AI-driven labour markets, and a conceptual framework to study the strategic reasoning capabilities in agents competing in the emerging economy.
Chameleon: Adaptive Adversarial Agents for Scaling-Based Visual Prompt Injection in Multimodal AI Systems
Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decision-making to automated document processing. As these systems scale, they rely heavily on preprocessing pipelines to handle diverse inputs efficiently. However, this dependency on standard preprocessing operations, specifically image downscaling, creates a significant yet often overlooked security vulnerability. While intended for computational optimization, scaling algorithms can be exploited to conceal malicious visual prompts that are invisible to human observers but become active semantic instructions once processed by the model. Current adversarial strategies remain largely static, failing to account for the dynamic nature of modern agentic workflows. To address this gap, we propose Chameleon, a novel, adaptive adversarial framework designed to expose and exploit scaling vulnerabilities in production VLMs. Unlike traditional static attacks, Chameleon employs an iterative, agent-based optimization mechanism that dynamically refines image perturbations based on the target model's real-time feedback. This allows the framework to craft highly robust adversarial examples that survive standard downscaling operations to hijack downstream execution. We evaluate Chameleon against Gemini 2.5 Flash model. Our experiments demonstrate that Chameleon achieves an Attack Success Rate (ASR) of 84.5% across varying scaling factors, significantly outperforming static baseline attacks which average only 32.1%. Furthermore, we show that these attacks effectively compromise agentic pipelines, reducing decision-making accuracy by over 45% in multi-step tasks. Finally, we discuss the implications of these vulnerabilities and propose multi-scale consistency checks as a necessary defense mechanism.
comment: 5 pages, 2 figures, IEEE Transactions on Dependable and Secure Computing
Complementary Characterization of Agent-Based Models via Computational Mechanics and Diffusion Models
This article extends the preprint "Characterizing Agent-Based Model Dynamics via $ε$-Machines and Kolmogorov-Style Complexity" by introducing diffusion models as orthogonal and complementary tools for characterizing the output of agent-based models (ABMs). Where $ε$-machines capture the predictive temporal structure and intrinsic computation of ABM-generated time series, diffusion models characterize high-dimensional cross-sectional distributions, learn underlying data manifolds, and enable synthetic generation of plausible population-level outcomes. We provide a formal analysis demonstrating that the two approaches operate on distinct mathematical domains -- processes vs. distributions -- and show that their combination yields a two-axis representation of ABM behavior based on temporal organization and distributional geometry. To our knowledge, this is the first framework to integrate computational mechanics with score-based generative modeling for the structural analysis of ABM outputs, thereby situating ABM characterization within the broader landscape of modern machine-learning methods for density estimation and intrinsic computation. The framework is validated using the same elder-caregiver ABM dataset introduced in the companion paper, and we provide precise definitions and propositions formalizing the mathematical complementarity between $ε$-machines and diffusion models. This establishes a principled methodology for jointly analyzing temporal predictability and high-dimensional distributional structure in complex simulation models.
comment: 11 pages. Methods paper introducing a dual-domain framework for analyzing ABM dynamics. Companion temporal-analysis preprint: arXiv:2510.12729
Towards Ethical Multi-Agent Systems of Large Language Models: A Mechanistic Interpretability Perspective AAAI'26
Large language models (LLMs) have been widely deployed in various applications, often functioning as autonomous agents that interact with each other in multi-agent systems. While these systems have shown promise in enhancing capabilities and enabling complex tasks, they also pose significant ethical challenges. This position paper outlines a research agenda aimed at ensuring the ethical behavior of multi-agent systems of LLMs (MALMs) from the perspective of mechanistic interpretability. We identify three key research challenges: (i) developing comprehensive evaluation frameworks to assess ethical behavior at individual, interactional, and systemic levels; (ii) elucidating the internal mechanisms that give rise to emergent behaviors through mechanistic interpretability; and (iii) implementing targeted parameter-efficient alignment techniques to steer MALMs towards ethical behaviors without compromising their performance.
comment: Accepted to LaMAS 2026@AAAI'26 (https://sites.google.com/view/lamas2026)
Semi Centralized Training Decentralized Execution Architecture for Multi Agent Deep Reinforcement Learning in Traffic Signal Control
Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow either a fully centralized or a fully decentralized design. Fully centralized approaches suffer from the curse of dimensionality, and reliance on a single learning server, whereas purely decentralized approaches operate under severe partial observability and lack explicit coordination resulting in suboptimal performance. These limitations motivate region-based MARL, where the network is partitioned into smaller, tightly coupled intersections that form regions, and training is organized around these regions. This paper introduces a Semi-Centralized Training, Decentralized Execution (SEMI-CTDE) architecture for multi intersection ATSC. Within each region, SEMI-CTDE performs centralized training with regional parameter sharing and employs composite state and reward formulations that jointly encode local and regional information. The architecture is highly transferable across different policy backbones and state-reward instantiations. Building on this architecture, we implement two models with distinct design objectives. A multi-perspective experimental analysis of the two implemented SEMI-CTDE-based models covering ablations of the architecture's core elements including rule based and fully decentralized baselines shows that they achieve consistently superior performance and remain effective across a wide range of traffic densities and distributions.
comment: Co-first authors: Pouria Yazdani and Arash Rezaali
A Modular Cognitive Architecture for Assisted Reasoning: The Nemosine Framework
This paper presents the Nemosine Framework, a modular cognitive architecture designed to support assisted reasoning, structured thinking, and systematic analysis. The model operates through functional cognitive modules ("personas") that organize tasks such as planning, evaluation, cross-checking, and narrative synthesis. The framework combines principles from metacognition, distributed cognition, and modular cognitive systems to offer an operational structure for assisted problem-solving and decision support. The architecture is documented through formal specification, internal consistency criteria, and reproducible structural components. The goal is to provide a clear conceptual basis for future computational implementations and to contribute to the study of symbolic-modular architectures for reasoning.
comment: 6 pages, 1 figure. First version
XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots
As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.
comment: 10 pages, 5 figures
Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem ICML
Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative topologies (VNEAP) was introduced to capture malleable VNRs, where each request can be instantiated using one of several functionally equivalent topologies that trade resources differently. While this flexibility enlarges the feasible space, it also introduces an additional decision layer, making dynamic embedding more challenging. This paper proposes HRL-VNEAP, a hierarchical reinforcement learning approach for VNEAP under dynamic arrivals. A high-level policy selects the most suitable alternative topology (or rejects the request), and a low-level policy embeds the chosen topology onto the substrate network. Experiments on realistic substrate topologies under multiple traffic loads show that naive exploitation strategies provide only modest gains, whereas HRL-VNEAP consistently achieves the best performance across all metrics. Compared to the strongest tested baselines, HRL-VNEAP improves acceptance ratio by up to \textbf{20.7\%}, total revenue by up to \textbf{36.2\%}, and revenue-over-cost by up to \textbf{22.1\%}. Finally, we benchmark against an MILP formulation on tractable instances to quantify the remaining gap to optimality and motivate future work on learning- and optimization-based VNEAP solutions.
comment: Submitted to IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN) 2026
Norm-Governed Multi-Agent Decision-Making in Simulator-Coupled Environments:The Reinsurance Constrained Multi-Agent Simulation Process (R-CMASP)
Reinsurance decision-making exhibits the core structural properties that motivate multi-agent models: distributed and asymmetric information, partial observability, heterogeneous epistemic responsibilities, simulator-driven environment dynamics, and binding prudential and regulatory constraints. Deterministic workflow automation cannot meet these requirements, as it lacks the epistemic flexibility, cooperative coordination mechanisms, and norm-sensitive behaviour required for institutional risk-transfer. We propose the Reinsurance Constrained Multi-Agent Simulation Process (R-CMASP), a formal model that extends stochastic games and Dec-POMDPs by adding three missing elements: (i) simulator-coupled transition dynamics grounded in catastrophe, capital, and portfolio engines; (ii) role-specialized agents with structured observability, belief updates, and typed communication; and (iii) a normative feasibility layer encoding solvency, regulatory, and organizational rules as admissibility constraints on joint actions. Using LLM-based agents with tool access and typed message protocols, we show in a domain-calibrated synthetic environment that governed multi-agent coordination yields more stable, coherent, and norm-adherent behaviour than deterministic automation or monolithic LLM baselines--reducing pricing variance, improving capital efficiency, and increasing clause-interpretation accuracy. Embedding prudential norms as admissibility constraints and structuring communication into typed acts measurably enhances equilibrium stability. Overall, the results suggest that regulated, simulator-driven decision environments are most naturally modelled as norm-governed, simulator-coupled multi-agent systems.
Self-Propelled Agents and Group Social Force
Brownian motion have long been studied on a diversity of fields, not only in physics of statistical mechanics, but also in biological models, finance and economic process, and social systems. In the past twenty years, there has been a growing interest in studying the model in self-propelled feature and interaction force such that the model also fits into study of social phenomenon of many individuals. This article will continue with this research trend and especially investigate the model in paradigms for a quantitative description of social and economic process. We mainly discuss a class of collective decision process of Brownian agent/particles, where the stochastic process does not exist in the fluctuation in the traditional Brownian motion, but in selection among several discrete choices. Their decisions interacts with each other in a given social topology. To simplify our discussion the binary choice problem is particularly discussed where each agent only takes an alternative of two choices. Mathematically, we introduce a set of arrays to describe social relationship of agents in a quantitative manner, and the arrays deduce the group social force and opinion dynamics, which are useful to study complex social movement and self-organization phenomena including discrete-choice activities, social groups and de-individualization effect. Such agent-based simulation symbolizes a variety of collective activities in human society, especially in the field of economics and social science.
comment: 16 pages, 10 figures
Systems and Control (EESS)
The Evolving Landscape of Interactive Surface Sensing Technologies
Interactive surfaces have evolved from capacitive touch and IR based systems into a diverse ecosystem of sensing technologies that support rich and expressive human computer interaction. This survey traces that progression, beginning with infrared vision based approaches, such as FTIR and diffuse illumination, and the rise of capacitive touch as the dominant technology in modern devices, to focusing on contemporary modalities including vision and acoustic sensing. New technologies under development are also discussed, including mmWave radar, and vibration based techniques. Each sensing technique is examined in terms of its operating principles, resolution, scalability, and applications, along with discussions of multimodal integration. By comparing tradeoffs between sensing modalities, the survey highlights the technical and design factors that shape interactive surface performance and user experience. The review concludes by identifying persistent challenges, including sensing accuracy, power constraints, and privacy concerns, and outlines how emerging sensing modalities can enable future interactive environments to be ubiquitous and intelligent.
comment: 12 pages, 10 figures
A Randomized Scheduling Framework for Privacy-Preserving Multi-robot Rendezvous given Prior Information
Privacy has become a critical concern in modern multi-robot systems, driven by both ethical considerations and operational constraints. As a result, growing attention has been directed toward privacy-preserving coordination in dynamical multi-robot systems. This work introduces a randomized scheduling mechanism for privacy-preserving robot rendezvous. The proposed approach achieves improved privacy even at lower communication rates, where privacy is quantified via pointwise maximal leakage. We show that lower transmission rates provide stronger privacy guarantees and prove that rendezvous is still achieved under the randomized scheduling mechanism. Numerical simulations are provided to demonstrate the effectiveness of the method.
On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator
This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.
comment: 18 pages, 11 figures, 5 tables
Small-Signal Stability Oriented Real-Time Operation of Power Systems with a High Penetration of Inverter-Based Resources
This study proposes a control strategy to ensure the safe operation of modern power systems with high penetration of inverter-based resources (IBRs) within an optimal operation framework. The objective is to obtain operating points that satisfy the optimality conditions of a predefined problem while guaranteeing small-signal stability. The methodology consists of two stages. First, an offline analysis of a set of operating points is performed to derive a data-driven regression-based expression that captures a damping-based stability index as a function of the operating conditions. Second, an Online Feedback Optimization (OFO) controller is employed to drive the system toward an optimal operating point while maintaining a secure distance from the instability region. The proposed strategy is evaluated on an academic test case based on a modified version of the IEEE 9-bus system, in which synchronous generators are replaced by IBRs operating under both grid-following and grid-forming control modes. The results demonstrate the effectiveness of the method and are discussed in detail.
Stability-Guaranteed Dual Kalman Filtering for Electrochemical Battery State Estimation
Accurate and stable state estimation is critical for battery management. Although dual Kalman filtering can jointly estimate states and parameters, the strong coupling between filters may cause divergence under large initialization errors or model mismatch. This paper proposes a Stability Guaranteed Dual Kalman Filtering (SG-DKF) method. A Lyapunov-based analysis yields a sufficient stability condition, leading to an adaptive dead-zone rule that suspends parameter updates when the innovation exceeds a stability bound. Applied to an electrochemical battery model, SG-DKF achieves accuracy comparable to a dual EKF and reduces state of charge RMSE by over 45% under large initial state errors.
comment: This work has been submitted to 23rd IFAC World Congress for possible publication
Safe model-based Reinforcement Learning via Model Predictive Control and Control Barrier Functions
Optimal control strategies are often combined with safety certificates to ensure both performance and safety in safety-critical systems. A prominent example is combining Model Predictive Control (MPC) with Control Barrier Functions (CBF). Yet, efficient tuning of MPC parameters and choosing an appropriate class $\mathcal{K}$ function in the CBF is challenging and problem dependent. This paper introduces a safe model-based Reinforcement Learning (RL) framework where a parametric MPC controller incorporates a CBF constraint with a parameterized class $\mathcal{K}$ function and serves as a function approximator to learn improved safe control policies from data. Three variations of the framework are introduced, distinguished by the way the optimization problem is formulated and the class $\mathcal{K}$ function is parameterized, including neural architectures. Numerical experiments on a discrete double-integrator with static and dynamic obstacles demonstrate that the proposed methods improve performance while ensuring safety.
comment: Submitted to IFAC WC 2026, 7 pages, 3 figures
Counterfactual Explanations for Power System Optimisation
Enhanced computational capabilities of modern decision-making software have allowed us to solve increasingly sophisticated optimisation problems. But in complex socio-economic, technical environments such as electricity markets, transparent operation is key to ensure a fair treatment of all parties involved, particularly regarding dispatch decisions. We address this issue by building on the concept of counterfactual explanations, answering questions such as "Why was this generator not dispatched?" by identifying minimum changes in the input parameters that would have changed the optimal solution. Both DC Optimal Power Flow and Unit Commitment problems are considered, wherein the variable parameters are the spatial and temporal demand profiles, respectively. The thereby obtained explanations allow users to identify the most important differences between the real and expected market outcomes and observe which constraints have led to the solution. The framework uses a bilevel optimisation problem to find the counterfactual demand scenarios. State-of-the-art methods are compared with data-driven heuristics on the basis of computational efficiency and explanation accuracy. Results show that leveraging historical data from previously solved instances can provide significant speed benefits and allows us to derive explanations in cases where conventional methods would not be tractable.
Constrained Control of PDE Traffic Flow via Spatial Control Barrier Functions
In this paper, a constrained control approach to variable speed limit (VSL) control for macroscopic partial differential equations (PDE) traffic models is developed. Control Lyapunov function (CLF) theory for ordinary differential equations (ODE) is extended to account for spatially and temporally varying states and control inputs. The stabilizing CLF is then unified with safety constraints through the introduction of spatially varying control barrier functions (sCBF). These methods are applied to in-domain VSL control of the Lighthill-Whitham-Richards (LWR) model to regulate traffic density to a desired profile while ensuring the density remains below prescribed limits enforced by the sCBF. Results show that incorporating constrained control minimally affects the stabilizing control input while successfully maintaining the density with the defined safe set.
comment: Submitted to 2026 European Control Conference, 6 pages, 7 figures
Pick-to-Learn for Systems and Control: Data-driven Synthesis with State-of-the-art Safety Guarantees
Data-driven methods have become paramount in modern systems and control problems characterized by growing levels of complexity. In safety-critical environments, deploying these methods requires rigorous guarantees, a need that has motivated much recent work at the interface of statistical learning and control. However, many existing approaches achieve this goal at the cost of sacrificing valuable data for testing and calibration, or by constraining the choice of learning algorithm, thus leading to suboptimal performances. In this paper, we describe Pick-to-Learn (P2L) for Systems and Control, a framework that allows any data-driven control method to be equipped with state-of-the-art safety and performance guarantees. P2L enables the use of all available data to jointly synthesize and certify the design, eliminating the need to set aside data for calibration or validation purposes. In presenting a comprehensive version of P2L for systems and control, this paper demonstrates its effectiveness across a range of core problems, including optimal control, reachability analysis, safe synthesis, and robust control. In many of these applications, P2L delivers designs and certificates that outperform commonly employed methods, and shows strong potential for broad applicability in diverse practical settings.
comment: 27 double-column pages, 18 figures
Adaptive Online Optimization for Microgrids with Renewable Energy Sources
In this paper we propose a novel adaptive online optimization algorithm tailored to the management of microgrids with high renewable energy penetration, which can be formulated as a constrained, online optimization problem. The proposed algorithm is characterized by a control-based design that applies the internal model principle, and a system identification routine tasked with identifying such internal model. In addition, in order to ensure the constraints are verified, we integrate a projection onto the constraint set. We showcase promising numerical results for the microgrid use case, highlighting in particular the enhanced adaptability of the proposed algorithm to changes in the internal model. The performance of the proposed algorithm is shown to outperform state-of-the-art alternative in the long-term, ensuring efficient management of the grid.
TEMPO-VINE: A Multi-Temporal Sensor Fusion Dataset for Localization and Mapping in Vineyards
In recent years, precision agriculture has been introducing groundbreaking innovations in the field, with a strong focus on automation. However, research studies in robotics and autonomous navigation often rely on controlled simulations or isolated field trials. The absence of a realistic common benchmark represents a significant limitation for the diffusion of robust autonomous systems under real complex agricultural conditions. Vineyards pose significant challenges due to their dynamic nature, and they are increasingly drawing attention from both academic and industrial stakeholders interested in automation. In this context, we introduce the TEMPO-VINE dataset, a large-scale multi-temporal dataset specifically designed for evaluating sensor fusion, simultaneous localization and mapping (SLAM), and place recognition techniques within operational vineyard environments. TEMPO-VINE is the first multi-modal public dataset that brings together data from heterogeneous LiDARs of different price levels, AHRS, RTK-GPS, and cameras in real trellis and pergola vineyards, with multiple rows exceeding 100 m in length. In this work, we address a critical gap in the landscape of agricultural datasets by providing researchers with a comprehensive data collection and ground truth trajectories in different seasons, vegetation growth stages, terrain and weather conditions. The sequence paths with multiple runs and revisits will foster the development of sensor fusion, localization, mapping and place recognition solutions for agricultural fields. The dataset, the processing tools and the benchmarking results will be available at the dedicated webpage upon acceptance.
Embodied Co-Design for Rapidly Evolving Agents: Taxonomy, Frontiers, and Challenges
Brain-body co-evolution enables animals to develop complex behaviors in their environments. Inspired by this biological synergy, embodied co-design (ECD) has emerged as a transformative paradigm for creating intelligent agents-from virtual creatures to physical robots-by jointly optimizing their morphologies and controllers rather than treating control in isolation. This integrated approach facilitates richer environmental interactions and robust task performance. In this survey, we provide a systematic overview of recent advances in ECD. We first formalize the concept of ECD and position it within related fields. We then introduce a hierarchical taxonomy: a lower layer that breaks down agent design into three fundamental components-controlling brain, body morphology, and task environment-and an upper layer that integrates these components into four major ECD frameworks: bi-level, single-level, generative, and open-ended. This taxonomy allows us to synthesize insights from more than one hundred recent studies. We further review notable benchmarks, datasets, and applications in both simulated and real-world scenarios. Finally, we identify significant challenges and offer insights into promising future research directions. A project associated with this survey has been created at https://github.com/Yuxing-Wang-THU/SurveyBrainBody.
Neural Policy Composition from Free Energy Minimization
The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.
Timely Information for Strategic Persuasion
This work investigates a dynamic variant of Bayesian persuasion, in which a strategic sender seeks to influence a receiver's belief over time through controlling the timing of the information disclosure, under resource constraints. We consider a binary information source (i.e., taking values 0 or 1), where the source's state evolve according to a continuous-time Markov chain (CTMC). In this setting, the receiver aims to estimate the source's state as accurately as possible. In contrast, the sender seeks to persuade the receiver to estimate the state to be 1, regardless of whether this estimate reflects the true state. This misalignment between their objectives naturally leads to a Stackelberg game formulation where the sender, acting as the leader, chooses an information-revelation policy, and the receiver, as the follower, decides whether to follow the sender's messages. As a result, the sender's objective is to maximize the long-term average time that the receiver's estimate equals 1, subject to a total sampling constraint and a constraint for the receiver to follow the sender's messages called incentive compatibility (IC) constraint. We first consider the single-source problem and show that the sender's optimal policy is to allocate a minimal sampling rate to the undesired state 0 (just enough to satisfy the IC constraint) and assign the remaining sampling rate to the desired state 1. Next, we extend the analysis to the multi-source case, where each source has a different minimal sampling rate. Our results show that the sender can leverage the timeliness of the revealed information to influence the receiver, thereby achieving a higher utility.
A Unified Low-rank ADI Framework with Shared Linear Solves for Simultaneously Solving Multiple Lyapunov, Sylvester, and Riccati Equations
It is known in the literature that the low-rank ADI method for Lyapunov equations is a Petrov-Galerkin projection algorithm that implicitly performs model order reduction. In this paper, we show that the low-rank ADI methods for Sylvester and Riccati equations are also Petrov-Galerkin projection algorithms that implicitly perform model order reduction. By observing that the ADI methods for Lyapunov, Sylvester, and Riccati equations differ only in pole placement and not in their interpolatory nature, we show that the shifted linear solves-which constitute the bulk of the computational cost-can be shared. The pole-placement step involves only small-scale operations and is therefore inexpensive. We propose a unified ADI framework that requires only two shifted linear solves per iteration to simultaneously solve six Lyapunov equations, one Sylvester equation, and ten Riccati equations, thus substantially increasing the return on investment for the computational cost spent on the linear solves. All operations needed to extract the individual solutions from these shared linear solves are small-scale and inexpensive. Since all ADI methods implicitly perform model order reduction when solving these linear matrix equations, we show that the resulting reduced-order models can be obtained as an additional byproduct. These models not only interpolate the original transfer function at the mirror images of the ADI shifts but also preserve important system properties such as stability, minimum-phase property, positive-realness, bounded-realness, and passivity. Consequently, the proposed unified ADI framework also serves as a recursive, interpolation-based model order reduction method, which can preserve several important properties of the original model in the reduced-order model.
Auto-Optimization with Active Learning in Uncertain Environment: A Predictive Control Approach
This paper presents an auto-optimal model predictive control (MPC) framework enhanced with active learning, designed to autonomously track optimal operational conditions in an unknown environment,where the conditions may dynamically adjust to environmental changes. First, an exploitation-oriented MPC (EO-MPC) is proposed, integrating real-time sampling data with robust set-based parameter estimation techniques to address the critical challenge of parameter identification. By introducing virtual excitation signals into the terminal constraint and establishing a validation mechanism for persistent excitation condition, the EO-MPC effectively resolves the issue of insufficient persistent excitation in parameter identification. Building upon this foundation, an active learning MPC (AL-MPC) approach is developed to integrate both available and virtual future data to resolve the fundamental conflict between tracking an unknown optimal operational condition and parameter identification. The recursive feasibility and convergence of the proposed methods are rigorously established, and numerous examples substantiate the reliability and effectiveness of the approach in practical applications.
Strategies for zero boil-off liquid hydrogen transfer: an export terminal case-study
To ensure economic viability, LH2 export terminals must minimize boil-off losses. We show two strategies to achieve zero boil-off losses for the transfer of 160 000 m3 LH2 (11 248 tons) using a centrifugal pump. In the first strategy, a pump with variable speed drive (VSD) and split-range control for the flow rate achieves losses from 0 wt% to 0.24 wt% in an uncertainty analysis. A pump efficiency approaching 70% is the most important factor to minimize losses. In contrast, a fixed-speed pump has unacceptably high losses ranging from 0.76 wt% to 1.06 wt% (119 tons per ship). The second strategy is to increase the maximum pressure in the seaborne tank (base case is 1.15 bara). Zero loss is achieved for the fixed speed pump if the maximum pressure is increased to 1.35 bara, while 1.22 bara is required for the pump with VSD assuming an efficiency of 60%.
comment: 9 pages, 9 figures
Gauss-Newton accelerated MPPI Control
Model Predictive Path Integral (MPPI) control is a sampling-based optimization method that has recently attracted attention, particularly in the robotics and reinforcement learning communities. MPPI has been widely applied as a GPU-accelerated random search method to deterministic direct single-shooting optimal control problems arising in model predictive control (MPC) formulations. MPPI offers several key advantages, including flexibility, robustness, ease of implementation, and inherent parallelizability. However, its performance can deteriorate in high-dimensional settings since the optimal control problem is solved via Monte Carlo sampling. To address this limitation, this paper proposes an enhanced MPPI method that incorporates a Jacobian reconstruction technique and the second-order Generalized Gauss-Newton method. This novel approach is called \textit{Gauss-Newton accelerated MPPI}. The numerical results show that the Gauss-Newton accelerated MPPI approach substantially improves MPPI scalability and computational efficiency while preserving the key benefits of the classical MPPI framework, making it a promising approach even for high-dimensional problems.
comment: 6 pages, 3 figures, submitted to the IFAC World Congress 2026
Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR
This paper focuses on adaptive control of the discrete-time linear quadratic regulator (adaptive LQR). Recent literature has made significant contributions in proving non-asymptotic convergence rates, but existing approaches have a few drawbacks that pose barriers for practical implementation. These drawbacks include (i) a requirement of an initial stabilizing controller, (ii) a reliance on exploration for closed-loop stability, and/or (iii) computationally intensive algorithms. This paper proposes a new algorithm that overcomes these drawbacks for a particular class of discrete-time systems. This algorithm leverages direct Model-Reference Adaptive Control (direct MRAC) and combines it with an epoch-based approach in order to address the drawbacks (i)-(iii) with a provable high-probability regret bound comparable to existing literature. Simulations demonstrate that the proposed approach yields regrets that are comparable to those from existing methods when the conditions (i) and (ii) are met, and yields regrets that are significantly smaller when either of these two conditions is not met.
Omniscient Attacker in Stochastic Security Games with Interdependent Nodes
The adoption of reinforcement learning for critical infrastructure defense introduces a vulnerability where sophisticated attackers can strategically exploit the defense algorithm's learning dynamics. While prior work addresses this vulnerability in the context of repeated normal-form games, its extension to the stochastic games remains an open research gap. We close this gap by examining stochastic security games between an RL defender and an omniscient attacker, utilizing a tractable linear influence network model. To overcome the structural limitations of prior methods, we propose and apply neuro-dynamic programming. Our experimental results demonstrate that the omniscient attacker can significantly outperform a naive defender, highlighting the critical vulnerability introduced by the learning dynamics and the effectiveness of the proposed strategy.
Efficient Safety Verification of Autonomous Vehicles with Neural Network Operator
When autonomous vehicles encounter untrained scenarios, ensuring safety hinges on effective safety verification to prevent accidents stemming from unexpected model decisions. Reachability analysis, a method of safety verification, offers relatively high precision but at the cost of significant computational complexity. Our method leverages end-to-end neural network operators to compute reachable sets, replacing traditional mathematical set operators, thereby achieving higher efficiency in safety verification without substantially compromising accuracy or increasing conservativeness. We define vehicle dynamics on discrete time series and detail the safety verification process and safety standard based on reachable sets. Experimental evaluations conducted in several typical road driving scenarios demonstrate the superior efficiency performance of our proposed operator over classical methods.
An accelerated proximal bundle method for convex optimization
The proximal bundle method (PBM) is a powerful and widely used approach for minimizing nonsmooth convex functions. However, for smooth objectives, its best-known convergence rate remains suboptimal, and whether PBM can be accelerated remains open. In this work, we present the first accelerated proximal bundle method that achieves the optimal $\mathscr{O}(1/\sqrtε)$ iteration complexity for obtaining an $ε$-accurate solution in smooth convex optimization. The proposed method is conceptually simple, which differs from Nesterov's accelerated gradient descent by only a single line and retains all key structural properties of the classical PBM. In particular, it relies on the same minimal assumptions on model approximations and preserves the standard bundle testing criterion. Numerical experiments confirm the accelerated $\mathscr{O}(1/\sqrtε)$ convergence rate predicted by our theory.
comment: 21 pages, 1 figure
Adaptive Time-Domain Harmonic Control for Noise-Vibration-Harshness Reduction of Electric Drives
Reducing Noise, Vibration, and Harshness (NVH) in electric drives is crucial for applications such as electric vehicle drivetrains and heat-pump compressors, where strict NVH requirements directly affect user satisfaction and component longevity. This work presents the integration of an adaptive time-domain harmonic controller into an existing electric-drive control loop to attenuate harmonic disturbances. Three control structures are proposed and analyzed, along with a modified parameter-estimation scheme that reduces computational effort while preserving estimation accuracy, making the method suitable for embedded real-time implementation. To cope with fast operating-point changes, a delta-learning approach combines adaptive control with a lookup-table-based feedforward estimator, ensuring fast convergence and robustness. The proposed controller architectures are validated through simulation and testbench experiments on a permanent-magnet synchronous machine drive, demonstrating substantial NVH reductions across operating conditions. The results confirm that time-domain adaptive harmonic control offers a practical and theoretically grounded solution for real-time NVH mitigation in electric drives.
AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Substituitions
In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biases. This paper introduces a Fuzzy Logic based Decision Support System (DSS) designed for real time, prescriptive game management. Unlike traditional Machine Learning approaches that encounter a predictive ceiling by attempting to replicate human behavior, our system audits performance through an objective, rule based inference engine. We propose a methodological advancement by reformulating the PlayeRank metric into a Cumulative Mean with Role Aware Normalization, eliminating the play time exposure bias inherent in cumulative sum models to enable accurate intra match comparison. The system integrates this refined metric with physiological proxies (fatigue) and contextual variables (disciplinary risk modulated by tactical role) to calculate a dynamic Substitution Priority (P final). Validation via a case study of the 2018 FIFA World Cup match between Brazil and Belgium demonstrates the system's ecological validity: it not only aligned with expert consensus on executed substitutions (for example Gabriel Jesus) but, crucially, identified high risk scenarios ignored by human decision makers. Specifically, the model flagged the "FAGNER Paradox" - a maximum priority defensive risk - minutes before a critical yellow card, and detected the "Lukaku Paradox", where an isolated assist masked a severe drop in participation. These results confirm that Fuzzy Logic offers a transparent, explainable, and superior alternative to black box models for optimizing real time tactical decisions.
comment: 33 pages, 7 figures
Optimizing DER Aggregate Flexibility via Network Reconfiguration
The aggregate flexibility region of distributed energy resources (DERs) quantifies the aggregate power shaping capabilities of DERs. It characterizes the distribution network's potential for wholesale market participation and grid service provision at the transmission level. To enhance flexibility and fully exploit the potential of DERs, this paper proposes a method to optimize the aggregate flexibility region through distribution network reconfiguration. First, we formulate the ellipsoidal aggregate flexibility region characterization problem as a two-stage adaptive robust optimization problem and derive an exact convex reformulation with a large number of second-order cone constraints. By exploiting the problem structure, we propose a scalable Benders decomposition algorithm with provable finite convergence to the optimal solution. Finally, we propose an optimal reconfiguration problem for aggregate flexibility region optimization and solve it using the custom Benders decomposition. Numerical simulations on the IEEE 123-bus test feeder demonstrate that, compared to existing approaches, substantial improvements in the aggregate flexibility region can be achieved over multiple scenarios with the optimized topology.
comment: 10 pages, 4 figures, 5 tables
Vision-Language-Action Models for Selective Robotic Disassembly: A Case Study on Critical Component Extraction from Desktops
Automating disassembly of critical components from end-of-life (EoL) desktops, such as high-value items like RAM modules and CPUs, as well as sensitive parts like hard disk drives, remains challenging due to the inherent variability and uncertainty of these products. Moreover, their disassembly requires sequential, precise, and dexterous operations, further increasing the complexity of automation. Current robotic disassembly processes are typically divided into several stages: perception, sequence planning, task planning, motion planning, and manipulation. Each stage requires explicit modeling, which limits generalization to unfamiliar scenarios. Recent development of vision-language-action (VLA) models has presented an end-to-end approach for general robotic manipulation tasks. Although VLAs have demonstrated promising performance on simple tasks, the feasibility of applying such models to complex disassembly remains largely unexplored. In this paper, we collected a customized dataset for robotic RAM and CPU disassembly and used it to fine-tune two well-established VLA approaches, OpenVLA and OpenVLA-OFT, as a case study. We divided the whole disassembly task into several small steps, and our preliminary experimental results indicate that the fine-tuned VLA models can faithfully complete multiple early steps but struggle with certain critical subtasks, leading to task failure. However, we observed that a simple hybrid strategy that combines VLA with a rule-based controller can successfully perform the entire disassembly operation. These findings highlight the current limitations of VLA models in handling the dexterity and precision required for robotic EoL product disassembly. By offering a detailed analysis of the observed results, this study provides insights that may inform future research to address current challenges and advance end-to-end robotic automated disassembly.
Quantum-Accelerated Deep Reinforcement Learning for Frequency Regulation Enhancement
In modern power systems, frequency regulation is a fundamental prerequisite for ensuring system reliability and assessing the robustness of expansion projects. Conventional feedback control schemes, however, exhibit limited accuracy under varying operating conditions because their gains remain static. Consequently, deep reinforcement learning methods are increasingly employed to design adaptive controllers that can be generalized to diverse frequency control tasks. At the same time, recent advances in quantum computing provide avenues for embedding quantum capabilities into such critical applications. In particular, the potential of quantum algorithms can be more effectively explored and harnessed on near-term quantum devices by leveraging insights from active controller design. In this work, we incorporate a quantum circuit together with an ansatz into the operation of a deep deterministic policy gradient agent. The simulation results of the IEEE 14-bus test system demonstrate the potential of this integrated approach that can achieve reliable, robust performance across diverse real-world challenges.
Distributed Articulation Point Identification in Time-Varying Undirected Networks
Identifying articulation points (APs) is fundamental to assessing the robustness of time-varying networks. In such dynamic environments, topological changes including edge additions and deletions can instantly alter the set of APs, demanding rapid and efficient re-assessment. This paper proposes a fully distributed algorithm for identifying APs and monitoring biconnectivity. Our core contribution is an incremental update protocol. Unlike static methods that require global re-initialization which incurs high communication overhead, our algorithm propagates information from the site of the change, updating only the affected nodes' state values. This approach, which builds upon a maximum consensus protocol, not only ensures convergence to the correct AP set following topological changes but also preserves network privacy by preventing nodes from reconstructing the global topology. We provide rigorous proofs of correctness for this eventual convergence and demonstrate its applicability and efficiency through experiments.
Probabilistic Dynamic Line Rating with Line Graph Convolutional LSTM
Dynamic line rating (DLR) is an effective approach to enhancing the utilization of existing transmission line infrastructure by adapting line ratings according to real-time weather conditions. Accurate DLR forecasts are essential for grid operators to effectively schedule generation, manage transmission congestion, and lower operating costs. As renewable generation becomes increasingly variable and weather-dependent, accurate DLR forecasts are also crucial for improving renewable utilization and reducing curtailment during congested periods. Deterministic forecasts, however, often inadequately represent actual line capacities under uncertain weather conditions, leading to operational risks and costly real-time adjustments. To overcome these limitations, we propose a novel network-wide probabilistic DLR forecasting model that leverages both spatial and temporal information, significantly reducing the operational risks and inefficiencies inherent in deterministic methods. Case studies on a synthetic Texas 123-bus system demonstrate that the proposed method not only enhances grid reliability by effectively capturing true DLR values, but also substantially reduces operational costs.
comment: 10 pages, 8 figures
ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety
Vulnerable road users (VRUs) face high collision risks in mixed traffic, yet most existing safety systems prioritize driver or vehicle assistance over direct VRU support. This paper presents ARCAS, a real-time augmented reality collision avoidance system that provides personalized spatial alerts to VRUs via wearable AR headsets. By fusing roadside 360-degree 3D LiDAR with SLAM-based headset tracking and an automatic 3D calibration procedure, ARCAS accurately overlays world-locked 3D bounding boxes and directional arrows onto approaching hazards in the user's passthrough view. The system also enables multi-headset coordination through shared world anchoring. Evaluated in real-world pedestrian interactions with e-scooters and vehicles (180 trials), ARCAS nearly doubled pedestrians' time-to-collision and increased counterparts' reaction margins by up to 4x compared to unaided-eye conditions. Results validate the feasibility and effectiveness of LiDAR-driven AR guidance and highlight the potential of wearable AR as a promising next-generation safety tool for urban mobility.
comment: 8 pages, 3 figures, 1 table
Disturbance Compensation for Safe Kinematic Control of Robotic Systems with Closed Architecture
In commercial robotic systems, it is common to encounter a closed inner-loop torque controller that is not user-modifiable. However, the outer-loop controller, which sends kinematic commands such as position or velocity for the inner-loop controller to track, is typically exposed to users. In this work, we focus on the development of an easily integrated add-on at the outer-loop layer by combining disturbance rejection control and robust control barrier function for high-performance tracking and safe control of the whole dynamic system of an industrial manipulator. This is particularly beneficial when 1) the inner-loop controller is imperfect, unmodifiable, and uncertain; and 2) the dynamic model exhibits significant uncertainty. Stability analysis, formal safety guarantee proof, and hardware experiments with a PUMA robotic manipulator are presented. Our solution demonstrates superior performance in terms of simplicity of implementation, robustness, tracking precision, and safety compared to the state of the art. Video: https://youtu.be/zw1tanvrV8Q
XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots
As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While substantial progress has been devoted to human behavior prediction, limited attention has been paid to how humans perceive, interpret, and trust robots' inferences, impeding deployment in safety-critical and socially embedded environments. This paper presents XR-DT, an eXtended Reality-enhanced Digital Twin framework for agentic mobile robots, that bridges physical and virtual spaces to enable bi-directional understanding between humans and robots. Our hierarchical XR-DT architecture integrates virtual-, augmented-, and mixed-reality layers, fusing real-time sensor data, simulated environments in the Unity game engine, and human feedback captured through wearable AR devices. Within this framework, we design an agentic mobile robot system with a unified diffusion policy for context-aware task adaptation. We further propose a chain-of-thought prompting mechanism that allows multimodal large language models to reason over human instructions and environmental context, while leveraging an AutoGen-based multi-agent coordination layer to enhance robustness and collaboration in dynamic tasks. Initial experimental results demonstrate accurate human and robot trajectory prediction, validating the XR-DT framework's effectiveness in HRI tasks. By embedding human intention, environmental dynamics, and robot cognition into the XR-DT framework, our system enables interpretable, trustworthy, and adaptive HRI.
comment: 10 pages, 5 figures
Using Dynamic Safety Margins as Control Barrier Functions
This paper presents an approach to design control barrier functions (CBFs) for arbitrary state and input constraints using tools from the reference governor literature. In particular, it is shown that dynamic safety margins (DSMs) are CBFs for an augmented system obtained by concatenating the state with a virtual reference. The proposed approach is agnostic to the relative degree and can handle multiple state and input constraints using the control-sharing property of CBFs. The construction of CBFs using Lyapunov-based DSMs is then investigated in further detail. Numerical simulations show that the method outperforms existing DSM-based approaches, while also guaranteeing safety and persistent feasibility of the associated optimization program.
comment: 12 pages, 5 figures, 2 tables
Analytical Phasor-Based Fault Location Enhancement for Wind Farm Collector Networks
The increasing integration of Inverter-Based Resources (IBRs) is reshaping fault current characteristics, presenting significant challenges to traditional protection and fault location methods. This paper addresses a key limitation in fault location within wind farm collector networks, i.e., one-terminal phasor-based methods become inaccurate when IBRs are electrically located downstream from the fault. In such cases, the voltage drop caused by IBR fault current injections is not captured by the Intelligent Electronic Device, resulting in a systematic overestimation of fault distance. To mitigate this issue, a general compensation framework was proposed by augmenting classical loop formulations with a distance-dependent voltage correction term. The methodology was derived analytically using a sequence-domain representation and generalized to multiple fault types through a unified notation. It maintains the simplicity and interpretability of conventional approaches and can be implemented using only local measurements. The method was evaluated through EMT simulations in PSCAD using a realistic wind farm model. Results show significant improvements in location accuracy, with average and maximum errors notably reduced, especially for ground-involved faults where reductions exceed 90\%. Furthermore, the compensation eliminates sensitivity to wind penetration levels and ensures uniform performance across feeders, positioning the method as a practical solution for modern renewable-dominated grids.
An Accelerated Distributed Optimization with Equality and Inequality Coupling Constraints
This paper studies distributed convex optimization with both affine equality and nonlinear inequality couplings through the duality analysis. We first formulate the dual of the coupling-constraint problem and reformulate it as a consensus optimization problem over a connected network. To efficiently solve this dual problem and hence the primal problem, we design an accelerated linearized algorithm that, at each round, a look-ahead linearization of the separable objective is combined with a quadratic penalty on the Laplacian constraint, a proximal step, and an aggregation of iterations. On the theory side, we prove non-ergodic rates for both the primal optimality error and the feasibility error. On the other hand, numerical experiments show a faster decrease of optimality error and feasibility residual than augmented-Lagrangian tracking and distributed subgradient baselines under the same communication budget.
Approximation-free Control for Signal Temporal Logic Specifications using Spatiotemporal Tubes
This paper presents a spatiotemporal tube (STT)-based control framework for satisfying Signal Temporal Logic (STL) specifications in unknown control-affine systems. We formulate STL constraints as a robust optimization problem (ROP) and recast it as a scenario optimization program (SOP) to construct STTs with formal correctness guarantees. We also propose a closed-form control law that operates independently of the system dynamics, and ensures the system trajectory evolves within the STTs, thereby satisfying the STL specifications. The proposed approach is validated through case studies and comparisons with state-of-the-art methods, demonstrating superior computational efficiency, trajectory quality, and applicability to complex STL tasks.
comment: This paper is a revised version of the article published in IEEE Control Systems Letters (L-CSS). This version corrects the errors in Theorem 3.2 and the proof of Theorem 3.3, that was present in the initial submitted manuscript
An Information Theory of Finite Abstractions and their Fundamental Scalability Limits
Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same ``accuracy" (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And, yet, after decades of research on abstractions, there are no formal results concerning their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions' accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory -- the branch of information theory studying lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems in a higher-dimensional ambient space. Rate represents abstraction size, while distortion describes abstraction accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum abstraction distortion, given the system dynamics and a threshold on abstraction size. The bound depends on the complexity of the dynamics, through generalized entropy. We demonstrate the bound's tightness on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system.
Dual-Objective Reinforcement Learning with Novel Hamilton-Jacobi-Bellman Formulations
Hard constraints in reinforcement learning (RL) often degrade policy performance. Lagrangian methods offer a way to blend objectives with constraints, but require intricate reward engineering and parameter tuning. In this work, we extend recent advances that connect Hamilton-Jacobi (HJ) equations with RL to propose two novel value functions for dual-objective satisfaction. Namely, we address: 1) the Reach-Always-Avoid (RAA) problem -- of achieving distinct reward and penalty thresholds -- and 2) the Reach-Reach (RR) problem -- of achieving thresholds of two distinct rewards. In contrast with temporal logic approaches, which typically involve representing an automaton, we derive explicit, tractable Bellman forms in this context via decomposition. Specifically, we prove that the RAA and RR problems may be rewritten as compositions of previously studied HJ-RL problems. We leverage our analysis to propose a variation of Proximal Policy Optimization (DOHJ-PPO), and demonstrate that it produces distinct behaviors from previous approaches, outcompeting a number of baselines in success, safety and speed across a range of tasks for safe-arrival and multi-target achievement.
Joint Scheduling of Workload Demand and Energy Supply in Low-carbon Data Centers with Decision-Dependent Uncertainty Set
This paper addresses the joint scheduling problem of stochastic workloads and a hydrogen-enabled distributed energy system in a low-carbon Internet data centers (IDC). Although such workloads can be shifted over temporal and spatial horizons, it poses challenges when they cannot be accurately predicted, resulting in significant efficiency degradation and high operational cost of the energy system. The problem becomes even more difficult when the workload shifting decisions would influence their randomness, which is natural for the IDC workloads. To tackle these issues, we propose a workload classification model based on the decision-dependent uncertainty set, where the spatiotemporal elasticity of different types of random workloads are clearly identified and the decision dependencies are explicitly described as linear constraints. Thus, a mixed integer program is then established for the optimal scheduling of both workload demand and energy supply. To enhance system resilience against high volatility scenarios, a rolling horizon algorithm is developed to ensure nonanticipativity and full-scenario feasibility. Numerical tests demonstrate that the proposed method exhibits effective workload scheduling decisions with dramatic energy operational cost reductions compared to the benchmarks under most of the uncertain scenarios.
comment: The upper and lower bounds of the uncertainty set relied on in decision-making need to be reconsidered
Principles of Physiological Closed-Loop Controllers in Neuromodulation
As neurostimulation devices increasingly incorporate closed-loop functionality, the greater design complexity brings additional requirements for risk management and special considerations to optimise benefit. This manuscript creates a common framework upon which all current and planned neuromodulation-based physiological closed-loop controllers (PCLCs) can be mapped including integration of the Technical Considerations of Medical Devices with Physiologic Closed-Loop Control Technology guidance published in 2023 by the United States Food and Drug Administration (FDA), a classification of feedback (reactive) and feedforward (predictive) biomarkers, and control systems theory. We explain risk management in the context of this framework and illustrate its applications for three exemplary technologies. This manuscript serves as guidance to the emerging field of PCLCs in neuromodulation, mitigating risk through standardized nomenclature and a systematic outline for rigorous device development, testing, and implementation.
comment: 36 pages, 10 figures
EmbedGenius: Towards Automated Software Development for Generic Embedded IoT Systems
Embedded IoT system development is crucial for enabling seamless connectivity and functionality across a wide range of applications. However, such a complex process requires cross-domain knowledge of hardware and software and hence often necessitates direct developer involvement, making it labor-intensive, time-consuming, and error-prone. To address this challenge, this paper introduces EmbedGenius, the first fully automated software development platform for general-purpose embedded IoT systems. The key idea is to leverage the reasoning ability of Large Language Models (LLMs) and embedded system expertise to automate the hardware-in-the-loop development process. The main methods include a component-aware library resolution method for addressing hardware dependencies, a library knowledge generation method that injects utility domain knowledge into LLMs, and an auto-programming method that ensures successful deployment. We evaluate EmbedGenius's performance across 71 modules and four mainstream embedded development platforms with over 350 IoT tasks. Experimental results show that EmbedGenius can generate codes with an accuracy of 95.7% and complete tasks with a success rate of 86.5%, surpassing human-in-the-loop baselines by 15.6%--37.7% and 25.5%--53.4%, respectively. We also show EmbedGenius's potential through case studies in environmental monitoring and remote control systems development.
Flight-Ready Precise and Robust Carrier-Phase GNSS Navigation Software for Distributed Space Systems
This paper presents the full requirements analysis, design, development, and testing of high-precision navigation flight software for Distributed Space Systems (DSS) using Carrier Phase Differential GNSS (CDGNSS). Five main contributions are made. First, a survey of flown and upcoming DSS missions with stringent precision requirements is conducted, from which a thorough requirements analysis is distilled to guide development and testing. Second, a real-time navigation functional architecture is designed, and adopts a sparse and regularized Consider Kalman Filter with options for numerical stability in-flight. The filter rigorously accounts for uncertainties in process noise, measurement noise, and biases. It tracks float ambiguities with integer resolution where possible. The covariance correlation structure is preserved under all navigation modes, including contingencies and outages. Third, a lightweight, memoryless Fault Detection, Isolation, and Recovery (FDIR) module is developed to guard against anomalous measurements, providing statistical screening and ensuring robust navigation. Fourth, the software architecture is proposed for ease of integration, with strategies presented for modularity and computational efficiency tailored to constrained flight systems. Fifth, a comprehensive test campaign is conducted, mapped to a requirements verification matrix, spanning unit, interface, software-in-the-loop, and real-time hardware-in-the-loop tests, emphasizing gradual test fidelity for efficient fault isolation. Finally, flight-like results are demonstrated using the VISORS mission, due to the generalizability of the VISORS navigation operations, and the stringency which demands sub-centimeter relative position and sub-millimeter-per-second velocity accuracy. This architecture aims to serve as a reference for next-generation DSS missions adopting CDGNSS.
Energy-Aware Lane Planning for Connected Electric Vehicles in Urban Traffic: Design and Vehicle-in-the-Loop Validation
Urban driving with connected and automated vehicles (CAVs) offers potential for energy savings, yet most eco-driving strategies focus solely on longitudinal speed control within a single lane. This neglects the significant impact of lateral decisions, such as lane changes, on overall energy efficiency, especially in environments with traffic signals and heterogeneous traffic flow. To address this gap, we propose a novel energy-aware motion planning framework that jointly optimizes longitudinal speed and lateral lane-change decisions using vehicle-to-infrastructure (V2I) communication. Our approach estimates long-term energy costs using a graph-based approximation and solves short-horizon optimal control problems under traffic constraints. Using a data-driven energy model calibrated to an actual battery electric vehicle, we demonstrate with vehicle-in-the-loop experiments that our method reduces motion energy consumption by up to 24 percent compared to a human driver, highlighting the potential of connectivity-enabled planning for sustainable urban autonomy.
comment: Accepted at 2025 IEEE Conference on Decision and Control (CDC25')
CaFTRA: Frequency-Domain Correlation-Aware Feedback-Free MIMO Transmission and Resource Allocation for 6G and Beyond
The fundamental design of wireless systems toward AI-native 6G and beyond is driven by the need for ever-increasing demand of mobile data traffic, extreme spectral efficiency, and adaptability across diverse service scenarios. To overcome the limitations posed by feedback-based multiple-input and multiple-output (MIMO) transmission, we propose a novel frequency-domain Correlation-aware Feedback-free MIMO Transmission and Resource Allocation (CaFTRA) framework tailored for fully-decoupled radio access networks (FD-RAN) to meet the emerging requirements of AI-Native 6G and beyond. By leveraging artificial intelligence (AI), CaFTRA effectively eliminates real-time uplink feedback by predicting channel state information (CSI) based solely on user geolocation. We introduce a Learnable Queries-driven Transformer Network for CSI mapping from user geolocation, which utilizes multi-head attention and learnable query embeddings to accurately capture frequency-domain correlations among resource blocks (RBs), thereby significantly improving the precision of CSI prediction. Once base stations (BSs) adopt feedback-free transmission, their downlink transmission coverage can be significantly expanded due to the elimination of frequent uplink feedback. To enable efficient resource scheduling under such extensive-coverage scenarios, we apply a low-complexity many-to-one matching theory-based algorithm for efficient multi-BS association and multi-RB resource allocation, which is proven to converge to a stable matching within limited iterations. Simulation results demonstrate that CaFTRA achieves stable matching convergence and significant gains in spectral efficiency and user fairness compared to 5G, underscoring its potential value for 6G standardization efforts.
Digital requirements engineering with an INCOSE-derived SysML meta-model
Traditional requirements engineering tools do not readily access the SysML-defined system architecture model, often resulting in ad-hoc duplication of model elements that lacks the connectivity and expressive detail possible in a SysML-defined model. Further integration of requirements engineering activities with MBSE contributes to the Authoritative Source of Truth while facilitating deep access to system architecture model elements for V&V activities. We explore the application of MBSE to requirements engineering by extending the Model-Based Structured Requirement SysML Profile to comply with the INCOSE Guide to Writing Requirements while conforming to the ISO/IEC/IEEE 29148 standard requirement statement patterns. Rules, Characteristics, and Attributes were defined in SysML according to the Guide to facilitate requirements definition, verification & validation. The resulting SysML Profile was applied in two system architecture models at NASA Jet Propulsion Laboratory, allowing us to assess its applicability and value in real-world project environments. Initial results indicate that INCOSE-derived Model-Based Structured Requirements may rapidly improve requirement expression quality while complementing the NASA Systems Engineering Handbook checklist and guidance, but typical requirement management activities still have challenges related to automation and support in the system architecture modeling software.
comment: 24 pages; 11 figures; 3 tables; journal preprint
System-level Analysis of Dual-Mode Networked Sensing: ISAC Integration & Coordination Gains
This paper characterizes integration and coordination gains in dense millimeter-wave ISAC networks through a dual-mode framework that combines monostatic and multistatic sensing. A comprehensive system-level analysis is conducted, accounting for base station (BS) density, power allocation, antenna misalignment, radar cross-section (RCS) fluctuations, clutter, bistatic geometry, channel fading, and self-interference cancellation (SIC) efficiency. Using stochastic geometry, coverage probabilities and ergodic rates for sensing and communication are derived, revealing tradeoffs among BS density, beamwidth, and power allocation. It is shown that the communication performance sustained reliable operation despite the overlaid sensing functionality. In contrast, the results reveal the foundational role of spatial sensing diversity, driven by the dual-mode operation, to compensate for the weak sensing reflections and vulnerability to imperfect SIC along with interference and clutter. To this end, we identify a system transition from monostatic to multistatic-dominant sensing operation as a function of the SIC efficiency. In the latter case, using six multistatic BSs instead of a single bistatic receiver improved sensing coverage probability by over 100%, highlighting the coordination gain. Moreover, comparisons with pure communication networks confirm substantial integration gain. Specifically, dual-mode networked sensing with four cooperative BSs can double throughput, while multistatic sensing alone improves throughput by over 50%.
comment: This work has been accepted for publication in IEEE Transactions on Wireless Communications
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
3D volumetric video provides immersive experience and is gaining traction in digital media. Despite its rising popularity, the streaming of volumetric video content poses significant challenges due to the high data bandwidth requirement. A natural approach to mitigate the bandwidth issue is to reduce the volumetric video's data rate by downsampling the content prior to transmission. The video can then be upsampled at the receiver's end using a super-resolution (SR) algorithm to reconstruct the high-resolution details. While super-resolution techniques have been extensively explored and advanced for 2D video content, there is limited work on SR algorithms tailored for volumetric videos. To address this gap and the growing need for efficient volumetric video streaming, we have developed VoLUT with a new SR algorithm specifically designed for volumetric content. Our algorithm uniquely harnesses the power of lookup tables (LUTs) to facilitate the efficient and accurate upscaling of low-resolution volumetric data. The use of LUTs enables our algorithm to quickly reference precomputed high-resolution values, thereby significantly reducing the computational complexity and time required for upscaling. We further apply adaptive video bit rate algorithm (ABR) to dynamically determine the downsampling rate according to the network condition and stream the selected video rate to the receiver. Compared to related work, VoLUT is the first to enable high-quality 3D SR on commodity mobile devices at line-rate. Our evaluation shows VoLUT can reduce bandwidth usage by 70% , boost QoE by 36.7% for volumetric video streaming and achieve 3D SR speed-up with no quality compromise.
Baseline-improved Economic Model Predictive Control for Optimal Microgrid Dispatch
As opposed to stabilizing to a reference trajectory or state, Economic Model Predictive Control (EMPC) optimizes economic performance over a prediction horizon, making it particularly attractive for economic microgrid (MG) dispatch. However, as load and generation forecasts are only known 24-48 h in advance, economically optimal steady states or periodic trajectories are not available and the EMPC-based works that rely on these signals are inadequate. In addition, demand charges, based on maximum monthly grid import power of the MG, cannot be easily casted as an additive cost, which prevents the application of the principle of optimality if introduced naively. In this work, we propose to close this mismatch between the EMPC prediction horizon and existing monthly timescales by means of an appropriately generated baseline reference trajectory. To do this, we first propose an EMPC formulation for a generic deterministic discrete non-linear time-varying system subject to hard state and input constraints. We then show that, under mild assumptions on the terminal cost and region, the asymptotic average economic cost of the proposed method is no worse than a baseline given by any arbitrary reference trajectory that is only known online. In particular, this results into a practical, finite-time upper bound on the average economic cost difference with the baseline that decreases linearly to zero as time goes to infinity. We then show how the proposed EMPC framework can be used to solve optimal MG dispatch problems, introducing various costs and constraints that conform to the required assumptions. By means of this framework, we conduct realistic simulations with data from the Port of San Diego MG, which demonstrate that the proposed method can reduce monthly electricity costs in closed-loop with respect to established baseline reference trajectories.
comment: 18 pages, 4 tables, Manuscript submitted on Automatica
A Multi-Objective Simultaneous Routing, Facility Location and Allocation Model for Earthquake Emergency Logistics
Emergency preparedness reduces the severity and impact of major disasters. In the case of earthquakes, a rapid and efficient emergency response is essential to reduce the number of fatalities. Therefore, the design and planning of an adequate emergency transportation network are crucial in earthquake-prone locations. In the context of emergency transportation modeling, the aim of emergency routing is to find the network with the minimum length that can provide access between the maximum number of Emergency Response Centers (ERCs) and damaged areas. Meanwhile, the goal of the facility location and allocation problem is to optimize the placement of temporary hospitals to increase coverage and accessibility, particularly in remote or severely impacted areas. This paper proposes a multi-objective, robust, multi-modal, and multi-time-period optimization problem that simultaneously optimizes routing, facility location, and hospital allocation. The objective function is to minimize unmet commodity demand, unserved injuries, and economic costs. We adopt a fuzzy goal programming approach to solve the multi-objective simultaneous routing, facility location, and allocation model.
comment: One of the authors does not agree to publish the paper on arXiv
Design and Analysis of a Robust Control System for Triple Inverted Pendulum Stabilization
The design of robust controllers for triple inverted pendulum systems presents significant challenges due to their inherent instability and nonlinear dynamics. Furthermore, uncertainties in system parameters further complicate the control design. This paper investigates a robust control strategy for triple inverted pendulums under parameter uncertainty. Two control approaches, namely the $H_\infty$ controller and the $μ$-synthesis controller, are compared in terms of their ability to achieve reference tracking and disturbance rejection. Simulation results demonstrate that the $H_\infty$ controller provides superior transient performance, making it a promising solution for the robust stabilization of such complex systems.
comment: One of the authors does not agree to publish the paper on arXiv
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Despite the rise of billion-parameter foundation models trained across thousands of GPUs, similar scaling gains have not been shown for humanoid control. Current neural controllers for humanoids remain modest in size, target a limited set of behaviors, and are trained on a handful of GPUs over several days. We show that scaling up model capacity, data, and compute yields a generalist humanoid controller capable of creating natural and robust whole-body movements. Specifically, we posit motion tracking as a natural and scalable task for humanoid control, leveraging dense supervision from diverse motion-capture data to acquire human motion priors without manual reward engineering. We build a foundation model for motion tracking by scaling along three axes: network size (from 1.2M to 42M parameters), dataset volume (over 100M frames, 700 hours of high-quality motion data), and compute (9k GPU hours). Beyond demonstrating the benefits of scale, we show the practical utility of our model through two mechanisms: (1) a real-time universal kinematic planner that bridges motion tracking to downstream task execution, enabling natural and interactive control, and (2) a unified token space that supports various motion input interfaces, such as VR teleoperation devices, human videos, and vision-language-action (VLA) models, all using the same policy. Scaling motion tracking exhibits favorable properties: performance improves steadily with increased compute and data diversity, and learned representations generalize to unseen motions, establishing motion tracking at scale as a practical foundation for humanoid control.
comment: Project page: https://nvlabs.github.io/SONIC/
OPTIKS: Optimized Gradient Properties Through Timing in K-Space
A customizable method (OPTIKS) for designing fast trajectory-constrained gradient waveforms with optimized time domain properties was developed. Given a specified multidimensional k-space trajectory, the method optimizes traversal speed (and therefore timing) with position along the trajectory. OPTIKS facilitates optimization of objectives dependent on the time domain gradient waveform and the arc-length domain k-space speed. OPTIKS is applied to design waveforms which limit peripheral nerve stimulation (PNS), minimize mechanical resonance excitation, and reduce acoustic noise. A variety of trajectory examples are presented including spirals, circular echo-planar-imaging, and rosettes. Design performance is evaluated based on duration, standardized PNS models, field measurements, gradient coil back-EMF measurements, and calibrated acoustic measurements. We show reductions in back-EMF of up to 94% and field oscillations up to 91.1%, acoustic noise decreases of up to 9.22 dB, and with efficient use of PNS models speed increases of up to 11.4%. The design method implementation is made available as an open source Python package through GitHub.
comment: This work has been accepted for publication in IEEE Transactions on Medical Imaging. Please cite the IEE version at DOI 10.1109/TMI.2025.3639398. GitHub Repo: https://github.com/mamccready/optiks
Robotics
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
Vision Language Models (VLMs) demonstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied applications. The agentic paradigm promises that VLMs can use a wide variety of tools that could augment these capabilities, such as depth estimators, segmentation models, and pose estimators. Yet it remains an open challenge how to realize this vision without solely relying on handcrafted prompting strategies or enforcing fixed, predefined tool pipelines that limit VLMs' ability to discover optimal tool-use patterns. Reinforcement Learning could overcome this gap, but has so far been limited to reasoning with a single visual tool due to the large search space in multi-tool reasoning. We introduce Double Interactive Reinforcement Learning (DIRL), a two-phase training framework where VLMs learn to coordinate multiple tools through interactive exploration and feedback. In the teaching phase, we combine demonstrations from a single tool specialist trained via interactive RL with traces from a frontier model using all tools. In the exploration phase, the model further refines multi-tool coordination through continued RL. Our model, SpaceTools, with tool-augmented spatial reasoning ability, achieves state-of-the-art performance on spatial understanding benchmarks (RoboSpatial-Home, BLINK, BOP-ASK) and demonstrates reliable real-world manipulation using a 7-DOF robot as a tool. DIRL provides substantial improvements over the vanilla SFT (+12% on RoboSpatial) and RL (+16% on RoboSpatial) baselines. Project page: https://spacetools.github.io/.
Artificial Microsaccade Compensation: Stable Vision for an Ornithopter
Animals with foveated vision, including humans, experience microsaccades, small, rapid eye movements that they are not aware of. Inspired by this phenomenon, we develop a method for "Artificial Microsaccade Compensation". It can stabilize video captured by a tailless ornithopter that has resisted attempts to use camera-based sensing because it shakes at 12-20 Hz. Our approach minimizes changes in image intensity by optimizing over 3D rotation represented in SO(3). This results in a stabilized video, computed in real time, suitable for human viewing, and free from distortion. When adapted to hold a fixed viewing orientation, up to occasional saccades, it can dramatically reduce inter-frame motion while also benefiting from an efficient recursive update. When compared to Adobe Premier Pro's warp stabilizer, which is widely regarded as the best commercial video stabilization software available, our method achieves higher quality results while also running in real time.
comment: 29 pages, 5 figures, 2 tables, under review
When to Say "Hi" - Learn to Open a Conversation with an in-the-wild Dataset
The social capabilities of socially interactive agents (SIA) are a key to successful and smooth interactions between the user and the SIA. A successful start of the interaction is one of the essential factors for satisfying SIA interactions. For a service and information task in which the SIA helps with information, e.g. about the location, it is an important skill to master the opening of the conversation and to recognize which interlocutor opens the conversation and when. We are therefore investigating the extent to which the opening of the conversation can be trained using the user's body language as an input for machine learning to ensure smooth conversation starts for the interaction. In this paper we propose the Interaction Initiation System (IIS) which we developed, trained and validated using an in-the-wild data set. In a field test at the Deutsches Museum Bonn, a Furhat robot from Furhat Robotics was used as a service and information point. Over the period of use we collected the data of \textit{N} = 201 single user interactions for the training of the algorithms. We can show that the IIS, achieves a performance that allows the conclusion that this system is able to determine the greeting period and the opener of the interaction.
comment: 6 pages, 3 figures, 5 tables. This paper has been accepted for publication at IEEE ROMAN 2025
MDE-AgriVLN: Agricultural Vision-and-Language Navigation with Monocular Depth Estimation
Agricultural robots are serving as powerful assistants across a wide range of agricultural tasks, nevertheless, still heavily relying on manual operations or railway systems for movement. The AgriVLN method and the A2A benchmark pioneeringly extend Vision-and-Language Navigation (VLN) to the agricultural domain, enabling a robot to navigate to a target position following a natural language instruction. Unlike human binocular vision, most agricultural robots are only given a single camera for monocular vision, which results in limited spatial perception. To bridge this gap, we present the method of Agricultural Vision-and-Language Navigation with Monocular Depth Estimation (MDE-AgriVLN), in which we propose the MDE module generating depth features from RGB images, to assist the decision-maker on reasoning. When evaluated on the A2A benchmark, our MDE-AgriVLN method successfully increases Success Rate from 0.23 to 0.32 and decreases Navigation Error from 4.43m to 4.08m, demonstrating the state-of-the-art performance in the agricultural VLN domain. Code: https://github.com/AlexTraveling/MDE-AgriVLN.
Classification of User Satisfaction in HRI with Social Signals in the Wild
Socially interactive agents (SIAs) are being used in various scenarios and are nearing productive deployment. Evaluating user satisfaction with SIAs' performance is a key factor in designing the interaction between the user and SIA. Currently, subjective user satisfaction is primarily assessed manually through questionnaires or indirectly via system metrics. This study examines the automatic classification of user satisfaction through analysis of social signals, aiming to enhance both manual and autonomous evaluation methods for SIAs. During a field trial at the Deutsches Museum Bonn, a Furhat Robotics head was employed as a service and information hub, collecting an "in-the-wild" dataset. This dataset comprises 46 single-user interactions, including questionnaire responses and video data. Our method focuses on automatically classifying user satisfaction based on time series classification. We use time series of social signal metrics derived from the body pose, time series of facial expressions, and physical distance. This study compares three feature engineering approaches on different machine learning models. The results confirm the method's effectiveness in reliably identifying interactions with low user satisfaction without the need for manually annotated datasets. This approach offers significant potential for enhancing SIA performance and user experience through automated feedback mechanisms.
comment: 15 pages, 3 figures. This paper has been accepted for publication at ICSR+AI 2025
MUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction
Recent stateful recurrent neural networks have achieved remarkable progress on static 3D reconstruction but remain vulnerable to motion-induced artifacts, where non-rigid regions corrupt attention propagation between the spatial memory and image feature. By analyzing the internal behaviors of the state and image token updating mechanism, we find that aggregating self-attention maps across layers reveals a consistent pattern: dynamic regions are naturally down-weighted, exposing an implicit motion cue that the pretrained transformer already encodes but never explicitly uses. Motivated by this observation, we introduce MUT3R, a training-free framework that applies the attention-derived motion cue to suppress dynamic content in the early layers of the transformer during inference. Our attention-level gating module suppresses the influence of dynamic regions before their artifacts propagate through the feature hierarchy. Notably, we do not retrain or fine-tune the model; we let the pretrained transformer diagnose its own motion cues and correct itself. This early regulation stabilizes geometric reasoning in streaming scenarios and leads to improvements in temporal consistency and camera pose robustness across multiple dynamic benchmarks, offering a simple and training-free pathway toward motion-aware streaming reconstruction.
Driving is a Game: Combining Planning and Prediction with Bayesian Iterative Best Response
Autonomous driving planning systems perform nearly perfectly in routine scenarios using lightweight, rule-based methods but still struggle in dense urban traffic, where lane changes and merges require anticipating and influencing other agents. Modern motion predictors offer highly accurate forecasts, yet their integration into planning is mostly rudimental: discarding unsafe plans. Similarly, end-to-end models offer a one-way integration that avoids the challenges of joint prediction and planning modeling under uncertainty. In contrast, game-theoretic formulations offer a principled alternative but have seen limited adoption in autonomous driving. We present Bayesian Iterative Best Response (BIBeR), a framework that unifies motion prediction and game-theoretic planning into a single interaction-aware process. BIBeR is the first to integrate a state-of-the-art predictor into an Iterative Best Response (IBR) loop, repeatedly refining the strategies of the ego vehicle and surrounding agents. This repeated best-response process approximates a Nash equilibrium, enabling bidirectional adaptation where the ego both reacts to and shapes the behavior of others. In addition, our proposed Bayesian confidence estimation quantifies prediction reliability and modulates update strength, more conservative under low confidence and more decisive under high confidence. BIBeR is compatible with modern predictors and planners, combining the transparency of structured planning with the flexibility of learned models. Experiments show that BIBeR achieves an 11% improvement over state-of-the-art planners on highly interactive interPlan lane-change scenarios, while also outperforming existing approaches on standard nuPlan benchmarks.
Hierarchical Vision Language Action Model Using Success and Failure Demonstrations
Prior Vision-Language-Action (VLA) models are typically trained on teleoperated successful demonstrations, while discarding numerous failed attempts that occur naturally during data collection. However, these failures encode where and how policies can be fragile, information that can be exploited to improve robustness. We address this problem by leveraging mixed-quality datasets to learn failure-aware reasoning at planning time. We introduce VINE, a hierarchical vision-language-action model that separates high-level reasoning (System 2) from low-level control (System 1) under a hierarchical reinforcement learning formalism, making failures usable as a structured learning signal rather than noisy supervision. System 2 performs feasibility-guided tree search over a 2D scene-graph abstraction: it proposes subgoal transitions, predicts success probabilities from both successes and failures, and prunes brittle branches before execution, effectively casting plan evaluation as feasibility scoring. The selected subgoal sequence is then passed to System 1, which executes low-level actions without modifying the agent's core skills. Trained entirely from offline teleoperation data, VINE integrates negative experience directly into the decision loop. Across challenging manipulation tasks, this approach consistently improves success rates and robustness, demonstrating that failure data is an essential resource for converting the broad competence of VLAs into robust execution.
comment: https://vine-vla.github.io/
Autonomous Reinforcement Learning Robot Control with Intel's Loihi 2 Neuromorphic Hardware
We present an end-to-end pipeline for deploying reinforcement learning (RL) trained Artificial Neural Networks (ANNs) on neuromorphic hardware by converting them into spiking Sigma-Delta Neural Networks (SDNNs). We demonstrate that an ANN policy trained entirely in simulation can be transformed into an SDNN compatible with Intel's Loihi 2 architecture, enabling low-latency and energy-efficient inference. As a test case, we use an RL policy for controlling the Astrobee free-flying robot, similar to a previously hardware in space-validated controller. The policy, trained with Rectified Linear Units (ReLUs), is converted to an SDNN and deployed on Intel's Loihi 2, then evaluated in NVIDIA's Omniverse Isaac Lab simulation environment for closed-loop control of Astrobee's motion. We compare execution performance between GPU and Loihi 2. The results highlight the feasibility of using neuromorphic platforms for robotic control and establish a pathway toward energy-efficient, real-time neuromorphic computation in future space and terrestrial robotics applications.
comment: Submitted for review at NICE 2026 (Neuro-Inspired Computational Elements) conference
Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning
Active suspension systems are critical for enhancing vehicle comfort, safety, and stability, yet their performance is often limited by fixed hardware designs and control strategies that cannot adapt to uncertain and dynamic operating conditions. Recent advances in digital twins (DTs) and deep reinforcement learning (DRL) offer new opportunities for real-time, data-driven optimization across a vehicle's lifecycle. However, integrating these technologies into a unified framework remains an open challenge. This work presents a DT-based control co-design (CCD) framework for full-vehicle active suspensions using multi-generation design concepts. By integrating automatic differentiation into DRL, we jointly optimize physical suspension components and control policies under varying driver behaviors and environmental uncertainties. DRL also addresses the challenge of partial observability, where only limited states can be sensed and fed back to the controller, by learning optimal control actions directly from available sensor information. The framework incorporates model updating with quantile learning to capture data uncertainty, enabling real-time decision-making and adaptive learning from digital-physical interactions. The approach demonstrates personalized optimization of suspension systems under two distinct driving settings (mild and aggressive). Results show that the optimized systems achieve smoother trajectories and reduce control efforts by approximately 43% and 52% for mild and aggressive, respectively, while maintaining ride comfort and stability. Contributions include: developing a DT-enabled CCD framework integrating DRL and uncertainty-aware model updating for full-vehicle active suspensions, introducing a multi-generation design strategy for self-improving systems, and demonstrating personalized optimization of active suspension systems for distinct driver types.
comment: 28 pages, 17 figures
A Modular Architecture Design for Autonomous Driving Racing in Controlled Environments
This paper presents an Autonomous System (AS) architecture for vehicles in a closed circuit. The AS performs precision tasks including computer vision for environment perception, positioning and mapping for accurate localization, path planning for optimal trajectory generation, and control for precise vehicle actuation. Each subsystem operates independently while connecting data through a cohesive pipeline architecture. The system implements a modular design that combines state-of-the-art technologies for real-time autonomous navigation in controlled environments.
OmniDexVLG: Learning Dexterous Grasp Generation from Vision Language Model-Guided Grasp Semantics, Taxonomy and Functional Affordance
Dexterous grasp generation aims to produce grasp poses that align with task requirements and human interpretable grasp semantics. However, achieving semantically controllable dexterous grasp synthesis remains highly challenging due to the lack of unified modeling of multiple semantic dimensions, including grasp taxonomy, contact semantics, and functional affordance. To address these limitations, we present OmniDexVLG, a multimodal, semantics aware grasp generation framework capable of producing structurally diverse and semantically coherent dexterous grasps under joint language and visual guidance. Our approach begins with OmniDexDataGen, a semantic rich dexterous grasp dataset generation pipeline that integrates grasp taxonomy guided configuration sampling, functional affordance contact point sampling, taxonomy aware differential force closure grasp sampling, and physics based optimization and validation, enabling systematic coverage of diverse grasp types. We further introduce OmniDexReasoner, a multimodal grasp type semantic reasoning module that leverages multi agent collaboration, retrieval augmented generation, and chain of thought reasoning to infer grasp related semantics and generate high quality annotations that align language instructions with task specific grasp intent. Building upon these components, we develop a unified Vision Language Grasping generation model that explicitly incorporates grasp taxonomy, contact structure, and functional affordance semantics, enabling fine grained control over grasp synthesis from natural language instructions. Extensive experiments in simulation and real world object grasping and ablation studies demonstrate that our method substantially outperforms state of the art approaches in terms of grasp diversity, contact semantic diversity, functional affordance diversity, and semantic consistency.
comment: Project Website: https://sites.google.com/view/omnidexvlg, 16 pages
IM HERE: Interaction Model for Human Effort Based Robot Engagement
The effectiveness of human-robot interaction often hinges on the ability to cultivate engagement - a dynamic process of cognitive involvement that supports meaningful exchanges. Many existing definitions and models of engagement are either too vague or lack the ability to generalize across different contexts. We introduce IM HERE, a novel framework that models engagement effectively in human-human, human-robot, and robot-robot interactions. By employing an effort-based description of bilateral relationships between entities, we provide an accurate breakdown of relationship patterns, simplifying them to focus placement and four key states. This framework captures mutual relationships, group behaviors, and actions conforming to social norms, translating them into specific directives for autonomous systems. By integrating both subjective perceptions and objective states, the model precisely identifies and describes miscommunication. The primary objective of this paper is to automate the analysis, modeling, and description of social behavior, and to determine how autonomous systems can behave in accordance with social norms for full social integration while simultaneously pursuing their own social goals.
comment: 8 pages, 5 figures
MPCFormer: A physics-informed data-driven approach for explainable socially-aware autonomous driving
Autonomous Driving (AD) vehicles still struggle to exhibit human-like behavior in highly dynamic and interactive traffic scenarios. The key challenge lies in AD's limited ability to interact with surrounding vehicles, largely due to a lack of understanding the underlying mechanisms of social interaction. To address this issue, we introduce MPCFormer, an explainable socially-aware autonomous driving approach with physics-informed and data-driven coupled social interaction dynamics. In this model, the dynamics are formulated into a discrete space-state representation, which embeds physics priors to enhance modeling explainability. The dynamics coefficients are learned from naturalistic driving data via a Transformer-based encoder-decoder architecture. To the best of our knowledge, MPCFormer is the first approach to explicitly model the dynamics of multi-vehicle social interactions. The learned social interaction dynamics enable the planner to generate manifold, human-like behaviors when interacting with surrounding traffic. By leveraging the MPC framework, the approach mitigates the potential safety risks typically associated with purely learning-based methods. Open-looped evaluation on NGSIM dataset demonstrates that MPCFormer achieves superior social interaction awareness, yielding the lowest trajectory prediction errors compared with other state-of-the-art approach. The prediction achieves an ADE as low as 0.86 m over a long prediction horizon of 5 seconds. Close-looped experiments in highly intense interaction scenarios, where consecutive lane changes are required to exit an off-ramp, further validate the effectiveness of MPCFormer. Results show that MPCFormer achieves the highest planning success rate of 94.67%, improves driving efficiency by 15.75%, and reduces the collision rate from 21.25% to 0.5%, outperforming a frontier Reinforcement Learning (RL) based planner.
comment: 17 pages, 18 figures
Safety Reinforced Model Predictive Control (SRMPC): Improving MPC with Reinforcement Learning for Motion Planning in Autonomous Driving
Model predictive control (MPC) is widely used for motion planning, particularly in autonomous driving. Real-time capability of the planner requires utilizing convex approximation of optimal control problems (OCPs) for the planner. However, such approximations confine the solution to a subspace, which might not contain the global optimum. To address this, we propose using safe reinforcement learning (SRL) to obtain a new and safe reference trajectory within MPC. By employing a learning-based approach, the MPC can explore solutions beyond the close neighborhood of the previous one, potentially finding global optima. We incorporate constrained reinforcement learning (CRL) to ensure safety in automated driving, using a handcrafted energy function-based safety index as the constraint objective to model safe and unsafe regions. Our approach utilizes a state-dependent Lagrangian multiplier, learned concurrently with the safe policy, to solve the CRL problem. Through experimentation in a highway scenario, we demonstrate the superiority of our approach over both MPC and SRL in terms of safety and performance measures.
Bayesian Optimization for Automatic Tuning of Torque-Level Nonlinear Model Predictive Control
This paper presents an auto-tuning framework for torque-based Nonlinear Model Predictive Control (nMPC), where the MPC serves as a real-time controller for optimal joint torque commands. The MPC parameters, including cost function weights and low-level controller gains, are optimized using high-dimensional Bayesian Optimization (BO) techniques, specifically Sparse Axis-Aligned Subspace (SAASBO) with a digital twin (DT) to achieve precise end-effector trajectory real-time tracking on an UR10e robot arm. The simulation model allows efficient exploration of the high-dimensional parameter space, and it ensures safe transfer to hardware. Our simulation results demonstrate significant improvements in tracking performance (+41.9%) and reduction in solve times (-2.5%) compared to manually-tuned parameters. Moreover, experimental validation on the real robot follows the trend (with a +25.8% improvement), emphasizing the importance of digital twin-enabled automated parameter optimization for robotic operations.
comment: 6 pages, 7 figures, 3 tables
Prediction-Driven Motion Planning: Route Integration Strategies in Attention-Based Prediction Models SC
Combining motion prediction and motion planning offers a promising framework for enhancing interactions between automated vehicles and other traffic participants. However, this introduces challenges in conditioning predictions on navigation goals and ensuring stable, kinematically feasible trajectories. Addressing the former challenge, this paper investigates the extension of attention-based motion prediction models with navigation information. By integrating the ego vehicle's intended route and goal pose into the model architecture, we bridge the gap between multi-agent motion prediction and goal-based motion planning. We propose and evaluate several architectural navigation integration strategies to our model on the nuPlan dataset. Our results demonstrate the potential of prediction-driven motion planning, highlighting how navigation information can enhance both prediction and planning tasks. Our implementation is at: https://github.com/KIT-MRT/future-motion.
comment: In Proceedings of the IEEE International Conference on Intelligent Transportation Systems (ITSC), Gold Coast, AUSTRALIA, 18-21 November 2025
Cross-embodied Co-design for Dexterous Hands
Dexterous manipulation is limited by both control and design, without consensus as to what makes manipulators best for performing dexterous tasks. This raises a fundamental challenge: how should we design and control robot manipulators that are optimized for dexterity? We present a co-design framework that learns task-specific hand morphology and complementary dexterous control policies. The framework supports 1) an expansive morphology search space including joint, finger, and palm generation, 2) scalable evaluation across the wide design space via morphology-conditioned cross-embodied control, and 3) real-world fabrication with accessible components. We evaluate the approach across multiple dexterous tasks, including in-hand rotation with simulation and real deployment. Our framework enables an end-to-end pipeline that can design, train, fabricate, and deploy a new robotic hand in under 24 hours. The full framework will be open-sourced and available on our website.
Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
Reinforcement learning (RL) offers transformative potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, the NASA Astrobee, aboard the International Space Station (ISS). Using NVIDIA's Omniverse physics simulator and curriculum learning, we trained a deep neural network to replace Astrobee's standard attitude and translation control, enabling it to navigate in microgravity. Our results validate a novel training pipeline that bridges the simulation-to-reality (Sim2Real) gap, utilizing a GPU-accelerated, scientific-grade simulation environment for efficient Monte Carlo RL training. This successful deployment demonstrates the feasibility of training RL policies terrestrially and transferring them to space-based applications. This paves the way for future work in In-Space Servicing, Assembly, and Manufacturing (ISAM), enabling rapid on-orbit adaptation to dynamic mission requirements.
comment: published at iSpaRo 2025
Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing
The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.
comment: iSpaRo 2025, Best Paper Award in Orbital Robotics
PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention
The Vision-Language-Action (VLA) models have demonstrated remarkable performance on embodied tasks and shown promising potential for real-world applications. However, current VLAs still struggle to produce consistent and precise target-oriented actions, as they often generate redundant or unstable motions along trajectories, limiting their applicability in time-sensitive scenarios.In this work, we attribute these redundant actions to the spatially uniform perception field of existing VLAs, which causes them to be distracted by target-irrelevant objects, especially in complex environments.To address this issue, we propose an efficient PosA-VLA framework that anchors visual attention via pose-conditioned supervision, consistently guiding the model's perception toward task-relevant regions. The pose-conditioned anchor attention mechanism enables the model to better align instruction semantics with actionable visual cues, thereby improving action generation precision and efficiency. Moreover, our framework adopts a lightweight architecture and requires no auxiliary perception modules (e.g., segmentation or grounding networks), ensuring efficient inference. Extensive experiments verify that our method executes embodied tasks with precise and time-efficient behavior across diverse robotic manipulation benchmarks and shows robust generalization in a variety of challenging environments.
ContactRL: Safe Reinforcement Learning based Motion Planning for Contact based Human Robot Collaboration
In collaborative human-robot tasks, safety requires not only avoiding collisions but also ensuring safe, intentional physical contact. We present ContactRL, a reinforcement learning (RL) based framework that directly incorporates contact safety into the reward function through force feedback. This enables a robot to learn adaptive motion profiles that minimize human-robot contact forces while maintaining task efficiency. In simulation, ContactRL achieves a low safety violation rate of 0.2\% with a high task success rate of 87.7\%, outperforming state-of-the-art constrained RL baselines. In order to guarantee deployment safety, we augment the learned policy with a kinetic energy based Control Barrier Function (eCBF) shield. Real-world experiments on an UR3e robotic platform performing small object handovers from a human hand across 360 trials confirm safe contact, with measured normal forces consistently below 10N. These results demonstrate that ContactRL enables safe and efficient physical collaboration, thereby advancing the deployment of collaborative robots in contact-rich tasks.
comment: 8 pages, 7 figures
A Novel Approach to Tomato Harvesting Using a Hybrid Gripper with Semantic Segmentation and Keypoint Detection
This paper presents an autonomous tomato-harvesting system built around a hybrid robotic gripper that combines six soft auxetic fingers with a rigid exoskeleton and a latex basket to achieve gentle, cage-like grasping. The gripper is driven by a servo-actuated Scotch--yoke mechanism, and includes separator leaves that form a conical frustum for fruit isolation, with an integrated micro-servo cutter for pedicel cutting. For perception, an RGB--D camera and a Detectron2-based pipeline perform semantic segmentation of ripe/unripe tomatoes and keypoint localization of the pedicel and fruit center under occlusion and variable illumination. An analytical model derived using the principle of virtual work relates servo torque to grasp force, enabling design-level reasoning about actuation requirements. During execution, closed-loop grasp-force regulation is achieved using a proportional--integral--derivative controller with feedback from force-sensitive resistors mounted on selected fingers to prevent slip and bruising. Motion execution is supported by Particle Swarm Optimization (PSO)--based trajectory planning for a 5-DOF manipulator. Experiments demonstrate complete picking cycles (approach, separation, cutting, grasping, transport, release) with an average cycle time of 24.34~s and an overall success rate of approximately 80\%, while maintaining low grasp forces (0.20--0.50~N). These results validate the proposed hybrid gripper and integrated vision--control pipeline for reliable harvesting in cluttered environments.
Context-Triggered Contingency Games for Strategic Multi-Agent Interaction
We address the challenge of reliable and efficient interaction in autonomous multi-agent systems, where agents must balance long-term strategic objectives with short-term dynamic adaptation. We propose context-triggered contingency games, a novel integration of strategic games derived from temporal logic specifications with dynamic contingency games solved in real time. Our two-layered architecture leverages strategy templates to guarantee satisfaction of high-level objectives, while a new factor-graph-based solver enables scalable, real-time model predictive control of dynamic interactions. The resulting framework ensures both safety and progress in uncertain, interactive environments. We validate our approach through simulations and hardware experiments in autonomous driving and robotic navigation, demonstrating efficient, reliable, and adaptive multi-agent interaction.
Multimodal Control of Manipulators: Coupling Kinematics and Vision for Self-Driving Laboratory Operations
Motion planning schemes are used for planning motions of a manipulator from an initial pose to a final pose during a task execution. A motion planning scheme generally comprises of a trajectory planning method and an inverse kinematic solver to determine trajectories and joints solutions respectively. In this paper, 3 motion planning schemes developed based on Jacobian methods are implemented to traverse a redundant manipulator with a coupled finger gripper through given trajectories. RRT* algorithm is used for planning trajectories and screw theory based forward kinematic equations are solved for determining joint solutions of the manipulator and gripper. Inverse solutions are computed separately using 3 Jacobian based methods such as Jacobian Transpose (JT), Pseudo Inverse (PI), and Damped Least Square (DLS) methods. Space Jacobian and manipulability measurements of the manipulator and gripper are obtained using screw theory formulations. Smoothness and RMSE error of generated trajectories and velocity continuity, acceleration profile, jerk, and snap values of joint motions are analysed for determining an efficient motion planning method for a given task. Advantages and disadvantages of the proposed motion planning schemes mentioned above are analysed using simulation studies to determine a suitable inverse solution technique for the tasks.
RoboScape-R: Unified Reward-Observation World Models for Generalizable Robotics Training via RL
Achieving generalizable embodied policies remains a key challenge. Traditional policy learning paradigms, including both Imitation Learning (IL) and Reinforcement Learning (RL), struggle to cultivate generalizability across diverse scenarios. While IL policies often overfit to specific expert trajectories, RL suffers from the inherent lack of a unified and general reward signal necessary for effective multi-scene generalization. We posit that the world model is uniquely capable of serving as a universal environment proxy to address this limitation. However, current world models primarily focus on their ability to predict observations and still rely on task-specific, handcrafted reward functions, thereby failing to provide a truly general training environment. Toward this problem, we propose RoboScape-R, a framework leveraging the world model to serve as a versatile, general-purpose proxy for the embodied environment within the RL paradigm. We introduce a novel world model-based general reward mechanism that generates ''endogenous'' rewards derived from the model's intrinsic understanding of real-world state transition dynamics. Extensive experiments demonstrate that RoboScape-R effectively addresses the limitations of traditional RL methods by providing an efficient and general training environment that substantially enhances the generalization capability of embodied policies. Our approach offers critical insights into utilizing the world model as an online training strategy and achieves an average 37.5% performance improvement over baselines under out-of-domain scenarios.
A Learning-based Control Methodology for Transitioning VTOL UAVs
Transition control poses a critical challenge in Vertical Take-Off and Landing Unmanned Aerial Vehicle (VTOL UAV) development due to the tilting rotor mechanism, which shifts the center of gravity and thrust direction during transitions. Current control methods' decoupled control of altitude and position leads to significant vibration, and limits interaction consideration and adaptability. In this study, we propose a novel coupled transition control methodology based on reinforcement learning (RL) driven controller. Besides, contrasting to the conventional phase-transition approach, the ST3M method demonstrates a new perspective by treating cruise mode as a special case of hover. We validate the feasibility of applying our method in simulation and real-world environments, demonstrating efficient controller development and migration while accurately controlling UAV position and attitude, exhibiting outstanding trajectory tracking and reduced vibrations during the transition process.
AdaPower: Specializing World Foundation Models for Predictive Manipulation
World Foundation Models (WFMs) offer remarkable visual dynamics simulation capabilities, yet their application to precise robotic control remains limited by the gap between generative realism and control-oriented precision. While existing approaches use WFMs as synthetic data generators, they suffer from high computational costs and underutilization of pre-trained VLA policies. We introduce \textbf{AdaPower} (\textbf{Ada}pt and Em\textbf{power}), a lightweight adaptation framework that transforms general-purpose WFMs into specialist world models through two novel components: Temporal-Spatial Test-Time Training (TS-TTT) for inference-time adaptation and Memory Persistence (MP) for long-horizon consistency. Integrated within a Model Predictive Control framework, our adapted world model empowers pre-trained VLAs, achieving over 41\% improvement in task success rates on LIBERO benchmarks without policy retraining, while preserving computational efficiency and generalist capabilities.
Mobility Induced Sensitivity of UAV based Nodes to Jamming in Private 5G Airfield Networks An Experimental Study
This work presents an experimental performance evaluation of a private 5G airfield network under controlled directional SDR jamming attacks targeting UAV-based UE nodes. Using a QualiPoc Android UE, mounted as a payload on a quadcopter UAV, we conducted a series of experiments to evaluate signal degradation, handover performance, and ser-vice stability in the presence of constant directional jamming. The conducted experiments aimed to examine the effects of varying travel speeds, altitudes, and moving patterns of a UAV-based UE to record and analyze the key physical-layer and network-layer metrics such as CQI, MCS, RSRP, SINR, BLER, Net PDSCH Throughput and RLF. The re-sults of this work describe the link stability and signal degradation dependencies, caused by the level of mobility of the UAV-based UE nodes during autonomous and automatic operation in private 5G Airfield networks
comment: 4 pages, 4 figures
MSG-Loc: Multi-Label Likelihood-based Semantic Graph Matching for Object-Level Global Localization
Robots are often required to localize in environments with unknown object classes and semantic ambiguity. However, when performing global localization using semantic objects, high semantic ambiguity intensifies object misclassification and increases the likelihood of incorrect associations, which in turn can cause significant errors in the estimated pose. Thus, in this letter, we propose a multi-label likelihood-based semantic graph matching framework for object-level global localization. The key idea is to exploit multi-label graph representations, rather than single-label alternatives, to capture and leverage the inherent semantic context of object observations. Based on these representations, our approach enhances semantic correspondence across graphs by combining the likelihood of each node with the maximum likelihood of its neighbors via context-aware likelihood propagation. For rigorous validation, data association and pose estimation performance are evaluated under both closed-set and open-set detection configurations. In addition, we demonstrate the scalability of our approach to large-vocabulary object categories in both real-world indoor scenes and synthetic environments.
comment: Accepted in IEEE Robotics and Automation Letters (2025)
CSMapping: Scalable Crowdsourced Semantic Mapping and Topology Inference for Autonomous Driving
Crowdsourcing enables scalable autonomous driving map construction, but low-cost sensor noise hinders quality from improving with data volume. We propose CSMapping, a system that produces accurate semantic maps and topological road centerlines whose quality consistently increases with more crowdsourced data. For semantic mapping, we train a latent diffusion model on HD maps (optionally conditioned on SD maps) to learn a generative prior of real-world map structure, without requiring paired crowdsourced/HD-map supervision. This prior is incorporated via constrained MAP optimization in latent space, ensuring robustness to severe noise and plausible completion in unobserved areas. Initialization uses a robust vectorized mapping module followed by diffusion inversion; optimization employs efficient Gaussian-basis reparameterization, projected gradient descent zobracket multi-start, and latent-space factor-graph for global consistency. For topological mapping, we apply confidence-weighted k-medoids clustering and kinematic refinement to trajectories, yielding smooth, human-like centerlines robust to trajectory variation. Experiments on nuScenes, Argoverse 2, and a large proprietary dataset achieve state-of-the-art semantic and topological mapping performance, with thorough ablation and scalability studies.
Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion
Human motor control remains agile and robust despite limited sensory information for feedback, a property attributed to the body's ability to perform morphological computation through muscle coordination with variable impedance. However, it remains unclear how such low-level mechanical computation reduces the control requirements of the high-level controller. In this study, we implement a hierarchical controller consisting of a high-level neural network trained by reinforcement learning and a low-level variable-impedance muscle coor dination model with mono- and biarticular muscles in monoped locomotion task. We systematically restrict the high-level controller by varying the control frequency and by introducing biologically inspired observation conditions: delayed, partial, and substituted observation. Under these conditions, we evaluate how the low-level variable-impedance muscle coordination contributes to learning process of high-level neural network. The results show that variable-impedance muscle coordination enables stable locomotion even under slow-rate control frequency and limited observation conditions. These findings demonstrate that the morphological computation of muscle coordination effectively offloads high-frequency feedback of the high-level controller and provide a design principle for the controller in motor control.
comment: 12 pages, 11 figures. Submitted to IEEE Transactions on Systems, Man, and Cybernetics: Systems
PerFACT: Motion Policy with LLM-Powered Dataset Synthesis and Fusion Action-Chunking Transformers
Deep learning methods have significantly enhanced motion planning for robotic manipulators by leveraging prior experiences within planning datasets. However, state-of-the-art neural motion planners are primarily trained on small datasets collected in manually generated workspaces, limiting their generalizability to out-of-distribution scenarios. Additionally, these planners often rely on monolithic network architectures that struggle to encode critical planning information. To address these challenges, we introduce Motion Policy with Dataset Synthesis powered by large language models (LLMs) and Fusion Action-Chunking Transformers (PerFACT), which incorporates two key components. Firstly, a novel LLM-powered workspace generation method, MotionGeneralizer, enables large-scale planning data collection by producing a diverse set of semantically feasible workspaces. Secondly, we introduce Fusion Motion Policy Networks (MpiNetsFusion), a generalist neural motion planner that uses a fusion action-chunking transformer to better encode planning signals and attend to multiple feature modalities. Leveraging MotionGeneralizer, we collect 3.5M trajectories to train and evaluate MpiNetsFusion against state-of-the-art planners, which shows that the proposed MpiNetsFusion can plan several times faster on the evaluated tasks.
World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
Autonomous navigation of terrestrial robots using Reinforcement Learning (RL) from LIDAR observations remains challenging due to the high dimensionality of sensor data and the sample inefficiency of model-free approaches. Conventional policy networks struggle to process full-resolution LIDAR inputs, forcing prior works to rely on simplified observations that reduce spatial awareness and navigation robustness. This paper presents a novel model-based RL framework built on top of the DreamerV3 algorithm, integrating a Multi-Layer Perceptron Variational Autoencoder (MLP-VAE) within a world model to encode high-dimensional LIDAR readings into compact latent representations. These latent features, combined with a learned dynamics predictor, enable efficient imagination-based policy optimization. Experiments on simulated TurtleBot3 navigation tasks demonstrate that the proposed architecture achieves faster convergence and higher success rate compared to model-free baselines such as SAC, DDPG, and TD3. It is worth emphasizing that the DreamerV3-based agent attains a 100% success rate across all evaluated environments when using the full dataset of the Turtlebot3 LIDAR (360 readings), while model-free methods plateaued below 85%. These findings demonstrate that integrating predictive world models with learned latent representations enables more efficient and robust navigation from high-dimensional sensory data.
comment: Accepted for publication in the Journal of Intelligent and Fuzzy Systems
What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
In this paper, we provide a comprehensive overview of existing scene representation methods for robotics, covering traditional representations such as point clouds, voxels, signed distance functions (SDF), and scene graphs, as well as more recent neural representations like Neural Radiance Fields (NeRF), 3D Gaussian Splatting (3DGS), and the emerging Foundation Models. While current SLAM and localization systems predominantly rely on sparse representations like point clouds and voxels, dense scene representations are expected to play a critical role in downstream tasks such as navigation and obstacle avoidance. Moreover, neural representations such as NeRF, 3DGS, and foundation models are well-suited for integrating high-level semantic features and language-based priors, enabling more comprehensive 3D scene understanding and embodied intelligence. In this paper, we categorized the core modules of robotics into five parts (Perception, Mapping, Localization, Navigation, Manipulation). We start by presenting the standard formulation of different scene representation methods and comparing the advantages and disadvantages of scene representation across different modules. This survey is centered around the question: What is the best 3D scene representation for robotics? We then discuss the future development trends of 3D scene representations, with a particular focus on how the 3D Foundation Model could replace current methods as the unified solution for future robotic applications. The remaining challenges in fully realizing this model are also explored. We aim to offer a valuable resource for both newcomers and experienced researchers to explore the future of 3D scene representations and their application in robotics. We have published an open-source project on GitHub and will continue to add new works and technologies to this project.
Surfel-LIO: Fast LiDAR-Inertial Odometry with Pre-computed Surfels and Hierarchical Z-order Voxel Hashing
LiDAR-inertial odometry (LIO) is an active research area, as it enables accurate real-time state estimation in GPS-denied environments. Recent advances in map data structures and spatial indexing have significantly improved the efficiency of LIO systems. Nevertheless, we observe that two aspects may still leave room for improvement: (1) nearest neighbor search often requires examining multiple spatial units to gather sufficient points for plane fitting, and (2) plane parameters are typically recomputed at every iteration despite unchanged map geometry. Motivated by these observations, we propose Surfel-LIO, which employs a hierarchical voxel structure (hVox) with pre-computed surfel representation. This design enables O(1) correspondence retrieval without runtime neighbor enumeration or plane fitting, combined with Z-order curve encoding for cache-friendly spatial indexing. Experimental results on the M3DGR dataset demonstrate that our method achieves significantly faster processing speed compared to recent state-of-the-art methods while maintaining comparable state estimation accuracy. Our implementation is publicly available at https://github.com/93won/lidar_inertial_odometry.
GOMP: Grasped Object Manifold Projection for Multimodal Imitation Learning of Manipulation
Imitation Learning (IL) holds great potential for learning repetitive manipulation tasks, such as those in industrial assembly. However, its effectiveness is often limited by insufficient trajectory precision due to compounding errors. In this paper, we introduce Grasped Object Manifold Projection (GOMP), an interactive method that mitigates these errors by constraining a non-rigidly grasped object to a lower-dimensional manifold. GOMP assumes a precise task in which a manipulator holds an object that may shift within the grasp in an observable manner and must be mated with a grounded part. Crucially, all GOMP enhancements are learned from the same expert dataset used to train the base IL policy, and are adjusted with an n-arm bandit-based interactive component. We propose a theoretical basis for GOMP's improvement upon the well-known compounding error bound in IL literature. We demonstrate the framework on four precise assembly tasks using tactile feedback, and note that the approach remains modality-agnostic. Data and videos are available at williamvdb.github.io/GOMPsite.
comment: 8 pages, 8 figures, 2 tables
NavMapFusion: Diffusion-based Fusion of Navigation Maps for Online Vectorized HD Map Construction WACV 2026
Accurate environmental representations are essential for autonomous driving, providing the foundation for safe and efficient navigation. Traditionally, high-definition (HD) maps are providing this representation of the static road infrastructure to the autonomous system a priori. However, because the real world is constantly changing, such maps must be constructed online from on-board sensor data. Navigation-grade standard-definition (SD) maps are widely available, but their resolution is insufficient for direct deployment. Instead, they can be used as coarse prior to guide the online map construction process. We propose NavMapFusion, a diffusion-based framework that performs iterative denoising conditioned on high-fidelity sensor data and on low-fidelity navigation maps. This paper strives to answer: (1) How can coarse, potentially outdated navigation maps guide online map construction? (2) What advantages do diffusion models offer for map fusion? We demonstrate that diffusion-based map construction provides a robust framework for map fusion. Our key insight is that discrepancies between the prior map and online perception naturally correspond to noise within the diffusion process; consistent regions reinforce the map construction, whereas outdated segments are suppressed. On the nuScenes benchmark, NavMapFusion conditioned on coarse road lines from OpenStreetMap data reaches a 21.4% relative improvement on 100 m, and even stronger improvements on larger perception ranges, while maintaining real-time capabilities. By fusing low-fidelity priors with high-fidelity sensor data, the proposed method generates accurate and up-to-date environment representations, guiding towards safer and more reliable autonomous driving. The code is available at https://github.com/tmonnin/navmapfusion
comment: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)
ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models
Recent advances in large multimodal models have enabled new opportunities in embodied AI, particularly in robotic manipulation. These models have shown strong potential in generalization and reasoning, but achieving reliable and responsible robotic behavior in real-world settings remains an open challenge. In high-stakes environments, robotic agents must go beyond basic task execution to perform risk-aware reasoning, moral decision-making, and physically grounded planning. We introduce ResponsibleRobotBench, a systematic benchmark designed to evaluate and accelerate progress in responsible robotic manipulation from simulation to real world. This benchmark consists of 23 multi-stage tasks spanning diverse risk types, including electrical, chemical, and human-related hazards, and varying levels of physical and planning complexity. These tasks require agents to detect and mitigate risks, reason about safety, plan sequences of actions, and engage human assistance when necessary. Our benchmark includes a general-purpose evaluation framework that supports multimodal model-based agents with various action representation modalities. The framework integrates visual perception, context learning, prompt construction, hazard detection, reasoning and planning, and physical execution. It also provides a rich multimodal dataset, supports reproducible experiments, and includes standardized metrics such as success rate, safety rate, and safe success rate. Through extensive experimental setups, ResponsibleRobotBench enables analysis across risk categories, task types, and agent configurations. By emphasizing physical reliability, generalization, and safety in decision-making, this benchmark provides a foundation for advancing the development of trustworthy, real-world responsible dexterous robotic systems. https://sites.google.com/view/responsible-robotbench
comment: https://sites.google.com/view/responsible-robotbench
Driving Beyond Privilege: Distilling Dense-Reward Knowledge into Sparse-Reward Policies
We study how to exploit dense simulator-defined rewards in vision-based autonomous driving without inheriting their misalignment with deployment metrics. In realistic simulators such as CARLA, privileged state (e.g., lane geometry, infractions, time-to-collision) can be converted into dense rewards that stabilize and accelerate model-based reinforcement learning, but policies trained directly on these signals often overfit and fail to generalize when evaluated on sparse objectives such as route completion and collision-free overtaking. We propose reward-privileged world model distillation, a two-stage framework in which a teacher DreamerV3-style agent is first trained with a dense privileged reward, and only its latent dynamics are distilled into a student trained solely on sparse task rewards. Teacher and student share the same observation space (semantic bird's-eye-view images); privileged information enters only through the teacher's reward, and the student does not imitate the teacher's actions or value estimates. Instead, the student's world model is regularized to match the teacher's latent dynamics while its policy is learned from scratch on sparse success/failure signals. In CARLA lane-following and overtaking benchmarks, sparse-reward students outperform both dense-reward teachers and sparse-from-scratch baselines. On unseen lane-following routes, reward-privileged distillation improves success by about 23 percent relative to the dense teacher while maintaining comparable or better safety. On overtaking, students retain near-perfect performance on training routes and achieve up to a 27x improvement in success on unseen routes, with improved lane keeping. These results show that dense rewards can be leveraged to learn richer dynamics models while keeping the deployed policy optimized strictly for sparse, deployment-aligned objectives.
Sliding Mode Control and Subspace Stabilization Methodology for the Orbital Stabilization of Periodic Trajectories
This paper presents a combined sliding-mode control and subspace stabilization methodology for orbital stabilization of periodic trajectories in underactuated mechanical systems with one degree of underactuation. The approach starts with partial feedback linearization and stabilization. Then, transverse linearization along the reference orbit is computed, resulting in a periodic linear time-varying system with a stable subspace. Sliding-mode control drives trajectories toward this subspace. The proposed design avoids solving computationally intensive periodic LQR problems and improves robustness to matched disturbances. The methodology is validated through experiments on the Butterfly robot.
CRAFT-E: A Neuro-Symbolic Framework for Embodied Affordance Grounding
Assistive robots operating in unstructured environments must understand not only what objects are, but what they can be used for. This requires grounding language-based action queries to objects that both afford the requested function and can be physically retrieved. Existing approaches often rely on black-box models or fixed affordance labels, limiting transparency, controllability, and reliability for human-facing applications. We introduce CRAFT-E, a modular neuro-symbolic framework that composes a structured verb-property-object knowledge graph with visual-language alignment and energy-based grasp reasoning. The system generates interpretable grounding paths that expose the factors influencing object selection and incorporates grasp feasibility as an integral part of affordance inference. We further construct a benchmark dataset with unified annotations for verb-object compatibility, segmentation, and grasp candidates, and deploy the full pipeline on a physical robot. CRAFT-E achieves competitive performance in static scenes, ImageNet-based functional retrieval, and real-world trials involving 20 verbs and 39 objects. The framework remains robust under perceptual noise and provides transparent, component-level diagnostics. By coupling symbolic reasoning with embodied perception, CRAFT-E offers an interpretable and customizable alternative to end-to-end models for affordance-grounded object selection, supporting trustworthy decision-making in assistive robotic systems.
comment: 20 pages. 3 figures, 4 tables. Under Review
Supercomputing for High-speed Avoidance and Reactive Planning in Robots
This paper presents SHARP (Supercomputing for High-speed Avoidance and Reactive Planning), a proof-of-concept study demonstrating how high-performance computing (HPC) can enable millisecond-scale responsiveness in robotic control. While modern robots face increasing demands for reactivity in human--robot shared workspaces, onboard processors are constrained by size, power, and cost. Offloading to HPC offers massive parallelism for trajectory planning, but its feasibility for real-time robotics remains uncertain due to network latency and jitter. We evaluate SHARP in a stress-test scenario where a 7-DOF manipulator must dodge high-speed foam projectiles. Using a parallelized multi-goal A* search implemented with MPI on both local and remote HPC clusters, the system achieves mean planning latencies of 22.9 ms (local) and 30.0 ms (remote, ~300 km away), with avoidance success rates of 84% and 88%, respectively. These results show that when round-trip latency remains within the tens-of-milliseconds regime, HPC-side computation is no longer the bottleneck, enabling avoidance well below human reaction times. The SHARP results motivate hybrid control architectures: low-level reflexes remain onboard for safety, while bursty, high-throughput planning tasks are offloaded to HPC for scalability. By reporting per-stage timing and success rates, this study provides a reproducible template for assessing real-time feasibility of HPC-driven robotics. Collectively, SHARP reframes HPC offloading as a viable pathway toward dependable, reactive robots in dynamic environments.
comment: Error in the graph calculation
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when training on downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore SMP can compose different styles to synthesize new styles not present in the original dataset. Our method produces high-quality motion comparable to state-of-the-art adversarial imitation learning methods through reusable and modular motion priors. We demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video demo available at https://youtu.be/ravlZJteS20
comment: 14 pages, 9 figures
MP1: MeanFlow Tames Policy Learning in 1-step for Robotic Manipulation AAAI 2026
In robot manipulation, robot learning has become a prevailing approach. However, generative models within this field face a fundamental trade-off between the slow, iterative sampling of diffusion models and the architectural constraints of faster Flow-based methods, which often rely on explicit consistency losses. To address these limitations, we introduce MP1, which pairs 3D point-cloud inputs with the MeanFlow paradigm to generate action trajectories in one network function evaluation (1-NFE). By directly learning the interval-averaged velocity via the "MeanFlow Identity", our policy avoids any additional consistency constraints. This formulation eliminates numerical ODE-solver errors during inference, yielding more precise trajectories. MP1 further incorporates CFG for improved trajectory controllability while retaining 1-NFE inference without reintroducing structural constraints. Because subtle scene-context variations are critical for robot learning, especially in few-shot learning, we introduce a lightweight Dispersive Loss that repels state embeddings during training, boosting generalization without slowing inference. We validate our method on the Adroit and Meta-World benchmarks, as well as in real-world scenarios. Experimental results show MP1 achieves superior average task success rates, outperforming DP3 by 10.2% and FlowPolicy by 7.3%. Its average inference time is only 6.8 ms-19x faster than DP3 and nearly 2x faster than FlowPolicy. Our project page is available at https://mp1-2254.github.io/, and the code can be accessed at https://github.com/LogSSim/MP1.
comment: This paper has been accepted by AAAI 2026
DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception
Robust semantic perception for autonomous vehicles relies on effectively combining multiple sensors with complementary strengths and weaknesses. State-of-the-art sensor fusion approaches to semantic perception often treat sensor data uniformly across the spatial extent of the input, which hinders performance when faced with challenging conditions. By contrast, we propose a novel depth-guided multimodal fusion method that upgrades condition-aware fusion by integrating depth information. Our network, DGFusion, poses multimodal segmentation as a multi-task problem, utilizing the lidar measurements, which are typically available in outdoor sensor suites, both as one of the model's inputs and as ground truth for learning depth. Our corresponding auxiliary depth head helps to learn depth-aware features, which are encoded into spatially varying local depth tokens that condition our attentive cross-modal fusion. Together with a global condition token, these local depth tokens dynamically adapt sensor fusion to the spatially varying reliability of each sensor across the scene, which largely depends on depth. In addition, we propose a robust loss for our depth, which is essential for learning from lidar inputs that are typically sparse and noisy in adverse conditions. Our method achieves state-of-the-art panoptic and semantic segmentation performance on the challenging MUSES and DeLiVER datasets. Code and models will be available at https://github.com/timbroed/DGFusion
comment: Code and models will be available at https://github.com/timbroed/DGFusion
Anti-bullying Adaptive Cruise Control: A proactive right-of-way protection approach
Adaptive Cruise Control (ACC) systems have been widely commercialized in recent years. However, existing ACC systems remain vulnerable to close-range cut-ins, a behavior that resembles "road bullying". To address this issue, this research proposes an Anti-bullying Adaptive Cruise Control (AACC) approach, which is capable of proactively protecting right-of-way against such "road bullying" cut-ins. To handle diverse "road bullying" cut-in scenarios smoothly, the proposed approach first leverages an online Inverse Optimal Control (IOC) based algorithm for individual driving style identification. Then, based on Stackelberg competition, a game-theoretic-based motion planning framework is presented in which the identified individual driving styles are utilized to formulate cut-in vehicles' reaction functions. By integrating such reaction functions into the ego vehicle's motion planning, the ego vehicle could consider cut-in vehicles' all possible reactions to find its optimal right-of-way protection maneuver. To the best of our knowledge, this research is the first to model vehicles' interaction dynamics and develop an interactive planner that adapts cut-in vehicle's various driving styles. Simulation results show that the proposed approach can prevent "road bullying" cut-ins and be adaptive to different cut-in vehicles' driving styles. It can improve safety and comfort by up to 79.8% and 20.4%. The driving efficiency has benefits by up to 19.33% in traffic flow. The proposed approach can also adopt more flexible driving strategies. Furthermore, the proposed approach can support real-time field implementation by ensuring less than 50 milliseconds computation time.
comment: 15 pages, 19 figures
HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving
Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.
comment: Project page: https://hybridworldsim.github.io/
SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots
Visual traversability estimation is critical for autonomous navigation, but existing VLM-based methods rely on hand-crafted prompts, generalize poorly across embodiments, and output only traversability maps, leaving trajectory generation to slow external planners. We propose SwarmDiffusion, a lightweight end-to-end diffusion model that jointly predicts traversability and generates a feasible trajectory from a single RGB image. To remove the need for annotated or planner-produced paths, we introduce a planner-free trajectory construction pipeline based on randomized waypoint sampling, Bezier smoothing, and regularization enforcing connectivity, safety, directionality, and path thinness. This enables learning stable motion priors without demonstrations. SwarmDiffusion leverages VLM-derived supervision without prompt engineering and conditions the diffusion process on a compact embodiment state, producing physically consistent, traversable paths that transfer across different robot platforms. Across indoor environments and two embodiments (quadruped and aerial), the method achieves 80-100% navigation success and 0.09s inference, and adapts to a new robot using only-500 additional visual samples. It generalizes reliably to unseen environments in simulation and real-world trials, offering a scalable, prompt-free approach to unified traversability reasoning and trajectory generation.
comment: This work has been submitted for publication and is currently under review
DynamicCity: Large-Scale 4D Occupancy Generation from Dynamic Scenes ICLR 2025
Urban scene generation has been developing rapidly recently. However, existing methods primarily focus on generating static and single-frame scenes, overlooking the inherently dynamic nature of real-world driving environments. In this work, we introduce DynamicCity, a novel 4D occupancy generation framework capable of generating large-scale, high-quality dynamic 4D scenes with semantics. DynamicCity mainly consists of two key models. 1) A VAE model for learning HexPlane as the compact 4D representation. Instead of using naive averaging operations, DynamicCity employs a novel Projection Module to effectively compress 4D features into six 2D feature maps for HexPlane construction, which significantly enhances HexPlane fitting quality (up to 12.56 mIoU gain). Furthermore, we utilize an Expansion & Squeeze Strategy to reconstruct 3D feature volumes in parallel, which improves both network training efficiency and reconstruction accuracy than naively querying each 3D point (up to 7.05 mIoU gain, 2.06x training speedup, and 70.84% memory reduction). 2) A DiT-based diffusion model for HexPlane generation. To make HexPlane feasible for DiT generation, a Padded Rollout Operation is proposed to reorganize all six feature planes of the HexPlane as a squared 2D feature map. In particular, various conditions could be introduced in the diffusion or sampling process, supporting versatile 4D generation applications, such as trajectory- and command-driven generation, inpainting, and layout-conditioned generation. Extensive experiments on the CarlaSC and Waymo datasets demonstrate that DynamicCity significantly outperforms existing state-of-the-art 4D occupancy generation methods across multiple metrics. The code and models have been released to facilitate future research.
comment: ICLR 2025 Spotlight; 35 pages, 18 figures, 15 tables; Project Page at https://dynamic-city.github.io/
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
Recent advancements in vision foundation models (VFMs) have revolutionized visual perception in 2D, yet their potential for 3D scene understanding, particularly in autonomous driving applications, remains underexplored. In this paper, we introduce LargeAD, a versatile and scalable framework designed for large-scale 3D pretraining across diverse real-world driving datasets. Our framework leverages VFMs to extract semantically rich superpixels from 2D images, which are aligned with LiDAR point clouds to generate high-quality contrastive samples. This alignment facilitates cross-modal representation learning, enhancing the semantic consistency between 2D and 3D data. We introduce several key innovations: (i) VFM-driven superpixel generation for detailed semantic representation, (ii) a VFM-assisted contrastive learning strategy to align multimodal features, (iii) superpoint temporal consistency to maintain stable representations across time, and (iv) multi-source data pretraining to generalize across various LiDAR configurations. Our approach achieves substantial gains over state-of-the-art methods in linear probing and fine-tuning for LiDAR-based segmentation and object detection. Extensive experiments on 11 large-scale multi-sensor datasets highlight our superior performance, demonstrating adaptability, efficiency, and robustness in real-world autonomous driving scenarios.
comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
3D and 4D World Modeling: A Survey
World modeling has become a cornerstone in AI research, enabling agents to understand, represent, and predict the dynamic environments they inhabit. While prior work largely emphasizes generative methods for 2D image and video data, they overlook the rapidly growing body of work that leverages native 3D and 4D representations such as RGB-D imagery, occupancy grids, and LiDAR point clouds for large-scale scene modeling. At the same time, the absence of a standardized definition and taxonomy for ``world models'' has led to fragmented and sometimes inconsistent claims in the literature. This survey addresses these gaps by presenting the first comprehensive review explicitly dedicated to 3D and 4D world modeling and generation. We establish precise definitions, introduce a structured taxonomy spanning video-based (VideoGen), occupancy-based (OccGen), and LiDAR-based (LiDARGen) approaches, and systematically summarize datasets and evaluation metrics tailored to 3D/4D settings. We further discuss practical applications, identify open challenges, and highlight promising research directions, aiming to provide a coherent and foundational reference for advancing the field. A systematic summary of existing literature is available at https://github.com/worldbench/awesome-3d-4d-world-models
comment: Survey; 50 pages, 10 figures, 14 tables; GitHub Repo at https://github.com/worldbench/awesome-3d-4d-world-models
Robust Deterministic Policy Gradient for Disturbance Attenuation and Its Application to Quadrotor Control
This paper presents a robust reinforcement learning algorithm called robust deterministic policy gradient (RDPG), which reformulates the H-infinity control problem as a two-player zero-sum dynamic game between a user and an adversary. The method combines deterministic policy gradients with deep reinforcement learning to train a robust policy that attenuates disturbances efficiently. A practical variant, robust deep deterministic policy gradient (RDDPG), integrates twin-delayed updates for stability and sample efficiency. Experiments on an unmanned aerial vehicle demonstrate superior robustness and tracking accuracy under severe disturbance conditions.
comment: 8 pages
Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we introduce ViFailback, a framework designed to diagnose robotic manipulation failures and provide both textual and visual correction guidance. Our framework utilizes explicit visual symbols to enhance annotation efficiency. We further release the ViFailback dataset, a large-scale collection of 58,126 Visual Question Answering (VQA) pairs along with their corresponding 5,202 real-world manipulation trajectories. Based on the dataset, we establish ViFailback-Bench, a benchmark of 11 fine-grained VQA tasks designed to assess the failure diagnosis and correction abilities of Vision-Language Models (VLMs), featuring ViFailback-Bench Lite for closed-ended and ViFailback-Bench Hard for open-ended evaluation. To demonstrate the effectiveness of our framework, we built the ViFailback-8B VLM, which not only achieves significant overall performance improvement on ViFailback-Bench but also generates visual symbols for corrective action guidance. Finally, by integrating ViFailback-8B with a VLA model, we conduct real-world robotic experiments demonstrating its ability to assist the VLA model in recovering from failures. Project Website: https://x1nyuzhou.github.io/vifailback.github.io/
FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction
Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.
Quaternion-Based Sliding Mode Control for Six Degrees of Freedom Flight Control of Quadrotors
Despite extensive research on sliding mode control (SMC) design for quadrotors, the existing approaches suffer from certain limitations. Euler angle-based SMC formulations suffer from poor performance in high-pitch or -roll maneuvers. Quaternion-based SMC approaches have unwinding issues and complex architecture. Coordinate-free methods are slow and only almost globally stable. This paper presents a new six degrees of freedom SMC flight controller to address the above limitations. We use a cascaded architecture with a position controller in the outer loop and a quaternion-based attitude controller in the inner loop. The position controller generates the desired trajectory for the attitude controller using a coordinate-free approach. The quaternion-based attitude controller uses the natural characteristics of the quaternion hypersphere, featuring a simple structure while providing global stability and avoiding unwinding issues. We compare our controller with three other common control methods conducting challenging maneuvers like flip-over and high-speed trajectory tracking in the presence of model uncertainties and disturbances. Our controller consistently outperforms the benchmark approaches with less control effort and actuator saturation, offering highly effective and efficient flight control.
AugMapNet: Improving Spatial Latent Structure via BEV Grid Augmentation for Enhanced Vectorized Online HD Map Construction WACV 2026
Autonomous driving requires understanding infrastructure elements, such as lanes and crosswalks. To navigate safely, this understanding must be derived from sensor data in real-time and needs to be represented in vectorized form. Learned Bird's-Eye View (BEV) encoders are commonly used to combine a set of camera images from multiple views into one joint latent BEV grid. Traditionally, from this latent space, an intermediate raster map is predicted, providing dense spatial supervision but requiring post-processing into the desired vectorized form. More recent models directly derive infrastructure elements as polylines using vectorized map decoders, providing instance-level information. Our approach, Augmentation Map Network (AugMapNet), proposes latent BEV feature grid augmentation, a novel technique that significantly enhances the latent BEV representation. AugMapNet combines vector decoding and dense spatial supervision more effectively than existing architectures while remaining easy to integrate compared to other hybrid approaches. It additionally benefits from extra processing on its latent BEV features. Experiments on nuScenes and Argoverse2 datasets demonstrate significant improvements on vectorized map prediction of up to 13.3% over the StreamMapNet baseline on 60 m range and greater improvements on larger ranges. We confirm transferability by applying our method to another baseline, SQD-MapNet, and find similar improvements. A detailed analysis of the latent BEV grid confirms a more structured latent space of AugMapNet and shows the value of our novel concept beyond pure performance improvement. The code can be found at https://github.com/tmonnin/augmapnet
comment: Accepted to 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2026)
Efficient Preference-Based Reinforcement Learning: Randomized Exploration Meets Experimental Design
We study reinforcement learning from human feedback in general Markov decision processes, where agents learn from trajectory-level preference comparisons. A central challenge in this setting is to design algorithms that select informative preference queries to identify the underlying reward while ensuring theoretical guarantees. We propose a meta-algorithm based on randomized exploration, which avoids the computational challenges associated with optimistic approaches and remains tractable. We establish both regret and last-iterate guarantees under mild reinforcement learning oracle assumptions. To improve query complexity, we introduce and analyze an improved algorithm that collects batches of trajectory pairs and applies optimal experimental design to select informative comparison queries. The batch structure also enables parallelization of preference queries, which is relevant in practical deployment as feedback can be gathered concurrently. Empirical evaluation confirms that the proposed method is competitive with reward-based reinforcement learning while requiring a small number of preference queries.
Multiagent Systems
SRPG: Semantically Reconstructed Privacy Guard for Zero-Trust Privacy in Educational Multi-Agent Systems
Multi-Agent Systems (MAS) with large language models (LLMs) enable personalized education but risk leaking minors personally identifiable information (PII) via unstructured dialogue. Existing privacy methods struggle to balance security and utility: role-based access control fails on unstructured text, while naive masking destroys pedagogical context. We propose SRPG, a privacy guard for educational MAS, using a Dual-Stream Reconstruction Mechanism: a strict sanitization stream ensures zero PII leakage, and a context reconstruction stream (LLM driven) recovers mathematical logic. This decouples instructional content from private data, preserving teaching efficacy. Tests on MathDial show SRPG works across models; with GPT-4o, it achieves 0.0000 Attack Success Rate (ASR) (zero leakage) and 0.8267 Exact Match, far outperforming the zero trust Pure LLM baseline (0.2138). SRPG effectively protects minors privacy without sacrificing mathematical instructional quality.
Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks AAAI 2026
Despite recent advances, autonomous agents often struggle to solve complex tasks in enterprise domains that require coordinating multiple tools and processing diverse data sources. This struggle is driven by two main limitations. First, single-agent architectures enforce a monolithic plan-execute loop, which directly causes trajectory instability. Second, the requirement to use local open-weight models for data privacy introduces smaller context windows leading to the rapid consumption of context from large tool outputs. To solve this problem we introduce RP-ReAct (Reasoner Planner-ReAct), a novel multi-agent approach that fundamentally decouples strategic planning from low-level execution to achieve superior reliability and efficiency. RP-ReAct consists of a Reasoner Planner Agent (RPA), responsible for planning each sub-step, continuously analysing the execution results using the strong reasoning capabilities of a Large Reasoning Model, and one or multiple Proxy-Execution Agent (PEA) that translates sub-steps into concrete tool interactions using a ReAct approach. Crucially, we incorporate a context-saving strategy within the PEA to mitigate context window overflow by managing large tool outputs via external storage and on-demand access. We evaluate RP-ReAct, on the challenging, multi-domain ToolQA benchmark using a diverse set of six open-weight reasoning models. Our empirical results show that RP-ReAct achieves superior performance and improved generalization ability over state-of-the-art baselines when addressing diverse complex tasks across the evaluated domains. Furthermore we establish the enhanced robustness and stability of our approach across different model scales, paving the way for effective and deployable agentic solutions for enterprises.
comment: 11 pages, 1 figure, 2 tables, Workshop AAAI 2026 agentic AI Benchmarks and Applications for Enterprise Tasks
Multi-Agent Reinforcement Learning with Communication-Constrained Priors
Communication is one of the effective means to improve the learning of cooperative policy in multi-agent systems. However, in most real-world scenarios, lossy communication is a prevalent issue. Existing multi-agent reinforcement learning with communication, due to their limited scalability and robustness, struggles to apply to complex and dynamic real-world environments. To address these challenges, we propose a generalized communication-constrained model to uniformly characterize communication conditions across different scenarios. Based on this, we utilize it as a learning prior to distinguish between lossy and lossless messages for specific scenarios. Additionally, we decouple the impact of lossy and lossless messages on distributed decision-making, drawing on a dual mutual information estimatior, and introduce a communication-constrained multi-agent reinforcement learning framework, quantifying the impact of communication messages into the global reward. Finally, we validate the effectiveness of our approach across several communication-constrained benchmarks.
Market share maximizing strategies of CAV fleet operators may cause chaos in our cities
We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In individual routing, just like today, drivers select routes minimizing expected travel costs, whereas in collective routing an operator centrally assigns vehicles to routes. The utility is then the average experienced travel time discounted with individually perceived attractiveness of automated driving. The market share maximising strategy amounts to offering utility greater than for individual routing to as many drivers as possible. Our theoretical contribution consists in developing a rigorous mathematical framework of individualized collective routing and studying algorithms which fleets of CAVs may use for their market-share optimization. We also define bi-level CAV - HDV equilibria and derive conditions which link the potential marketing behaviour of CAVs to the behavioural profile of the human population. Practically, we find that the fleet operator may often be able to equilibrate at full market share by simply mimicking the choices HDVs would make. In more realistic heterogenous human population settings, however, we discover that the market-share maximizing fleet controller should use highly variable mixed strategies as a means to attract or retain customers. The reason is that in mixed routing the powerful group player can control which vehicles are routed via congested and uncongested alternatives. The congestion pattern generated by CAVs is, however, not known to HDVs before departure and so HDVs cannot select faster routes and face huge uncertainty whichever alternative they choose. Consequently, mixed market-share maximising fleet strategies resulting in unpredictable day-to-day driving conditions may, alarmingly, become pervasive in our future cities.
comment: 34 pages, 9 figures
Modal Logical Neural Networks
We propose Modal Logical Neural Networks (MLNNs), a neurosymbolic framework that integrates deep learning with the formal semantics of modal logic, enabling reasoning about necessity and possibility. Drawing on Kripke semantics, we introduce specialized neurons for the modal operators $\Box$ and $\Diamond$ that operate over a set of possible worlds, enabling the framework to act as a differentiable ``logical guardrail.'' The architecture is highly flexible: the accessibility relation between worlds can either be fixed by the user to enforce known rules or, as an inductive feature, be parameterized by a neural network. This allows the model to optionally learn the relational structure of a logical system from data while simultaneously performing deductive reasoning within that structure. This versatile construction is designed for flexibility. The entire framework is differentiable from end to end, with learning driven by minimizing a logical contradiction loss. This not only makes the system resilient to inconsistent knowledge but also enables it to learn nonlinear relationships that can help define the logic of a problem space. We illustrate MLNNs on four case studies: grammatical guardrailing, axiomatic detection of the unknown, multi-agent epistemic trust, and detecting constructive deception in natural language negotiation. These experiments demonstrate how enforcing or learning accessibility can increase logical consistency and interpretability without changing the underlying task architecture.
comment: 27 pages, 10 figures, 7 tables
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
Bridging the gap between theoretical conceptualization and computational implementation is a major bottleneck in Scientific Computing (SciC) and Scientific Machine Learning (SciML). We introduce ATHENA (Agentic Team for Hierarchical Evolutionary Numerical Algorithms), an agentic framework designed as an Autonomous Lab to manage the end-to-end computational research lifecycle. Its core is the HENA loop, a knowledge-driven diagnostic process framed as a Contextual Bandit problem. Acting as an online learner, the system analyzes prior trials to select structural `actions' ($A_n$) from combinatorial spaces guided by expert blueprints (e.g., Universal Approximation, Physics-Informed constraints). These actions are translated into executable code ($S_n$) to generate scientific rewards ($R_n$). ATHENA transcends standard automation: in SciC, it autonomously identifies mathematical symmetries for exact analytical solutions or derives stable numerical solvers where foundation models fail. In SciML, it performs deep diagnosis to tackle ill-posed formulations and combines hybrid symbolic-numeric workflows (e.g., coupling PINNs with FEM) to resolve multiphysics problems. The framework achieves super-human performance, reaching validation errors of $10^{-14}$. Furthermore, collaborative ``human-in-the-loop" intervention allows the system to bridge stability gaps, improving results by an order of magnitude. This paradigm shift focuses from implementation mechanics to methodological innovation, accelerating scientific discovery.
AsymPuzl: An Asymmetric Puzzle for multi-agent cooperation NeurIPS
Large Language Model (LLM) agents are increasingly studied in multi-turn, multi-agent scenarios, yet most existing setups emphasize open-ended role-play rather than controlled evaluation. We introduce AsymPuzl, a minimal but expressive two-agent puzzle environment designed to isolate communication under information asymmetry. Each agent observes complementary but incomplete views of a symbolic puzzle and must exchange messages to solve it cooperatively. Using a diverse set of current-generation and open-source LLMs, we show that (i) strong models such as GPT-5 and Claude-4.0 reliably converge across puzzle sizes on the solution by sharing complete information in two turns, (ii) weaker models often ignore partner messages or over-correct their hypotheses, and (iii) feedback design is non-trivial: simple self-feedback improves success rates, while detailed joint feedback can hurt performance. These findings show that even in simple cooperative tasks, LLM communication strategies diverge and depend on the granularity of feedback signals. AsymPuzl thus provides a testbed for probing the limits of multi-turn cooperation and opens avenues for studying coordination mechanisms.
comment: Accepted at NeurIPS MTI-LLM 2025
A Monad-Based Clause Architecture for Artificial Age Score (AAS) in Large Language Models
Large language models (LLMs) are often deployed as powerful yet opaque systems, leaving open how their internal memory and "self-like" behavior should be governed in a principled and auditable way. The Artificial Age Score (AAS) was previously introduced and mathematically justified through three theorems that characterise it as a metric of artificial memory aging. Building on this foundation, the present work develops an engineering-oriented, clause-based architecture that imposes law-like constraints on LLM memory and control. Twenty selected monads from Leibniz's Monadology are grouped into six bundles: ontology, dynamics, representation and consciousness, harmony and reason, body and organisation, and teleology, and each bundle is realised as an executable specification on top of the AAS kernel. Across six minimal Python implementations, these clause families are instantiated in numerical experiments acting on channel-level quantities such as recall scores, redundancy, and weights. Each implementation follows a four-step pattern: inputs and setup, clause implementation, numerical results, and implications for LLM design, emphasising that the framework is not only philosophically motivated but also directly implementable. The experiments show that the clause system exhibits bounded and interpretable behavior: AAS trajectories remain continuous and rate-limited, contradictions and unsupported claims trigger explicit penalties, and hierarchical refinement reveals an organic structure in a controlled manner. Dual views and goal-action pairs are aligned by harmony terms, and windowed drift in perfection scores separates sustained improvement from sustained degradation. Overall, the monad-based clause framework uses AAS as a backbone and provides a transparent, code-level blueprint for constraining and analyzing internal dynamics in artificial agents.
comment: 42 pages, 6 toy simulation Python implementations, 20 monad clauses instantiated across six system bundles (ontology, dynamics, representation and consciousness, harmony and reason, body and organisation, teleology)
Welfare and Cost Aggregation for Multi-Agent Control: When to Choose Which Social Cost Function, and Why?
Many multi-agent socio-technical systems rely on aggregating heterogeneous agents' costs into a social cost function (SCF) to coordinate resource allocation in domains like energy grids, water allocation, or traffic management. The choice of SCF often entails implicit assumptions and may lead to undesirable outcomes if not rigorously justified. In this paper, we demonstrate that what determines which SCF ought to be used is the degree to which individual costs can be compared across agents and which axioms the aggregation shall fulfill. Drawing on the results from social choice theory, we provide guidance on how this process can be used in control applications. We demonstrate which assumptions about interpersonal utility comparability - ranging from ordinal level comparability to full cardinal comparability - together with a choice of desirable axioms, inform the selection of a correct SCF, be it the classical utilitarian sum, the Nash SCF, or maximin. We then demonstrate how the proposed framework can be applied for principled allocations of water and transportation resources.
VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis
The current remote sensing image analysis task is increasingly evolving from traditional object recognition to complex intelligence reasoning, which places higher requirements on the model's reasoning ability and the flexibility of tool invocation. To this end, we propose a new multimodal agent framework, Vision-Interleaved Chain-of-Thought Framework (VICoT), which implements explicit multi-round reasoning by dynamically incorporating visual tools into the chain of thought. Through a stack-based reasoning structure and a modular MCP-compatible tool suite, VICoT enables LLMs to efficiently perform multi-round, interleaved vision-language reasoning tasks with strong generalization and flexibility.We also propose the Reasoning Stack distillation method to migrate complex Agent behaviors to small, lightweight models, which ensures the reasoning capability while significantly reducing complexity. Experiments on multiple remote sensing benchmarks demonstrate that VICoT significantly outperforms existing SOTA frameworks in reasoning transparency, execution efficiency, and generation quality.
Structuring Collective Action with LLM-Guided Evolution: From Ill-Structured Problems to Executable Heuristics
Collective action problems, which require aligning individual incentives with collective goals, are classic examples of Ill-Structured Problems (ISPs). For an individual agent, the causal links between local actions and global outcomes are unclear, stakeholder objectives often conflict, and no single, clear algorithm can bridge micro-level choices with macro-level welfare. We present ECHO-MIMIC, a general computational framework that converts this global complexity into a tractable, Well-Structured Problem (WSP) for each agent by discovering executable heuristics and persuasive rationales. The framework operates in two stages: ECHO (Evolutionary Crafting of Heuristics from Outcomes) evolves snippets of Python code that encode candidate behavioral policies, while MIMIC (Mechanism Inference \& Messaging for Individual-to-Collective Alignment) evolves companion natural language messages that motivate agents to adopt those policies. Both phases employ a large-language-model-driven evolutionary search: the LLM proposes diverse and context-aware code or text variants, while population-level selection retains those that maximize collective performance in a simulated environment. We demonstrate this framework on two distinct ISPs: a canonical agricultural landscape management problem and a carbon-aware EV charging time slot usage problem. Results show that ECHO-MIMIC discovers high-performing heuristics compared to baselines and crafts tailored messages that successfully align simulated agent behavior with system-level goals. By coupling algorithmic rule discovery with tailored communication, ECHO-MIMIC transforms the cognitive burden of collective action into a implementable set of agent-level instructions, making previously ill-structured problems solvable in practice and opening a new path toward scalable, adaptive policy design.
Multi-Agent Code Verification via Information Theory
LLMs generate buggy code: 29.6% of SWE-bench solved patches fail, 62% of BaxBench solutions have vulnerabilities, and existing tools only catch 65% of bugs with 35% false positives. We built CodeX-Verify, a multi-agent system that uses four specialized agents to detect different types of bugs. We prove mathematically that combining agents with different detection patterns finds more bugs than any single agent when the agents look for different problems, using submodularity of mutual information under conditional independence. Measuring agent correlation of rho = 0.05 to 0.25 confirms they detect different bugs. Testing on 99 code samples with verified labels shows our system catches 76.1% of bugs, matching the best existing method (Meta Prompt Testing: 75%) while running faster and without test execution. We tested all 15 agent combinations and found that using multiple agents improves accuracy by 39.7 percentage points (from 32.8% to 72.4%) compared to single agents, with diminishing returns of +14.9pp, +13.5pp, and +11.2pp for agents 2, 3, and 4, validating our theoretical model. The best two-agent combination (Correctness + Performance) reaches 79.3% accuracy. Testing on 300 real patches from Claude Sonnet 4.5 runs in under 200ms per sample, making this practical for production use.
comment: 18 pages, 3 figures, 9 tables
Systems and Control (EESS)
Semi-Markov Decision Process Framework for Age of Incorrect Information Minimization
For a remote estimation system, we study age of incorrect information (AoII), which is a recently proposed semantic-aware freshness metric. In particular, we assume an information source observing a discrete-time finite-state Markov chain (DTMC) and employing push-based transmissions of status update packets towards the monitor which is tasked with remote estimation of the source. The source-to-monitor channel delay is assumed to have a general discrete-time phase-type (DPH) distribution, whereas the zero-delay reverse channel ensures that the source has perfect information on AoII and the remote estimate. A multi-threshold transmission policy is employed where packet transmissions are initiated when the AoII process exceeds a threshold which may be different for each estimation value. In this general setting, our goal is to minimize the weighted sum of time average of an arbitrary function of AoII and estimation, and transmission costs, by suitable choice of the thresholds. We formulate the problem as a semi-Markov decision process (SMDP) with the same state-space as the original DTMC to obtain the optimum multi-threshold policy whereas the parameters of the SMDP are obtained by using a novel stochastic tool called dual-regime absorbing Markov chain (DR-AMC), and its corresponding absorption time distribution named as dual-regime DPH (DR-DPH).
Applied Neural Network-Based Active Control for Vortex-Induced Vibrations Suppression in a Two-Degree-of-Freedom Cylinder
Vortex-Induced Vibrations (VIVs) of cylindrical structures present significant challenges in various engineering applications, including marine risers, tall buildings, and renewable energy systems. Hence, it is vital to control Vortex-Induced Vibrations of cylindrical structures. For this purpose, in this study a novel approach is introduced to VIV control, based on a model-based active control strategy integrated with a Neural Network (NN) in the presence of uncertainty modeling. The proposed method utilizes a closed-loop control system, where feedback from the system's dynamic state is used to generate adaptive control commands, enabling the system to respond to changing flow conditions and nonlinearities. Then, the controllability analysis is conducted to assess the efficiency of the control strategy in mitigating VIV. Two control approaches are implemented: simple learning and composite learning. Both strategies significantly enhance vibration suppression, achieving up to 99% reduction in vibrations despite uncertainties in the system. The results demonstrate the potential of the proposed method to enhance the efficiency, stability, and lifespan of structures subject to VIV.
An Information Theory of Finite Abstractions and their Fundamental Scalability Limits
Finite abstractions are discrete approximations of dynamical systems, such that the set of abstraction trajectories contains, in a formal sense, all system trajectories. There is a consensus that abstractions suffer from the curse of dimensionality: for the same ``accuracy" (how closely the abstraction represents the system), the abstraction size scales poorly with system dimensions. And, yet, after decades of research on abstractions, there are no formal results concerning their accuracy-size tradeoff. In this work, we derive a statistical, quantitative theory of abstractions' accuracy-size tradeoff and uncover fundamental limits on their scalability, through rate-distortion theory -- the branch of information theory studying lossy compression. Abstractions are viewed as encoder-decoder pairs, encoding trajectories of dynamical systems in a higher-dimensional ambient space. Rate represents abstraction size, while distortion describes abstraction accuracy, defined as the spatial average deviation between abstract trajectories and system ones. We obtain a fundamental lower bound on the minimum abstraction distortion, given the system dynamics and a threshold on abstraction size. The bound depends on the complexity of the dynamics, through generalized entropy. We demonstrate the bound's tightness on certain dynamical systems. Finally, we showcase how the developed theory can be employed to construct optimal abstractions, in terms of the size-accuracy tradeoff, through an example on a chaotic system.
A Modular Architecture Design for Autonomous Driving Racing in Controlled Environments
This paper presents an Autonomous System (AS) architecture for vehicles in a closed circuit. The AS performs precision tasks including computer vision for environment perception, positioning and mapping for accurate localization, path planning for optimal trajectory generation, and control for precise vehicle actuation. Each subsystem operates independently while connecting data through a cohesive pipeline architecture. The system implements a modular design that combines state-of-the-art technologies for real-time autonomous navigation in controlled environments.
Fault-Tolerant Control of Steam Temperature in HRSG Superheater under Actuator Fault Using a Sliding Mode Observer and PINN
This paper presents a novel fault-tolerant control framework for steam temperature regulation in Heat Recovery Steam Generators (HRSGs) subject to actuator faults. Addressing the critical challenge of valve degradation in superheater spray attemperators, we propose a synergistic architecture comprising three components: (1) a Sliding Mode Observer (SMO) for estimation of unmeasured thermal states, (2) a Physics-Informed Neural Network (PINN) for estimating multiplicative actuator faults using physical laws as constraints, and (3) a one-sided Sliding Mode Controller (SMC) that adapts to the estimated faults while minimizing excessive actuation. The key innovation lies in the framework of closed-loop physics-awareness, where the PINN continuously informs both the observer and controller about fault severity while preserving thermodynamic consistency. Rigorous uniform ultimate boundedness (UUB) is established via Lyapunov analysis under practical assumptions. Validated on real HRSG operational data, the framework demonstrates effective fault adaptation, reduced temperature overshoot, and maintains steam temperature within 1°C of the setpoint under valve effectiveness loss. This work bridges control theory and physics-guided machine learning to deliver a practically deployable solution for power plant resilience, with extensions applicable to thermal systems subject to multiplicative faults.
Multi-Agent Deep Reinforcement Learning for UAV-Assisted 5G Network Slicing: A Comparative Study of MAPPO, MADDPG, and MADQN
The growing demand for robust, scalable wireless networks in the 5G-and-beyond era has led to the deployment of Unmanned Aerial Vehicles (UAVs) as mobile base stations to enhance coverage in dense urban and underserved rural areas. This paper presents a Multi-Agent Deep Reinforcement Learning (MADRL) framework that integrates Proximal Policy Optimization (MAPPO), Multi-Agent Deep Deterministic Policy Gradient (MADDPG), and Multi-Agent Deep Q-Networks (MADQN) to jointly optimize UAV positioning, resource allocation, Quality of Service (QoS), and energy efficiency through 5G network slicing. The framework adopts Centralized Training with Decentralized Execution (CTDE), enabling autonomous real-time decision-making while preserving global coordination. Users are prioritized into Premium (A), Silver (B), and Bronze (C) slices with distinct QoS requirements. Experiments in realistic urban and rural scenarios show that MAPPO achieves the best overall QoS-energy tradeoff, especially in interference-rich environments; MADDPG offers more precise continuous control and can attain slightly higher SINR in open rural settings at the cost of increased energy usage; and MADQN provides a computationally efficient baseline for discretized action spaces. These findings demonstrate that no single MARL algorithm is universally dominant; instead, algorithm suitability depends on environmental topology, user density, and service requirements. The proposed framework highlights the potential of MARL-driven UAV systems to enhance scalability, reliability, and differentiated QoS delivery in next-generation wireless networks.
comment: 19 pages, 13 figures. Under review for journal submission
Exact and Parametric Dynamical System Representation of Nonlinear Functions
Parametric representations of various functions are fundamental tools in science and engineering. This paper introduces a fixed-initial-state constant-input dynamical system (FISCIDS) representation, which provides an exact and parametric model for a broad class of nonlinear functions. A FISCIDS representation of a given nonlinear function consists of an input-affine dynamical system with a fixed initial state and constant input. The argument of the function is applied as the constant input to the input-affine system, and the value of the function is the output of the input-affine system at a fixed terminal time. We show that any differentially algebraic function has a quadratic FISCIDS representation. We also show that there exists an analytic function that is not differentially algebraic but has a quadratic FISCIDS representation. Therefore, most functions in practical problems in science and engineering can be represented by a quadratic FISCIDS representation.
comment: 7 pages, 3 figures
Bayesian Optimization for Automatic Tuning of Torque-Level Nonlinear Model Predictive Control
This paper presents an auto-tuning framework for torque-based Nonlinear Model Predictive Control (nMPC), where the MPC serves as a real-time controller for optimal joint torque commands. The MPC parameters, including cost function weights and low-level controller gains, are optimized using high-dimensional Bayesian Optimization (BO) techniques, specifically Sparse Axis-Aligned Subspace (SAASBO) with a digital twin (DT) to achieve precise end-effector trajectory real-time tracking on an UR10e robot arm. The simulation model allows efficient exploration of the high-dimensional parameter space, and it ensures safe transfer to hardware. Our simulation results demonstrate significant improvements in tracking performance (+41.9%) and reduction in solve times (-2.5%) compared to manually-tuned parameters. Moreover, experimental validation on the real robot follows the trend (with a +25.8% improvement), emphasizing the importance of digital twin-enabled automated parameter optimization for robotic operations.
comment: 6 pages, 7 figures, 3 tables
CaFTRA: Frequency-Domain Correlation-Aware Feedback-Free MIMO Transmission and Resource Allocation for 6G and Beyond
The fundamental design of wireless systems toward AI-native 6G and beyond is driven by the need for ever-increasing demand of mobile data traffic, extreme spectral efficiency, and adaptability across diverse service scenarios. To overcome the limitations posed by feedback-based multiple-input and multiple-output (MIMO) transmission, we propose a novel frequency-domain Correlation-aware Feedback-free MIMO Transmission and Resource Allocation (CaFTRA) framework tailored for fully-decoupled radio access networks (FD-RAN) to meet the emerging requirements of AI-Native 6G and beyond. By leveraging artificial intelligence (AI), CaFTRA effectively eliminates real-time uplink feedback by predicting channel state information (CSI) based solely on user geolocation. We introduce a Learnable Queries-driven Transformer Network for CSI mapping from user geolocation, which utilizes multi-head attention and learnable query embeddings to accurately capture frequency-domain correlations among resource blocks (RBs), thereby significantly improving the precision of CSI prediction. Once base stations (BSs) adopt feedback-free transmission, their downlink transmission coverage can be significantly expanded due to the elimination of frequent uplink feedback. To enable efficient resource scheduling under such extensive-coverage scenarios, we apply a low-complexity many-to-one matching theory-based algorithm for efficient multi-BS association and multi-RB resource allocation, which is proven to converge to a stable matching within limited iterations. Simulation results demonstrate that CaFTRA achieves stable matching convergence and significant gains in spectral efficiency and user fairness compared to 5G, underscoring its potential value for 6G standardization efforts.
Sample-Efficient Model-Free Policy Gradient Methods for Stochastic LQR via Robust Linear Regression
Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms - the Natural Policy Gradient and the Gauss-Newton Method - for solving the Linear Quadratic Regulator (LQR) problem in unknown stochastic linear systems. The main challenge lies in obtaining an unbiased gradient estimate from noisy data due to errors-in-variables in linear regression. This issue is addressed by employing a primal-dual estimation procedure. Using this novel gradient estimation scheme, the paper establishes convergence guarantees with a sample complexity of order O(1/epsilon). Theoretical results are further supported by numerical experiments, which demonstrate the effectiveness of the proposed algorithms.
Crossing the Sim2Real Gap Between Simulation and Ground Testing to Space Deployment of Autonomous Free-flyer Control
Reinforcement learning (RL) offers transformative potential for robotic control in space. We present the first on-orbit demonstration of RL-based autonomous control of a free-flying robot, the NASA Astrobee, aboard the International Space Station (ISS). Using NVIDIA's Omniverse physics simulator and curriculum learning, we trained a deep neural network to replace Astrobee's standard attitude and translation control, enabling it to navigate in microgravity. Our results validate a novel training pipeline that bridges the simulation-to-reality (Sim2Real) gap, utilizing a GPU-accelerated, scientific-grade simulation environment for efficient Monte Carlo RL training. This successful deployment demonstrates the feasibility of training RL policies terrestrially and transferring them to space-based applications. This paves the way for future work in In-Space Servicing, Assembly, and Manufacturing (ISAM), enabling rapid on-orbit adaptation to dynamic mission requirements.
comment: published at iSpaRo 2025
Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing
The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.
comment: iSpaRo 2025, Best Paper Award in Orbital Robotics
A Hybrid Sequential Convex Programming Framework for Unbalanced Three-Phase AC OPF
This paper presents a hybrid Sequential Convex Programming (SCP) framework for solving the unbalanced three-phase AC Optimal Power Flow (OPF) problem. The method combines a fixed McCormick outer approximation of bilinear voltage-current terms, first-order Taylor linearisations, and an adaptive trust-region constraint to preserve feasibility and promote convergence. The resulting formulation remains convex at each iteration and ensures convergence to a stationary point that satisfies the first-order Karush-Kuhn-Tucker (KKT) conditions of the nonlinear OPF. Case studies on standard IEEE feeders and a real low-voltage (LV) network in Cyprus demonstrate high numerical accuracy with optimality gap below 0.1% and up to 2x faster runtimes compared to IPOPT. These results confirm that the method is accurate and computationally efficient for large-scale unbalanced distribution networks.
Output-Constrained Controller with Fuzzy-Tuned Parameters for Overhead Cranes
This study proposes a fuzzy-adjusted nonlinear control method based on torque jitter output limit constraints for overhead crane systems with double pendulum effects. The proposed control method can effectively suppress swing and achieve precise positioning. Firstly, by enhancing the coupling relationship between the trolley displacement and swing angle, a composite signal with an error term was designed. Then, an energy-based Lyapunov function was constructed using the composite error signal, which incorporated a new formulation of the inertia matrix and potential energy function. Subsequently, using the backstepping method in conjunction with the hyperbolic tangent function, a controller with partial performance constraints was designed. In addition, to further enhance the system's dynamic performance, a fuzzy control scheme with online adjustable system parameters was designed. Finally, the stability of the system is proven using Lyapunov theory combined with LaSalle's invariance principle. Simulation results demonstrate that the proposed controller exhibits superior performance and robustness.
Covariance Control for a class of Stochastic Discrete-time Linear Systems using the S-Variable Approach
This paper deals with the problem of covariance control for a class of linear stochastic discrete-time systems in the Stochastic Model Predictive Control (SMPC) framework. The considered systems are affected by independent and identically distributed (i.i.d.) additive and parametric stochastic uncertainties (potentially unbounded), in addition to polytopic deterministic uncertainties bounding the mean of the state and input parameters. The control design conditions presented in this paper are formulated as Linear Matrix Inequalities (LMIs), using the S-variable approach in order to reduce the potential conservatism. These conditions are derived using a deterministic exact characterization of the covariance dynamics, the latter involves bilinear terms in the control gain. A technique to linearize such dynamics is presented, it results in a descriptor representation allowing to derive sufficient conditions for covariance control design. The derived condition is firstly compared to a known necessary and sufficient stability condition for systems without deterministic uncertainties and additive stochastic noise, although more conservative, it turns out to be more numerically tractable. Then, the same condition is used to design controllers that are robust to both deterministic and stochastic uncertainties. Several numerical examples are presented for comparison and illustration.
A Perception-feedback position-tracking control for quadrotors
In this paper a position-tracking controller for quadrotors based on perception feedback is developed, which directly uses measurements from onboard sensors such as low cost IMUs and GPS to generate the control commands without state estimation. Bias in gyros sensors are corrected to enhance the tracking performance. Practical stability of the origin of the tracking error system in the presence of external disturbances is proved using the Lyapunov analysis, which turns out to exponential stability in the absence of external disturbances. Numerical simulations are included to illustrate the proposed control scheme and to verify the robustness of the proposed controller under noisy measurements and parameter uncertainties.
Physics-Based Communication Compression via Lyapunov-Weighted Event-Triggered Control
Event-Triggered Control (ETC) reduces communication overhead in networked systems by transmitting only when stability requires it. Conventional mechanisms use isotropic error thresholds ($\|e\| \le σ\|x\|$), treating all directions equally. This ignores stability geometry and triggers conservatively. We propose a static directional triggering mechanism that exploits this asymmetry. By weighting errors via the Lyapunov matrix $P$, we define an anisotropic half-space scaling with instantaneous energy margins: larger deviations tolerated along stable modes, strict bounds where instability threatens. We prove global asymptotic stability and exclusion of Zeno behavior. Monte Carlo simulations ($N=100$) show 43.6\% fewer events than optimally tuned isotropic methods while achieving $2.1\times$ better control performance than time-varying alternatives. The mechanism functions as a runtime safety gate for learning-based controllers operating under communication constraints.
Resilient AFE Drive Control using Neural Networks with Tracking Guarantees
Industrial installations across several sectors have seen a dramatic increase in productivity, accuracy and efficiency over the last decade due to expanded utilization of medium voltage, variable speed power electronic converters to drive their processes. Specifically, active front-end (AFE) drives have become popular due to their ability to deliver power while maintaining safe electrical setpoints. However, under abnormal grid conditions such as phase loss, conventional AFE control may fail to enforce safety constraints, potentially leading to drive shutdown and significant financial losses. In this work, we propose using reference-tracking Performance Boosting (rPB) to improve the resilience of standard AFE control to faults. This neural-network control framework provides a principled way to optimize transient performance while preserving the steady-state tracking properties of AFE-based drives. By carefully shaping the input signals to the rPB controller, we ensure that it activates only during grid faults, leaving nominal operation unaffected. Simulation results show that the proposed approach successfully maintains the DC bus voltage and the grid current within safe limits during single-phase loss events.
Real-Time Control and Automation Framework for Acousto-Holographic Microscopy
Manual operation of microscopes for repetitive tasks in cell biology is a significant bottleneck, consuming invaluable expert time, and introducing human error. Automation is essential, and while Digital Holographic Microscopy (DHM) offers powerful, label-free quantitative phase imaging (QPI), its inherently noisy and low-contrast holograms make robust autofocus and object detection challenging. We present the design, integration, and validation of a fully automated closed-loop DHM system engineered for high-throughput mechanical characterization of biological cells. The system integrates automated serpentine scanning, real-time YOLO-based object detection, and a high-performance, multi-threaded software architecture using pinned memory and SPSC queues. This design enables the GPU-accelerated reconstruction pipeline to run fully in parallel with the 50 fps data acquisition, adding no sequential overhead. A key contribution is the validation of a robust, multi-stage holographic autofocus strategy; we demonstrate that a selected metric (based on a low-pass filter and standard deviation) provides reliable focusing for noisy holograms where conventional methods (e.g., Tenengrad, Laplacian) fail entirely. Performance analysis of the complete system identifies the 2.23-second autofocus operation-not reconstruction-as the primary throughput bottleneck, resulting in a 9.62-second analysis time per object. This work delivers a complete functional platform for autonomous DHM screening and provides a clear, data-driven path for future optimization, proposing a hybrid brightfield imaging modality to address current bottlenecks.
Market share maximizing strategies of CAV fleet operators may cause chaos in our cities
We study the dynamics and equilibria of a new kind of routing games, where players - drivers of future autonomous vehicles - may switch between individual (HDV) and collective (CAV) routing. In individual routing, just like today, drivers select routes minimizing expected travel costs, whereas in collective routing an operator centrally assigns vehicles to routes. The utility is then the average experienced travel time discounted with individually perceived attractiveness of automated driving. The market share maximising strategy amounts to offering utility greater than for individual routing to as many drivers as possible. Our theoretical contribution consists in developing a rigorous mathematical framework of individualized collective routing and studying algorithms which fleets of CAVs may use for their market-share optimization. We also define bi-level CAV - HDV equilibria and derive conditions which link the potential marketing behaviour of CAVs to the behavioural profile of the human population. Practically, we find that the fleet operator may often be able to equilibrate at full market share by simply mimicking the choices HDVs would make. In more realistic heterogenous human population settings, however, we discover that the market-share maximizing fleet controller should use highly variable mixed strategies as a means to attract or retain customers. The reason is that in mixed routing the powerful group player can control which vehicles are routed via congested and uncongested alternatives. The congestion pattern generated by CAVs is, however, not known to HDVs before departure and so HDVs cannot select faster routes and face huge uncertainty whichever alternative they choose. Consequently, mixed market-share maximising fleet strategies resulting in unpredictable day-to-day driving conditions may, alarmingly, become pervasive in our future cities.
comment: 34 pages, 9 figures
Left shifting analysis of Human-Autonomous Team interactions to analyse risks of autonomy in high-stakes AI systems
Developing high-stakes autonomous systems that include Artificial Intelligence (AI) components is complex; the consequences of errors can be catastrophic, yet it is challenging to plan for all operational cases. In stressful scenarios for the human operator, such as short decision-making timescales, the risk of failures is exacerbated. A lack of understanding of AI failure modes obstructs this and so blocks the robust implementation of applications of AI in smart systems. This prevents early risk identification, leading to increased time, risk and cost of projects. A key tenet of Systems Engineering and acquisition engineering is centred around a "left-shift" in test and evaluation activities to earlier in the system lifecycle, to allow for "accelerated delivery of [systems] that work". We argue it is therefore essential that this shift includes the analysis of AI failure cases as part of the design stages of the system life cycle. Our proposed framework enables the early characterisation of risks emerging from human-autonomy teaming (HAT) in operational contexts. The cornerstone of this is a new analysis of AI failure modes, built on the seminal modelling of human-autonomy teams laid out by LaMonica et al., 2022. Using the analysis of the interactions between human and autonomous systems and exploring the failure modes within each aspect, our approach provides a way to systematically identify human-AI interactions risks across the operational domain of the system of interest. The understanding of the emergent behaviour enables increased robustness of the system, for which the analysis should be undertaken over the whole scope of its operational design domain. This approach is illustrated through an example use case for an AI assistant supporting a Command & Control (C2) System.
comment: Published in: IfSE Annual Systems Engineering Conference Proceedings 2025
Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion
Human motor control remains agile and robust despite limited sensory information for feedback, a property attributed to the body's ability to perform morphological computation through muscle coordination with variable impedance. However, it remains unclear how such low-level mechanical computation reduces the control requirements of the high-level controller. In this study, we implement a hierarchical controller consisting of a high-level neural network trained by reinforcement learning and a low-level variable-impedance muscle coor dination model with mono- and biarticular muscles in monoped locomotion task. We systematically restrict the high-level controller by varying the control frequency and by introducing biologically inspired observation conditions: delayed, partial, and substituted observation. Under these conditions, we evaluate how the low-level variable-impedance muscle coordination contributes to learning process of high-level neural network. The results show that variable-impedance muscle coordination enables stable locomotion even under slow-rate control frequency and limited observation conditions. These findings demonstrate that the morphological computation of muscle coordination effectively offloads high-frequency feedback of the high-level controller and provide a design principle for the controller in motor control.
comment: 12 pages, 11 figures. Submitted to IEEE Transactions on Systems, Man, and Cybernetics: Systems
PerFACT: Motion Policy with LLM-Powered Dataset Synthesis and Fusion Action-Chunking Transformers
Deep learning methods have significantly enhanced motion planning for robotic manipulators by leveraging prior experiences within planning datasets. However, state-of-the-art neural motion planners are primarily trained on small datasets collected in manually generated workspaces, limiting their generalizability to out-of-distribution scenarios. Additionally, these planners often rely on monolithic network architectures that struggle to encode critical planning information. To address these challenges, we introduce Motion Policy with Dataset Synthesis powered by large language models (LLMs) and Fusion Action-Chunking Transformers (PerFACT), which incorporates two key components. Firstly, a novel LLM-powered workspace generation method, MotionGeneralizer, enables large-scale planning data collection by producing a diverse set of semantically feasible workspaces. Secondly, we introduce Fusion Motion Policy Networks (MpiNetsFusion), a generalist neural motion planner that uses a fusion action-chunking transformer to better encode planning signals and attend to multiple feature modalities. Leveraging MotionGeneralizer, we collect 3.5M trajectories to train and evaluate MpiNetsFusion against state-of-the-art planners, which shows that the proposed MpiNetsFusion can plan several times faster on the evaluated tasks.
Quantum Encrypted Control of Networked Systems
Encrypted control has been extensively studied to ensure the confidentiality of system states and control inputs for networked control systems. This paper presents a computationally efficient encrypted control framework for networked systems enabled by quantum communication. A quantum channel between sensors and actuators is used to generate identical secret keys, whose security is further enhanced through quantum key distribution. These keys enable lightweight encryption and decryption while preserving confidentiality and control accuracy. We develop a novel encryption-decryption architecture for state-feedback control of linear systems based on quantum keys, and characterize the impact of quantum state errors on closed-loop stability. In particular, we establish the existence of a critical threshold on intrinsic quantum noise below which stability is guaranteed. In contrast to classical encrypted control schemes, which may collapse under a single key-bit error, the proposed quantum encrypted control exhibits strong robustness to key imperfections. We further adopt quantization techniques to address the scenarios with limited communication bits in practical situations, and implement privacy protection for quantum keys based on a stochastic quantizer. These results demonstrate that integrating quantum technologies into control systems in a nontrivial and principled manner, even at their current level of maturity, can yield substantial performance gains in reducing computational complexity and improving resilience to key errors while ensuring security against multiple eavesdropping sources.
comment: 24 pages
Universal Quantum Interconnects via Phase-Coherent Four-Wave Mixing
Quantum transduction, which enables the coherent conversion of quantum information between disparate physical platforms, is a cornerstone for realizing scalable and interoperable quantum networks. Among various approaches, parametric frequency mixing processes such as four-wave mixing (FWM) offer a promising pathway toward efficient and low-noise transduction. In this work, we demonstrate the feasibility of coherent quantum state transfer by indirectly verifying high-fidelity wavefunction's phase mapping (>99%) from the input field to the generated output field wave. Using a gas-filled hollow-core capillary fiber, we systematically investigate spectral phase evolution across a broad range, including infrared (IR) to ultraviolet (UV) transitions, as well as conversions from telecom-band (1550 nm) to visible (516 nm) and deep-UV (308 nm) wavelengths. Our results reveal that strong phase coherence can be maintained throughout these diverse conversion regimes. Because quantum properties such as coherence and entanglement are intrinsically encoded in both the amplitude and phase of a photonic wavefunction, preserving spectral phase is essential for faithful quantum information transfer. We further show that efficient and phase-preserving transduction can be achieved by tuning system parameters, offering valuable insights into nonlinear coupling dynamics. These findings establish a promising foundation for advancing FWM-based quantum transduction schemes and open new avenues for integrating heterogeneous quantum systems across wide spectral domains within future quantum communication networks.
Experimental Sensitivity Enhancement of a Quantum Rydberg Atom-Based RF Receiver with a Metamaterial GRIN Lens
We experimentally demonstrate enhanced sensitivity of an atom-based Rydberg radio frequency (RF) receiver integrated with a gradient refractive index (GRIN) Luneburg-type metamaterial lens. By analyzing the electromagnetically induced transparency (EIT) effect in Cesium vapor, we compare receiver performance with and without the GRIN lens under a 2.2~GHz and a 3.6~GHz far-field excitation. Our measurements reveal a significant amplification of the EIT transparency window when the lens is introduced, consistent with the theoretical prediction that the local E-field enhancement at the vapor cell reduces the minimum detectable electric field and increases the signal-to-noise ratio (SNR) of the Rydberg RF receiver. This experimental validation highlights the potential of metamaterial-assisted quantum sensing to overcome the inherent bandwidth and sensitivity limitations of bare Rydberg receivers for a variety of applications, such as electromagnetic compatibility (EMC) testing, quantum radar, and wireless communications.
Sliding Mode Control and Subspace Stabilization Methodology for the Orbital Stabilization of Periodic Trajectories
This paper presents a combined sliding-mode control and subspace stabilization methodology for orbital stabilization of periodic trajectories in underactuated mechanical systems with one degree of underactuation. The approach starts with partial feedback linearization and stabilization. Then, transverse linearization along the reference orbit is computed, resulting in a periodic linear time-varying system with a stable subspace. Sliding-mode control drives trajectories toward this subspace. The proposed design avoids solving computationally intensive periodic LQR problems and improves robustness to matched disturbances. The methodology is validated through experiments on the Butterfly robot.
Stability of Lyapunov redesign trajectory tracking control with unbounded perturbations -- A tube-based stability analysis
Considering a nonlinear system in Byrnes-Isidori form that is subject to unbounded perturbations, we apply Lyapunov redesign via feedback linearisation for trajectory tracking. Leveraging the ideas of tube-based geometric characterisation of the invariance properties of the closed loop, we generalise the classical stability criterion from the~literature from constant to nonconstant reference trajectories. The proposed analysis is tailored to the Lyapunov redesign and the tracking problem insofar as we incorporate the reference trajectory and the transient decrease of the tracking error enforced by the controller. In particular, we exploit that the Lyapunov function of the tracking error satisfies a differential inequality, thereby guaranteeing that the solution of the closed loop remains in a contracting tube along the reference trajectory.
Configuration-Constrained Tube MPC for Periodic Operation
Periodic operation often emerges as the economically optimal mode in industrial processes, particularly under varying economic or environmental conditions. This paper proposes a robust model predictive control (MPC) framework for uncertain systems modeled as polytopic linear differential inclusions (LDIs), where the dynamics evolve as convex combinations of finitely many affine control systems with additive disturbances. The robust control problem is reformulated as a convex optimization program by optimizing over configuration-constrained polytopic tubes and tracks a periodic trajectory that is optimal for a given economic criterion. Artificial variables embedded in the formulation ensure recursive feasibility and robust constraint satisfaction when the economic criterion is updated online, while guaranteeing convergence to the corresponding optimal periodic tube when the criterion remains constant. To improve computational efficiency, we introduce a quadratic over-approximation of the periodic cost under a Lipschitz continuity assumption, yielding a Quadratic Program (QP) formulation that preserves the above theoretical guarantees. The effectiveness and scalability of the approach are demonstrated on a benchmark example and a ball-plate system with eight states.
comment: 11 pages, 3 figures, submitted for IEEE-TACON
Probabilistic Safety under Arbitrary Disturbance Distributions using Piecewise-Affine Control Barrier Functions
We propose a simple safety filter design for stochastic discrete-time systems based on piecewise affine probabilistic control barrier functions, providing an appealing balance between modeling flexibility and computational complexity. Exact evaluation of the safety filter consists of solving a mixed-integer quadratic program (MIQP) if the dynamics are control-affine (or a mixed-integer nonlinear program in general). We propose a heuristic search method that replaces this by a small number of small-scale quadratic programs (QPs), or nonlinear programs (NLPs) respectively. The proposed approach provides a flexible framework in which arbitrary (data-driven) quantile estimators can be used to bound the probability of safety violations. Through extensive numerical experiments, we demonstrate improvements in conservatism and computation time with respect to existing methods, and we illustrate the flexibility of the method for modeling complex safety sets. Supplementary material can be found at https://mathijssch.github.io/ecc26-supplementary/.
PINN vs LSTM: A Comparative Study for Steam Temperature Control in Heat Recovery Steam Generators
This paper introduces a direct comparative study of Physics-Informed Neural Networks (PINNs) and Long Short-Term Memory (LSTM) networks for adaptive steam temperature control in Heat Recovery Steam Generators (HRSGs), particularly under valve leakage faults. Maintaining precise steam temperature in HRSGs is critical for efficiency and safety, yet traditional control strategies struggle with nonlinear, fault-induced dynamics. Both architectures are designed to adaptively tune the gains of a PI-plus-feedforward control law in real-time. The LSTM controller, a purely data-driven approach, was trained offline on historical operational data, while the PINN controller integrates fundamental thermodynamic laws directly into its online learning process through a physics-based loss function. Their performance was evaluated using a model validated with data from a combined cycle power plant, under normal load changes and a challenging valve leakage fault scenario. Results demonstrate that while the LSTM controller offers significant improvement over conventional methods, its performance degrades under the unseen fault. The PINN controller consistently delivered superior robustness and performance, achieving a 54\% reduction in integral absolute error compared to the LSTM under fault conditions. This study concludes that embedding physical knowledge into data-driven control is essential for developing reliable, fault-tolerant autonomous control systems in complex industrial applications.
Adaptive Parameter Control Using AAN for Lower Limb Rehabilitation Exoskeletons
Exoskeletons play a crucial role in assisting patients with varying mobility levels during rehabilitation. However, existing control strategies face challenges such as imprecise trajectory tracking, interaction torque oscillations, and limited adaptability to diverse patient conditions. To address these issues, this paper proposes an assist-as-needed (AAN) control algorithm that integrates a human-robot coupling dynamics model, a human torque-momentum observer (HTMO), and an adaptive parameter controller (APC). The algorithm first employs inverse dynamics to compute the joint torques required for the rehabilitation trajectory. The HTMO then estimates the torque exerted by the patient's joints and determines the torque error, which the exoskeleton compensates for via a spring-damper system, ultimately generating the target trajectory. Finally, the APC ensures adaptive assistive control. The proposed method is validated for its effectiveness in MATLAB/Simulink.
comment: 6 pages, 7 figures, 2 tables. Proceedings of the 44th Chinese Control Conference, July 28-30, 2025, Chongqing, China
Computationally Efficient Signal Detection with Unknown Bandwidths
Signal detection in environments with unknown signal bandwidth and time intervals is a fundamental problem in adversarial and spectrum-sharing scenarios. This paper addresses the problem of detecting signals occupying unknown degrees of freedom from non-coherent power measurements, where the signal is constrained to an interval in one dimension or a hyper-cube in multiple dimensions. A GLRT is derived, resulting in a straightforward metric involving normalized average signal energy for each candidate signal set. We present bounds on false alarm and missed detection probabilities, demonstrating their dependence on SNR and signal set sizes. To overcome the inherent computational complexity of exhaustive searches, we propose a computationally efficient binary search method, reducing the complexity from O(N^2) to O(N) for one-dimensional cases. Simulations indicate that the method maintains performance near exhaustive searches and achieves asymptotic consistency, with interval-of-overlap converging to one under constant SNR as measurement size increases. The simulation studies also demonstrate superior performance and reduced complexity compared to contemporary neural network-based approaches, specifically outperforming custom-trained U-Net models in spectrum detection tasks.
comment: Submitted to the IEEE Open Journal of the Communications Society
Anti-bullying Adaptive Cruise Control: A proactive right-of-way protection approach
Adaptive Cruise Control (ACC) systems have been widely commercialized in recent years. However, existing ACC systems remain vulnerable to close-range cut-ins, a behavior that resembles "road bullying". To address this issue, this research proposes an Anti-bullying Adaptive Cruise Control (AACC) approach, which is capable of proactively protecting right-of-way against such "road bullying" cut-ins. To handle diverse "road bullying" cut-in scenarios smoothly, the proposed approach first leverages an online Inverse Optimal Control (IOC) based algorithm for individual driving style identification. Then, based on Stackelberg competition, a game-theoretic-based motion planning framework is presented in which the identified individual driving styles are utilized to formulate cut-in vehicles' reaction functions. By integrating such reaction functions into the ego vehicle's motion planning, the ego vehicle could consider cut-in vehicles' all possible reactions to find its optimal right-of-way protection maneuver. To the best of our knowledge, this research is the first to model vehicles' interaction dynamics and develop an interactive planner that adapts cut-in vehicle's various driving styles. Simulation results show that the proposed approach can prevent "road bullying" cut-ins and be adaptive to different cut-in vehicles' driving styles. It can improve safety and comfort by up to 79.8% and 20.4%. The driving efficiency has benefits by up to 19.33% in traffic flow. The proposed approach can also adopt more flexible driving strategies. Furthermore, the proposed approach can support real-time field implementation by ensuring less than 50 milliseconds computation time.
comment: 15 pages, 19 figures
Welfare and Cost Aggregation for Multi-Agent Control: When to Choose Which Social Cost Function, and Why?
Many multi-agent socio-technical systems rely on aggregating heterogeneous agents' costs into a social cost function (SCF) to coordinate resource allocation in domains like energy grids, water allocation, or traffic management. The choice of SCF often entails implicit assumptions and may lead to undesirable outcomes if not rigorously justified. In this paper, we demonstrate that what determines which SCF ought to be used is the degree to which individual costs can be compared across agents and which axioms the aggregation shall fulfill. Drawing on the results from social choice theory, we provide guidance on how this process can be used in control applications. We demonstrate which assumptions about interpersonal utility comparability - ranging from ordinal level comparability to full cardinal comparability - together with a choice of desirable axioms, inform the selection of a correct SCF, be it the classical utilitarian sum, the Nash SCF, or maximin. We then demonstrate how the proposed framework can be applied for principled allocations of water and transportation resources.
Nonlinear Oscillatory Response of Automated Vehicle Car-following: Theoretical Analysis with Traffic State and Control Input Limits
This paper presents a framework grounded in the theory of describing function (DF) and incremental-input DF to theoretically analyze the nonlinear oscillatory response of automated vehicles (AVs) car-following (CF) amidst traffic oscillations, considering the limits of traffic state and control input. While prevailing approaches largely ignore these limits (i.e., saturation of acceleration/deceleration and speed) and focus on linear string stability analysis, this framework establishes a basis for theoretically analyzing the frequency response of AV systems with nonlinearities imposed by these limits. To this end, trajectories of CF pairs are decomposed into nominal and oscillatory trajectories, subsequently, the controlled AV system is repositioned within the oscillatory trajectory coordinates. Built on this base, DFs are employed to approximate the frequency responses of nonlinear saturation components by using their first harmonic output, thereby capturing the associated amplification ratio and phase shift. Considering the closed-loop nature of AV control systems, where system states and control input mutually influence each other, amplification ratios and phase shifts are balanced within the loop to ensure consistency. This balancing process may render multiple solutions, hence the incremental-input DF is further applied to identify the reasonable ones. The proposed method is validated by estimations from Simulink, and further comparisons with prevailing methods are conducted. Results confirm the alignment of our framework with Simulink results and exhibit its superior accuracy in analysis compared to the prevailing methods. Furthermore, the framework proves valuable in string stability analysis, especially when conventional linear methods offer misleading insights.
A Unified Bayesian Framework for Stochastic Data-Driven Smoothing, Prediction, and Control
Extending data-driven algorithms based on Willems' fundamental lemma to stochastic data often requires empirical and customized workarounds. This work presents a unified Bayesian framework that provides a systematic and general method for handling stochastic data-driven tasks, including smoothing, prediction, and control, via maximum a posteriori estimation. This framework formulates a unified trajectory estimation problem for the three tasks by specifying different types of trajectory knowledge. Then, a Bayesian problem is solved that optimally combines trajectory knowledge with a data-driven characterization of the trajectory from offline data for a general class of stochastic disturbances. Under specific conditions, this problem is shown to generalize existing data-driven prediction and control algorithms. Numerical examples demonstrate the performance of the unified approach for all three tasks against other data-driven and system identification approaches.
Marginalize, Rather than Impute: Probabilistic Wind Power Forecasting with Incomplete Data
Machine learning methods are widely and successfully used for probabilistic wind power forecasting, yet the pervasive issue of missing values (e.g., due to sensor faults or communication outages) has received limited attention. The prevailing practice is impute-then-predict, but conditioning on point imputations biases parameter estimates and fails to propagate uncertainty from missing features. Our approach treats missing features and forecast targets uniformly: we learn a joint generative model of features and targets from incomplete data and, at operational deployment, condition on the observed features and marginalize the unobserved ones to produce forecasts. This imputation-free procedure avoids error introduced by imputation and preserves uncertainty aroused from missing features. In experiments, it improves forecast quality in terms of continuous ranked probability score relative to impute-then-predict baselines while incurring substantially lower computational cost than common alternatives.
comment: Submitted to INFORMS Journal on Data Science
MathBode: Measuring the Stability of LLM Reasoning using Frequency Response
This paper presents MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs). Instead of one-shot accuracy, MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions. This yields interpretable, frequency-resolved metrics -- gain (amplitude tracking) and phase (lag) -- that form Bode-style fingerprints. Across five closed-form families (linear solve, ratio/saturation, compound interest, 2x2 linear systems, similar triangles), the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures. We compare several models against a symbolic baseline that calibrates the instrument ($G \approx 1$, $φ\approx 0$). Results separate frontier from mid-tier models on dynamics, providing a compact, reproducible protocol that complements standard benchmarks with actionable measurements of reasoning fidelity and consistency. We open-source the dataset and code to enable further research and adoption.
Quaternion-Based Sliding Mode Control for Six Degrees of Freedom Flight Control of Quadrotors
Despite extensive research on sliding mode control (SMC) design for quadrotors, the existing approaches suffer from certain limitations. Euler angle-based SMC formulations suffer from poor performance in high-pitch or -roll maneuvers. Quaternion-based SMC approaches have unwinding issues and complex architecture. Coordinate-free methods are slow and only almost globally stable. This paper presents a new six degrees of freedom SMC flight controller to address the above limitations. We use a cascaded architecture with a position controller in the outer loop and a quaternion-based attitude controller in the inner loop. The position controller generates the desired trajectory for the attitude controller using a coordinate-free approach. The quaternion-based attitude controller uses the natural characteristics of the quaternion hypersphere, featuring a simple structure while providing global stability and avoiding unwinding issues. We compare our controller with three other common control methods conducting challenging maneuvers like flip-over and high-speed trajectory tracking in the presence of model uncertainties and disturbances. Our controller consistently outperforms the benchmark approaches with less control effort and actuator saturation, offering highly effective and efficient flight control.
Fleet Size and Mix Capacitated Vehicle Routing Problem with Time Windows for Mobile Fast Chargers SP
The electrification of off-road heavy equipment presents operational challenges for agencies serving remote sites with limited fixed charging infrastructure. Existing mobile fast charging vehicle (MFCV) planning approaches typically treat fleet design and routing as separate problems, fixing vehicle characteristics before dispatch. This paper formulates a fleet size and mix capacitated vehicle routing problem with time windows (FSMCVRPTW) for MFCV deployment, jointly optimizing fleet composition, charger specifications, routing, and scheduling within a unified mixed-integer linear program. The model incorporates heterogeneous MFCV types with varying power ratings, battery capacities, fuel range, and cost structures, minimizing total daily cost from labor, fuel, amortized capital expenditure, and energy purchase under temporal service windows, resource budgets, and energy-delivery constraints. The formulation is implemented in Python/Gurobi and applied to two case studies using California Department of Transportation wheel-loader data in Los Angeles (dense urban) and Truckee (sparse mountainous). Results show that simultaneous optimization yields compact, well-utilized fleets that meet all service windows while revealing strong sensitivity of unit cost to demand density and geography. The proposed FSMCVRPTW framework provides a generalizable decision-support methodology that co-designs fleet size, charger power, routing, and service schedules in a single optimization layer for context-aware, cost-efficient mobile fast charging.
comment: 10 pages, submitted to IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION
Pricing Problems in Adoption of New Technologies
We propose a generalization of the Bass diffusion model in discrete-time that explicitly models the effect of price in adoption. Our model is different from earlier price-incorporated models and fits well to adoption data for various products. We then utilize this model to study two decision-making problems. First, we provide a series of structural results on optimal pricing strategies to maximize profits from product sales by a monopolist over a finite horizon. We fully characterize the optimal pricing strategy in the single-period problem, and establish several structural properties of the same for the multi-period counterpart. Second, we study a Stackelberg game between a policy-maker and a monopolist, where the former seeks to maximize adoption through rebates, while the latter focuses on profits. For this problem, we analytically characterize crucial properties of the equilibrium path of the single-period game, and demonstrate how they carry over to the multi-period variant.
Robotics
Video2Act: A Dual-System Video Diffusion Policy with Robotic Spatio-Motional Modeling
Robust perception and dynamics modeling are fundamental to real-world robotic policy learning. Recent methods employ video diffusion models (VDMs) to enhance robotic policies, improving their understanding and modeling of the physical world. However, existing approaches overlook the coherent and physically consistent motion representations inherently encoded across frames in VDMs. To this end, we propose Video2Act, a framework that efficiently guides robotic action learning by explicitly integrating spatial and motion-aware representations. Building on the inherent representations of VDMs, we extract foreground boundaries and inter-frame motion variations while filtering out background noise and task-irrelevant biases. These refined representations are then used as additional conditioning inputs to a diffusion transformer (DiT) action head, enabling it to reason about what to manipulate and how to move. To mitigate inference inefficiency, we propose an asynchronous dual-system design, where the VDM functions as the slow System 2 and the DiT head as the fast System 1, working collaboratively to generate adaptive actions. By providing motion-aware conditions to System 1, Video2Act maintains stable manipulation even with low-frequency updates from the VDM. For evaluation, Video2Act surpasses previous state-of-the-art VLA methods by 7.7% in simulation and 21.7% in real-world tasks in terms of average success rate, further exhibiting strong generalization capabilities.
SMP: Reusable Score-Matching Motion Priors for Physics-Based Character Control
Data-driven motion priors that can guide agents toward producing naturalistic behaviors play a pivotal role in creating life-like virtual characters. Adversarial imitation learning has been a highly effective method for learning motion priors from reference motion data. However, adversarial priors, with few exceptions, need to be retrained for each new controller, thereby limiting their reusability and necessitating the retention of the reference motion data when training on downstream tasks. In this work, we present Score-Matching Motion Priors (SMP), which leverages pre-trained motion diffusion models and score distillation sampling (SDS) to create reusable task-agnostic motion priors. SMPs can be pre-trained on a motion dataset, independent of any control policy or task. Once trained, SMPs can be kept frozen and reused as general-purpose reward functions to train policies to produce naturalistic behaviors for downstream tasks. We show that a general motion prior trained on large-scale datasets can be repurposed into a variety of style-specific priors. Furthermore SMP can compose different styles to synthesize new styles not present in the original dataset. Our method produces high-quality motion comparable to state-of-the-art adversarial imitation learning methods through reusable and modular motion priors. We demonstrate the effectiveness of SMP across a diverse suite of control tasks with physically simulated humanoid characters. Video demo available at https://youtu.be/ravlZJteS20
comment: 14 pages, 9 figures
SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting
LiDAR-captured point clouds are often considered the gold standard in active 3D reconstruction. While their accuracy is exceptional in flat regions, the capturing is susceptible to miss small geometric structures and may fail with dark, absorbent materials. Alternatively, capturing multiple photos of the scene and applying 3D photogrammetry can infer these details as they often represent feature-rich regions. However, the accuracy of LiDAR for featureless regions is rarely reached. Therefore, we suggest combining the strengths of LiDAR and camera-based capture by introducing SurfFill: a Gaussian surfel-based LiDAR completion scheme. We analyze LiDAR capturings and attribute LiDAR beam divergence as a main factor for artifacts, manifesting mostly at thin structures and edges. We use this insight to introduce an ambiguity heuristic for completed scans by evaluating the change in density in the point cloud. This allows us to identify points close to missed areas, which we can then use to grow additional points from to complete the scan. For this point growing, we constrain Gaussian surfel reconstruction [Huang et al. 2024] to focus optimization and densification on these ambiguous areas. Finally, Gaussian primitives of the reconstruction in ambiguous areas are extracted and sampled for points to complete the point cloud. To address the challenges of large-scale reconstruction, we extend this pipeline with a divide-and-conquer scheme for building-sized point cloud completion. We evaluate on the task of LiDAR point cloud completion of synthetic and real-world scenes and find that our method outperforms previous reconstruction methods.
comment: Project page: https://lfranke.github.io/surffill
U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences
Modeling dynamic 3D environments from LiDAR sequences is central to building reliable 4D worlds for autonomous driving and embodied AI. Existing generative frameworks, however, often treat all spatial regions uniformly, overlooking the varying uncertainty across real-world scenes. This uniform generation leads to artifacts in complex or ambiguous regions, limiting realism and temporal stability. In this work, we present U4D, an uncertainty-aware framework for 4D LiDAR world modeling. Our approach first estimates spatial uncertainty maps from a pretrained segmentation model to localize semantically challenging regions. It then performs generation in a "hard-to-easy" manner through two sequential stages: (1) uncertainty-region modeling, which reconstructs high-entropy regions with fine geometric fidelity, and (2) uncertainty-conditioned completion, which synthesizes the remaining areas under learned structural priors. To further ensure temporal coherence, U4D incorporates a mixture of spatio-temporal (MoST) block that adaptively fuses spatial and temporal representations during diffusion. Extensive experiments show that U4D produces geometrically faithful and temporally consistent LiDAR sequences, advancing the reliability of 4D world modeling for autonomous perception and simulation.
comment: Preprint; 19 pages, 7 figures, 8 tables
BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection AAAI26
Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because of the fundamental disparity in geometric accuracy between these sensors, indiscriminate fusion in previous methods often leads to degraded performance. In this paper, we propose BEVDilation, a novel LiDAR-centric framework that prioritizes LiDAR information in the fusion. By formulating image BEV features as implicit guidance rather than naive concatenation, our strategy effectively alleviates the spatial misalignment caused by image depth estimation errors. Furthermore, the image guidance can effectively help the LiDAR-centric paradigm to address the sparsity and semantic limitations of point clouds. Specifically, we propose a Sparse Voxel Dilation Block that mitigates the inherent point sparsity by densifying foreground voxels through image priors. Moreover, we introduce a Semantic-Guided BEV Dilation Block to enhance the LiDAR feature diffusion processing with image semantic guidance and long-range context capture. On the challenging nuScenes benchmark, BEVDilation achieves better performance than state-of-the-art methods while maintaining competitive computational efficiency. Importantly, our LiDAR-centric strategy demonstrates greater robustness to depth noise compared to naive fusion. The source code is available at https://github.com/gwenzhang/BEVDilation.
comment: Accept by AAAI26
Experimental Characterization of Fingertip Trajectory following for a 3-DoF Series-Parallel Hybrid Robotic Finger
Task-space control of robotic fingers is a critical enabler of dexterous manipulation, as manipulation objectives are most naturally specified in terms of fingertip motions and applied forces rather than individual joint angles. While task-space planning and control have been extensively studied for larger, arm-scale manipulators, demonstrations of precise task-space trajectory tracking in compact, multi-DoF robotic fingers remain scarce. In this paper, we present the physical prototyping and experimental characterization of a three-degree-of-freedom, linkage-driven, series-parallel robotic finger with analytic forward kinematics and a closed-form Jacobian. A resolved motion rate control (RMRC) scheme is implemented to achieve closed-loop task-space trajectory tracking. We experimentally evaluate the fingertip tracking performance across a variety of trajectories, including straight lines, circles, and more complex curves, and report millimeter-level accuracy. To the best of our knowledge, this work provides one of the first systematic experimental demonstrations of precise task-space trajectory tracking in a linkage-driven robotic finger, thereby establishing a benchmark for future designs aimed at dexterous in-hand manipulation.
VLA Models Are More Generalizable Than You Think: Revisiting Physical and Spatial Modeling
Vision-language-action (VLA) models achieve strong in-distribution performance but degrade sharply under novel camera viewpoints and visual perturbations. We show that this brittleness primarily arises from misalignment in Spatial Modeling, rather than Physical Modeling. To address this, we propose a one-shot adaptation framework that recalibrates visual representations through lightweight, learnable updates. Our first method, Feature Token Modulation (FTM), applies a global affine transformation to visual tokens and improves Libero viewpoint accuracy from 48.5% to 87.1% with only 4K parameters. Building on this, Feature Linear Adaptation (FLA) introduces low-rank updates to the ViT encoder, achieving 90.8% success with 4.7M parameters -- matching LoRA-scale finetuning at far lower cost. Together, these results reveal substantial untapped robustness in pretrained VLA models and demonstrate that targeted, minimal visual adaptation is sufficient to restore viewpoint generalization.
Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models
This work presents a systematic investigation into how alternative LiDAR-to-image projections affect metric place recognition when coupled with a state-of-the-art vision foundation model. We introduce a modular retrieval pipeline that controls for backbone, aggregation, and evaluation protocol, thereby isolating the influence of the 2-D projection itself. Using consistent geometric and structural channels across multiple datasets and deployment scenarios, we identify the projection characteristics that most strongly determine discriminative power, robustness to environmental variation, and suitability for real-time autonomy. Experiments with different datasets, including integration into an operational place recognition policy, validate the practical relevance of these findings and demonstrate that carefully designed projections can serve as an effective surrogate for end-to-end 3-D learning in LiDAR place recognition.
comment: 13 Pages, 5 Figures, 2 Tables Under Review
SwarmDiffusion: End-To-End Traversability-Guided Diffusion for Embodiment-Agnostic Navigation of Heterogeneous Robots
Visual traversability estimation is critical for autonomous navigation, but existing VLM-based methods rely on hand-crafted prompts, generalize poorly across embodiments, and output only traversability maps, leaving trajectory generation to slow external planners. We propose SwarmDiffusion, a lightweight end-to-end diffusion model that jointly predicts traversability and generates a feasible trajectory from a single RGB image. To remove the need for annotated or planner-produced paths, we introduce a planner-free trajectory construction pipeline based on randomized waypoint sampling, Bezier smoothing, and regularization enforcing connectivity, safety, directionality, and path thinness. This enables learning stable motion priors without demonstrations. SwarmDiffusion leverages VLM-derived supervision without prompt engineering and conditions the diffusion process on a compact embodiment state, producing physically consistent, traversable paths that transfer across different robot platforms. Across indoor environments and two embodiments (quadruped and aerial), the method achieves 80-100\% navigation success and 0.09 s inference, and adapts to a new robot using only-500 additional visual samples. It generalizes reliably to unseen environments in simulation and real-world trials, offering a scalable, prompt-free approach to unified traversability reasoning and trajectory generation.
comment: This work has been submitted for publication and is currently under review
VLM as Strategist: Adaptive Generation of Safety-critical Testing Scenarios via Guided Diffusion
The safe deployment of autonomous driving systems (ADSs) relies on comprehensive testing and evaluation. However, safety-critical scenarios that can effectively expose system vulnerabilities are extremely sparse in the real world. Existing scenario generation methods face challenges in efficiently constructing long-tail scenarios that ensure fidelity, criticality, and interactivity, while particularly lacking real-time dynamic response capabilities to the vehicle under test (VUT). To address these challenges, this paper proposes a safety-critical testing scenario generation framework that integrates the high-level semantic understanding capabilities of Vision Language Models (VLMs) with the fine-grained generation capabilities of adaptive guided diffusion models. The framework establishes a three-layer hierarchical architecture comprising a strategic layer for VLM-directed scenario generation objective determination, a tactical layer for guidance function formulation, and an operational layer for guided diffusion execution. We first establish a high-quality fundamental diffusion model that learns the data distribution of real driving scenarios. Next, we design an adaptive guided diffusion method that enables real-time, precise control of background vehicles (BVs) in closed-loop simulation. The VLM is then incorporated to autonomously generate scenario generation objectives and guidance functions through deep scenario understanding and risk reasoning, ultimately guiding the diffusion model to achieve VLM-directed scenario generation. Experimental results demonstrate that the proposed method can efficiently generate realistic, diverse, and highly interactive safety-critical testing scenarios. Furthermore, case studies validate the adaptability and VLM-directed generation performance of the proposed method.
comment: 25 pages, 9 figures
Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach
Vision-Language-Action (VLA) models, trained via flow-matching or diffusion objectives, excel at learning complex behaviors from large-scale, multi-modal datasets (e.g., human teleoperation, scripted policies). However, since VLAs incorporate diverse data modes in the pre-training stage, and the finetuning dataset often contains demonstration data collected in a kinematically suboptimal or undesirable way, it exists redundant action modes that are irrelevant to the success action modes of the downstream task. Specifically, we observe a critical inference-time fragility among various sampled noises after supervised finetuning of pre-trained VLAs. In this paper, we attribute this instability to the distribution shift between the VLA policy and the policy induced by stable success modes of the downstream task dataset. Thus, we propose \textbf{TACO}, a test-time-scaling (TTS) framework that applies a lightweight pseudo-count estimator as a high-fidelity verifier of action chunks. The VLA models integrated with TACO can execute the actions with maximum pseudo-count from all sampled action chunks, thereby preventing distribution shifts while preserving the generalization ability of VLAs since the constraint is applied only during inference. Our method resembles the classical anti-exploration principle in offline reinforcement learning (RL), and being gradient-free, it incurs significant computational benefits compared to RL update, especially for flow or diffusion-based VLAs which are difficult to perform RL update due to denoising process. Extensive experiments across four simulation benchmarks (RoboTwin2.0, Robotwin, LIBERO, SimplerEnv) and a dual-arm platform demonstrate that our method significantly improves the inference stability and success rates in downstream-task adaptations.
comment: The first two authors contributed equally. Yang Zhang leads the whole project
Phase-Adaptive LLM Framework with Multi-Stage Validation for Construction Robot Task Allocation: A Systematic Benchmark Against Traditional Optimization Algorithms
Multi-robot task allocation in construction automation has traditionally relied on optimization methods such as Dynamic Programming and Reinforcement Learning. This research introduces the LangGraph-based Task Allocation Agent (LTAA), an LLM-driven framework that integrates phase-adaptive allocation strategies, multi-stage validation with hierarchical retries, and dynamic prompting for efficient robot coordination. Although recent LLM approaches show potential for construction robotics, they largely lack rigorous validation and benchmarking against established algorithms. This paper presents the first systematic comparison of LLM-based task allocation with traditional methods in construction scenarios.The study validates LLM feasibility through SMART-LLM replication and addresses implementation challenges using a Self-Corrective Agent Architecture. LTAA leverages natural-language reasoning combined with structured validation mechanisms, achieving major computational gains reducing token usage by 94.6% and allocation time by 86% through dynamic prompting. The framework adjusts its strategy across phases: emphasizing execution feasibility early and workload balance in later allocations.The authors evaluate LTAA against Dynamic Programming, Q-learning, and Deep Q-Network (DQN) baselines using construction operations from the TEACh human-robot collaboration dataset. In the Heavy Excels setting, where robots have strong task specializations, LTAA achieves 77% task completion with superior workload balance, outperforming all traditional methods. These findings show that LLM-based reasoning with structured validation can match established optimization algorithms while offering additional advantages such as interpretability, adaptability, and the ability to update task logic without retraining.
Diagnose, Correct, and Learn from Manipulation Failures via Visual Symbols
Vision-Language-Action (VLA) models have recently achieved remarkable progress in robotic manipulation, yet they remain limited in failure diagnosis and learning from failures. Additionally, existing failure datasets are mostly generated programmatically in simulation, which limits their generalization to the real world. In light of these, we introduce ViFailback, a framework designed to diagnose robotic manipulation failures and provide both textual and visual correction guidance. Our framework utilizes explicit visual symbols to enhance annotation efficiency. We further release the ViFailback dataset, a large-scale collection of 58,126 Visual Question Answering (VQA) pairs along with their corresponding 5,202 real-world manipulation trajectories. Based on the dataset, we establish ViFailback-Bench, a benchmark of 11 fine-grained VQA tasks designed to assess the failure diagnosis and correction abilities of Vision-Language Models (VLMs), featuring ViFailback-Bench Lite for closed-ended and ViFailback-Bench Hard for open-ended evaluation. To demonstrate the effectiveness of our framework, we built the ViFailback-8B VLM, which not only achieves significant overall performance improvement on ViFailback-Bench but also generates visual symbols for corrective action guidance. Finally, by integrating ViFailback-8B with a VLA model, we conduct real-world robotic experiments demonstrating its ability to assist the VLA model in recovering from failures. Project Website: https://x1nyuzhou.github.io/vifailback.github.io/
CogDrive: Cognition-Driven Multimodal Prediction-Planning Fusion for Safe Autonomy
Safe autonomous driving in mixed traffic requires a unified understanding of multimodal interactions and dynamic planning under uncertainty. Existing learning based approaches struggle to capture rare but safety critical behaviors, while rule based systems often lack adaptability in complex interactions. To address these limitations, CogDrive introduces a cognition driven multimodal prediction and planning framework that integrates explicit modal reasoning with safety aware trajectory optimization. The prediction module adopts cognitive representations of interaction modes based on topological motion semantics and nearest neighbor relational encoding. With a differentiable modal loss and multimodal Gaussian decoding, CogDrive learns sparse and unbalanced interaction behaviors and improves long horizon trajectory prediction. The planning module incorporates an emergency response concept and optimizes safety stabilized trajectories, where short term consistent branches ensure safety during replanning cycles and long term branches support smooth and collision free motion under low probability switching modes. Experiments on Argoverse2 and INTERACTION datasets show that CogDrive achieves strong performance in trajectory accuracy and miss rate, while closed loop simulations confirm adaptive behavior in merge and intersection scenarios. By combining cognitive multimodal prediction with safety oriented planning, CogDrive offers an interpretable and reliable paradigm for safe autonomy in complex traffic.
comment: 25 pages, 6 figures
RoboWheel: A Data Engine from Real-World Human Demonstrations for Cross-Embodiment Robotic Learning
We introduce Robowheel, a data engine that converts human hand object interaction (HOI) videos into training-ready supervision for cross morphology robotic learning. From monocular RGB or RGB-D inputs, we perform high precision HOI reconstruction and enforce physical plausibility via a reinforcement learning (RL) optimizer that refines hand object relative poses under contact and penetration constraints. The reconstructed, contact rich trajectories are then retargeted to cross-embodiments, robot arms with simple end effectors, dexterous hands, and humanoids, yielding executable actions and rollouts. To scale coverage, we build a simulation-augmented framework on Isaac Sim with diverse domain randomization (embodiments, trajectories, object retrieval, background textures, hand motion mirroring), which enriches the distributions of trajectories and observations while preserving spatial relationships and physical plausibility. The entire data pipeline forms an end to end pipeline from video,reconstruction,retargeting,augmentation data acquisition. We validate the data on mainstream vision language action (VLA) and imitation learning architectures, demonstrating that trajectories produced by our pipeline are as stable as those from teleoperation and yield comparable continual performance gains. To our knowledge, this provides the first quantitative evidence that HOI modalities can serve as effective supervision for robotic learning. Compared with teleoperation, Robowheel is lightweight, a single monocular RGB(D) camera is sufficient to extract a universal, embodiment agnostic motion representation that could be flexibly retargeted across embodiments. We further assemble a large scale multimodal dataset combining multi-camera captures, monocular videos, and public HOI corpora for training and evaluating embodied models.
comment: 27 Pages, 21 figures
SAM2Grasp: Resolve Multi-modal Grasping via Prompt-conditioned Temporal Action Prediction
Imitation learning for robotic grasping is often plagued by the multimodal problem: when a scene contains multiple valid targets, demonstrations of grasping different objects create conflicting training signals. Standard imitation learning policies fail by averaging these distinct actions into a single, invalid action. In this paper, we introduce SAM2Grasp, a novel framework that resolves this issue by reformulating the task as a uni-modal, prompt-conditioned prediction problem. Our method leverages the frozen SAM2 model to use its powerful visual temporal tracking capability and introduces a lightweight, trainable action head that operates in parallel with its native segmentation head. This design allows for training only the small action head on pre-computed temporal-visual features from SAM2. During inference, an initial prompt, such as a bounding box provided by an upstream object detection model, designates the specific object to be grasped. This prompt conditions the action head to predict a unique, unambiguous grasp trajectory for that object alone. In all subsequent video frames, SAM2's built-in temporal tracking capability automatically maintains stable tracking of the selected object, enabling our model to continuously predict the grasp trajectory from the video stream without further external guidance. This temporal-prompted approach effectively eliminates ambiguity from the visuomotor policy. We demonstrate through extensive experiments that SAM2Grasp achieves state-of-the-art performance in cluttered, multi-object grasping tasks.
Reframing Human-Robot Interaction Through Extended Reality: Unlocking Safer, Smarter, and More Empathic Interactions with Virtual Robots and Foundation Models
This perspective reframes human-robot interaction (HRI) through extended reality (XR), arguing that virtual robots powered by large foundation models (FMs) can serve as cognitively grounded, empathic agents. Unlike physical robots, XR-native agents are unbound by hardware constraints and can be instantiated, adapted, and scaled on demand, while still affording embodiment and co-presence. We synthesize work across XR, HRI, and cognitive AI to show how such agents can support safety-critical scenarios, socially and cognitively empathic interaction across domains, and outreaching physical capabilities with XR and AI integration. We then discuss how multimodal large FMs (e.g., large language model, large vision model, and vision-language model) enable context-aware reasoning, affect-sensitive situations, and long-term adaptation, positioning virtual robots as cognitive and empathic mediators rather than mere simulation assets. At the same time, we highlight challenges and potential risks, including overtrust, cultural and representational bias, privacy concerns around biometric sensing, and data governance and transparency. The paper concludes by outlining a research agenda for human-centered, ethically grounded XR agents - emphasizing multi-layered evaluation frameworks, multi-user ecosystems, mixed virtual-physical embodiment, and societal and ethical design practices to envision XR-based virtual agents powered by FMs as reshaping future HRI into a more efficient and adaptive paradigm.
comment: This paper is under review
Robotic capabilities framework: A boundary object and intermediate-level knowledge artifact for co-designing robotic processes
As robots become more adaptable, responsive, and capable of interacting with humans, the design of effective human-robot collaboration becomes critical. Yet, this design process is typically led by monodisciplinary approaches, often overlooking interdisciplinary knowledge and the experiential knowledge of workers who will ultimately share tasks with these systems. To address this gap, we introduce the robotic capabilities framework, a vocabulary that enables transdisciplinary collaborations to meaningfully shape the future of work when robotic systems are integrated into the workplace. Rather than focusing on the internal workings of robots, the framework centers discussion on high-level capabilities, supporting dialogue around which elements of a task should remain human-led and which can be delegated to robots. We developed the framework through reflexive and iterative processes, and applied it in two distinct settings: by engaging roboticists in describing existing commercial robots using its vocabulary, and through a design activity with students working on robotics-related projects. The framework emerges as an intermediate-level knowledge artifact and a boundary object that bridges technical and experiential domains, guiding designers, empowering workers, and contributing to more just and collaborative futures of work.
AID: Agent Intent from Diffusion for Multi-Agent Informative Path Planning
Information gathering in large-scale or time-critical scenarios (e.g., environmental monitoring, search and rescue) requires broad coverage within limited time budgets, motivating the use of multi-agent systems. These scenarios are commonly formulated as multi-agent informative path planning (MAIPP), where multiple agents must coordinate to maximize information gain while operating under budget constraints. A central challenge in MAIPP is ensuring effective coordination while the belief over the environment evolves with incoming measurements. Recent learning-based approaches address this by using distributions over future positions as "intent" to support coordination. However, these autoregressive intent predictors are computationally expensive and prone to compounding errors. Inspired by the effectiveness of diffusion models as expressive, long-horizon policies, we propose AID, a fully decentralized MAIPP framework that leverages diffusion models to generate long-term trajectories in a non-autoregressive manner. AID first performs behavior cloning on trajectories produced by existing MAIPP planners and then fine-tunes the policy using reinforcement learning via Diffusion Policy Policy Optimization (DPPO). This two-stage pipeline enables the policy to inherit expert behavior while learning improved coordination through online reward feedback. Experiments demonstrate that AID consistently improves upon the MAIPP planners it is trained from, achieving up to 4x faster execution and 17% increased information gain, while scaling effectively to larger numbers of agents. Our implementation is publicly available at https://github.com/marmotlab/AID.
nuScenes Revisited: Progress and Challenges in Autonomous Driving
Autonomous Vehicles (AV) and Advanced Driver Assistance Systems (ADAS) have been revolutionized by Deep Learning. As a data-driven approach, Deep Learning relies on vast amounts of driving data, typically labeled in great detail. As a result, datasets, alongside hardware and algorithms, are foundational building blocks for the development of AVs. In this work we revisit one of the most widely used autonomous driving datasets: the nuScenes dataset. nuScenes exemplifies key trends in AV development, being the first dataset to include radar data, to feature diverse urban driving scenes from two continents, and to be collected using a fully autonomous vehicle operating on public roads, while also promoting multi-modal sensor fusion, standardized benchmarks, and a broad range of tasks including perception, localization \& mapping, prediction and planning. We provide an unprecedented look into the creation of nuScenes, as well as its extensions nuImages and Panoptic nuScenes, summarizing many technical details that have hitherto not been revealed in academic publications. Furthermore, we trace how the influence of nuScenes impacted a large number of other datasets that were released later and how it defined numerous standards that are used by the community to this day. Finally, we present an overview of both official and unofficial tasks using the nuScenes dataset and review major methodological developments, thereby offering a comprehensive survey of the autonomous driving literature, with a particular focus on nuScenes.
comment: 18 pages, 17 figures
Vehicle Dynamics Embedded World Models for Autonomous Driving
World models have gained significant attention as a promising approach for autonomous driving. By emulating human-like perception and decision-making processes, these models can predict and adapt to dynamic environments. Existing methods typically map high-dimensional observations into compact latent spaces and learn optimal policies within these latent representations. However, prior work usually jointly learns ego-vehicle dynamics and environmental transition dynamics from the image input, leading to inefficiencies and a lack of robustness to variations in vehicle dynamics. To address these issues, we propose the Vehicle Dynamics embedded Dreamer (VDD) method, which decouples the modeling of ego-vehicle dynamics from environmental transition dynamics. This separation allows the world model to generalize effectively across vehicles with diverse parameters. Additionally, we introduce two strategies to further enhance the robustness of the learned policy: Policy Adjustment during Deployment (PAD) and Policy Augmentation during Training (PAT). Comprehensive experiments in simulated environments demonstrate that the proposed model significantly improves both driving performance and robustness to variations in vehicle dynamics, outperforming existing approaches.
On the Convergence of Density-Based Predictive Control for Multi-Agent Non-Uniform Area Coverage
This paper presents Density-based Predictive Control (DPC), a novel multi-agent control strategy for efficient non-uniform area coverage, grounded in optimal transport theory. In large-scale scenarios such as search and rescue or environmental monitoring, traditional uniform coverage fails to account for varying regional priorities. DPC leverages a pre-constructed reference distribution to allocate agents' coverage efforts, spending more time in high-priority or densely sampled regions. We analyze convergence conditions using the Wasserstein distance, derive an analytic optimal control law for unconstrained cases, and propose a numerical method for constrained scenarios. Simulations on first-order dynamics and linearized quadrotor models demonstrate that DPC achieves trajectories closely matching the non-uniform reference distribution, outperforming existing coverage methods.
comment: Accepted for publication in ASME JDSMC
VIGS-SLAM: Visual Inertial Gaussian Splatting SLAM
We present VIGS-SLAM, a visual-inertial 3D Gaussian Splatting SLAM system that achieves robust real-time tracking and high-fidelity reconstruction. Although recent 3DGS-based SLAM methods achieve dense and photorealistic mapping, their purely visual design degrades under motion blur, low texture, and exposure variations. Our method tightly couples visual and inertial cues within a unified optimization framework, jointly refining camera poses, depths, and IMU states. It features robust IMU initialization, time-varying bias modeling, and loop closure with consistent Gaussian updates. Experiments on four challenging datasets demonstrate our superiority over state-of-the-art methods. Project page: https://vigs-slam.github.io
comment: Project page: https://vigs-slam.github.io
KALIKO: Kalman-Implicit Koopman Operator Learning For Prediction of Nonlinear Dynamical Systems
Long-horizon dynamical prediction is fundamental in robotics and control, underpinning canonical methods like model predictive control. Yet, many systems and disturbance phenomena are difficult to model due to effects like nonlinearity, chaos, and high-dimensionality. Koopman theory addresses this by modeling the linear evolution of embeddings of the state under an infinite-dimensional linear operator that can be approximated with a suitable finite basis of embedding functions, effectively trading model nonlinearity for representational complexity. However, explicitly computing a good choice of basis is nontrivial, and poor choices may cause inaccurate forecasts or overfitting. To address this, we present Kalman-Implicit Koopman Operator (KALIKO) Learning, a method that leverages the Kalman filter to implicitly learn embeddings corresponding to latent dynamics without requiring an explicit encoder. KALIKO produces interpretable representations consistent with both theory and prior works, yielding high-quality reconstructions and inducing a globally linear latent dynamics. Evaluated on wave data generated by a high-dimensional PDE, KALIKO surpasses several baselines in open-loop prediction and in a demanding closed-loop simulated control task: stabilizing an underactuated manipulator's payload by predicting and compensating for strong wave disturbances.
Flux4D: Flow-based Unsupervised 4D Reconstruction NeurIPS 2025
Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.
comment: NeurIPS 2025. Project page: https://waabi.ai/flux4d/
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
Multi-Agent Reinforcement Learning and Real-Time Decision-Making in Robotic Soccer for Virtual Environments
The deployment of multi-agent systems in dynamic, adversarial environments like robotic soccer necessitates real-time decision-making, sophisticated cooperation, and scalable algorithms to avoid the curse of dimensionality. While Reinforcement Learning (RL) offers a promising framework, existing methods often struggle with the multi-granularity of tasks (long-term strategy vs. instant actions) and the complexity of large-scale agent interactions. This paper presents a unified Multi-Agent Reinforcement Learning (MARL) framework that addresses these challenges. First, we establish a baseline using Proximal Policy Optimization (PPO) within a client-server architecture for real-time action scheduling, with PPO demonstrating superior performance (4.32 avg. goals, 82.9% ball control). Second, we introduce a Hierarchical RL (HRL) structure based on the options framework to decompose the problem into a high-level trajectory planning layer (modeled as a Semi-Markov Decision Process) and a low-level action execution layer, improving global strategy (avg. goals increased to 5.26). Finally, to ensure scalability, we integrate mean-field theory into the HRL framework, simplifying many-agent interactions into a single agent vs. the population average. Our mean-field actor-critic method achieves a significant performance boost (5.93 avg. goals, 89.1% ball control, 92.3% passing accuracy) and enhanced training stability. Extensive simulations of 4v4 matches in the Webots environment validate our approach, demonstrating its potential for robust, scalable, and cooperative behavior in complex multi-agent domains.
Forecasting in Offline Reinforcement Learning for Non-stationary Environments NeurIPS 2025
Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time, assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific pattern of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent's experience, we aim to bridge the gap between offline RL and the complexities of real-world, non-stationary environments.
comment: The Thirty-Ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025
Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models
Robust robotic manipulation requires reliable failure detection and recovery. Although current Vision-Language Models (VLMs) show promise, their accuracy and generalization are limited by the scarcity of failure data. To address this data gap, we propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures. This method produces not only binary classification labels but also fine-grained failure categories and step-by-step reasoning traces in both simulation and the real world. With it, we construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail, substantially expanding the diversity and scale of existing failure datasets. We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection. Guardian achieves state-of-the-art performance on both existing and newly introduced benchmarks. It also effectively improves task success rates when integrated into a state-of-the-art manipulation system in simulation and real robots, demonstrating the impact of our generated failure data. Code, Data, and Models available at https://www.di.ens.fr/willow/research/guardian/.
comment: Code, Data, and Models available at https://www.di.ens.fr/willow/research/guardian/. The paper contains 8 pages, 9 figures, 6 tables
DYNEMO-SLAM: Dynamic Entity and Motion-Aware 3D Scene Graph SLAM
Robots operating in dynamic environments face significant challenges due to the presence of moving agents and displaced objects. Traditional SLAM systems typically assume a static world or treat dynamic as outliers, discarding their information to preserve map consistency. As a result, they cannot exploit dynamic entities as persistent landmarks, do not model and exploit their motion over time, and therefore quickly degrade in highly cluttered environments with few reliable static features. This paper presents a novel 3D scene graph-based SLAM framework that addresses the challenge of modeling and estimating the pose of dynamic entities into the SLAM backend. Our framework incorporates semantic motion priors and dynamic entity-aware constraints to jointly optimize the robot trajectory, dynamic entity poses, and the surrounding environment structure within a unified graph formulation. In parallel, a dynamic keyframe selection policy and a semantic loop-closure prefiltering step enable the system to remain robust and effective in highly dynamic environments by continuously adapting to scene changes and filtering inconsistent observations. The simulation and real-world experimental results show a 49.97% reduction in ATE compared to the baseline method employed, demonstrating the effectiveness of incorporating dynamic entities and estimating their poses for improved robustness and richer scene representation in complex scenarios while maintaining real-time performance.
comment: 8 pages, 4 figures, 5 tables
Is Image-based Object Pose Estimation Ready to Support Grasping?
We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $ε$ sub-optimality and feasible policy after $O(ε^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($γ$) and by at least 6x for larger value of $γ$.
LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving
Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches. Code will be released at: https://github.com/jhz1192/Latent-Planner.
AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
Vision-Language-Action (VLA) models have demonstrated remarkable capabilities in embodied AI tasks. However, existing VLA models, often built upon Vision-Language Models (VLMs), typically process dense visual inputs independently at each timestep. This approach implicitly models the task as a Markov Decision Process (MDP). However, this history-agnostic design is suboptimal for effective visual token processing in dynamic sequential decision-making, as it fails to leverage the context of history. To address this limitation, we reformulate the problem from a Partially Observable Markov Decision Process (POMDP) perspective and propose a novel framework named AVA-VLA. Inspired by the POMDP that the action generation should be conditioned on the belief state. AVA-VLA introduces Active Visual Attention (AVA) to dynamically modulate visual processing. It achieves this by leveraging the recurrent state, which is a neural approximation of the agent's belief state derived from the previous decision step. Specifically, the AVA module uses the recurrent state to compute the soft weights to actively process task-relevant visual tokens based on its historical context. Comprehensive evaluations demonstrate that AVA-VLA achieves state-of-the-art performance across popular robotic benchmarks, including LIBERO and CALVIN. Furthermore, real-world deployments on a dual-arm robot platform validate the framework's practical applicability and robust sim-to-real transferability.
comment: 18 pages, 10 figures
Multi-User Personalisation in Human-Robot Interaction: Resolving Preference Conflicts Using Gradual Argumentation
While personalisation in Human-Robot Interaction (HRI) has advanced significantly, most existing approaches focus on single-user adaptation, overlooking scenarios involving multiple stakeholders with potentially conflicting preferences. To address this, we propose the Multi-User Preferences Quantitative Bipolar Argumentation Framework (MUP-QBAF), a novel multi-user personalisation framework based on Quantitative Bipolar Argumentation Frameworks (QBAFs) that explicitly models and resolves multi-user preference conflicts. Unlike prior work in Argumentation Frameworks, which typically assumes static inputs, our approach is tailored to robotics: it incorporates both users' arguments and the robot's dynamic observations of the environment, allowing the system to adapt over time and respond to changing contexts. Preferences, both positive and negative, are represented as arguments whose strength is recalculated iteratively based on new information. The framework's properties and capabilities are presented and validated through a realistic case study, where an assistive robot mediates between the conflicting preferences of a caregiver and a care recipient during a frailty assessment task. This evaluation further includes a sensitivity analysis of argument base scores, demonstrating how preference outcomes can be shaped by user input and contextual observations. By offering a transparent, structured, and context-sensitive approach to resolving competing user preferences, this work advances the field of multi-user HRI. It provides a principled alternative to data-driven methods, enabling robots to navigate conflicts in real-world environments.
comment: Preprint submitted to a journal
Agentic UAVs: LLM-Driven Autonomy with Integrated Tool-Calling and Cognitive Reasoning
Unmanned Aerial Vehicles (UAVs) are increasingly used in defense, surveillance, and disaster response, yet most systems still operate at SAE Level 2 to 3 autonomy. Their dependence on rule-based control and narrow AI limits adaptability in dynamic and uncertain missions. Current UAV architectures lack context-aware reasoning, autonomous decision-making, and integration with external systems. Importantly, none make use of Large Language Model (LLM) agents with tool-calling for real-time knowledge access. This paper introduces the Agentic UAVs framework, a five-layer architecture consisting of Perception, Reasoning, Action, Integration, and Learning. The framework enhances UAV autonomy through LLM-driven reasoning, database querying, and interaction with third-party systems. A prototype built with ROS 2 and Gazebo combines YOLOv11 for object detection with GPT-4 for reasoning and a locally deployed Gemma 3 model. In simulated search-and-rescue scenarios, agentic UAVs achieved higher detection confidence (0.79 compared to 0.72), improved person detection rates (91% compared to 75%), and a major increase in correct action recommendations (92% compared to 4.5%). These results show that modest computational overhead can enable significantly higher levels of autonomy and system-level integration.
comment: 17 pages, 2 figure
Large Language Models for Robotics: A Survey
The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and planning. This survey first provides an overview of the background and development of LLMs for robotics, followed by a discussion of their benefits and recent advancements in LLM-based robotic models. It then explores various techniques, employed in perception, decision-making, control, and interaction, as well as cross-module coordination in practical tasks. Finally, we review current applications of LLMs in robotics and outline potential challenges they may face in the near future. Embodied intelligence represents the future of intelligent systems, and LLM-based robotics is one of the most promising yet challenging paths toward achieving it.
comment: Preprint. 9 figures, 7 tables
Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey
Learning-based 3D reconstruction has emerged as a transformative technique in autonomous driving, enabling precise modeling of environments through advanced neural representations. It has inspired pioneering solutions for vital tasks in autonomous driving, such as dense mapping and closed-loop simulation, as well as comprehensive scene feature for driving scene understanding and reasoning. Given the rapid growth in related research, this survey provides a comprehensive review of both technical evolutions and practical applications in autonomous driving. We begin with an introduction to the preliminaries of learning-based 3D reconstruction to provide a solid technical background foundation, then progress to a rigorous, multi-dimensional examination of cutting-edge methodologies, systematically organized according to the distinctive technical requirements and fundamental challenges of autonomous driving. Through analyzing and summarizing development trends and cutting-edge research, we identify existing technical challenges, along with insufficient disclosure of on-board validation and safety verification details in the current literature, and ultimately suggest potential directions to guide future studies.
comment: Published in IEEE Trans. on Intelligent Transportation Systems
ST-Booster: An Iterative SpatioTemporal Perception Booster for Vision-and-Language Navigation in Continuous Environments
Vision-and-Language Navigation in Continuous Environments (VLN-CE) requires agents to navigate unknown, continuous spaces based on natural language instructions. Compared to discrete settings, VLN-CE poses two core perception challenges. First, the absence of predefined observation points leads to heterogeneous visual memories and weakened global spatial correlations. Second, cumulative reconstruction errors in three-dimensional scenes introduce structural noise, impairing local feature perception. To address these challenges, this paper proposes ST-Booster, an iterative spatiotemporal booster that enhances navigation performance through multi-granularity perception and instruction-aware reasoning. ST-Booster consists of three key modules -- Hierarchical SpatioTemporal Encoding (HSTE), Multi-Granularity Aligned Fusion (MGAF), and ValueGuided Waypoint Generation (VGWG). HSTE encodes long-term global memory using topological graphs and captures shortterm local details via grid maps. MGAF aligns these dualmap representations with instructions through geometry-aware knowledge fusion. The resulting representations are iteratively refined through pretraining tasks. During reasoning, VGWG generates Guided Attention Heatmaps (GAHs) to explicitly model environment-instruction relevance and optimize waypoint selection. Extensive comparative experiments and performance analyses are conducted, demonstrating that ST-Booster outperforms existing state-of-the-art methods, particularly in complex, disturbance-prone environments.
comment: 11 pages, 7 figures
ViTaMIn-B: A Reliable and Efficient Visuo-Tactile Bimanual Manipulation Interface
Handheld devices have opened up unprecedented opportunities to collect large-scale, high-quality demonstrations efficiently. However, existing systems often lack robust tactile sensing or reliable pose tracking to handle complex interaction scenarios, especially for bimanual and contact-rich tasks. In this work, we propose ViTaMIn-B, a more capable and efficient handheld data collection system for such tasks. We first design DuoTact, a novel compliant visuo-tactile sensor built with a flexible frame to withstand large contact forces during manipulation while capturing high-resolution contact geometry. To enhance the cross-sensor generalizability, we propose reconstructing the sensor's global deformation as a 3D point cloud and using it as the policy input. We further develop a robust, unified 6-DoF bimanual pose acquisition process using Meta Quest controllers, which eliminates the trajectory drift issue in common SLAM-based methods. Comprehensive user studies confirm the efficiency and high usability of ViTaMIn-B among novice and expert operators. Furthermore, experiments on four bimanual manipulation tasks demonstrate its superior task performance relative to existing systems. Project page: https://chuanyune.github.io/ViTaMIn-B_page/
comment: Project page: https://chuanyune.github.io/ViTaMIn-B_page/
SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge SP
Articulated 3D objects are critical for embodied AI, robotics, and interactive scene understanding, yet creating simulation-ready assets remains labor-intensive and requires expert modeling of part hierarchies and motion structures. We introduce SPARK, a framework for reconstructing physically consistent, kinematic part-level articulated objects from a single RGB image. Given an input image, we first leverage VLMs to extract coarse URDF parameters and generate part-level reference images. We then integrate the part-image guidance and the inferred structure graph into a generative diffusion transformer to synthesize consistent part and complete shapes of articulated objects. To further refine the URDF parameters, we incorporate differentiable forward kinematics and differentiable rendering to optimize joint types, axes, and origins under VLM-generated open-state supervision. Extensive experiments show that SPARK produces high-quality, simulation-ready articulated assets across diverse categories, enabling downstream applications such as robotic manipulation and interaction modeling. Project page: https://heyumeng.com/SPARK/index.html.
comment: Project page: https://heyumeng.com/SPARK/index.html. 17 pages, 7 figures
EMMA: Scaling Mobile Manipulation via Egocentric Human Data
Scaling mobile manipulation imitation learning is bottlenecked by expensive mobile robot teleoperation. We present Egocentric Mobile MAnipulation (EMMA), an end-to-end framework training mobile manipulation policies from human mobile manipulation data with static robot data, sidestepping mobile teleoperation. To accomplish this, we co-train human full-body motion data with static robot data. In our experiments across three real-world tasks, EMMA demonstrates comparable performance to baselines trained on teleoperated mobile robot data (Mobile ALOHA), achieving higher or equivalent task performance in full task success. We find that EMMA is able to generalize to new spatial configurations and scenes, and we observe positive performance scaling as we increase the hours of human data, opening new avenues for scalable robotic learning in real-world environments. Details of this project can be found at https://ego-moma.github.io/.
Predicting Human Perceptions of Robot Performance During Navigation Tasks
Understanding human perceptions of robot performance is crucial for designing socially intelligent robots that can adapt to human expectations. Current approaches often rely on surveys, which can disrupt ongoing human-robot interactions. As an alternative, we explore predicting people's perceptions of robot performance using non-verbal behavioral cues and machine learning techniques. We contribute the SEAN TOGETHER Dataset consisting of observations of an interaction between a person and a mobile robot in Virtual Reality, together with perceptions of robot performance provided by users on a 5-point scale. We then analyze how well humans and supervised learning techniques can predict perceived robot performance based on different observation types (like facial expression and spatial behavior features). Our results suggest that facial expressions alone provide useful information, but in the navigation scenarios that we considered, reasoning about spatial features in context is critical for the prediction task. Also, supervised learning techniques outperformed humans' predictions in most cases. Further, when predicting robot performance as a binary classification task on unseen users' data, the F1-Score of machine learning models more than doubled that of predictions on a 5-point scale. This suggested good generalization capabilities, particularly in identifying performance directionality over exact ratings. Based on these findings, we conducted a real-world demonstration where a mobile robot uses a machine learning model to predict how a human who follows it perceives it. Finally, we discuss the implications of our results for implementing these supervised learning models in real-world navigation. Our work paves the path to automatically enhancing robot behavior based on observations of users and inferences about their perceptions of a robot.
MIMIC-MJX: Neuromechanical Emulation of Animal Behavior
The primary output of the nervous system is movement and behavior. While recent advances have democratized pose tracking during complex behavior, kinematic trajectories alone provide only indirect access to the underlying control processes. Here we present MIMIC-MJX, a framework for learning biologically-plausible neural control policies from kinematics. MIMIC-MJX models the generative process of motor control by training neural controllers that learn to actuate biomechanically-realistic body models in physics simulation to reproduce real kinematic trajectories. We demonstrate that our implementation is accurate, fast, data-efficient, and generalizable to diverse animal body models. Policies trained with MIMIC-MJX can be utilized to both analyze neural control strategies and simulate behavioral experiments, illustrating its potential as an integrative modeling framework for neuroscience.
comment: Corrected LaTeX issues. Project page available at https://mimic-mjx.talmolab.org
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
To address the gap in humanoid robot cognitive systems regarding the lack of a time-updable mediating thought space between semantics and continuous control, this study constructs and trains a VLA model named "Sigma" that runs on a single RTX 4090. It uses the open-source pi05_base model as a foundation and preprocesses svla_so101_pickplace into a training dataset. The researcher independently designed an architecture for a vision-language-action model that combines deep semantic understanding and association to achieve telepathic communication. The training process involved repeated optimizations of data preprocessing, LoRA fine-tuning, and the inference-stage adapter. The experiment employed offline closed-loop replay, comparing Sigma with the untuned pure pi05_base model under data conditions. Results showed that Sigma exhibited a stable decrease in control MSE across vector, fragment, and entire trajectory timescales, while maintaining the telepathy norm and semantic-text alignment quality unchanged. It demonstrates that mind-responsive alignment control is quantified through an architecture that combines deep understanding of semantics and association without retraining the base model, which provides reproducible experience for semantic alignment and intention-driven behavior in humanoid robots.
comment: The Sigma model has been open-sourced on Hugging Face. Weights, dataset, some scripts, and logs are all available. The link is: https://huggingface.co/Veltraxor/Sigma
Learning Massively Multitask World Models for Continuous Control
General-purpose control demands agents that act across many tasks and embodiments, yet research on reinforcement learning (RL) for continuous control remains dominated by single-task or offline regimes, reinforcing a view that online RL does not scale. Inspired by the foundation model recipe (large-scale pretraining followed by light RL) we ask whether a single agent can be trained on hundreds of tasks with online interaction. To accelerate research in this direction, we introduce a new benchmark with 200 diverse tasks spanning many domains and embodiments, each with language instructions, demonstrations, and optionally image observations. We then present \emph{Newt}, a language-conditioned multitask world model that is first pretrained on demonstrations to acquire task-aware representations and action priors, and then jointly optimized with online interaction across all tasks. Experiments show that Newt yields better multitask performance and data-efficiency than a set of strong baselines, exhibits strong open-loop control, and enables rapid adaptation to unseen tasks. We release our environments, demonstrations, code for training and evaluation, as well as 200+ checkpoints.
comment: Webpage: https://www.nicklashansen.com/NewtWM
Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments
Effective monitoring of underwater ecosystems is crucial for tracking environmental changes, guiding conservation efforts, and ensuring long-term ecosystem health. However, automating underwater ecosystem management with robotic platforms remains challenging due to the complexities of underwater imagery, which pose significant difficulties for traditional visual localization methods. We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images. This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes. Furthermore, we introduce the SQUIDLE+ VPR Benchmark-the first large-scale underwater VPR benchmark designed to leverage an extensive collection of unstructured data from multiple robotic platforms, spanning time intervals from days to years. The dataset encompasses diverse trajectories, arbitrary overlap and diverse seafloor types captured under varying environmental conditions, including differences in depth, lighting, and turbidity. Our code is available at: https://github.com/bev-gorry/underloc
LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences AAAI 2026
Generative world models have become essential data engines for autonomous driving, yet most existing efforts focus on videos or occupancy grids, overlooking the unique LiDAR properties. Extending LiDAR generation to dynamic 4D world modeling presents challenges in controllability, temporal coherence, and evaluation standardization. To this end, we present LiDARCrafter, a unified framework for 4D LiDAR generation and editing. Given free-form natural language inputs, we parse instructions into ego-centric scene graphs, which condition a tri-branch diffusion network to generate object structures, motion trajectories, and geometry. These structured conditions enable diverse and fine-grained scene editing. Additionally, an autoregressive module generates temporally coherent 4D LiDAR sequences with smooth transitions. To support standardized evaluation, we establish a comprehensive benchmark with diverse metrics spanning scene-, object-, and sequence-level aspects. Experiments on the nuScenes dataset using this benchmark demonstrate that LiDARCrafter achieves state-of-the-art performance in fidelity, controllability, and temporal consistency across all levels, paving the way for data augmentation and simulation. The code and benchmark are released to the community.
comment: AAAI 2026 Oral Presentation; 38 pages, 18 figures, 12 tables; Project Page at https://lidarcrafter.github.io
Are you a robot? Detecting Autonomous Vehicles from Behavior Analysis
The tremendous hype around autonomous driving is eagerly calling for emerging and novel technologies to support advanced mobility use cases. As car manufactures keep developing SAE level 3+ systems to improve the safety and comfort of passengers, traffic authorities need to establish new procedures to manage the transition from human-driven to fully-autonomous vehicles while providing a feedback-loop mechanism to fine-tune envisioned autonomous systems. Thus, a way to automatically profile autonomous vehicles and differentiate those from human-driven ones is a must. In this paper, we present a fully-fledged framework that monitors active vehicles using camera images and state information in order to determine whether vehicles are autonomous, without requiring any active notification from the vehicles themselves. Essentially, it builds on the cooperation among vehicles, which share their data acquired on the road feeding a machine learning model to identify autonomous cars. We extensively tested our solution and created the NexusStreet dataset, by means of the CARLA simulator, employing an autonomous driving control agent and a steering wheel maneuvered by licensed drivers. Experiments show it is possible to discriminate the two behaviors by analyzing video clips with an accuracy of 80%, which improves up to 93% when the target state information is available. Lastly, we deliberately degraded the state to observe how the framework performs under non-ideal data collection conditions.
Magnetic Tactile-Driven Soft Actuator for Intelligent Grasping and Firmness Evaluation
Soft robots are powerful tools for manipulating delicate objects, yet their adoption is hindered by two gaps: the lack of integrated tactile sensing and sensor signal distortion caused by actuator deformations. This paper addresses these challenges by introducing the SoftMag actuator: a magnetic tactile-sensorized soft actuator. Unlike systems relying on attached sensors or treating sensing and actuation separately, SoftMag unifies them through a shared architecture while confronting the mechanical parasitic effect, where deformations corrupt tactile signals. A multiphysics simulation framework models this coupling, and a neural-network-based decoupling strategy removes the parasitic component, restoring sensing fidelity. Experiments including indentation, quasi-static and step actuation, and fatigue tests validate the actuator's performance and decoupling effectiveness. Building upon this foundation, the system is extended into a two-finger SoftMag gripper, where a multi-task neural network enables real-time prediction of tri-axial contact forces and position. Furthermore, a probing-based strategy estimates object firmness during grasping. Validation on apricots shows a strong correlation (Pearson r over 0.8) between gripper-estimated firmness and reference measurements, confirming the system's capability for non-destructive quality assessment. Results demonstrate that combining integrated magnetic sensing, learning-based correction, and real-time inference enables a soft robotic platform that adapts its grasp and quantifies material properties. The framework offers an approach for advancing sensorized soft actuators toward intelligent, material-aware robotics.
comment: 25 pages, 24 figures
Multiagent Systems
Lumos: Let there be Language Model System Certification
We introduce the first principled framework, Lumos, for specifying and formally certifying Language Model System (LMS) behaviors. Lumos is an imperative probabilistic programming DSL over graphs, with constructs to generate independent and identically distributed prompts for LMS. It offers a structured view of prompt distributions via graphs, forming random prompts from sampled subgraphs. Lumos supports certifying LMS for arbitrary prompt distributions via integration with statistical certifiers. We provide hybrid (operational and denotational) semantics for Lumos, providing a rigorous way to interpret the specifications. Using only a small set of composable constructs, Lumos can encode existing LMS specifications, including complex relational and temporal specifications. It also facilitates specifying new properties - we present the first safety specifications for vision-language models (VLMs) in autonomous driving scenarios developed with Lumos. Using these, we show that the state-of-the-art VLM Qwen-VL exhibits critical safety failures, producing incorrect and unsafe responses with at least 90% probability in right-turn scenarios under rainy driving conditions, revealing substantial safety risks. Lumos's modular structure allows easy modification of the specifications, enabling LMS certification to stay abreast with the rapidly evolving threat landscape. We further demonstrate that specification programs written in Lumos enable finding specific failure cases exhibited by state-of-the-art LMS. Lumos is the first systematic and extensible language-based framework for specifying and certifying LMS behaviors, paving the way for a wider adoption of LMS certification.
CogDrive: Cognition-Driven Multimodal Prediction-Planning Fusion for Safe Autonomy
Safe autonomous driving in mixed traffic requires a unified understanding of multimodal interactions and dynamic planning under uncertainty. Existing learning based approaches struggle to capture rare but safety critical behaviors, while rule based systems often lack adaptability in complex interactions. To address these limitations, CogDrive introduces a cognition driven multimodal prediction and planning framework that integrates explicit modal reasoning with safety aware trajectory optimization. The prediction module adopts cognitive representations of interaction modes based on topological motion semantics and nearest neighbor relational encoding. With a differentiable modal loss and multimodal Gaussian decoding, CogDrive learns sparse and unbalanced interaction behaviors and improves long horizon trajectory prediction. The planning module incorporates an emergency response concept and optimizes safety stabilized trajectories, where short term consistent branches ensure safety during replanning cycles and long term branches support smooth and collision free motion under low probability switching modes. Experiments on Argoverse2 and INTERACTION datasets show that CogDrive achieves strong performance in trajectory accuracy and miss rate, while closed loop simulations confirm adaptive behavior in merge and intersection scenarios. By combining cognitive multimodal prediction with safety oriented planning, CogDrive offers an interpretable and reliable paradigm for safe autonomy in complex traffic.
comment: 25 pages, 6 figures
Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions
This paper examines why safety mechanisms designed for human-model interaction do not scale to environments where large language models (LLMs) interact with each other. Most current governance practices still rely on single-agent safety containment, prompts, fine-tuning, and moderation layers that constrain individual model behavior but leave the dynamics of multi-model interaction ungoverned. These mechanisms assume a dyadic setting: one model responding to one user under stable oversight. Yet research and industrial development are rapidly shifting toward LLM-to-LLM ecosystems, where outputs are recursively reused as inputs across chains of agents. In such systems, local compliance can aggregate into collective failure even when every model is individually aligned. We propose a conceptual transition from model-level safety to system-level safety, introducing the framework of the Emergent Systemic Risk Horizon (ESRH) to formalize how instability arises from interaction structure rather than from isolated misbehavior. The paper contributes (i) a theoretical account of collective risk in interacting LLMs, (ii) a taxonomy connecting micro, meso, and macro-level failure modes, and (iii) a design proposal for InstitutionalAI, an architecture for embedding adaptive oversight within multi-agent systems.
IACT: A Self-Organizing Recursive Model for General AI Agents: A Technical White Paper on the Architecture Behind kragent.ai
This technical white paper introduces the Interactive Agents Call Tree (IACT), a computational model designed to address the limitations of static, hard-coded agent workflows. Unlike traditional systems that require pre-defined graphs or specialized programming, IACT operates as a general-purpose autonomous system driven purely by user dialogue. Given a high-level objective, the system autonomously grows a dynamic, recursive agent topology incrementally tailored to the problem's structure. This allows it to scale its organizational complexity to match open-ended tasks. To mitigate the error propagation inherent in unidirectional function calls, IACT introduces interactional redundancy by replacing rigid invocations with bidirectional, stateful dialogues. This mechanism enables runtime error correction and ambiguity resolution. We describe the architecture, design principles, and practical lessons behind the production deployment of this model in the kragent.ai system, presenting qualitative evidence from real-world workflows rather than exhaustive benchmark results.
comment: 13 pages, 2 figures, 1 table
Spoken Conversational Agents with Large Language Models EMNLP 2025
Spoken conversational agents are converging toward voice-native LLMs. This tutorial distills the path from cascaded ASR/NLU to end-to-end, retrieval-and vision-grounded systems. We frame adaptation of text LLMs to audio, cross-modal alignment, and joint speech-text training; review datasets, metrics, and robustness across accents and compare design choices (cascaded vs. E2E, post-ASR correction, streaming). We link industrial assistants to current open-domain and task-oriented agents, highlight reproducible baselines, and outline open problems in privacy, safety, and evaluation. Attendees leave with practical recipes and a clear systems-level roadmap.
comment: Accepted to EMNLP 2025 Tutorial
EZYer: A simulacrum of high school with generative agent SIGIR 2025
With the rapid development of the online education and large language model, the existing educational tools still suffer from incomplete service, insufficient performance and weak interactivity in terms of courseware generation, interactive notes and quality assurance of content. In particular, the proposed generative agent EZYer : 1) Teacher Module: Integrating the Text Corpus retrieval and in-depth generation technologies, it automatically generates structured teaching materials and LaTeX Beamer courseware in line with the high school mathematics syllabus and supports user-defined image insertion. 2) Student Module: Throughout the collaborative interaction of the four roles of Teacher, Assistant, Top Student and Struggling Student, Note Taker summarizes and generates academic notes to enhance the depth and interest of learning. 3) Controller: set up keyword filtering system, content scoring system, role co-validation system, and dynamic content correction system. This ensure academic strictness and pedagogical propriety of EZYer inputs and outputs. In order to evaluate EZYer, this paper designs five-dimensional evaluation indexes of content accuracy, knowledge coverage, usability, formatting correctness and visual design and appeal, and scores 100 Beamer and Notes generated by EZYer by five large language models, separately, and the results show that the quality of EZYer-generated content is excellent and has a good application prospect.
comment: AgentIR@SIGIR 2025
Decentralized Multi-Agent System with Trust-Aware Communication
The emergence of Large Language Models (LLMs) is rapidly accelerating the development of autonomous multi-agent systems (MAS), paving the way for the Internet of Agents. However, traditional centralized MAS architectures present significant challenges, including single points of failure, vulnerability to censorship, inherent scalability limitations, and critical trust issues. We propose a novel Decentralized Multi-Agent System (DMAS) architecture designed to overcome these fundamental problems by enabling trust-aware, scalable, and censorship-resistant interactions among autonomous agents. Our DMAS features a decentralized agent runtime underpinned by a blockchain-based architecture. We formalize a trust-aware communication protocol that leverages cryptographic primitives and on-chain operations to provide security properties: verifiable interaction cycles, communication integrity, authenticity, non-repudiation, and conditional confidentiality, which we further substantiate through a comprehensive security analysis. Our performance analysis validates the DMAS as a scalable and efficient solution for building trustworthy multi-agent systems.
Local Dominance in Mixed-Strength Populations -- Fast Maximal Independent Set
In many natural and engineered systems, agents interact through local contests that determine which individuals become dominant within their neighborhoods. These interactions are shaped by inherent differences in strength, and they often lead to stable dominance patterns that emerge surprisingly quickly relative to the size of the population. This motivates the search for simple mathematical models that capture both heterogeneous agent strength and rapid convergence to stable local dominance. A widely studied abstraction of local dominance is the Maximal Independent Set (MIS) problem. In the Luby MIS protocol that provably converges quickly to an MIS, each agent repeatedly generates a strength value chosen uniformly and becomes locally dominant if its value is smaller than those of its neighbors. This provides a theoretical explanation for fast dominance convergence in populations of equal-strength agents and naturally raises the question of whether fast convergence also holds in the more realistic setting where agents are inherently mixed-strength. To investigate this question, we introduce the mixed-strength agents model, in which each agent draws its strength from its own distribution. We prove that the extension of the Luby MIS protocol where each agent repeatedly generates a strength value from its own distribution still exhibits fast dominance convergence, providing formal confirmation of the rapid convergence observed in many mixed-strength natural processes. We also show that heterogeneity can significantly change the dynamics of the process. In contrast to the equal-strength setting, a constant fraction of edges need not be eliminated per round. We construct a population and strength profile in which progress per round is asymptotically smaller, illustrating how inherent strength asymmetry produces qualitatively different global behavior.
comment: 11 pages
A Gossip-Enhanced Communication Substrate for Agentic AI: Toward Decentralized Coordination in Large-Scale Multi-Agent Systems
As agentic platforms scale, agents are moving beyond fixed roles and predefined toolchains, creating an urgent need for flexible and decentralized coordination. Current structured communication protocols such as direct agent-to-agent messaging or MCP-style tool calls offer reliability, but they struggle to support the emergent and swarm-like intelligence required in large adaptive systems. Distributed agents must learn continuously, share context fluidly, and coordinate without depending solely on central planners. This paper revisits gossip protocols as a complementary substrate for agentic communication. Gossip mechanisms, long valued in distributed systems for their decentralized and fault-tolerant properties, provide scalable and adaptive diffusion of knowledge and fill gaps that structured protocols alone cannot efficiently address. However, gossip also introduces challenges, including semantic relevance, temporal staleness, and limited guarantees on action consistency in rapidly changing environments. We examine how gossip can support context-rich state propagation, resilient coordination under uncertainty, and emergent global awareness. We also outline open problems around semantic filtering, trust, and knowledge decay. Rather than proposing a complete framework, this paper presents a research agenda for integrating gossip into multi-agent communication stacks and argues that gossip is essential for future agentic ecosystems that must remain robust, adaptive, and self-organizing as their scale and autonomy increase.
Learning Network Sheaves for AI-native Semantic Communication
Recent advances in AI call for a paradigm shift from bit-centric communication to goal- and semantics-oriented architectures, paving the way for AI-native 6G networks. In this context, we address a key open challenge: enabling heterogeneous AI agents to exchange compressed latent-space representations while mitigating semantic noise and preserving task-relevant meaning. We cast this challenge as learning both the communication topology and the alignment maps that govern information exchange among agents, yielding a learned network sheaf equipped with orthogonal maps. This learning process is further supported by a semantic denoising end compression module that constructs a shared global semantic space and derives sparse, structured representations of each agent's latent space. This corresponds to a nonconvex dictionary learning problem solved iteratively with closed-form updates. Experiments with mutiple AI agents pre-trained on real image data show that the semantic denoising and compression facilitates AI agents alignment and the extraction of semantic clusters, while preserving high accuracy in downstream task. The resulting communication network provides new insights about semantic heterogeneity across agents, highlighting the interpretability of our methodology.
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI
The rapid deployment of large language model (LLM)-based agents introduces a new class of risks, driven by their capacity for autonomous planning, multi-step tool integration, and emergent interactions. It raises some risk factors for existing governance approaches as they remain fragmented: Existing frameworks are either static taxonomies driven; however, they lack an integrated end-to-end pipeline from risk identification to operational assurance, especially for an agentic platform. We propose AGENTSAFE, a practical governance framework for LLM-based agentic systems. The framework operationalises the AI Risk Repository into design, runtime, and audit controls, offering a governance framework for risk identification and assurance. The proposed framework, AGENTSAFE, profiles agentic loops (plan -> act -> observe -> reflect) and toolchains, and maps risks onto structured taxonomies extended with agent-specific vulnerabilities. It introduces safeguards that constrain risky behaviours, escalates high-impact actions to human oversight, and evaluates systems through pre-deployment scenario banks spanning security, privacy, fairness, and systemic safety. During deployment, AGENTSAFE ensures continuous governance through semantic telemetry, dynamic authorization, anomaly detection, and interruptibility mechanisms. Provenance and accountability are reinforced through cryptographic tracing and organizational controls, enabling measurable, auditable assurance across the lifecycle of agentic AI systems. The key contributions of this paper are: (1) a unified governance framework that translates risk taxonomies into actionable design, runtime, and audit controls; (2) an Agent Safety Evaluation methodology that provides measurable pre-deployment assurance; and (3) a set of runtime governance and accountability mechanisms that institutionalise trust in agentic AI ecosystems.
comment: 12 pages, 1 figure
CrowdLLM: Building LLM-Based Digital Populations Augmented with Generative Models
The emergence of large language models (LLMs) has sparked much interest in creating LLM-based digital populations that can be applied to many applications such as social simulation, crowdsourcing, marketing, and recommendation systems. A digital population can reduce the cost of recruiting human participants and alleviate many concerns related to human subject study. However, research has found that most of the existing works rely solely on LLMs and could not sufficiently capture the accuracy and diversity of a real human population. To address this limitation, we propose CrowdLLM that integrates pretrained LLMs and generative models to enhance the diversity and fidelity of the digital population. We conduct theoretical analysis of CrowdLLM regarding its great potential in creating cost-effective, sufficiently representative, scalable digital populations that can match the quality of a real crowd. Comprehensive experiments are also conducted across multiple domains (e.g., crowdsourcing, voting, user rating) and simulation studies which demonstrate that CrowdLLM achieves promising performance in both accuracy and distributional fidelity to human data.
Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems
Effective human-agent interaction (HAI) relies on accurate and adaptive perception of human emotional states. While multimodal deep learning models - leveraging facial expressions, speech, and textual cues - offer high accuracy in emotion recognition, their training and maintenance are often computationally intensive and inflexible to modality changes. In this work, we propose a novel multi-agent framework for training multimodal emotion recognition systems, where each modality encoder and the fusion classifier operate as autonomous agents coordinated by a central supervisor. This architecture enables modular integration of new modalities (e.g., audio features via emotion2vec), seamless replacement of outdated components, and reduced computational overhead during training. We demonstrate the feasibility of our approach through a proof-of-concept implementation supporting vision, audio, and text modalities, with the classifier serving as a shared decision-making agent. Our framework not only improves training efficiency but also contributes to the design of more flexible, scalable, and maintainable perception modules for embodied and virtual agents in HAI scenarios.
comment: 14 pages, 4 figures
iMAD: Intelligent Multi-Agent Debate for Efficient and Accurate LLM Inference AAAI 2026
Large Language Model (LLM) agent systems have advanced rapidly, driven by their strong generalization in zero-shot settings. To further enhance reasoning and accuracy on complex tasks, Multi-Agent Debate (MAD) has emerged as a promising framework that engages multiple LLM agents in structured debates to encourage diverse reasoning. However, triggering MAD for every query is inefficient, as it incurs substantial computational (token) cost and may even degrade accuracy by overturning correct single-agent answers. To address these limitations, we propose intelligent Multi-Agent Debate (iMAD), a token-efficient framework that selectively triggers MAD only when it is likely to be beneficial (i.e., correcting an initially wrong answer). To achieve this goal, iMAD learns generalizable model behaviors to make accurate debate decisions. Specifically, iMAD first prompts a single agent to produce a structured self-critique response, from which we extract 41 interpretable linguistic and semantic features capturing hesitation cues. Then, iMAD uses a lightweight debate-decision classifier, trained using our proposed FocusCal loss, to determine whether to trigger MAD, enabling robust debate decisions without test dataset-specific tuning. Through extensive experiments using six (visual) question answering datasets against five competitive baselines, we have shown that iMAD significantly reduces token usage (by up to 92%) while also improving final answer accuracy (by up to 13.5%).
comment: Accepted in AAAI 2026 (Oral)
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning ICML 2025
Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.
comment: 21 pages, To appear in the International Conference of Machine Learning (ICML 2025)
Multi-Agent Code Verification via Information Theory
LLMs generate buggy code: 29.6% of SWE-bench solved patches fail, 62% of BaxBench solutions have vulnerabilities, and existing tools only catch 65% of bugs with 35% false positives. We built CodeX-Verify, a multi-agent system that uses four specialized agents to detect different types of bugs. We prove mathematically that combining agents with different detection patterns finds more bugs than any single agent when the agents look for different problems, using submodularity of mutual information under conditional independence. Measuring agent correlation of rho = 0.05 to 0.25 confirms they detect different bugs. Testing on 99 code samples with verified labels shows our system catches 76.1% of bugs, matching the best existing method (Meta Prompt Testing: 75%) while running faster and without test execution. We tested all 15 agent combinations and found that using multiple agents improves accuracy by 39.7 percentage points (from 32.8% to 72.4%) compared to single agents, with diminishing returns of +14.9pp, +13.5pp, and +11.2pp for agents 2, 3, and 4, validating our theoretical model. The best two-agent combination (Correctness + Performance) reaches 79.3% accuracy. Testing on 300 real patches from Claude Sonnet 4.5 runs in under 200ms per sample, making this practical for production use.
comment: 18 pages, 3 figures, 9 tables
Aligning Compound AI Systems via System-level DPO NeurIPS 2025
Compound AI systems, comprising multiple interacting components such as LLMs, foundation models, and external tools, have demonstrated remarkable improvements compared to single models in various tasks. To ensure their effective deployment in real-world applications, aligning these systems with human preferences is crucial. However, aligning the compound system via policy optimization, unlike the alignment of a single model, is challenging for two main reasons: (i) non-differentiable interactions between components make end-to-end gradient-based optimization method inapplicable, and (ii) system-level preferences cannot be directly transformed into component-level preferences. To address these challenges, we first formulate compound AI systems as Directed Acyclic Graphs (DAGs), explicitly modeling both component interactions and the associated data flows. Building on this formulation, we introduce $\textbf{SysDPO}$, a framework that extends Direct Preference Optimization (DPO) to enable joint system-level alignment. We propose two variants, SysDPO-Direct and SysDPO-Sampling, tailored for scenarios depending on whether we construct a system-specific preference dataset. We empirically demonstrate the effectiveness of our approach across two applications: the joint alignment of a language model and a diffusion model, and the joint alignment of an LLM collaboration system.
comment: NeurIPS 2025
Dominated Actions in Imperfect-Information Games
Dominance is a fundamental concept in game theory. In normal-form games dominated strategies can be identified in polynomial time. As a consequence, iterative removal of dominated strategies can be performed efficiently as a preprocessing step for reducing the size of a game before computing a Nash equilibrium. For imperfect-information games in extensive form, we could convert the game to normal form and then iteratively remove dominated strategies in the same way; however, this conversion may cause an exponential blowup in game size. In this paper we define and study the concept of dominated actions in imperfect-information games. Our main result is a polynomial-time algorithm for determining whether an action is dominated (strictly or weakly) by any mixed strategy in n-player games, which can be extended to an algorithm for iteratively removing dominated actions. This allows us to efficiently reduce the size of the game tree as a preprocessing step for Nash equilibrium computation. We explore the role of dominated actions empirically in "All In or Fold" No-Limit Texas Hold'em poker.
DynTaskMAS: A Dynamic Task Graph-driven Framework for Asynchronous and Parallel LLM-based Multi-Agent Systems
The emergence of Large Language Models (LLMs) in Multi-Agent Systems (MAS) has opened new possibilities for artificial intelligence, yet current implementations face significant challenges in resource management, task coordination, and system efficiency. While existing frameworks demonstrate the potential of LLM-based agents in collaborative problem-solving, they often lack sophisticated mechanisms for parallel execution and dynamic task management. This paper introduces DynTaskMAS, a novel framework that orchestrates asynchronous and parallel operations in LLM-based MAS through dynamic task graphs. The framework features four key innovations: (1) a Dynamic Task Graph Generator that intelligently decomposes complex tasks while maintaining logical dependencies, (2) an Asynchronous Parallel Execution Engine that optimizes resource utilization through efficient task scheduling, (3) a Semantic-Aware Context Management System that enables efficient information sharing among agents, and (4) an Adaptive Workflow Manager that dynamically optimizes system performance. Experimental evaluations demonstrate that DynTaskMAS achieves significant improvements over traditional approaches: a 21-33% reduction in execution time across task complexities (with higher gains for more complex tasks), a 35.4% improvement in resource utilization (from 65% to 88%), and near-linear throughput scaling up to 16 concurrent agents (3.47X improvement for 4X agents). Our framework establishes a foundation for building scalable, high-performance LLM-based multi-agent systems capable of handling complex, dynamic tasks efficiently.
Truthful and Trustworthy IoT AI Agents via Immediate-Penalty Enforcement under Approximate VCG Mechanisms
The deployment of autonomous AI agents in Internet of Things (IoT) energy systems requires decision-making mechanisms that remain robust, efficient, and trustworthy under real-time constraints and imperfect monitoring. While reinforcement learning enables adaptive prosumer behaviors, ensuring economic consistency and preventing strategic manipulation remain open challenges, particularly when sensing noise or partial observability reduces the operator's ability to verify actions. This paper introduces a trust-enforcement framework for IoT energy trading that combines an approximate Vickrey-Clarke-Groves (VCG) double auction with an immediate one-shot penalty. Unlike reputation- or history-based approaches, the proposed mechanism restores truthful reporting within a single round, even when allocation accuracy is approximate and monitoring is noisy. We theoretically characterize the incentive gap induced by approximation and derive a penalty threshold that guarantees truthful bidding under bounded sensing errors. To evaluate learning-enabled prosumers, we embed the mechanism into a multi-agent reinforcement learning environment reflecting stochastic generation, dynamic loads, and heterogeneous trading opportunities. Experiments show that improved allocation accuracy reduces deviation incentives, the required penalty matches analytical predictions, and learned bidding behaviors remain stable and interpretable despite imperfect monitoring. These results demonstrate that lightweight penalty designs can reliably align strategic IoT agents with socially efficient energy-trading outcomes.
MARS: A Meta-Adaptive Reinforcement Learning Framework for Risk-Aware Multi-Agent Portfolio Management AAAI 2026
Reinforcement Learning (RL) has shown significant promise in automated portfolio management; however, effectively balancing risk and return remains a central challenge, as many models fail to adapt to dynamically changing market conditions. We propose Meta-controlled Agents for a Risk-aware System (MARS), a novel framework addressing this through a multi-agent, risk-aware approach. MARS replaces monolithic models with a Heterogeneous Agent Ensemble, where each agent's unique risk profile is enforced by a Safety-Critic network to span behaviors from capital preservation to aggressive growth. A high-level Meta-Adaptive Controller (MAC) dynamically orchestrates this ensemble, shifting reliance between conservative and aggressive agents to minimize drawdown during downturns while seizing opportunities in bull markets. This two-tiered structure leverages behavioral diversity rather than explicit feature engineering to ensure a disciplined portfolio robust across market regimes. Experiments on major international indexes confirm that our framework significantly reduces maximum drawdown and volatility while maintaining competitive returns.
comment: Accepted by AAAI 2026 Main Track
Systems and Control (EESS)
Learning Physically Consistent Lagrangian Control Models Without Acceleration Measurements
This article investigates the modeling and control of Lagrangian systems involving non-conservative forces using a hybrid method that does not require acceleration calculations. It focuses in particular on the derivation and identification of physically consistent models, which are essential for model-based control synthesis. Lagrangian or Hamiltonian neural networks provide useful structural guarantees but the learning of such models often leads to inconsistent models, especially on real physical systems where training data are limited, partial and noisy. Motivated by this observation and the objective to exploit these models for model-based nonlinear control, a learning algorithm relying on an original loss function is proposed to improve the physical consistency of Lagrangian systems. A comparative analysis of different learning-based modeling approaches with the proposed solution shows significant improvements in terms of physical consistency of the learned models, on both simulated and experimental systems. The model's consistency is then exploited to demonstrate, on an experimental benchmark, the practical relevance of the proposed methodology for feedback linearization and energy-based control techniques.
comment: Submitted to the L4DC 2026
GNSS Array-Based Multipath Detection Employing UKF on Manifolds
Global Navigation Satellite Systems (GNSS) applications are often hindered by various sources of error, with multipath interference being one of the most challenging, particularly in urban environments. In this work, we build on previous research by implementing a GNSS array-based multipath detection algorithm, incorporating real-time attitude estimation for dynamic scenarios. The method fuses GNSS and IMU data using an Unscented Kalman Filter (UKF) on a manifold, enabling continuous attitude tracking. The proposed approach utilizes attitude information from satellite combinations to identify and exclude multipath-affected satellites, improving the accuracy of both positioning and attitude determination. To address computational challenges associated with evaluating large numbers of satellite combinations, we propose the use of the Random Sample Consensus (RANSAC) algorithm, which reduces the number of combinations assessed while maintaining high detection performance. Performance evaluations are conducted using trajectories and IMU readings from the KITTI dataset. GNSS observations are simulated based on ground truth positions and satellite ephemeris. The results demonstrate the effectiveness of the proposed approach in detecting satellites affected by multipath interference. Significant improvements in positioning accuracy are observed, particularly in scenarios where a large portion of the visible satellites are contaminated by severe multipath.
comment: The paper, was presented at the ION PLANS 2025 meeting (Position, Location, and Navigation Symposium) in Session C1: Multisensor Integrated Systems and Sensor Fusion Technologies, and is published in the conference proceedings
Statistical-Symbolic Verification of Perception-Based Autonomous Systems using State-Dependent Conformal Prediction
Reachability analysis has been a prominent way to provide safety guarantees for neurally controlled autonomous systems, but its direct application to neural perception components is infeasible due to imperfect or intractable perception models. Typically, this issue has been bypassed by complementing reachability with statistical analysis of perception error, say with conformal prediction (CP). However, existing CP methods for time-series data often provide conservative bounds. The corresponding error accumulation over time has made it challenging to combine statistical bounds with symbolic reachability in a way that is provable, scalable, and minimally conservative. To reduce conservatism and improve scalability, our key insight is that perception error varies significantly with the system's dynamical state. This article proposes state-dependent conformal prediction, which exploits that dependency in constructing tight high-confidence bounds on perception error. Based on this idea, we provide an approach to partition the state space, using a genetic algorithm, so as to optimize the tightness of conformal bounds. Finally, since using these bounds in reachability analysis leads to additional uncertainty and branching in the resulting hybrid system, we propose a branch-merging reachability algorithm that trades off uncertainty for scalability so as to enable scalable and tight verification. The evaluation of our verification methodology on two complementary case studies demonstrates reduced conservatism compared to the state of the art.
comment: The first and second authors contributed equally. The last two authors shared the supervision equally
AC/DC Frequency-Dependent Power Flow Jacobian: Quantifying Grid Support and Stability Implications
This letter proposes an AC/DC frequency-dependent power flow Jacobian analysis to identify the system support capabilities. In addition, the analyses reveal that system support capabilities do not necessarily enhance the system stability margin, suggesting that technical requirements of narrow-frequency-band and AC-side focused specifications may not lead to the expected performance of GFM.
comment: 3 pages, 5 figures
PAC-Bayesian Optimal Control with Stability and Generalization Guarantees
Stochastic Nonlinear Optimal Control (SNOC) seeks to minimize a cost function that accounts for random disturbances acting on a nonlinear dynamical system. Since the expectation over all disturbances is generally intractable, a common surrogate is the empirical cost, obtained by averaging over a finite dataset of sampled noise realizations. This substitution, however, introduces the challenge of guaranteeing performance under unseen disturbances. The issue is particularly severe when the dataset is limited, as the trained controllers may overfit, leading to substantial gaps between their empirical cost and the deployment cost. In this work, we develop a PAC-Bayesian framework that establishes rigorous generalization bounds for SNOC. Building on these bounds, we propose a principled controller design method that balances empirical performance and prior knowledge. To ensure tractability, we derive computationally efficient relaxations of the bounds and employ approximate inference methods. Our framework further leverages expressive neural controller parameterizations, guaranteeing closed-loop stability. Through simulated examples, we highlight how prior knowledge can be incorporated into control design and how more reliable controllers can be synthesized for cooperative robotics.
Tempering the Bayes Filter towards Improved Model-Based Estimation
Model-based filtering is often carried out while subject to an imperfect model, as learning partially-observable stochastic systems remains a challenge. Recent work on Bayesian inference found that tempering the likelihood or full posterior of an imperfect model can improve predictive accuracy, as measured by expected negative log likelihood. In this paper, we develop the tempered Bayes filter, improving estimation performance through both of the aforementioned, and one newly introduced, modalities. The result admits a recursive implementation with a computational complexity no higher than that of the original Bayes filter. Our analysis reveals that -- besides the well-known fact in the field of Bayesian inference that likelihood tempering affects the balance between prior and likelihood -- full-posterior tempering tunes the level of entropy in the final belief distribution. We further find that a region of the tempering space can be understood as interpolating between the Bayes- and MAP filters, recovering these as special cases. Analytical results further establish conditions under which a tempered Bayes filter achieves improved predictive performance. Specializing the results to the linear Gaussian case, we obtain the tempered Kalman filter. In this context, we interpret how the parameters affect the Kalman state estimate and covariance propagation. Empirical results confirm that our method consistently improves predictive accuracy over the Bayes filter baseline.
System Identification for Dynamic Modeling of a Bumper Car
This paper presents the modeling of autonomous vehicles with high maneuverability used in an experimental framework for educational purposes. Since standard bicycle models typically neglect wide steering angles, we develop modified planar bicycle models and combine them with both parametric and non-parametric identification techniques that progressively incorporate physical knowledge. The resulting models are systematically compared to evaluate the tradeoff between model accuracy and computational requirements, showing that physics-informed neural network models surpass the purely physical baseline in accuracy at lower computational cost.
Gain-Scheduling Data-Enabled Predictive Control for Nonlinear Systems with Linearized Operating Regions
This paper presents a Gain-Scheduled Data-Enabled Predictive Control (GS-DeePC) framework for nonlinear systems based on multiple locally linear data representations. Instead of relying on a single global Hankel matrix, the operating range of a measurable scheduling variable is partitioned into regions, and regional Hankel matrices are constructed from persistently exciting data. To ensure smooth transitions between linearization regions and suppress region-induced chattering, composite regions are introduced, merging neighboring data sets and enabling a robust switching mechanism. The proposed method maintains the original DeePC problem structure and can achieve reduced computational complexity by requiring only short, locally informative data sequences. Extensive experiments on a nonlinear DC-motor with an unbalanced disc demonstrate the significantly improved control performance compared to standard DeePC.
comment: 8 pages, 3 figures, 2 tables
Double-Helix based Real-Time Single Particle Tracking
In Real-Time, Feedback-Driven Single Particle Tracking methods, measurements of the emission intensity from a labeled, nanometer-scale particle are used in a feedback loop to track the motion of the particle as it moves inside its native environment, including within living cells. In this work, we take advantage of Point Spread Function (PSF) engineering techniques that encode the axial position of the particle into the shape of the PSF in the focal plane to eliminate the need for out-of-focal-plane measurements, reducing the complexity of implementation and decreasing the overall measurement time of the control loop. Specifically, we used the Double Helix PSF (DH-PSF) in which a single fluorescent source gives rise to two lobes in the image plane with the lobes rotating in the plane as the particle moves along the optical axis. We designed simple estimators of the relative error between the particle and the tracker, and a simple proportional feedback controller to regulate that error to zero. We explored the efficacy of the approach through simulation studies, demonstrating the tracking of fast-moving particles (with diffusion coefficients up to 10 {μ\text{m}^2/\text{s}}) over long time periods (multiple seconds).
Off-grid solar energy storage system with lithium iron phosphate (LFP) batteries in high mountains: a case report of Tianchi Lodge in Taiwan
Mountain huts are buildings located at high altitude, providing shelter and a place for hikers. Energy supply on mountain huts remains an open issue. Using renewable energies could be an appropriate solution. Tianchi Lodge, a famous mountain hut in Taiwan, has operated an off-grid solar energy storage system with lithium iron phosphate (LFP) batteries since 2020. In this case report, the energy architecture, detailed descriptions, and historical status of the system are provided.
comment: 6 pages, 3 figures, 1 table
Modal Analysis of Core Inertial Dynamics: Re-evaluating Grid-Forming Control Design Principles
This paper employs modal analysis to study the core inertial dynamics of governor-controlled synchronous generators (GC-SG), droop-based grid-forming (GFM) converters, and their most fundamental interactions. The results indicate that even in the simplest cases, the prevailing industry paradigm of emulating legacy GC-SG behaviour in GFM converters (high inertia to slow down the system and large droop to increase damping) could be a suboptimal policy. It is shown that GC-SGs exhibit a fundamental trade-off: adequate damping of the turbine-governor mode requires large droop constants, inevitably increasing steady-state frequency deviation and dependence on secondary regulation. In contrast, droop-based GFM converters invert this relationship: decreasing the droop constant simultaneously reduces steady-state frequency deviations and increases damping, while allowing virtual inertia to be freely chosen. When two GC-SGs are coupled, the poorly damped electromechanical swing mode emerges. Results show that replacing one GC-SG with a GFM converter of equivalent droop and inertia already significantly improves damping of both swing and turbine-governor modes. Counter-intuitively, further and remarkable damping gains are achieved by substantially lowering the GFM virtual inertia constant. These findings suggest that current industry trends may be constraining the potential benefits of Inverter Based Resources (IBRs). Optimal stability and performance are instead obtained with low droop and low virtual inertia, yielding tightly bounded frequency variations and strongly-damped electromechanical modes. The results indicate a need to re-evaluate GFM control design principles and emerging grid-code requirements.
Reduced-order Smith predictor for state feedback control with guaranteed stability
This article deals with the implementation of the Smith Predictor for state feedback control in state space representation. The desired control law, obtained using partial differential equations and backstepping control, contains an integral term that has to be approximated for implementation. In this article, we propose a new way to implement this control law using a dynamic controller. The control law is composed of a state feedback term and a dynamic term that approaches the integral term that has to be estimated for implementation. Using a Lyapunov functional, we provide sufficient conditions, in terms of a linear matrix inequality, to guarantee that the closed-loop system is stable when the proposed control law is applied. We use three examples, taken from the literature, to show the benefits of the proposed approach.
Intervention Strategies for Fairness and Efficiency at Autonomous Single-Intersection Traffic Flows
Intersections present significant challenges in traffic management, where ensuring safety and efficiency is essential for effective flow. However, these goals are often achieved at the expense of fairness, which is critical for trustworthiness and long-term sustainability. This paper investigates how the timing of centralized intervention affects the management of autonomous agents at a signal-less, orthogonal intersection, while satisfying safety constraints, evaluating efficiency, and ensuring fairness. A mixed-integer linear programming (MILP) approach is used to optimize agent coordination within a circular control zone centered at the intersection. We introduce the concept of fairness, measured via pairwise reversal counts, and incorporate fairness constraints into the MILP framework. We then study the relationship between fairness and system efficiency and its impact on platoon formation. Finally, simulation studies analyze the effectiveness of early versus late intervention strategies and fairness-aware control, focusing on safe, efficient, and robust management of agents within the control zone.
Necessary and Sufficient Conditions for PID Design of MIMO Nonlinear Systems
As is well known, classical PID control is ubiquitous in industrial processes, yet a rigorous and explicit design theory for nonlinear uncertain MIMO second-order systems remains underdeveloped. In this paper we consider a class of such systems with both uncertain dynamics and an unknown but strictly positive input gain, where the nonlinear uncertainty is characterized by bounds on the Jacobian with respect to the state variables. We explicitly construct a three-dimensional region for the PID gains that is sufficient to guarantee global stability and asymptotic tracking of constant references for all nonlinearities satisfying these Jacobian bounds. We then derive a corresponding necessary region, thereby revealing the inherent conservatism required to cope with worst-case uncertainties. Moreover, under additional structural assumptions on the nonlinearities, these sufficient and necessary regions coincide, yielding a precise necessary-and-sufficient characterization of all globally stabilizing PID gains. All these regions are given in closed form and depend only on the prescribed Jacobian bounds and the known lower bound of the input gain, in contrast to many qualitative tuning methods in the literature.
Fleet Size and Mix Capacitated Vehicle Routing Problem with Time Windows for Mobile Fast Chargers SP
The electrification of off-road heavy equipment presents operational challenges for agencies serving remote sites with limited fixed charging infrastructure. Existing mobile fast charging vehicle (MFCV) planning approaches typically treat fleet design and routing as separate problems, fixing vehicle characteristics before dispatch. This paper formulates a fleet size and mix capacitated vehicle routing problem with time windows (FSMCVRPTW) for MFCV deployment, jointly optimizing fleet composition, charger specifications, routing, and scheduling within a unified mixed-integer linear program. The model incorporates heterogeneous MFCV types with varying power ratings, battery capacities, fuel range, and cost structures, minimizing total daily cost from labor, fuel, amortized capital expenditure, and energy purchase under temporal service windows, resource budgets, and energy-delivery constraints. The formulation is implemented in Python/Gurobi and applied to two case studies using California Department of Transportation wheel-loader data in Los Angeles (dense urban) and Truckee (sparse mountainous). Results show that simultaneous optimization yields compact, well-utilized fleets that meet all service windows while revealing strong sensitivity of unit cost to demand density and geography. The proposed FSMCVRPTW framework provides a generalizable decision-support methodology that co-designs fleet size, charger power, routing, and service schedules in a single optimization layer for context-aware, cost-efficient mobile fast charging.
comment: 10 pages, submitted to IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION
On the Convergence of Density-Based Predictive Control for Multi-Agent Non-Uniform Area Coverage
This paper presents Density-based Predictive Control (DPC), a novel multi-agent control strategy for efficient non-uniform area coverage, grounded in optimal transport theory. In large-scale scenarios such as search and rescue or environmental monitoring, traditional uniform coverage fails to account for varying regional priorities. DPC leverages a pre-constructed reference distribution to allocate agents' coverage efforts, spending more time in high-priority or densely sampled regions. We analyze convergence conditions using the Wasserstein distance, derive an analytic optimal control law for unconstrained cases, and propose a numerical method for constrained scenarios. Simulations on first-order dynamics and linearized quadrotor models demonstrate that DPC achieves trajectories closely matching the non-uniform reference distribution, outperforming existing coverage methods.
comment: Accepted for publication in ASME JDSMC
On Frequency-Weighted Extended Balanced Truncation
This paper addresses the problem of frequency-weighted extended balanced truncation for discrete and continuous-time linear time-invariant plants. We show that the frequency-weighted discrete-time plant admits block-diagonal solutions to both the Lyapunov inequality and its extended form. A recursive algorithm for extended balanced truncation is proposed, together with corresponding a-priori error bounds. Theoretical results are extended to continuous-time systems and validated through numerical examples.
comment: 8 pages, conference submission
The xApp Store: A Framework for xApp Onboarding and Deployment in O-RAN
5G and beyond mobile telecommunication networks are increasingly embracing software technologies in their operation and control, similar to what has powered the growth of the cloud. This is most recently seen in the radio access network (RAN). In this new approach, the RAN is increasingly controlled by software applications known as xApps, and opens the door to third party development of xApps bringing diversity to the ecosystem, similar to mobile phone apps. This model aligns closely with the controllers in the ITU-T architecture for autonomous networks, and provides a pathway towards autonomous operation in the RAN. Unfortunately, no marketplace to host or supply xApps currently exists. This work describes our experiences in leveraging open-source O-RAN implementations to design and develop an xApp store.
comment: Accepted to ANMS'25
Time-Invariant Polytopic and Interval Observers for Uncertain Linear Systems via Non-Square Transformation
This paper presents novel polytopic and interval observer designs for uncertain linear continuous-time (CT) and discrete-time (DT) systems subjected to bounded disturbances and noise. Our approach guarantees enclosure of the true state and input-to-state stability (ISS) of the polytopic and interval set estimates. Notably, our approach applies to all detectable systems that are stabilized by any optimal observer design, utilizing a potentially non-square (lifted) time-invariant coordinate transformation based on polyhedral Lyapunov functions and mixed-monotone embedding systems that do not impose any positivity constraints, enabling feasible and optimal observer designs, even in cases where previous methods fail. The effectiveness of our approach is demonstrated through several examples of uncertain linear CT and DT systems.
Quick Updates for the Perturbed Static Output Feedback Control Problem in Linear Systems with Applications to Power Systems
This paper introduces a method for efficiently updating a nominal stabilizing static output feedback (SOF) controller in perturbed linear systems. As operating points and state-space matrices change in dynamic systems, accommodating updates to the SOF controller are necessary. Traditional methods address such changes by re-solving for the updated SOF gain, which is often (i) computationally expensive due to the NP-hard nature of the problem or (ii) infeasible due to the limitations of its semidefinite programming relaxations. To overcome this, we leverage the concept of minimum destabilizing real perturbation (MDRP) to formulate a norm minimization problem that yields fast, reliable controller updates. This approach accommodates a variety of known perturbations, including abrupt changes, model inaccuracies, and equilibrium-dependent linearizations. We remark that the application of our proposed approach is limited to the class of SOF controllers in perturbed linear systems. We also introduce geometric metrics to quantify the proximity to instability and rigorously define stability-guaranteed regions. Extensive numerical simulations validate the efficiency and robustness of the proposed method. Moreover, such extensive numerical simulations corroborate that although we utilize a heuristic optimization method to compute the MDRP, it performs quite well in practice compared to an existing approximation method in the literature, namely the hybrid expansion-contraction (HEC) method. We demonstrate the results on the SOF control of multi-machine power networks with changing operating points, and demonstrate that the computed quick updates produce comparable solutions to the traditional SOF ones, while requiring orders of magnitude less computational time.
An innovative circuit for testing hot carrier and trap generation in GaN Devices
Microelectronic devices in modern systems are working continuously for prolonged periods of many years. Thus, there is a crucial need for reliability model that will enable us to predict precisely the life cycle of the device and point out on the governing failure mechanisms that are responsible for degradation and failure. This is even more important when high power and frequency devices, especially Normally Off Switch Power transistors are concerned, since the reliability research on those devices lay far behind that of low power digital devices. Our main goal in our lab is to investigate the failure mechanisms of GaN transistors, aimed at determining the reliability factors of the innovative MTOL model. The main goal is to understand the reason for the transformation of the failure mechanism. Employing the recorded data may enable us to predict the performance and life time of the device at different operation parameters such as current, voltage, frequency and temperatures. In this study we employ a new different use for the well-known boost convertor circuit, based on GaN Devices in order to stress the transistor to the maximum values of voltage and current which allows us to examine the reliability of the transistor and accelerating Hot Carrier And Trap Generation failures mechanisms. The acceleration of the failure mechanism should be done in a way that will not affect the electronic device detrimentally and on the other hand we would not need to wait a long time in order to observe the degradation. In this work we will present our new boost converter circuit based on high power GaN HEMT.
A Switched Systems Approach to Image-Based Feature Tracking for Autonomous Satellite Inspection
This paper presents an information-based guidance and control architecture for an autonomous deputy spacecraft tasked with inspecting a chief satellite in orbit. The primary objective is for the deputy spacecraft to maximize information gain while tracking features on the chief satellite. The deputy spacecraft needs to respect various constraints such as illumination, field-of-view (FOV), fuel limitations, and avoidance regions. Additionally, the absence of GPS information poses a significant challenge for relative localization within the space environment. To learn the structure of the chief satellite while achieving relative self-localization and maximizing the information gain, this paper integrates a memory regression extension (MRE)-based distance observer with an information-maximizing adaptive controller. The distance observer utilizes feedback from a camera. A switched systems approach is used to determine the minimum dwell time required for a feature to remain within the FOV of the camera to ensure accurate estimation. A k-means clustering algorithm acts as a high-level planner to intermittently generate goal locations that guide the deputy spacecraft toward the nearest cluster of uninspected points on the chief satellite, subject to illumination and FOV constraints. A Lyapunov-based stability analysis is conducted to analyze the developed architecture, and simulation results validate the theoretical results of the paper.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $ε$ sub-optimality and feasible policy after $O(ε^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($γ$) and by at least 6x for larger value of $γ$.
Observer Design for Networked Linear Systems with Fast and Slow Dynamics under Measurement Noise
This paper addresses the emulation-based observer design for networked control systems (NCS) with linear plants that operate at two time scales in the presence of measurement noise. The system is formulated as a hybrid singularly perturbed dynamical system, enabling the systematic use of singular perturbation techniques to derive explicit bounds on the maximum allowable transmission intervals (MATI) for both fast and slow communication channels. Under the resulting conditions, the proposed observer guarantees that the estimation error satisfies a global exponential derivative-input-to-state stability (DISS)-like property, where the ultimate bound scales proportionally with the magnitudes of the measurement noise and the time derivative of the control input. The effectiveness of the approach is illustrated through a numerical example.
comment: 9 pages, 2 figures, full version of the paper submitted to the IFAC World Congress
Data-Efficient Motor Condition Monitoring with Time Series Foundation Models
Motor condition monitoring is essential for ensuring system reliability and preventing catastrophic failures. However, data-driven diagnostic methods often suffer from sparse fault labels and severe class imbalance, which limit their effectiveness in real-world applications. This paper proposes a motor condition monitoring framework that leverages the general features learned during pre-training of two time series foundation models, MOMENT and Mantis, to address these challenges. By transferring broad temporal representations from large-scale pre-training, the proposed approach significantly reduces dependence on labeled data while maintaining high diagnostic accuracy. Experimental results show that MOMENT achieves nearly twice the performance of conventional deep learning models using only 1% of the training data, whereas Mantis surpasses state-of-the-art baselines by 22%, reaching 90% accuracy with the same data ratio. These results demonstrate the strong generalization and data efficiency of time series foundation models in fault diagnosis, providing new insights into scalable and adaptive frameworks for intelligent motor condition monitoring.
Convex Computations for Controlled Safety Invariant Sets of Black-box Discrete-time Dynamical Systems
Identifying controlled safety invariant sets (CSISs) is essential for safety-critical systems. This paper addresses the problem of computing CSISs for black-box discrete-time systems, where the dynamics are unknown and only limited simulation data are available. Traditionally, a CSIS requires that for every state in the set, there exists a control input that keeps the system within the set at the next step. However, enforcing such universal invariance, i.e., requiring the set to remain controlled invariant for all states, is often overly restrictive or impractical for black-box systems. To address this, we introduce the notion of a Probably Approximately Correct (PAC) CSIS, in which, with prescribed confidence, there exists a suitable control input to keep the system within the set at the next step for at least a specified fraction of the states. Our approach leverages barrier functions and scenario optimization, yielding a tractable linear programming method for estimating PAC CSISs. Several illustrative examples demonstrate the effectiveness of the proposed framework.
comment: 11 pages
Quantized Distributed Estimation with Event-triggered Communication and Packet Loss
This paper focuses on the problem of quantized distributed estimation with event-triggered communication and packet loss, aiming to reduce the number of transmitted bits. The main challenge lies in the inability to differentiate between an untriggered event and a packet loss occurrence. This paper proposes an event-triggered distributed estimation algorithm with quantized communication and quantized measurement, in which it introduces a one-bit information reconstruction method to deal with packet loss. The almost sure convergence and convergence rate of the proposed algorithm are established. Besides, it is demonstrated that the global average communication bit-rate decreases to zero over time. Moreover, the trade-off between communication rate and convergence rate is revealed, providing guidance for designing the communication rate required to achieve the algorithm's convergence rate. A numerical example is supplied to validate the findings.
comment: 6 page,4 figures, accepted by 64rd IEEE Conference on Decision and Control
Ergodic-risk Criterion for Stochastically Stabilizing Policy Optimization
This paper introduces ergodic-risk criteria, which capture long-term cumulative risks associated with controlled Markov chains through probabilistic limit theorems--in contrast to existing methods that require assumptions of either finite hitting time, finite state/action space, or exponentiation necessitating light-tailed distributions. Using tailored Functional Central Limit Theorems (FCLT), we demonstrate that the time-correlated terms in the ergodic-risk criteria converge under uniform ergodicity and establish conditions for the convergence of these criteria in non-stationary general-state Markov chains involving heavy-tailed distributions. For quadratic risk functionals on stochastic linear systems, in addition to internal stability, this requires the (possibly heavy-tailed) process noise to have only a finite fourth moment. After quantifying cumulative uncertainties in risk functionals that account for extreme deviations, these ergodic-risk criteria are then incorporated into policy optimizations, thereby extending the standard average optimal synthesis to a risk-sensitive framework. Finally, by establishing the strong duality of the constrained policy optimization, we propose a primal-dual algorithm that optimizes average performance while ensuring that certain risks associated with these ergodic-risk criteria are constrained. Our risk-sensitive framework offers a theoretically guaranteed policy iteration for the long-term risk-sensitive control of processes involving heavy-tailed noise, which is shown to be effective through several simulations.
Sequential Convex Programming for Multimode Spacecraft Trajectory Optimization
Spacecraft equipped with multiple propulsion modes or systems can offer enhanced performance and mission flexibility compared with traditional configurations. Despite these benefits, the trajectory optimization of spacecraft utilizing such configurations remains a complex challenge. This paper presents a sequential convex programming (SCP) approach for the optimal design of multi-mode and multi-propulsion spacecraft trajectories. The method extends the dynamical linearization within SCP using sparse automatic differentiation, enabling efficient inclusion of multiple propulsion modes or systems without complex manual reformulation while maintaining comparable computational efficiency. New constraint formulations are introduced to ensure selection of a single propulsion mode at each time step and limit the total number of modes used. The approach is demonstrated for (i) a low-thrust Earth-67P rendezvous using the SPT-140 thruster with 20 discrete modes, and (ii) an Earth-Mars transfer employing both a low-thrust engine and a solar sail. Results confirm that the proposed method can efficiently compute optimal trajectories for these scenarios.
comment: 12 pages, 5 figures, presented at the ORSNZ Annual Conference 2025
Robust Sensor Placement for Poisson Arrivals with False Alarm Aware Spatiotemporal Sensing
This paper studies sensor placement when detection performance varies stochastically due to environmental factors over space and time and false alarms are present, but a filter is used to attenuate the effect. We introduce a unified model that couples detection and false alarms through an availability function, which captures how false alarms reduce effective sensing and filtering responses to the disturbance. Building on this model, we give a sufficient condition under which filtering improves detection. In addition, we derive a coverage-based lower bound on the void probability. Furthermore, we prove robustness guarantees showing that performance remains stable when detection probabilities are learned from limited data. We validate the approach with numerical studies using AIS vessel-traffic data and synthetic maritime scenarios. Together, these results provide theory and practical guidance for deploying sensors in dynamic, uncertain environments.
comment: Submitted to IEEE ACC
Concentration of Cumulative Reward in Markov Decision Processes
In this paper, we investigate the concentration properties of cumulative reward in Markov Decision Processes (MDPs), focusing on both asymptotic and non-asymptotic settings. We introduce a unified approach to characterize reward concentration in MDPs, covering both infinite-horizon settings (i.e., average and discounted reward frameworks) and finite-horizon setting. Our asymptotic results include the law of large numbers, the central limit theorem, and the law of iterated logarithms, while our non-asymptotic bounds include Azuma-Hoeffding-type inequalities and a non-asymptotic version of the law of iterated logarithms. Additionally, we explore two key implications of our results. First, we analyze the sample path behavior of the difference in rewards between any two stationary policies. Second, we show that two alternative definitions of regret for learning policies proposed in the literature are rate-equivalent. Our proof techniques rely on a martingale decomposition of cumulative reward, properties of the solution to the policy evaluation fixed-point equation, and both asymptotic and non-asymptotic concentration results for martingale difference sequences.
comment: 71 pages
Robotics
EfficientFlow: Efficient Equivariant Flow Policy Learning for Embodied AI AAAI 2026
Generative modeling has recently shown remarkable promise for visuomotor policy learning, enabling flexible and expressive control across diverse embodied AI tasks. However, existing generative policies often struggle with data inefficiency, requiring large-scale demonstrations, and sampling inefficiency, incurring slow action generation during inference. We introduce EfficientFlow, a unified framework for efficient embodied AI with flow-based policy learning. To enhance data efficiency, we bring equivariance into flow matching. We theoretically prove that when using an isotropic Gaussian prior and an equivariant velocity prediction network, the resulting action distribution remains equivariant, leading to improved generalization and substantially reduced data demands. To accelerate sampling, we propose a novel acceleration regularization strategy. As direct computation of acceleration is intractable for marginal flow trajectories, we derive a novel surrogate loss that enables stable and scalable training using only conditional trajectories. Across a wide range of robotic manipulation benchmarks, the proposed algorithm achieves competitive or superior performance under limited data while offering dramatically faster inference. These results highlight EfficientFlow as a powerful and efficient paradigm for high-performance embodied AI.
comment: Accepted by AAAI 2026. Project Page: https://efficientflow.github.io/
Data-Centric Visual Development for Self-Driving Labs
Self-driving laboratories offer a promising path toward reducing the labor-intensive, time-consuming, and often irreproducible workflows in the biological sciences. Yet their stringent precision requirements demand highly robust models whose training relies on large amounts of annotated data. However, this kind of data is difficult to obtain in routine practice, especially negative samples. In this work, we focus on pipetting, the most critical and precision sensitive action in SDLs. To overcome the scarcity of training data, we build a hybrid pipeline that fuses real and virtual data generation. The real track adopts a human-in-the-loop scheme that couples automated acquisition with selective human verification to maximize accuracy with minimal effort. The virtual track augments the real data using reference-conditioned, prompt-guided image generation, which is further screened and validated for reliability. Together, these two tracks yield a class-balanced dataset that enables robust bubble detection training. On a held-out real test set, a model trained entirely on automatically acquired real images reaches 99.6% accuracy, and mixing real and generated data during training sustains 99.4% accuracy while reducing collection and review load. Our approach offers a scalable and cost-effective strategy for supplying visual feedback data to SDL workflows and provides a practical solution to data scarcity in rare event detection and broader vision tasks.
comment: 11 pages, 4 figures
Visual Sync: Multi-Camera Synchronization via Cross-View Object Motion NeurIPS 2025
Today, people can easily record memorable moments, ranging from concerts, sports events, lectures, family gatherings, and birthday parties with multiple consumer cameras. However, synchronizing these cross-camera streams remains challenging. Existing methods assume controlled settings, specific targets, manual correction, or costly hardware. We present VisualSync, an optimization framework based on multi-view dynamics that aligns unposed, unsynchronized videos at millisecond accuracy. Our key insight is that any moving 3D point, when co-visible in two cameras, obeys epipolar constraints once properly synchronized. To exploit this, VisualSync leverages off-the-shelf 3D reconstruction, feature matching, and dense tracking to extract tracklets, relative poses, and cross-view correspondences. It then jointly minimizes the epipolar error to estimate each camera's time offset. Experiments on four diverse, challenging datasets show that VisualSync outperforms baseline methods, achieving an median synchronization error below 50 ms.
comment: Accepted to NeurIPS 2025. Project page: https://stevenlsw.github.io/visualsync/
ManualVLA: A Unified VLA Model for Chain-of-Thought Manual Generation and Robotic Manipulation
Vision-Language-Action (VLA) models have recently emerged, demonstrating strong generalization in robotic scene understanding and manipulation. However, when confronted with long-horizon tasks that require defined goal states, such as LEGO assembly or object rearrangement, existing VLA models still face challenges in coordinating high-level planning with precise manipulation. Therefore, we aim to endow a VLA model with the capability to infer the "how" process from the "what" outcomes, transforming goal states into executable procedures. In this paper, we introduce ManualVLA, a unified VLA framework built upon a Mixture-of-Transformers (MoT) architecture, enabling coherent collaboration between multimodal manual generation and action execution. Unlike prior VLA models that directly map sensory inputs to actions, we first equip ManualVLA with a planning expert that generates intermediate manuals consisting of images, position prompts, and textual instructions. Building upon these multimodal manuals, we design a Manual Chain-of-Thought (ManualCoT) reasoning process that feeds them into the action expert, where each manual step provides explicit control conditions, while its latent representation offers implicit guidance for accurate manipulation. To alleviate the burden of data collection, we develop a high-fidelity digital-twin toolkit based on 3D Gaussian Splatting, which automatically generates manual data for planning expert training. ManualVLA demonstrates strong real-world performance, achieving an average success rate 32% higher than the previous hierarchical SOTA baseline on LEGO assembly and object rearrangement tasks.
Learning Dexterous Manipulation Skills from Imperfect Simulations
Reinforcement learning and sim-to-real transfer have made significant progress in dexterous manipulation. However, progress remains limited by the difficulty of simulating complex contact dynamics and multisensory signals, especially tactile feedback. In this work, we propose \ours, a sim-to-real framework that addresses these limitations and demonstrates its effectiveness on nut-bolt fastening and screwdriving with multi-fingered hands. The framework has three stages. First, we train reinforcement learning policies in simulation using simplified object models that lead to the emergence of correct finger gaits. We then use the learned policy as a skill primitive within a teleoperation system to collect real-world demonstrations that contain tactile and proprioceptive information. Finally, we train a behavior cloning policy that incorporates tactile sensing and show that it generalizes to nuts and screwdrivers with diverse geometries. Experiments across both tasks show high task progress ratios compared to direct sim-to-real transfer and robust performance even on unseen object shapes and under external perturbations. Videos and code are available on https://dexscrew.github.io.
LLM-Driven Corrective Robot Operation Code Generation with Static Text-Based Simulation
Recent advances in Large language models (LLMs) have demonstrated their promising capabilities of generating robot operation code to enable LLM-driven robots. To enhance the reliability of operation code generated by LLMs, corrective designs with feedback from the observation of executing code have been increasingly adopted in existing research. However, the code execution in these designs relies on either a physical experiment or a customized simulation environment, which limits their deployment due to the high configuration effort of the environment and the potential long execution time. In this paper, we explore the possibility of directly leveraging LLM to enable static simulation of robot operation code, and then leverage it to design a new reliable LLM-driven corrective robot operation code generation framework. Our framework configures the LLM as a static simulator with enhanced capabilities that reliably simulate robot code execution by interpreting actions, reasoning over state transitions, analyzing execution outcomes, and generating se- mantic observations that accurately capture trajectory dynamics. To validate the performance of our framework, we performed experiments on various operation tasks for different robots, including UAVs and small ground vehicles. The experiment results not only demonstrated the high accuracy of our static text-based simulation but also the reliable code generation of our LLM-driven corrective framework, which achieves a comparable performance with state-of-the-art research while does not rely on dynamic code execution using physical experiments or simulators.
comment: 8 pages, 2 figures
Learning Sim-to-Real Humanoid Locomotion in 15 Minutes
Massively parallel simulation has reduced reinforcement learning (RL) training time for robots from days to minutes. However, achieving fast and reliable sim-to-real RL for humanoid control remains difficult due to the challenges introduced by factors such as high dimensionality and domain randomization. In this work, we introduce a simple and practical recipe based on off-policy RL algorithms, i.e., FastSAC and FastTD3, that enables rapid training of humanoid locomotion policies in just 15 minutes with a single RTX 4090 GPU. Our simple recipe stabilizes off-policy RL algorithms at massive scale with thousands of parallel environments through carefully tuned design choices and minimalist reward functions. We demonstrate rapid end-to-end learning of humanoid locomotion controllers on Unitree G1 and Booster T1 robots under strong domain randomization, e.g., randomized dynamics, rough terrain, and push perturbations, as well as fast training of whole-body human-motion tracking policies. We provide videos and open-source implementation at: https://younggyo.me/fastsac-humanoid.
comment: Project website: https://younggyo.me/fastsac-humanoid
RoaD: Rollouts as Demonstrations for Closed-Loop Supervised Fine-Tuning of Autonomous Driving Policies
Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift when deployed in closed loop, leading to compounding errors. We introduce Rollouts as Demonstrations (RoaD), a simple and efficient method to mitigate covariate shift by leveraging the policy's own closed-loop rollouts as additional training data. During rollout generation, RoaD incorporates expert guidance to bias trajectories toward high-quality behavior, producing informative yet realistic demonstrations for fine-tuning. This approach enables robust closed-loop adaptation with orders of magnitude less data than reinforcement learning, and avoids restrictive assumptions of prior closed-loop supervised fine-tuning (CL-SFT) methods, allowing broader applications domains including end-to-end driving. We demonstrate the effectiveness of RoaD on WOSAC, a large-scale traffic simulation benchmark, where it performs similar or better than the prior CL-SFT method; and in AlpaSim, a high-fidelity neural reconstruction-based simulator for end-to-end driving, where it improves driving score by 41\% and reduces collisions by 54\%.
comment: Preprint
Forecasting in Offline Reinforcement Learning for Non-stationary Environments NeurIPS 2025
Offline Reinforcement Learning (RL) provides a promising avenue for training policies from pre-collected datasets when gathering additional interaction data is infeasible. However, existing offline RL methods often assume stationarity or only consider synthetic perturbations at test time, assumptions that often fail in real-world scenarios characterized by abrupt, time-varying offsets. These offsets can lead to partial observability, causing agents to misperceive their true state and degrade performance. To overcome this challenge, we introduce Forecasting in Non-stationary Offline RL (FORL), a framework that unifies (i) conditional diffusion-based candidate state generation, trained without presupposing any specific pattern of future non-stationarity, and (ii) zero-shot time-series foundation models. FORL targets environments prone to unexpected, potentially non-Markovian offsets, requiring robust agent performance from the onset of each episode. Empirical evaluations on offline RL benchmarks, augmented with real-world time-series data to simulate realistic non-stationarity, demonstrate that FORL consistently improves performance compared to competitive baselines. By integrating zero-shot forecasting with the agent's experience, we aim to bridge the gap between offline RL and the complexities of real-world, non-stationary environments.
comment: The Thirty-Ninth Annual Conference on Neural Information Processing Systems, NeurIPS 2025
GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Yet, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and long-horizon stability. We introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. We instantiate this framework with GrndCtrl, a reward-aligned adaptation method based on Group Relative Policy Optimization (GRPO), yielding world models that maintain stable trajectories, consistent geometry, and reliable rollouts for embodied navigation. Like post-training alignment in large language models, GrndCtrl leverages verifiable rewards to bridge generative pretraining and grounded behavior, achieving superior spatial coherence and navigation stability over supervised fine-tuning in outdoor environments.
Guardian: Detecting Robotic Planning and Execution Errors with Vision-Language Models
Robust robotic manipulation requires reliable failure detection and recovery. Although current Vision-Language Models (VLMs) show promise, their accuracy and generalization are limited by the scarcity of failure data. To address this data gap, we propose an automatic robot failure synthesis approach that procedurally perturbs successful trajectories to generate diverse planning and execution failures. This method produces not only binary classification labels but also fine-grained failure categories and step-by-step reasoning traces in both simulation and the real world. With it, we construct three new failure detection benchmarks: RLBench-Fail, BridgeDataV2-Fail, and UR5-Fail, substantially expanding the diversity and scale of existing failure datasets. We then train Guardian, a VLM with multi-view images for detailed failure reasoning and detection. Guardian achieves state-of-the-art performance on both existing and newly introduced benchmarks. It also effectively improves task success rates when integrated into a state-of-the-art manipulation system in simulation and real robots, demonstrating the impact of our generated failure data.
comment: 9 pages, 9 figures, 6 tables
Real-World Robot Control by Deep Active Inference With a Temporally Hierarchical World Model
Robots in uncertain real-world environments must perform both goal-directed and exploratory actions. However, most deep learning-based control methods neglect exploration and struggle under uncertainty. To address this, we adopt deep active inference, a framework that accounts for human goal-directed and exploratory actions. Yet, conventional deep active inference approaches face challenges due to limited environmental representation capacity and high computational cost in action selection. We propose a novel deep active inference framework that consists of a world model, an action model, and an abstract world model. The world model encodes environmental dynamics into hidden state representations at slow and fast timescales. The action model compresses action sequences into abstract actions using vector quantization, and the abstract world model predicts future slow states conditioned on the abstract action, enabling low-cost action selection. We evaluate the framework on object-manipulation tasks with a real-world robot. Results show that it achieves high success rates across diverse manipulation tasks and switches between goal-directed and exploratory actions in uncertain settings, while making action selection computationally tractable. These findings highlight the importance of modeling multiple timescale dynamics and abstracting actions and state transitions.
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
NeuroHJR: Hamilton-Jacobi Reachability-based Obstacle Avoidance in Complex Environments with Physics-Informed Neural Networks
Autonomous ground vehicles (AGVs) must navigate safely in cluttered environments while accounting for complex dynamics and environmental uncertainty. Hamilton-Jacobi Reachability (HJR) offers formal safety guarantees through the computation of forward and backward reachable sets, but its application is hindered by poor scalability in environments with numerous obstacles. In this paper, we present a novel framework called NeuroHJR that leverages Physics-Informed Neural Networks (PINNs) to approximate the HJR solution for real-time obstacle avoidance. By embedding system dynamics and safety constraints directly into the neural network loss function, our method bypasses the need for grid-based discretization and enables efficient estimation of reachable sets in continuous state spaces. We demonstrate the effectiveness of our approach through simulation results in densely cluttered scenarios, showing that it achieves safety performance comparable to that of classical HJR solvers while significantly reducing the computational cost. This work provides a new step toward real-time, scalable deployment of reachability-based obstacle avoidance in robotics.
comment: Author-accepted version. Accepted at IEEE 11th Indian Control Conference (ICC), 2025
Is Image-based Object Pose Estimation Ready to Support Grasping?
We present a framework for evaluating 6-DoF instance-level object pose estimators, focusing on those that require a single RGB (not RGB-D) image as input. Besides gaining intuition about how accurate these estimators are, we are interested in the degree to which they can serve as the sole perception mechanism for robotic grasping. To assess this, we perform grasping trials in a physics-based simulator, using image-based pose estimates to guide a parallel gripper and an underactuated robotic hand in picking up 3D models of objects. Our experiments on a subset of the BOP (Benchmark for 6D Object Pose Estimation) dataset compare five open-source object pose estimators and provide insights that were missing from the literature.
Register Any Point: Scaling 3D Point Cloud Registration by Flow Matching
Point cloud registration aligns multiple unposed point clouds into a common frame, and is a core step for 3D reconstruction and robot localization. In this work, we cast registration as conditional generation: a learned continuous, point-wise velocity field transports noisy points to a registered scene, from which the pose of each view is recovered. Unlike previous methods that conduct correspondence matching to estimate the transformation between a pair of point clouds and then optimize the pairwise transformations to realize multi-view registration, our model directly generates the registered point cloud. With a lightweight local feature extractor and test-time rigidity enforcement, our approach achieves state-of-the-art results on pairwise and multi-view registration benchmarks, particularly with low overlap, and generalizes across scales and sensor modalities. It further supports downstream tasks including relocalization, multi-robot SLAM, and multi-session map merging. Source code available at: https://github.com/PRBonn/RAP.
comment: 22 pages
Much Ado About Noising: Dispelling the Myths of Generative Robotic Control
Generative models, like flows and diffusions, have recently emerged as popular and efficacious policy parameterizations in robotics. There has been much speculation as to the factors underlying their successes, ranging from capturing multi-modal action distribution to expressing more complex behaviors. In this work, we perform a comprehensive evaluation of popular generative control policies (GCPs) on common behavior cloning (BC) benchmarks. We find that GCPs do not owe their success to their ability to capture multi-modality or to express more complex observation-to-action mappings. Instead, we find that their advantage stems from iterative computation, as long as intermediate steps are supervised during training and this supervision is paired with a suitable level of stochasticity. As a validation of our findings, we show that a minimum iterative policy (MIP), a lightweight two-step regression-based policy, essentially matches the performance of flow GCPs, and often outperforms distilled shortcut models. Our results suggest that the distribution-fitting component of GCPs is less salient than commonly believed, and point toward new design spaces focusing solely on control performance. Project page: https://simchowitzlabpublic.github.io/much-ado-about-noising-project/
GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.
IGen: Scalable Data Generation for Robot Learning from Open-World Images
The rise of generalist robotic policies has created an exponential demand for large-scale training data. However, on-robot data collection is labor-intensive and often limited to specific environments. In contrast, open-world images capture a vast diversity of real-world scenes that naturally align with robotic manipulation tasks, offering a promising avenue for low-cost, large-scale robot data acquisition. Despite this potential, the lack of associated robot actions hinders the practical use of open-world images for robot learning, leaving this rich visual resource largely unexploited. To bridge this gap, we propose IGen, a framework that scalably generates realistic visual observations and executable actions from open-world images. IGen first converts unstructured 2D pixels into structured 3D scene representations suitable for scene understanding and manipulation. It then leverages the reasoning capabilities of vision-language models to transform scene-specific task instructions into high-level plans and generate low-level actions as SE(3) end-effector pose sequences. From these poses, it synthesizes dynamic scene evolution and renders temporally coherent visual observations. Experiments validate the high quality of visuomotor data generated by IGen, and show that policies trained solely on IGen-synthesized data achieve performance comparable to those trained on real-world data. This highlights the potential of IGen to support scalable data generation from open-world images for generalist robotic policy training.
comment: 8 pages, 8 figures
AgriLiRa4D: A Multi-Sensor UAV Dataset for Robust SLAM in Challenging Agricultural Fields
Multi-sensor Simultaneous Localization and Mapping (SLAM) is essential for Unmanned Aerial Vehicles (UAVs) performing agricultural tasks such as spraying, surveying, and inspection. However, real-world, multi-modal agricultural UAV datasets that enable research on robust operation remain scarce. To address this gap, we present AgriLiRa4D, a multi-modal UAV dataset designed for challenging outdoor agricultural environments. AgriLiRa4D spans three representative farmland types-flat, hilly, and terraced-and includes both boundary and coverage operation modes, resulting in six flight sequence groups. The dataset provides high-accuracy ground-truth trajectories from a Fiber Optic Inertial Navigation System with Real-Time Kinematic capability (FINS_RTK), along with synchronized measurements from a 3D LiDAR, a 4D Radar, and an Inertial Measurement Unit (IMU), accompanied by complete intrinsic and extrinsic calibrations. Leveraging its comprehensive sensor suite and diverse real-world scenarios, AgriLiRa4D supports diverse SLAM and localization studies and enables rigorous robustness evaluation against low-texture crops, repetitive patterns, dynamic vegetation, and other challenges of real agricultural environments. To further demonstrate its utility, we benchmark four state-of-the-art multi-sensor SLAM algorithms across different sensor combinations, highlighting the difficulty of the proposed sequences and the necessity of multi-modal approaches for reliable UAV localization. By filling a critical gap in agricultural SLAM datasets, AgriLiRa4D provides a valuable benchmark for the research community and contributes to advancing autonomous navigation technologies for agricultural UAVs. The dataset can be downloaded from: https://zhan994.github.io/AgriLiRa4D.
DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models
Vision-Language-Action (VLA) models trained with flow matching have demonstrated impressive capabilities on robotic manipulation tasks. However, their performance often degrades under distribution shift and on complex multi-step tasks, suggesting that the learned representations may not robustly capture task-relevant semantics. We introduce DiG-Flow, a principled framework that enhances VLA robustness through geometric regularization. Our key insight is that the distributional discrepancy between observation and action embeddings provides a meaningful geometric signal: lower transport cost indicates compatible representations, while higher cost suggests potential misalignment. DiG-Flow computes a discrepancy measure between empirical distributions of observation and action embeddings, maps it to a modulation weight via a monotone function, and applies residual updates to the observation embeddings before flow matching. Crucially, this intervention operates at the representation level without modifying the flow matching path or target vector field. We provide theoretical guarantees showing that discrepancy-guided training provably decreases the training objective, and that guided inference refinement converges with contraction. Empirically, DiG-Flow integrates into existing VLA architectures with negligible overhead and consistently improves performance, with particularly pronounced gains on complex multi-step tasks and under limited training data.
Dynamic Log-Gaussian Process Control Barrier Function for Safe Robotic Navigation in Dynamic Environments
Control Barrier Functions (CBFs) have emerged as efficient tools to address the safe navigation problem for robot applications. However, synthesizing informative and obstacle motion-aware CBFs online using real-time sensor data remains challenging, particularly in unknown and dynamic scenarios. Motived by this challenge, this paper aims to propose a novel Gaussian Process-based formulation of CBF, termed the Dynamic Log Gaussian Process Control Barrier Function (DLGP-CBF), to enable real-time construction of CBF which are both spatially informative and responsive to obstacle motion. Firstly, the DLGP-CBF leverages a logarithmic transformation of GP regression to generate smooth and informative barrier values and gradients, even in sparse-data regions. Secondly, by explicitly modeling the DLGP-CBF as a function of obstacle positions, the derived safety constraint integrates predicted obstacle velocities, allowing the controller to proactively respond to dynamic obstacles' motion. Simulation results demonstrate significant improvements in obstacle avoidance performance, including increased safety margins, smoother trajectories, and enhanced responsiveness compared to baseline methods.
comment: To be presented in the 64th IEEE Conference on Decision and Control (CDC 2025)
SPARK: Sim-ready Part-level Articulated Reconstruction with VLM Knowledge
Articulated 3D objects are critical for embodied AI, robotics, and interactive scene understanding, yet creating simulation-ready assets remains labor-intensive and requires expert modeling of part hierarchies and motion structures. We introduce SPARK, a framework for reconstructing physically consistent, kinematic part-level articulated objects from a single RGB image. Given an input image, we first leverage VLMs to extract coarse URDF parameters and generate part-level reference images. We then integrate the part-image guidance and the inferred structure graph into a generative diffusion transformer to synthesize consistent part and complete shapes of articulated objects. To further refine the URDF parameters, we incorporate differentiable forward kinematics and differentiable rendering to optimize joint types, axes, and origins under VLM-generated open-state supervision. Extensive experiments show that SPARK produces high-quality, simulation-ready articulated assets across diverse categories, enabling downstream applications such as robotic manipulation and interaction modeling.
Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track
This work presents a real-time autonomous track navigation framework for nonholonomic differential-drive mobile robots by jointly integrating multi-task visual perception and a provably stable tracking controller. The perception pipeline reconstructs lane centerlines using 2D-to-3D camera projection, arc-length based uniform point resampling, and cubic polynomial fitting solved via robust QR least-squares optimization. The controller regulates robot linear and angular velocities through a Lyapunov-stability grounded design, ensuring bounded error dynamics and asymptotic convergence of position and heading deviations even in dynamic and partially perceived lane scenarios, without relying on HD prior maps or global satellite localization. Real-world experiments on embedded platforms verify system fidelity, real-time execution, trajectory smoothness, and closed-loop stability for reliable autonomous navigation.
comment: This is a master's graduation thesis that has not been formally published. Uploaded with the author's copyright permission. No confidential content involved
A Cross-Embodiment Gripper Benchmark for Rigid-Object Manipulation in Aerial and Industrial Robotics
Robotic grippers are increasingly deployed across industrial, collaborative, and aerial platforms, where each embodiment imposes distinct mechanical, energetic, and operational constraints. Established YCB and NIST benchmarks quantify grasp success, force, or timing on a single platform, but do not evaluate cross-embodiment transferability or energy-aware performance, capabilities essential for modern mobile and aerial manipulation. This letter introduces the Cross-Embodiment Gripper Benchmark (CEGB), a compact and reproducible benchmarking suite extending YCB and selected NIST metrics with three additional components: a transfer-time benchmark measuring the practical effort required to exchange embodiments, an energy-consumption benchmark evaluating grasping and holding efficiency, and an intent-specific ideal payload assessment reflecting design-dependent operational capability. Together, these metrics characterize both grasp performance and the suitability of reusing a single gripper across heterogeneous robotic systems. A lightweight self-locking gripper prototype is implemented as a reference case. Experiments demonstrate rapid embodiment transfer (median ~= 17.6 s across user groups), low holding energy for gripper prototype (~= 1.5 J per 10 s), and consistent grasp performance with cycle times of 3.2 - 3.9 s and success rates exceeding 90%. CEGB thus provides a reproducible foundation for cross-platform, energy-aware evaluation of grippers in aerial and manipulators domains.
L2M-Calib: One-key Calibration Method for LiDAR and Multiple Magnetic Sensors
Multimodal sensor fusion enables robust environmental perception by leveraging complementary information from heterogeneous sensing modalities. However, accurate calibration is a critical prerequisite for effective fusion. This paper proposes a novel one-key calibration framework named L2M-Calib for a fused magnetic-LiDAR system, jointly estimating the extrinsic transformation between the two kinds of sensors and the intrinsic distortion parameters of the magnetic sensors. Magnetic sensors capture ambient magnetic field (AMF) patterns, which are invariant to geometry, texture, illumination, and weather, making them suitable for challenging environments. Nonetheless, the integration of magnetic sensing into multimodal systems remains underexplored due to the absence of effective calibration techniques. To address this, we optimize extrinsic parameters using an iterative Gauss-Newton scheme, coupled with the intrinsic calibration as a weighted ridge-regularized total least squares (w-RRTLS) problem, ensuring robustness against measurement noise and ill-conditioned data. Extensive evaluations on both simulated datasets and real-world experiments, including AGV-mounted sensor configurations, demonstrate that our method achieves high calibration accuracy and robustness under various environmental and operational conditions.
NavForesee: A Unified Vision-Language World Model for Hierarchical Planning and Dual-Horizon Navigation Prediction
Embodied navigation for long-horizon tasks, guided by complex natural language instructions, remains a formidable challenge in artificial intelligence. Existing agents often struggle with robust long-term planning about unseen environments, leading to high failure rates. To address these limitations, we introduce NavForesee, a novel Vision-Language Model (VLM) that unifies high-level language planning and predictive world model imagination within a single, unified framework. Our approach empowers a single VLM to concurrently perform planning and predictive foresight. Conditioned on the full instruction and historical observations, the model is trained to understand the navigation instructions by decomposing the task, tracking its progress, and formulating the subsequent sub-goal. Simultaneously, it functions as a generative world model, providing crucial foresight by predicting short-term environmental dynamics and long-term navigation milestones. The VLM's structured plan guides its targeted prediction, while the imagined future provides rich context to inform the navigation actions, creating a powerful internal feedback loop of perception-planning/prediction-action. We demonstrate through extensive experiments on the R2R-CE and RxR-CE benchmark that NavForesee achieves highly competitive performance in complex scenarios. Our work highlights the immense potential of fusing explicit language planning with implicit spatiotemporal prediction, paving the way for more intelligent and capable embodied agents.
$\mathbf{M^3A}$ Policy: Mutable Material Manipulation Augmentation Policy through Photometric Re-rendering
Material generalization is essential for real-world robotic manipulation, where robots must interact with objects exhibiting diverse visual and physical properties. This challenge is particularly pronounced for objects made of glass, metal, or other materials whose transparent or reflective surfaces introduce severe out-of-distribution variations. Existing approaches either rely on simulated materials in simulators and perform sim-to-real transfer, which is hindered by substantial visual domain gaps, or depend on collecting extensive real-world demonstrations, which is costly, time-consuming, and still insufficient to cover various materials. To overcome these limitations, we resort to computational photography and introduce Mutable Material Manipulation Augmentation (M$^3$A), a unified framework that leverages the physical characteristics of materials as captured by light transport for photometric re-rendering. The core idea is simple yet powerful: given a single real-world demonstration, we photometrically re-render the scene to generate a diverse set of highly realistic demonstrations with different material properties. This augmentation effectively decouples task-specific manipulation skills from surface appearance, enabling policies to generalize across materials without additional data collection. To systematically evaluate this capability, we construct the first comprehensive multi-material manipulation benchmark spanning both simulation and real-world environments. Extensive experiments show that the M$^3$A policy significantly enhances cross-material generalization, improving the average success rate across three real-world tasks by 58.03\%, and demonstrating robust performance on previously unseen materials.
comment: under submission
Accelerating Probabilistic Response-Time Analysis: Revised Critical Instant and Optimized Convolution
Accurate estimation of the Worst-Case Deadline Failure Probability (WCDFP) has attracted growing attention as a means to provide safety assurances in complex systems such as robotic platforms and autonomous vehicles. WCDFP quantifies the likelihood of deadline misses under the most pessimistic operating conditions, and safe estimation is essential for dependable real-time applications. However, achieving high accuracy in WCDFP estimation often incurs significant computational cost. Recent studies have revealed that the classical assumption of the critical instant, the activation pattern traditionally considered to trigger the worst-case behavior, can lead to underestimation of WCDFP in probabilistic settings. This observation motivates the use of a revised critical instant formulation that more faithfully captures the true worst-case scenario. This paper investigates convolution-based methods for WCDFP estimation under this revised setting and proposes an optimization technique that accelerates convolution by improving the merge order. Extensive experiments with diverse execution-time distributions demonstrate that the proposed optimized Aggregate Convolution reduces computation time by up to an order of magnitude compared to Sequential Convolution, while retaining accurate and safe-sided WCDFP estimates. These results highlight the potential of the approach to provide both efficiency and reliability in probabilistic timing analysis for safety-critical real-time applications.
comment: 8 pages, 5 figures. Proceedings of APRIS2025
Modality-Augmented Fine-Tuning of Foundation Robot Policies for Cross-Embodiment Manipulation on GR1 and G1
This paper presents a modality-augmented fine-tuning framework designed to adapt foundation robot policies to diverse humanoid embodiments. We validate our approach across two distinct settings: (i) the GR1 embodiment, utilizing public datasets where we introduce post-processed modalities, including binary contact signals and ZoeDepth-generated metric depth; and (ii) the Unitree G1 embodiment, for which we contribute a novel multi-modal dataset incorporating cuRobo motion planning, inverse kinematics, and ground-truth contact-force measurements. Our experiments demonstrate that modality augmentation consistently enhances policy performance across different embodiments. Specifically, for the GR1, integrating contact-state cues and RGB-D fusion improves online success rates from 51% to 63%. Furthermore, in the G1 "Pick Apple to Bowl" task, our contact-augmented model achieves a success rate of 94%, significantly outperforming the 48% achieved by standard fine-tuning and the 0% baseline of zero-shot transfer. These results highlight that lightweight post-processing effectively strengthens policies for GR1, while high-quality multi-modal data is crucial for reliable transfer to the Unitree G1. Consequently, this work establishes a unified, data-centric pathway for extending foundation robot policies through targeted modality design and multi-modal fine-tuning.
comment: 8 pages, 10 figures
Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning
Humanoid robots have received significant research interests and advancements in recent years. Despite many successes, due to their morphology, dynamics and limitation of control policy, humanoid robots are prone to fall as compared to other embodiments like quadruped or wheeled robots. And its large weight, tall Center of Mass, high Degree-of-Freedom would cause serious hardware damages when falling uncontrolled, to both itself and surrounding objects. Existing researches in this field mostly focus on using control based methods that struggle to cater diverse falling scenarios and may introduce unsuitable human prior. On the other hand, large-scale Deep Reinforcement Learning and Curriculum Learning could be employed to incentivize humanoid agent discovering falling protection policy that fits its own nature and property. In this work, with carefully designed reward functions and domain diversification curriculum, we successfully train humanoid agent to explore falling protection behaviors and discover that by forming a `triangle' structure, the falling damages could be significantly reduced with its rigid-material body. With comprehensive metrics and experiments, we quantify its performance with comparison to other methods, visualize its falling behaviors and successfully transfer it to real world platform.
Visibility-aware Cooperative Aerial Tracking with Decentralized LiDAR-based Swarms
Autonomous aerial tracking with drones offers vast potential for surveillance, cinematography, and industrial inspection applications. While single-drone tracking systems have been extensively studied, swarm-based target tracking remains underexplored, despite its unique advantages of distributed perception, fault-tolerant redundancy, and multidirectional target coverage. To bridge this gap, we propose a novel decentralized LiDAR-based swarm tracking framework that enables visibility-aware, cooperative target tracking in complex environments, while fully harnessing the unique capabilities of swarm systems. To address visibility, we introduce a novel Spherical Signed Distance Field (SSDF)-based metric for 3-D environmental occlusion representation, coupled with an efficient algorithm that enables real-time onboard SSDF updating. A general Field-of-View (FOV) alignment cost supporting heterogeneous LiDAR configurations is proposed for consistent target observation. Swarm coordination is enhanced through cooperative costs that enforce inter-robot safe clearance, prevent mutual occlusions, and notably facilitate 3-D multidirectional target encirclement via a novel electrostatic-potential-inspired distribution metric. These innovations are integrated into a hierarchical planner, combining a kinodynamic front-end searcher with a spatiotemporal $SE(3)$ back-end optimizer to generate collision-free, visibility-optimized trajectories.Deployed on heterogeneous LiDAR swarms, our fully decentralized implementation features collaborative perception, distributed planning, and dynamic swarm reconfigurability. Validated through rigorous real-world experiments in cluttered outdoor environments, the proposed system demonstrates robust cooperative tracking of agile targets (drones, humans) while achieving superior visibility maintenance.
COMET: A Dual Swashplate Autonomous Coaxial Bi-copter AAV with High-Maneuverability and Long-Endurance
Coaxial bi-copter autonomous aerial vehicles (AAVs) have garnered attention due to their potential for improved rotor system efficiency and compact form factor. However, balancing efficiency, maneuverability, and compactness in coaxial bi-copter systems remains a key design challenge, limiting their practical deployment. This letter introduces COMET, a coaxial bi-copter AAV platform featuring a dual swashplate mechanism. The coaxial bi-copter system's efficiency and compactness are optimized through bench tests, and the whole prototype's efficiency and robustness under varying payload conditions are verified through flight endurance experiments. The maneuverability performance of the system is evaluated in comprehensive trajectory tracking tests. The results indicate that the dual swashplate configuration enhances tracking performance and improves flight efficiency compared to the single swashplate alternative. Successful autonomous flight trials across various scenarios verify COMET's potential for real-world applications.
comment: 8 pages, 8 figures, accepted at IEEE RA-L
How do trout regulate patterns of muscle contraction to optimize propulsive efficiency during steady swimming
Understanding efficient fish locomotion offers insights for biomechanics, fluid dynamics, and engineering. Traditional studies often miss the link between neuromuscular control and whole-body movement. To explore energy transfer in carangiform swimming, we created a bio-inspired digital trout. This model combined multibody dynamics, Hill-type muscle modeling, and a high-fidelity fluid-structure interaction algorithm, accurately replicating a real trout's form and properties. Using deep reinforcement learning, the trout's neural system achieved hierarchical spatiotemporal control of muscle activation. We systematically examined how activation strategies affect speed and energy use. Results show that axial myomere coupling-with activation spanning over 0.5 body lengths-is crucial for stable body wave propagation. Moderate muscle contraction duration ([0.1,0.3] of a tail-beat cycle) lets the body and fluid act as a passive damping system, cutting energy use. Additionally, the activation phase lag of myomeres shapes the body wave; if too large, it causes antagonistic contractions that hinder thrust. These findings advance bio-inspired locomotion understanding and aid energy-efficient underwater system design.
RoboLoc: A Benchmark Dataset for Point Place Recognition and Localization in Indoor-Outdoor Integrated Environments
Robust place recognition is essential for reliable localization in robotics, particularly in complex environments with fre- quent indoor-outdoor transitions. However, existing LiDAR-based datasets often focus on outdoor scenarios and lack seamless domain shifts. In this paper, we propose RoboLoc, a benchmark dataset designed for GPS-free place recognition in indoor-outdoor environments with floor transitions. RoboLoc features real-world robot trajectories, diverse elevation profiles, and transitions between structured indoor and unstructured outdoor domains. We benchmark a variety of state-of-the-art models, point-based, voxel-based, and BEV-based architectures, highlighting their generalizability domain shifts. RoboLoc provides a realistic testbed for developing multi-domain localization systems in robotics and autonomous navigation
Real-World Reinforcement Learning of Active Perception Behaviors NeurIPS 2025
A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://penn-pal-lab.github.io/aawr/
comment: NeurIPS 2025 camera ready
Real-Time On-the-Go Annotation Framework Using YOLO for Automated Dataset Generation CEC 2025
Efficient and accurate annotation of datasets remains a significant challenge for deploying object detection models such as You Only Look Once (YOLO) in real-world applications, particularly in agriculture where rapid decision-making is critical. Traditional annotation techniques are labor-intensive, requiring extensive manual labeling post data collection. This paper presents a novel real-time annotation approach leveraging YOLO models deployed on edge devices, enabling immediate labeling during image capture. To comprehensively evaluate the efficiency and accuracy of our proposed system, we conducted an extensive comparative analysis using three prominent YOLO architectures (YOLOv5, YOLOv8, YOLOv12) under various configurations: single-class versus multi-class annotation and pretrained versus scratch-based training. Our analysis includes detailed statistical tests and learning dynamics, demonstrating significant advantages of pretrained and single-class configurations in terms of model convergence, performance, and robustness. Results strongly validate the feasibility and effectiveness of our real-time annotation framework, highlighting its capability to drastically reduce dataset preparation time while maintaining high annotation quality.
comment: Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: https://doi.org/10.1109/INTCEC65580.2025.11256048
Property-Guided Cyber-Physical Reduction and Surrogation for Safety Analysis in Robotic Vehicles SP 2025
We propose a methodology for falsifying safety properties in robotic vehicle systems through property-guided reduction and surrogate execution. By isolating only the control logic and physical dynamics relevant to a given specification, we construct lightweight surrogate models that preserve property-relevant behaviors while eliminating unrelated system complexity. This enables scalable falsification via trace analysis and temporal logic oracles. We demonstrate the approach on a drone control system containing a known safety flaw. The surrogate replicates failure conditions at a fraction of the simulation cost, and a property-guided fuzzer efficiently discovers semantic violations. Our results suggest that controller reduction, when coupled with logic-aware test generation, provides a practical and scalable path toward semantic verification of cyber-physical systems.
comment: Accepted at EAI SmartSP 2025 (EAI International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles), Springer LNICST. The code repository is available here: https://doi.org/10.5281/zenodo.17497068
Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions CVPR
Unmanned Aerial Vehicles (UAVs) are indispensable for infrastructure inspection, surveillance, and related tasks, yet they also introduce critical security challenges. This survey provides a wide-ranging examination of the anti-UAV domain, centering on three core objectives-classification, detection, and tracking-while detailing emerging methodologies such as diffusion-based data synthesis, multi-modal fusion, vision-language modeling, self-supervised learning, and reinforcement learning. We systematically evaluate state-of-the-art solutions across both single-modality and multi-sensor pipelines (spanning RGB, infrared, audio, radar, and RF) and discuss large-scale as well as adversarially oriented benchmarks. Our analysis reveals persistent gaps in real-time performance, stealth detection, and swarm-based scenarios, underscoring pressing needs for robust, adaptive anti-UAV systems. By highlighting open research directions, we aim to foster innovation and guide the development of next-generation defense strategies in an era marked by the extensive use of UAVs.
comment: Best Paper, Accepted at CVPR Workshop Anti-UAV 2025. 16 pages
PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models
Preference-based reinforcement learning (PbRL) has emerged as a promising paradigm for teaching robots complex behaviors without reward engineering. However, its effectiveness is often limited by two critical challenges: the reliance on extensive human input and the inherent difficulties in resolving query ambiguity and credit assignment during reward learning. In this paper, we introduce PRIMT, a PbRL framework designed to overcome these challenges by leveraging foundation models (FMs) for multimodal synthetic feedback and trajectory synthesis. Unlike prior approaches that rely on single-modality FM evaluations, PRIMT employs a hierarchical neuro-symbolic fusion strategy, integrating the complementary strengths of large language models and vision-language models in evaluating robot behaviors for more reliable and comprehensive feedback. PRIMT also incorporates foresight trajectory generation, which reduces early-stage query ambiguity by warm-starting the trajectory buffer with bootstrapped samples, and hindsight trajectory augmentation, which enables counterfactual reasoning with a causal auxiliary loss to improve credit assignment. We evaluate PRIMT on 2 locomotion and 6 manipulation tasks on various benchmarks, demonstrating superior performance over FM-based and scripted baselines.
Multimodal "Puppeteer": Exploring Robot Teleoperation Via Virtual Counterpart with LLM-Driven Voice and Gesture Interaction in Augmented Reality
The integration of robotics and augmented reality (AR) offers promising opportunities to enhance human-robot interaction (HRI) by making teleoperation more transparent, spatially grounded, and intuitive. We present a head-mounted AR "puppeteer" framework in which users control a physical robot via interacting with its virtual counterpart robot using large language model (LLM)-driven voice commands and hand-gesture interaction on the Meta Quest 3. In a within-subject user study with 42 participants performing an AR-based robotic pick-and-place pattern-matching task, we compare two interaction conditions: gesture-only (GO) and combined voice+gesture (VG). Our results show that GO currently provides more reliable and efficient control for this time-critical task, while VG introduces additional flexibility but also latency and recognition issues that can increase workload. We further explore how prior robotics experience shapes participants' perceptions of each modality. Based on these findings, we distill a set of evidence-based design guidelines for AR puppeteer metaphoric robot teleoperation, implicating multimodality as an adaptive strategy that must balance efficiency, robustness, and user expertise rather than assuming that additional modalities are universally beneficial. Our work contributes empirical insights into how multimodal (voice+gesture) interaction influences task efficiency, usability, and user experience in AR-based HRI.
comment: This work is under peer review
How to Adapt Control Barrier Functions? A Learning-Based Approach with Applications to a VTOL Quadplane
In this paper, we present a novel theoretical framework for online adaptation of Control Barrier Function (CBF) parameters, i.e., of the class K functions included in the CBF condition, under input constraints. We introduce the concept of locally validated CBF parameters, which are adapted online to guarantee finite-horizon safety, based on conditions derived from Nagumo's theorem and tangent cone analysis. To identify these parameters online, we integrate a learning-based approach with an uncertainty-aware verification process that account for both epistemic and aleatoric uncertainties inherent in neural network predictions. Our method is demonstrated on a VTOL quadplane model during challenging transition and landing maneuvers, showcasing enhanced performance while maintaining safety.
comment: 2025 IEEE Conference on Decision and Control (CDC). Project page: https://www.taekyung.me/how-to-adapt-cbf
High-Speed Event Vision-Based Tactile Roller Sensor for Large Surface Measurements
Inspecting large-scale industrial surfaces like aircraft fuselages for quality control requires capturing their precise 3D surface geometry at high resolution. Vision-based tactile sensors (VBTSs) offer high local resolution but require slow 'press-and-lift' measurements stitched for large areas. Approaches with sliding or roller/belt VBTS designs provide measurements continuity. However, they face significant challenges respectively: sliding struggles with friction/wear and both approaches are speed-limited by conventional camera frame rates and motion blur, making large-area scanning time consuming. Thus, a rapid, continuous, high-resolution method is needed. We introduce a novel tactile sensor integrating a neuromorphic camera in a rolling mechanism to achieve this. Leveraging its high temporal resolution and robustness to motion blur, our system uses a modified event-based multi-view stereo approach for 3D reconstruction. We demonstrate state-of-the-art scanning speeds up to 0.5 m/s, achieving Mean Absolute Error below 100 microns -- 11 times faster than prior continuous tactile sensing methods. A multi-reference Bayesian fusion strategy enhances accuracy (reducing MAE by 25.2\% compared to EMVS) and mitigates curvature errors. We also validate high-speed feature recognition via Braille reading 2.6 times faster than previous approaches.
comment: Under Review - Project Page: https://akramekhairi.github.io/TheySeeMeRolling/. 14 pages, 11 figures
A Unified Framework for Probabilistic Dynamic-, Trajectory- and Vision-based Virtual Fixtures
Probabilistic Virtual Fixtures (VFs) enable the adaptive selection of the most suitable haptic feedback for each phase of a task, based on learned or perceived uncertainty. While keeping the human in the loop remains essential, for instance, to ensure high precision, partial automation of certain task phases is critical for productivity. We present a unified framework for probabilistic VFs that seamlessly switches between manual fixtures, semi-automated fixtures (with the human handling precise tasks), and full autonomy. We introduce a novel probabilistic Dynamical System-based VF for coarse guidance, enabling the robot to autonomously complete certain task phases while keeping the human operator in the loop. For tasks requiring precise guidance, we extend probabilistic position-based trajectory fixtures with automation allowing for seamless human interaction as well as geometry-awareness and optimal impedance gains. For manual tasks requiring very precise guidance, we also extend visual servoing fixtures with the same geometry-awareness and impedance behavior. We validate our approach experimentally on different robots, showcasing multiple operation modes and the ease of programming fixtures.
comment: for the supplementary video, see https://www.youtube.com/watch?v=vUXzcpMbMnY
VITA: Vision-to-Action Flow Matching Policy
Conventional flow matching and diffusion-based policies sample through iterative denoising from standard noise distributions (e.g., Gaussian), and require conditioning modules to repeatedly incorporate visual information during the generative process, incurring substantial time and memory overhead. To reduce the complexity, we develop VITA(VIsion-To-Action policy), a noise-free and conditioning-free flow matching policy learning framework that directly flows from visual representations to latent actions. Since the source of the flow is visually grounded, VITA eliminates the need of visual conditioning during generation. As expected, bridging vision and action is challenging, because actions are lower-dimensional, less structured, and sparser than visual representations; moreover, flow matching requires the source and target to have the same dimensionality. To overcome this, we introduce an action autoencoder that maps raw actions into a structured latent space aligned with visual latents, trained jointly with flow matching. To further prevent latent space collapse, we propose flow latent decoding, which anchors the latent generation process by backpropagating the action reconstruction loss through the flow matching ODE (ordinary differential equation) solving steps. We evaluate VITA on 9 simulation and 5 real-world tasks from ALOHA and Robomimic. VITA achieves 1.5x-2x faster inference compared to conventional methods with conditioning modules, while outperforming or matching state-of-the-art policies. Codes, datasets, and demos are available at our project page: https://ucd-dare.github.io/VITA/.
comment: Project page: https://ucd-dare.github.io/VITA/ Code: https://github.com/ucd-dare/VITA
RobustVLA: Robustness-Aware Reinforcement Post-Training for Vision-Language-Action Models
Vision-Language-Action (VLA) models have recently emerged as powerful general-purpose policies for robotic manipulation, benefiting from large-scale multi-modal pre-training. However, they often fail to generalize reliably in out-of-distribution deployments, where unavoidable disturbances such as observation noise, sensor errors, or actuation perturbations become prevalent. While recent Reinforcement Learning (RL)-based post-training provides a practical means to adapt pre-trained VLA models, existing methods mainly emphasize reward maximization and overlook robustness to environmental uncertainty. In this work, we introduce RobustVLA, a lightweight online RL post-training method designed to explicitly enhance the resilience of VLA models. Through a systematic robustness analysis, we identify two key regularizations: Jacobian regularization, which mitigates sensitivity to observation noise, and smoothness regularization, which stabilizes policies under action perturbations. Extensive experiments across diverse robotic environments demonstrate that RobustVLA significantly outperforms prior state-of-the-art methods in robustness and reliability. Our results highlight the importance of principled robustness-aware RL post-training as a key step toward improving the reliability and robustness of VLA models.
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
We address key challenges in Dataset Aggregation (DAgger) for real-world contact-rich manipulation: how to collect informative human correction data and how to effectively update policies with this new data. We introduce Compliant Residual DAgger (CR-DAgger), which contains two novel components: 1) a Compliant Intervention Interface that leverages compliance control, allowing humans to provide gentle, accurate delta action corrections without interrupting the ongoing robot policy execution; and 2) a Compliant Residual Policy formulation that learns from human corrections while incorporating force feedback and force control. Our system significantly enhances performance on precise contact-rich manipulation tasks using minimal correction data, improving base policy success rates by over 50\% on two challenging tasks (book flipping and belt assembly) while outperforming both retraining-from-scratch and finetuning approaches. Through extensive real-world experiments, we provide practical guidance for implementing effective DAgger in real-world robot learning tasks. Result videos are available at: https://compliant-residual-dagger.github.io/
3EED: Ground Everything Everywhere in 3D NeurIPS 2025
Visual grounding in 3D is the key for embodied agents to localize language-referred objects in open-world environments. However, existing benchmarks are limited to indoor focus, single-platform constraints, and small scale. We introduce 3EED, a multi-platform, multi-modal 3D grounding benchmark featuring RGB and LiDAR data from vehicle, drone, and quadruped platforms. We provide over 128,000 objects and 22,000 validated referring expressions across diverse outdoor scenes -- 10x larger than existing datasets. We develop a scalable annotation pipeline combining vision-language model prompting with human verification to ensure high-quality spatial grounding. To support cross-platform learning, we propose platform-aware normalization and cross-modal alignment techniques, and establish benchmark protocols for in-domain and cross-platform evaluations. Our findings reveal significant performance gaps, highlighting the challenges and opportunities of generalizable 3D grounding. The 3EED dataset and benchmark toolkit are released to advance future research in language-driven 3D embodied perception.
comment: NeurIPS 2025 DB Track; 38 pages, 17 figures, 10 tables; Project Page at https://project-3eed.github.io/
AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios AAAI 2026
Referring Multi-Object Tracking (RMOT) aims to achieve precise object detection and tracking through natural language instructions, representing a fundamental capability for intelligent robotic systems. However, current RMOT research remains mostly confined to ground-level scenarios, which constrains their ability to capture broad-scale scene contexts and perform comprehensive tracking and path planning. In contrast, Unmanned Aerial Vehicles (UAVs) leverage their expansive aerial perspectives and superior maneuverability to enable wide-area surveillance. Moreover, UAVs have emerged as critical platforms for Embodied Intelligence, which has given rise to an unprecedented demand for intelligent aerial systems capable of natural language interaction. To this end, we introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios, which aims to bridge this research gap. To facilitate its construction, we develop an innovative semi-automated collaborative agent-based labeling assistant (COALA) framework that significantly reduces labor costs while maintaining annotation quality. Furthermore, we propose HawkEyeTrack (HETrack), a novel method that collaboratively enhances vision-language representation learning and improves the perception of UAV scenarios. Comprehensive experiments validated the challenging nature of our dataset and the effectiveness of our method.
comment: AAAI 2026
UniFucGrasp: Human-Hand-Inspired Unified Functional Grasp Annotation Strategy and Dataset for Diverse Dexterous Hands
Dexterous grasp datasets are vital for embodied intelligence, but mostly emphasize grasp stability, ignoring functional grasps needed for tasks like opening bottle caps or holding cup handles. Most rely on bulky, costly, and hard-to-control high-DOF Shadow Hands. Inspired by the human hand's underactuated mechanism, we establish UniFucGrasp, a universal functional grasp annotation strategy and dataset for multiple dexterous hand types. Based on biomimicry, it maps natural human motions to diverse hand structures and uses geometry-based force closure to ensure functional, stable, human-like grasps. This method supports low-cost, efficient collection of diverse, high-quality functional grasps. Finally, we establish the first multi-hand functional grasp dataset and provide a synthesis model to validate its effectiveness. Experiments on the UFG dataset, IsaacSim, and complex robotic tasks show that our method improves functional manipulation accuracy and grasp stability, demonstrates improved adaptability across multiple robotic hands, helping to alleviate annotation cost and generalization challenges in dexterous grasping. The project page is at https://haochen611.github.io/UFG.
comment: Accepted to IEEE Robotics and Automation Letters (RA-L). The project page is at https://haochen611.github.io/UFG
HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving
Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint changes or to ensure geometric consistency. We introduce HybridWorldSim, a hybrid simulation framework that integrates multi-traversal neural reconstruction for static backgrounds with generative modeling for dynamic agents. This unified design addresses key limitations of previous methods, enabling the creation of diverse and high-fidelity driving scenarios with reliable visual and spatial consistency. To facilitate robust benchmarking, we further release a new multi-traversal dataset MIRROR that captures a wide range of routes and environmental conditions across different cities. Extensive experiments demonstrate that HybridWorldSim surpasses previous state-of-the-art methods, providing a practical and scalable solution for high-fidelity simulation and a valuable resource for research and development in autonomous driving.
Curvature-Constrained Vector Field for Motion Planning of Nonholonomic Robots
Vector fields are advantageous in handling nonholonomic motion planning as they provide reference orientation for robots. However, additionally incorporating curvature constraints becomes challenging, due to the interconnection between the design of the curvature-bounded vector field and the tracking controller under underactuation. In this paper, we present a novel framework to co-develop the vector field and the control laws, guiding the nonholonomic robot to the target configuration with curvature-bounded trajectory. First, we formulate the problem by introducing the target positive limit set, which allows the robot to converge to or pass through the target configuration, depending on different dynamics and tasks. Next, we construct a curvature-constrained vector field (CVF) via blending and distributing basic flow fields in workspace and propose the saturated control laws with a dynamic gain, under which the tracking error's magnitude decreases even when saturation occurs. Under the control laws, kinematically constrained nonholonomic robots are guaranteed to track the reference CVF and converge to the target positive limit set with bounded trajectory curvature. Numerical simulations show that the proposed CVF method outperforms other vector-field-based algorithms. Experiments on Ackermann UGVs and semi-physical fixed-wing UAVs demonstrate that the method can be effectively implemented in real-world scenarios.
comment: IEEE T-RO accepted, 20 pages, 22 figures
A $1000\times$ Faster LLM-enhanced Algorithm For Path Planning in Large-scale Grid Maps
Path planning in grid maps, arising from various applications, has garnered significant attention. Existing methods, such as A*, Dijkstra, and their variants, work well for small-scale maps but fail to address large-scale ones due to high search time and memory consumption. Recently, Large Language Models (LLMs) have shown remarkable performance in path planning but still suffer from spatial illusion and poor planning performance. Among all the works, LLM-A* \cite{meng2024llm} leverages LLM to generate a series of waypoints and then uses A* to plan the paths between the neighboring waypoints. In this way, the complete path is constructed. However, LLM-A* still suffers from high computational time for large-scale maps. To fill this gap, we conducted a deep investigation into LLM-A* and found its bottleneck, resulting in limited performance. Accordingly, we design an innovative LLM-enhanced algorithm, abbr. as iLLM-A*. iLLM-A* includes 3 carefully designed mechanisms, including the optimization of A*, an incremental learning method for LLM to generate high-quality waypoints, and the selection of the appropriate waypoints for A* for path planning. Finally, a comprehensive evaluation on various grid maps shows that, compared with LLM-A*, iLLM-A* \textbf{1) achieves more than $1000\times$ speedup on average, and up to $2349.5\times$ speedup in the extreme case, 2) saves up to $58.6\%$ of the memory cost, 3) achieves both obviously shorter path length and lower path length standard deviation.}
AutoDrive-R$^2$: Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving
Vision-Language-Action (VLA) models in autonomous driving systems have recently demonstrated transformative potential by integrating multimodal perception with decision-making capabilities. However, the interpretability and coherence of the decision process and the plausibility of action sequences remain largely underexplored. To address these issues, we propose AutoDrive-R$^2$, a novel VLA framework that enhances both reasoning and self-reflection capabilities of autonomous driving systems through chain-of-thought (CoT) processing and reinforcement learning (RL). Specifically, we first propose an innovative CoT dataset named nuScenesR$^2$-6K for supervised fine-tuning, which effectively builds cognitive bridges between input information and output trajectories through a four-step logical chain with self-reflection for validation. Moreover, to maximize both reasoning and self-reflection during the RL stage, we further employ the Group Relative Policy Optimization (GRPO) algorithm within a physics-grounded reward framework that incorporates spatial alignment, vehicle dynamic, and temporal smoothness criteria to ensure reliable and realistic trajectory planning. Extensive evaluation results across both nuScenes and Waymo datasets demonstrates the state-of-the-art performance and robust generalization capacity of our proposed method.
A Practical Guide for Incorporating Symmetry in Diffusion Policy NeurIPS 2025
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.
comment: NeurIPS 2025
WARPD: World model Assisted Reactive Policy Diffusion
With the increasing availability of open-source robotic data, imitation learning has become a promising approach for both manipulation and locomotion. Diffusion models are now widely used to train large, generalized policies that predict controls or trajectories, leveraging their ability to model multimodal action distributions. However, this generality comes at the cost of larger model sizes and slower inference, an acute limitation for robotic tasks requiring high control frequencies. Moreover, Diffusion Policy (DP), a popular trajectory-generation approach, suffers from a trade-off between performance and action horizon: fewer diffusion queries lead to larger trajectory chunks, which in turn accumulate tracking errors. To overcome these challenges, we introduce WARPD (World model Assisted Reactive Policy Diffusion), a method that generates closed-loop policies (weights for neural policies) directly, instead of open-loop trajectories. By learning behavioral distributions in parameter space rather than trajectory space, WARPD offers two major advantages: (1) extended action horizons with robustness to perturbations, while maintaining high task performance, and (2) significantly reduced inference costs. Empirically, WARPD outperforms DP in long-horizon and perturbed environments, and achieves multitask performance on par with DP while requiring only ~ 1/45th of the inference-time FLOPs per step.
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately $70\%$ of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at \href{github.com/budzianowski/opengvl}{OpenGVL}.
comment: Workshop on Making Sense of Data in Robotics: Composition, Curation, and Interpretability at Scale at CoRL 2025
Multiagent Systems
InnoGym: Benchmarking the Innovation Potential of AI Agents
LLMs and Agents have achieved impressive progress in code generation, mathematical reasoning, and scientific discovery. However, existing benchmarks primarily measure correctness, overlooking the diversity of methods behind solutions. True innovation depends not only on producing correct answers but also on the originality of the approach. We present InnoGym, the first benchmark and framework designed to systematically evaluate the innovation potential of AI agents. InnoGym introduces two complementary metrics: performance gain, which measures improvement over the best-known solutions, and novelty, which captures methodological differences from prior approaches. The benchmark includes 18 carefully curated tasks from real-world engineering and scientific domains, each standardized through resource filtering, evaluator validation, and solution collection. In addition, we provide iGym, a unified execution environment for reproducible and long-horizon evaluations. Extensive experiments show that while some agents produce novel approaches, their lack of robustness limits performance gains. These results highlight a key gap between creativity and effectiveness, underscoring the need for benchmarks that evaluate both.
comment: Work in progress
Agent-Kernel: A MicroKernel Multi-Agent System Framework for Adaptive Social Simulation Powered by LLMs
Multi-Agent System (MAS) developing frameworks serve as the foundational infrastructure for social simulations powered by Large Language Models (LLMs). However, existing frameworks fail to adequately support large-scale simulation development due to inherent limitations in adaptability, configurability, reliability, and code reusability. For example, they cannot simulate a society where the agent population and profiles change over time. To fill this gap, we propose Agent-Kernel, a framework built upon a novel society-centric modular microkernel architecture. It decouples core system functions from simulation logic and separates cognitive processes from physical environments and action execution. Consequently, Agent-Kernel achieves superior adaptability, configurability, reliability, and reusability. We validate the framework's superiority through two distinct applications: a simulation of the Universe 25 (Mouse Utopia) experiment, which demonstrates the handling of rapid population dynamics from birth to death; and a large-scale simulation of the Zhejiang University Campus Life, successfully coordinating 10,000 heterogeneous agents, including students and faculty.
CourtMotion: Learning Event-Driven Motion Representations from Skeletal Data for Basketball
This paper presents CourtMotion, a spatiotemporal modeling framework for analyzing and predicting game events and plays as they develop in professional basketball. Anticipating basketball events requires understanding both physical motion patterns and their semantic significance in the context of the game. Traditional approaches that use only player positions fail to capture crucial indicators such as body orientation, defensive stance, or shooting preparation motions. Our two-stage approach first processes skeletal tracking data through Graph Neural Networks to capture nuanced motion patterns, then employs a Transformer architecture with specialized attention mechanisms to model player interactions. We introduce event projection heads that explicitly connect player movements to basketball events like passes, shots, and steals, training the model to associate physical motion patterns with their tactical purposes. Experiments on NBA tracking data demonstrate significant improvements over position-only baselines: 35% reduction in trajectory prediction error compared to state-of-the-art position-based models and consistent performance gains across key basketball analytics tasks. The resulting pretrained model serves as a powerful foundation for multiple downstream tasks, with pick detection, shot taker identification, assist prediction, shot location classification, and shot type recognition demonstrating substantial improvements over existing methods.
SocialDriveGen: Generating Diverse Traffic Scenarios with Controllable Social Interactions
The generation of realistic and diverse traffic scenarios in simulation is essential for developing and evaluating autonomous driving systems. However, most simulation frameworks rely on rule-based or simplified models for scene generation, which lack the fidelity and diversity needed to represent real-world driving. While recent advances in generative modeling produce more realistic and context-aware traffic interactions, they often overlook how social preferences influence driving behavior. SocialDriveGen addresses this gap through a hierarchical framework that integrates semantic reasoning and social preference modeling with generative trajectory synthesis. By modeling egoism and altruism as complementary social dimensions, our framework enables controllable diversity in driver personalities and interaction styles. Experiments on the Argoverse 2 dataset show that SocialDriveGen generates diverse, high-fidelity traffic scenarios spanning cooperative to adversarial behaviors, significantly enhancing policy robustness and generalization to rare or high-risk situations.
DialogGuard: Multi-Agent Psychosocial Safety Evaluation of Sensitive LLM Responses
Large language models (LLMs) now mediate many web-based mental-health, crisis, and other emotionally sensitive services, yet their psychosocial safety in these settings remains poorly understood and weakly evaluated. We present DialogGuard, a multi-agent framework for assessing psychosocial risks in LLM-generated responses along five high-severity dimensions: privacy violations, discriminatory behaviour, mental manipulation, psychological harm, and insulting behaviour. DialogGuard can be applied to diverse generative models through four LLM-as-a-judge pipelines, including single-agent scoring, dual-agent correction, multi-agent debate, and stochastic majority voting, grounded in a shared three-level rubric usable by both human annotators and LLM judges. Using PKU-SafeRLHF with human safety annotations, we show that multi-agent mechanisms detect psychosocial risks more accurately than non-LLM baselines and single-agent judging; dual-agent correction and majority voting provide the best trade-off between accuracy, alignment with human ratings, and robustness, while debate attains higher recall but over-flags borderline cases. We release Dialog-Guard as open-source software with a web interface that provides per-dimension risk scores and explainable natural-language rationales. A formative study with 12 practitioners illustrates how it supports prompt design, auditing, and supervision of web-facing applications for vulnerable users.
Orchestration Framework for Financial Agents: From Algorithmic Trading to Agentic Trading NeurIPS 2025
The financial market is a mission-critical playground for AI agents due to its temporal dynamics and low signal-to-noise ratio. Building an effective algorithmic trading system may require a professional team to develop and test over the years. In this paper, we propose an orchestration framework for financial agents, which aims to democratize financial intelligence to the general public. We map each component of the traditional algorithmic trading system to agents, including planner, orchestrator, alpha agents, risk agents, portfolio agents, backtest agents, execution agents, audit agents, and memory agent. We present two in-house trading examples. For the stock trading task (hourly data from 04/2024 to 12/2024), our approach achieved a return of $20.42\%$, a Sharpe ratio of 2.63, and a maximum drawdown of $-3.59\%$, while the S&P 500 index yielded a return of $15.97\%$. For the BTC trading task (minute data from 27/07/2025 to 13/08/2025), our approach achieved a return of $8.39\%$, a Sharpe ratio of $0.38$, and a maximum drawdown of $-2.80\%$, whereas the BTC price increased by $3.80\%$. Our code is available on \href{https://github.com/Open-Finance-Lab/AgenticTrading}{GitHub}.
comment: Accepted at the Workshop on Generative AI in Finance, 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
How Market Volatility Shapes Algorithmic Collusion: A Comparative Analysis of Learning-Based Pricing Algorithms
Autonomous pricing algorithms are increasingly influencing competition in digital markets; however, their behavior under realistic demand conditions remains largely unexamined. This paper offers a thorough analysis of four pricing algorithms -- Q-Learning, PSO, Double DQN, and DDPG -- across three classic duopoly models (Logit, Hotelling, Linear) and under various demand-shock regimes created by auto-regressive processes. By utilizing profit- and price-based collusion indices, we investigate how the interactions among algorithms, market structure, and stochastic demand collaboratively influence competitive outcomes. Our findings reveal that reinforcement-learning algorithms often sustain supra-competitive prices under stable demand, with DDPG demonstrating the most pronounced collusive tendencies. Demand shocks produce notably varied effects: Logit markets suffer significant performance declines, Hotelling markets remain stable, and Linear markets experience shock-induced profit inflation. Despite marked changes in absolute performance, the relative rankings of the algorithms are consistent across different environments. These results underscore the critical importance of market structure and demand uncertainty in shaping algorithmic competition, while also contributing to the evolving policy discussions surrounding autonomous pricing behavior.
Many-to-One Adversarial Consensus: Exposing Multi-Agent Collusion Risks in AI-Based Healthcare
The integration of large language models (LLMs) into healthcare IoT systems promises faster decisions and improved medical support. LLMs are also deployed as multi-agent teams to assist AI doctors by debating, voting, or advising on decisions. However, when multiple assistant agents interact, coordinated adversaries can collude to create false consensus, pushing an AI doctor toward harmful prescriptions. We develop an experimental framework with scripted and unscripted doctor agents, adversarial assistants, and a verifier agent that checks decisions against clinical guidelines. Using 50 representative clinical questions, we find that collusion drives the Attack Success Rate (ASR) and Harmful Recommendation Rates (HRR) up to 100% in unprotected systems. In contrast, the verifier agent restores 100% accuracy by blocking adversarial consensus. This work provides the first systematic evidence of collusion risk in AI healthcare and demonstrates a practical, lightweight defence that ensures guideline fidelity.
comment: 7 pages Conference level paper
A Knowledge-Based Language Model: Deducing Grammatical Knowledge in a Multi-Agent Language Acquisition Simulation
This paper presents an initial study performed by the MODOMA system. The MODOMA is a computational multi-agent laboratory environment for unsupervised language acquisition experiments such that acquisition is based on the interaction between two language models, an adult and a child agent. Although this framework employs statistical as well as rule-based procedures, the result of language acquisition is a knowledge-based language model, which can be used to generate and parse new utterances of the target language. This system is fully parametrized and researchers can control all aspects of the experiments while the results of language acquisition, that is, the acquired grammatical knowledge, are explicitly represented and can be consulted. Thus, this system introduces novel possibilities for conducting computational language acquisition experiments. The experiments presented by this paper demonstrate that functional and content categories can be acquired and represented by the daughter agent based on training and test data containing different amounts of exemplars generated by the adult agent. Interestingly, similar patterns, which are well-established for human-generated data, are also found for these machine-generated data. As the procedures resulted in the successful acquisition of discrete grammatical categories by the child agent, these experiments substantiate the validity of the MODOMA approach to modelling language acquisition.
comment: 23 pages, 7 figures, 11 tables. Related work: arXiv:2503.18702. This is the peer-reviewed publisher's version, downloadable from: https://www.clinjournal.org/clinj/article/view/193
The Station: An Open-World Environment for AI-Driven Discovery
We introduce the STATION, an open-world multi-agent environment for autonomous scientific discovery. The Station simulates a complete scientific ecosystem, where agents can engage in long scientific journeys that include reading papers from peers, formulating hypotheses, collaborating with peers, submitting experiments, and publishing results. Importantly, there is no centralized system coordinating their activities. Utilizing their long context, agents are free to choose their own actions and develop their own narratives within the Station. Experiments demonstrate that AI agents in the Station achieve new state-of-the-art performance on a wide range of benchmarks, spanning mathematics, computational biology, and machine learning, notably surpassing AlphaEvolve in circle packing. A rich tapestry of unscripted narratives emerges, such as agents collaborating and analyzing other works rather than pursuing myopic optimization. From these emergent narratives, novel methods arise organically, such as a new density-adaptive algorithm for scRNA-seq batch integration that borrows concepts from another domain. The Station marks a first step towards autonomous scientific discovery driven by emergent behavior in an open-world environment, representing a new paradigm that moves beyond rigid pipelines.
comment: 55 pages
Robust, Observable, and Evolvable Agentic Systems Engineering: A Principled Framework Validated via the Fairy GUI Agent
The Agentic Paradigm faces a significant Software Engineering Absence, yielding Agentic systems commonly lacking robustness, observability, and evolvability. To address these deficiencies, we propose a principled engineering framework comprising Runtime Goal Refinement (RGR), Observable Cognitive Architecture (OCA), and Evolutionary Memory Architecture (EMA). In this framework, RGR ensures robustness and intent alignment via knowledge-constrained refinement and human-in-the-loop clarification; OCA builds an observable and maintainable white-box architecture using component decoupling, logic layering, and state-control separation; and EMA employs an execution-evolution dual-loop for evolvability. We implemented and empirically validated Fairy, a mobile GUI agent based on this framework. On RealMobile-Eval, our novel benchmark for ambiguous and complex tasks, Fairy outperformed the best SoTA baseline in user requirement completion by 33.7%. Subsequent controlled experiments, human-subject studies, and ablation studies further confirmed that the RGR enhances refinement accuracy and prevents intent deviation; the OCA improves maintainability; and the EMA is crucial for long-term performance. This research provides empirically validated specifications and a practical blueprint for building reliable, observable, and evolvable Agentic AI systems.
comment: 50 pages, 14 figures
NeKo: Cross-Modality Post-Recognition Error Correction with Tasks-Guided Mixture-of-Experts Language Model ACL 2025
Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative 5.0% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with 15.5% to 27.6% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.
comment: ACL 2025 Industry Track. NeKo LMs: https://huggingface.co/nvidia/NeKo-v0-post-correction
Computing Evolutionarily Stable Strategies in Multiplayer Games
We present an algorithm for computing all evolutionarily stable strategies in nondegenerate normal-form games with three or more players.
Autocratic strategies in Cournot oligopoly game
An oligopoly is a market in which the price of goods is controlled by a few firms. Cournot introduced the simplest game-theoretic model of oligopoly, where profit-maximizing behavior of each firm results in market failure. Furthermore, when the Cournot oligopoly game is infinitely repeated, firms can tacitly collude to monopolize the market. Such tacit collusion is realized by the same mechanism as direct reciprocity in the repeated prisoner's dilemma game, where mutual cooperation can be realized whereas defection is favorable for both prisoners in a one-shot game. Recently, in the repeated prisoner's dilemma game, a class of strategies called zero-determinant strategies attracts much attention in the context of direct reciprocity. Zero-determinant strategies are autocratic strategies which unilaterally control payoffs of players by enforcing linear relationships between payoffs. There were many attempts to find zero-determinant strategies in other games and to extend them so as to apply them to broader situations. In this paper, first, we show that zero-determinant strategies exist even in the repeated Cournot oligopoly game, and that they are quite different from those in the repeated prisoner's dilemma game. Especially, we prove that a fair zero-determinant strategy exists, which is guaranteed to obtain the average payoff of the opponents. Second, we numerically show that the fair zero-determinant strategy can be used to promote collusion when it is used against an adaptively learning player, whereas it cannot promote collusion when it is used against two adaptively learning players. Our findings elucidate some negative impact of zero-determinant strategies in the oligopoly market.
comment: 24 pages, 4 figures
A Protocol for Trustless Verification Under Uncertainty
Correctness is an emergent property of systems where exposing error is cheaper than committing it. In dynamic, low-trust environments, autonomous AI agents benefit from delegating work to sub-agents, yet correctness cannot be assured through upfront specification or centralized oversight. We propose a protocol that enforces correctness through collateralized claims in a recursive verification game. Tasks are published as intents, and solvers compete to fulfill them. Selected solvers carry out tasks under risk, with correctness checked post hoc by verifiers. Any challenger can challenge a result by staking against it to trigger the verification process. Incorrect agents are slashed and correct opposition is rewarded, with an escalation path that penalizes erroneous verifiers themselves. When incentives are aligned across solvers, challengers, and verifiers, falsification conditions make correctness the Nash equilibrium.
comment: 9 pages, 1 figure
Systems and Control (EESS)
ECO: Energy-Constrained Operator Learning for Chaotic Dynamics with Boundedness Guarantees
Chaos is a fundamental feature of many complex dynamical systems, including weather systems and fluid turbulence. These systems are inherently difficult to predict due to their extreme sensitivity to initial conditions. Many chaotic systems are dissipative and ergodic, motivating data-driven models that aim to learn invariant statistical properties over long time horizons. While recent models have shown empirical success in preserving invariant statistics, they are prone to generating unbounded predictions, which prevent meaningful statistics evaluation. To overcome this, we introduce the Energy-Constrained Operator (ECO) that simultaneously learns the system dynamics while enforcing boundedness in predictions. We leverage concepts from control theory to develop algebraic conditions based on a learnable energy function, ensuring the learned dynamics is dissipative. ECO enforces these algebraic conditions through an efficient closed-form quadratic projection layer, which provides provable trajectory boundedness. To our knowledge, this is the first work establishing such formal guarantees for data-driven chaotic dynamics models. Additionally, the learned invariant level set provides an outer estimate for the strange attractor, a complex structure that is computationally intractable to characterize. We demonstrate empirical success in ECO's ability to generate stable long-horizon forecasts, capturing invariant statistics on systems governed by chaotic PDEs, including the Kuramoto--Sivashinsky and the Navier--Stokes equations.
AI-Driven Optimization under Uncertainty for Mineral Processing Operations
The global capacity for mineral processing must expand rapidly to meet the demand for critical minerals, which are essential for building the clean energy technologies necessary to mitigate climate change. However, the efficiency of mineral processing is severely limited by uncertainty, which arises from both the variability of feedstock and the complexity of process dynamics. To optimize mineral processing circuits under uncertainty, we introduce an AI-driven approach that formulates mineral processing as a Partially Observable Markov Decision Process (POMDP). We demonstrate the capabilities of this approach in handling both feedstock uncertainty and process model uncertainty to optimize the operation of a simulated, simplified flotation cell as an example. We show that by integrating the process of information gathering (i.e., uncertainty reduction) and process optimization, this approach has the potential to consistently perform better than traditional approaches at maximizing an overall objective, such as net present value (NPV). Our methodological demonstration of this optimization-under-uncertainty approach for a synthetic case provides a mathematical and computational framework for later real-world application, with the potential to improve both the laboratory-scale design of experiments and industrial-scale operation of mineral processing circuits without any additional hardware.
comment: 27 pages, 13 figures, submitted to Sustainable Earth Resources Communications (SERC)
Event-triggered control of nonlinear systems from data
In a recent paper [8], we introduced a data-based approach to design event-triggered controllers for linear systems directly from data. Here, we extend the results in [8] to a class of nonlinear systems. We provide two data-based designs certified by a (classical) Lyapunov function. For these two designs, we devise event-triggered policies that rely on the previously found Lyapunov function, have parameters tuned from data, ensure a positive minimum inter-event time, and act based either on the state error or on the library error. These two different policies, and their respective advantages, are illustrated numerically.
Uncertainty quantification in load profiles with rising EV and PV adoption: the case of residential, industrial, and office buildings
The integration of photovoltaic (PV) generation and electric vehicle (EV) charging intro- duces significant uncertainty in electricity consumption patterns, particularly at the distribution level. This paper presents a comparative study for selecting metrics for uncertainty quantification (UQ) for net load profiles of residential, industrial, and office buildings under increased DER penetration. A variety of statistical metrics is evaluated for their usefulness in quantifying un- certainty, including, but not limited to, standard deviation, entropy, ramps, and distance metrics. The proposed metrics are classified into baseline-free, with baseline and error-based. These UQ metrics are evaluated for increased penetration of EV and PV. The results highlight suitable metrics to quantify uncertainty per consumer type and demonstrate how net load uncertainty is affected by EV and PV adoption. Additionally, it is observed that joint consideration of EV and PV can reduce overall uncertainty due to compensatory effects of EV charging and PV generation due to temporal alignment during the day. Uncertainty reduction is observed across all datasets and is most pronounced for the office building dataset.
NeuroHJR: Hamilton-Jacobi Reachability-based Obstacle Avoidance in Complex Environments with Physics-Informed Neural Networks
Autonomous ground vehicles (AGVs) must navigate safely in cluttered environments while accounting for complex dynamics and environmental uncertainty. Hamilton-Jacobi Reachability (HJR) offers formal safety guarantees through the computation of forward and backward reachable sets, but its application is hindered by poor scalability in environments with numerous obstacles. In this paper, we present a novel framework called NeuroHJR that leverages Physics-Informed Neural Networks (PINNs) to approximate the HJR solution for real-time obstacle avoidance. By embedding system dynamics and safety constraints directly into the neural network loss function, our method bypasses the need for grid-based discretization and enables efficient estimation of reachable sets in continuous state spaces. We demonstrate the effectiveness of our approach through simulation results in densely cluttered scenarios, showing that it achieves safety performance comparable to that of classical HJR solvers while significantly reducing the computational cost. This work provides a new step toward real-time, scalable deployment of reachability-based obstacle avoidance in robotics.
comment: Author-accepted version. Accepted at IEEE 11th Indian Control Conference (ICC), 2025
Experiment design using prior knowledge on controllability and stabilizability
In this paper, we consider the problem of designing input signals for an unknown linear time-invariant system in such a way that the resulting input-state data is suitable for identification or stabilization. We will take into account prior knowledge on system-theoretic properties of the system, in particular, controllability and stabilizability. For this, we extend the notion of universal inputs to incorporate prior knowledge on the system. An input is called universal for identification (resp., stabilization) if, when applied to any system complying with the prior knowledge, it results in data suitable for identification (resp., stabilization) regardless of the initial condition. We provide a full characterization of such universal inputs. In addition, we discuss online experiment design using prior knowledge, and we study cases where this approach results in the shortest possible experiment for identification and stabilization.
comment: 6 pages
Frequency-Dynamics-Aware Economic Dispatch with Optimal Grid-Forming Inverter Allocation and Reserved Power Headroom
The high penetration of inverter-based resources (IBRs) reduces system inertia, leading to frequency stability concerns, especially during synchronous generator (SG) outages. To maintain frequency dynamics within secure limits while ensuring economic efficiency, frequency-constrained optimal power flow (FCOPF) is employed. However, existing studies either neglect the frequency support capability and allocation of grid-forming (GFM) IBRs or suffer from limited accuracy in representing frequency dynamics due to model simplifications. To address this issue, this paper proposes a deep learning (DL)-based FCOPF (DL-FCOPF) framework. A DL model is first developed as a predictor to accurately estimate frequency-related metrics: the required reserved headroom and allocation of GFM IBRs, the rate of change of frequency and frequency nadir. After being trained with data obtained from electromagnetic transient simulations, the DL model is reformulated and incorporated into FCOPF. Case studies conducted on two test systems demonstrate the effectiveness of the proposed approach. Compared with the traditional OPF and linear FCOPF benchmarks, the DL-FCOPF can optimally coordinate SGs and IBRs with minimum cost, achieving desired frequency response, within an acceptable computing time. Further-more, sensitivity analyses are conducted to identify the most suit-able structure and linearization approach of the DL-based frequency predictor.
How to Capture Human Preference: Commissioning of a Robotic Use-Case via Preferential Bayesian Optimisation
The popularity of Bayesian Optimization (BO) to automate or support the commissioning of engineering systems is rising. Conventional BO, however, relies on the availability of a scalar objective function. The latter is often difficult to define and rarely captures the nuanced judgement of expert operators in industrial settings. Preferential Bayesian Optimization (PBO) addresses this limitation by relying solely on pairwise preference feedback of a human expert, so-called duels. In this paper, we study PBO's capacity to commission a particular setup where a manipulator needs to push a block towards a target position. We benchmark state-of-the-art algorithms in both simulations and in the real world. Our results confirm that PBO can commission the set-up to the satisfaction of an expert operator whilst relying solely on binary preference feedback. To evaluate to what extend the same result can be achieved using conventional BO we investigate the experts decision consistency against an expert-designed cost function. Our study reveals that the experts fail to define a cost function that is in full agreement with their own decision process as witnessed in the PBO experiments. We then show that the auxiliary cost function that is constructed as a by-product of the PBO algorithms outperforms the expert-designed cost function in terms of decision consistency. Furthermore we demonstrate that this cost function can be used with conventional BO algorithms in an effort to reproduce the optimal design. This proofs the preference based cost function captures the experts' preferences perhaps more effectively than the experts could articulate preference themselves. In conclusion, we discuss downsides and propose directions for future research.
An innovative circuit for testing hot carrier and trap generation in GaN Devices
Microelectronic devices in modern systems are working continuously for prolonged periods of many years. Thus, there is a crucial need for reliability model that will enable us to predict precisely the life cycle of the device and point out on the governing failure mechanisms that are responsible for degradation and failure. This is even more important when high power and frequency devices, especially Normally Off Switch Power transistors are concerned, since the reliability research on those devices lay far behind that of low power digital devices. Our main goal in our lab is to investigate the failure mechanisms of GaN transistors, aimed at determining the reliability factors of the innovative MTOL model. The main goal is to understand the reason for the transformation of the failure mechanism. Employing the recorded data may enable us to predict the performance and life time of the device at different operation parameters such as current, voltage, frequency and temperatures. In this study we employ a new different use for the well-known boost convertor circuit, based on GaN Devices in order to stress the transistor to the maximum values of voltage and current which allows us to examine the reliability of the transistor and accelerating Hot Carrier And Trap Generation failures mechanisms. The acceleration of the failure mechanism should be done in a way that will not affect the electronic device detrimentally and on the other hand we would not need to wait a long time in order to observe the degradation. In this work we will present our new boost converter circuit based on high power GaN HEMT.
Dynamic Log-Gaussian Process Control Barrier Function for Safe Robotic Navigation in Dynamic Environments
Control Barrier Functions (CBFs) have emerged as efficient tools to address the safe navigation problem for robot applications. However, synthesizing informative and obstacle motion-aware CBFs online using real-time sensor data remains challenging, particularly in unknown and dynamic scenarios. Motived by this challenge, this paper aims to propose a novel Gaussian Process-based formulation of CBF, termed the Dynamic Log Gaussian Process Control Barrier Function (DLGP-CBF), to enable real-time construction of CBF which are both spatially informative and responsive to obstacle motion. Firstly, the DLGP-CBF leverages a logarithmic transformation of GP regression to generate smooth and informative barrier values and gradients, even in sparse-data regions. Secondly, by explicitly modeling the DLGP-CBF as a function of obstacle positions, the derived safety constraint integrates predicted obstacle velocities, allowing the controller to proactively respond to dynamic obstacles' motion. Simulation results demonstrate significant improvements in obstacle avoidance performance, including increased safety margins, smoother trajectories, and enhanced responsiveness compared to baseline methods.
comment: To be presented in the 64th IEEE Conference on Decision and Control (CDC 2025)
Bayesian Ambiguity Contraction-based Adaptive Robust Markov Decision Processes for Adversarial Surveillance Missions
Collaborative Combat Aircraft (CCAs) are envisioned to enable autonomous Intelligence, Surveillance, and Reconnaissance (ISR) missions in contested environments, where adversaries may act strategically to deceive or evade detection. These missions pose challenges due to model uncertainty and the need for safe, real-time decision-making. Robust Markov Decision Processes (RMDPs) provide worst-case guarantees but are limited by static ambiguity sets that capture initial uncertainty without adapting to new observations. This paper presents an adaptive RMDP framework tailored to ISR missions with CCAs. We introduce a mission-specific formulation in which aircraft alternate between movement and sensing states. Adversarial tactics are modeled as a finite set of transition kernels, each capturing assumptions about how adversarial sensing or environmental conditions affect rewards. Our approach incrementally refines policies by eliminating inconsistent threat models, allowing agents to shift from conservative to aggressive behaviors while maintaining robustness. We provide theoretical guarantees showing that the adaptive planner converges as credible sets contract to the true threat and maintains safety under uncertainty. Experiments under Gaussian and non-Gaussian threat models across diverse network topologies show higher mission rewards and fewer exposure events compared to nominal and static robust planners.
Integrated YOLOP Perception and Lyapunov-based Control for Autonomous Mobile Robot Navigation on Track
This work presents a real-time autonomous track navigation framework for nonholonomic differential-drive mobile robots by jointly integrating multi-task visual perception and a provably stable tracking controller. The perception pipeline reconstructs lane centerlines using 2D-to-3D camera projection, arc-length based uniform point resampling, and cubic polynomial fitting solved via robust QR least-squares optimization. The controller regulates robot linear and angular velocities through a Lyapunov-stability grounded design, ensuring bounded error dynamics and asymptotic convergence of position and heading deviations even in dynamic and partially perceived lane scenarios, without relying on HD prior maps or global satellite localization. Real-world experiments on embedded platforms verify system fidelity, real-time execution, trajectory smoothness, and closed-loop stability for reliable autonomous navigation.
comment: This is a master's graduation thesis that has not been formally published. Uploaded with the author's copyright permission. No confidential content involved
Towards a Multi-Layer Defence Framework for Securing Near-Real-Time Operations in Open RAN
Securing the near-real-time (near-RT) control operations in Open Radio Access Networks (Open RAN) is increasingly critical, yet remains insufficiently addressed, as new runtime threats target the control loop while the system is operational. In this paper, we propose a multi-layer defence framework designed to enhance the security of near-RT RAN Intelligent Controller (RIC) operations. We classify operational-time threats into three categories, message-level, data-level, and control logic-level, and design and implement a dedicated detection and mitigation component for each: a signature-based E2 message inspection module performing structural and semantic validation of signalling exchanges, a telemetry poisoning detector based on temporal anomaly scoring using an LSTM network, and a runtime xApp attestation mechanism based on execution-time hash challenge-response. The framework is evaluated on an O-RAN testbed comprising FlexRIC and a commercial RAN emulator, demonstrating effective detection rates, low latency overheads, and practical integration feasibility. Results indicate that the proposed safeguards can operate within near-RT time constraints while significantly improving protection against runtime attacks, introducing less than 80 ms overhead for a network with 500 User Equipment (UEs). Overall, this work lays the foundation for deployable, layered, and policy-driven runtime security architectures for the near-RT RIC control loop in Open RAN, and provides an extensible framework into which future mitigation policies and threat-specific modules can be integrated.
comment: This is the authors preprint version. The manuscript has been submitted to IEEE
Combined Effects of Transient Ionizing and Electromagnetic Pulse on Vertical NPN Bipolar Transistor
Combined effects of transient ionizing and electromagnetic pulse on vertical NPN bipolar transistor were experimentally investigated under pulsed X-ray irradiation. Technology computer-aided design (TCAD) simulation method was also employed to explore the underlying physical mechanisms. The results demonstrate that the combined effect of a positive pulse injected into the collector (CEMP) and pulsed X-ray irradiation exceeds the linear superposition of their individual effects. Conversely, the combined effect of a positive pulse injected into the base (BEMP) and pulsed X-ray irradiation aligns closely with the results observed under BEMP acting alone. Mechanism analysis reveals that when CEMP and pulsed X-ray irradiation act simultaneously, there is a significant increase in both the drift photocurrent at the collector junction and the diffusion photocurrent near the collector junction. However, when BEMP and pulsed X-ray irradiation act simultaneously, these photocurrent components remain small, leading to a combined effect similar to the results observed when BEMP acts alone. These findings provide critical insights for the radiation-hardening design of bipolar circuits in harsh radiation environments.
comment: 11 pages, 16 figures
On robotic manipulators with time-dependent inertial parameters: From physical consistency to boundedness of the mass matrix
We generalize the robotics equation describing the dynamics of an open kinematic chain to include the effect of time-dependent change of inertial parameters as well as the effects of its cause, i.e. time dependency of the distributions of mass originating from parasitic movements of mass-carrying particles. The results generate insight that allows linking the novel concepts of uniform physical consistency and upper boundedness of inertial parameters -- ruling out approaching the edge to physical inconsistency or to diverge -- with the existence of finite, positive uniform bounds of the mass matrix.
comment: to be published in Nonlinear Dynamics
A Unified Bayesian Framework for Stochastic Data-Driven Smoothing, Prediction, and Control
Extending data-driven algorithms based on Willems' fundamental lemma to stochastic data often requires empirical and customized workarounds. This work presents a unified Bayesian framework that provides a systematic and general method for handling stochastic data-driven tasks, including smoothing, prediction, and control, via maximum a posteriori estimation. %This extends behavioral systems theory to stochastic data. This framework formulates a unified trajectory estimation problem for the three tasks by specifying different types of trajectory knowledge. Then, a Bayesian problem is solved that optimally combines trajectory knowledge with a data-driven characterization of the trajectory from offline data for a general class of stochastic disturbances. Under specific conditions, this problem is shown to generalize existing data-driven prediction and control algorithms. Numerical examples demonstrate the performance of the unified approach for all three tasks against other data-driven and system identification approaches.
Value of Communication in Goal-Oriented Semantic Communications: A Pareto Analysis
Emerging cyber-physical systems increasingly operate under stringent communication constraints that preclude the reliable transmission of their extensive machine-type data streams. Since raw measurements often contain correlated or redundant components, effective operation depends not on transmitting all available data but on selecting the information that contributes to achieving the objectives of the system. Beyond accuracy, goal-oriented semantic communication assesses the \emph{value of information} and aims to generate and transmit only what is relevant and at the right time. Motivated by this perspective, this work studies the \emph{value of communication} through the canonical setting of remote estimation of Markov sources, where a value-of-information measure quantifies the relevance of information. We investigate how optimal estimation performance varies with the available communication budget and determine the marginal performance gain attributable to additional communication. Our approach is based on a \emph{Pareto analysis} that characterizes the complete set of policies that achieve optimal trade-offs between estimation performance and communication cost. The value of communication is defined as the absolute slope of the resulting Pareto frontier. Although computing this frontier is non-trivial, we demonstrate that in our setting it admits a notably tractable structure: it is strictly decreasing, convex, and piecewise linear, and its slope is governed by a finite collection of constants. Moreover, each Pareto-optimal operating point is realizable as a convex combination of two stationary deterministic policies, enabling practical implementation. Leveraging these structural insights, we introduce SPLIT, an efficient and provably optimal algorithm for constructing the complete Pareto frontier.
comment: This work has been submitted to the IEEE for possible publication
RE-LLM: Integrating Large Language Models into Renewable Energy Systems
Energy system models are increasingly employed to guide long-term planning in multi-sectoral environments where decisions span electricity, heat, transport, land use, and industry. While these models provide rigorous quantitative insights, their outputs are often highly technical, making them difficult to interpret for non-expert stakeholders such as policymakers, planners, and the public. This communication gap limits the accessibility and practical impact of scenario-based modeling, particularly as energy transitions grow more complex with rising shares of renewables, sectoral integration, and deep uncertainties. To address this challenge, we propose the Renewable Energy Large Language Model (RE-LLM), a hybrid framework that integrates Large Language Models (LLMs) directly into the energy system modeling workflow. RE-LLM combines three core elements: (i) optimization-based scenario exploration, (ii) machine learning surrogates that accelerate computationally intensive simulations, and (iii) LLM-powered natural language generation that translates complex results into clear, stakeholder-oriented explanations. This integrated design not only reduces computational burden but also enhances inter-pretability, enabling real-time reasoning about trade-offs, sensitivities, and policy implications. The framework is adaptable across different optimization platforms and energy system models, ensuring broad applicability beyond the case study presented. By merging speed, rigor, and interpretability, RE-LLM advances a new paradigm of human-centric energy modeling. It enables interactive, multilingual, and accessible engagement with future energy pathways, ultimately bridging the final gap between data-driven analysis and actionable decision-making for sustainable transitions.
Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework
This paper introduces a new family of multi-parent recombination operators for Genetic Algorithms (GAs), based on normalized Pascal (binomial) coefficients. Unlike classical two-parent crossover operators, Pascal-Weighted Recombination (PWR) forms offsprings as structured convex combination of multiple parents, using binomially shaped weights that emphasize central inheritance while suppressing disruptive variance. We develop a mathematical framework for PWR, derive variance-transfer properties, and analyze its effect on schema survival. The operator is extended to real-valued, binary/logit, and permutation representations. We evaluate the proposed method on four representative benchmarks: (i) PID controller tuning evaluated using the ITAE metric, (ii) FIR low-pass filter design under magnitude-response constraints, (iii) wireless power-modulation optimization under SINR coupling, and (iv) the Traveling Salesman Problem (TSP). We demonstrate how, across these benchmarks, PWR consistently yields smoother convergence, reduced variance, and achieves 9-22% performance gains over standard recombination operators. The approach is simple, algorithm-agnostic, and readily integrable into diverse GA architectures.
comment: 23 pages, 8 figures
Input-Output Data-Driven Representation: Non-Minimality and Stability
Many recent data-driven control approaches for linear time-invariant systems are based on finite-horizon prediction of output trajectories using input-output data matrices. When applied recursively, this predictor forms a dynamic system representation. This data-driven representation is generally non-minimal, containing latent poles in addition to the system's original poles. In this article, we show that these latent poles are guaranteed to be stable through the use of the Moore-Penrose inverses of the data matrices, regardless of the system's stability and even in the presence of small noise in data. This result obviates the need to eliminate the latent poles through procedures that resort to low-rank approximation in data-driven control and analysis. It is then applied to construct a stabilizable and detectable realization from data, from which we design an output feedback linear quadratic regulator (LQR) controller. Furthermore, we extend this principle to data-driven inversion, enabling asymptotic unknown input estimation for minimum-phase systems.
Physics-Constrained Neural Dynamics: A Unified Manifold Framework for Large-Scale Power Flow Computation
Power flow analysis is a fundamental tool for power system analysis, planning, and operational control. Traditional Newton-Raphson methods suffer from limitations such as initial value sensitivity and low efficiency in batch computation, while existing deep learning-based power flow solvers mostly rely on supervised learning, requiring pre-solving of numerous cases and struggling to guarantee physical consistency. This paper proposes a neural physics power flow solving method based on manifold geometry and gradient flow, by describing the power flow equations as a constraint manifold, and constructing an energy function \(V(\mathbf{x}) = \frac{1}{2}\|\mathbf{F}(\mathbf{x})\|^2\) and gradient flow \(\frac{d\mathbf{x}}{dt} = -\nabla V(\mathbf{x})\), transforming power flow solving into an equilibrium point finding problem for dynamical systems. Neural networks are trained in an unsupervised manner by directly minimizing physical residuals, requiring no labeled data, achieving true "end-to-end" physics-constrained learning.
Autonomous Navigation and Station-Keeping on Near-Rectilinear Halo Orbits
This article develops an optical navigation (OPNAV) and station-keeping pipeline for the near-rectilinear halo orbit (NRHO) in high-fidelity ephemeris model dynamics. The pipeline involves synthetic images used by the non-iterative horizon-based OPNAV algorithm, fed into an extended Kalman filter. The state estimate is used by a controller to maintain the spacecraft's motion within the vicinity of a reference NRHO. We study differential correction-based and minimization-based implementations of the x-axis crossing control scheme, and propose an improved targeting prediction scheme by incorporating the filter's state covariance with an unscented transform. We also introduce a hysteresis mechanism, which improves station-keeping cost and provides insight into the difference in performance between the differential correction-based and minimization-based approaches. We perform Monte-Carlo experiments to assess the pipeline's tracking and ΔV performances. We report several key findings, including the variability of the filter performance with the sensor field of view and measurement locations, station-keeping cost reduction achieved by the unscented transform-based prediction and hysteresis, as well as variability of the cumulative ΔV as a function of maneuver location due to the periodic structure in the OPNAV-based filter's estimation accuracy.
A TinyML Reinforcement Learning Approach for Energy-Efficient Light Control in Low-Cost Greenhouse Systems CEC 2025
This study presents a reinforcement learning (RL)-based control strategy for adaptive lighting regulation in controlled environments using a low-power microcontroller. A model-free Q-learning algorithm was implemented to dynamically adjust the brightness of a Light-Emitting Diode (LED) based on real-time feedback from a light-dependent resistor (LDR) sensor. The system was trained to stabilize at 13 distinct light intensity levels (L1 to L13), with each target corresponding to a specific range within the 64-state space derived from LDR readings. A total of 130 trials were conducted, covering all target levels with 10 episodes each. Performance was evaluated in terms of convergence speed, steps taken, and time required to reach target states. Box plots and histograms were generated to analyze the distribution of training time and learning efficiency across targets. Experimental validation demonstrated that the agent could effectively learn to stabilize at varying light levels with minimal overshooting and smooth convergence, even in the presence of environmental perturbations. This work highlights the feasibility of lightweight, on-device RL for energy-efficient lighting control and sets the groundwork for multi-modal environmental control applications in resource-constrained agricultural systems.
comment: Copyright 2025 IEEE. This is the author's version of the work that has been accepted for publication in Proceedings of the 5. Interdisciplinary Conference on Electrics and Computer (INTCEC 2025) 15-16 September 2025, Chicago-USA. The final version of record is available at: https://doi.org/10.1109/INTCEC65580.2025.11256135
Reverse Engineering and Control-Aware Security Analysis of the ArduPilot UAV Framework
Unmanned Aerial Vehicle (UAV) technologies are gaining high interest for many domains, which makes UAV security of utmost importance. ArduPilot is among the most widely used open-source autopilot UAV frameworks; yet, many studies demonstrate the vulnerabilities affecting such systems. Vulnerabilities within its communication subsystems (including WiFi, telemetry, or GPS) expose critical entry points, and vulnerabilities in Ardupilot can affect the control procedure. In this paper, we reconstruct the software architecture and the control models implemented by ArduPilot and then examine how these control models could potentially misused to induce malicious behaviors while relying on legitimate inputs.
Edge-Native, Behavior-Adaptive Drone System for Wildlife Monitoring
Wildlife monitoring with drones must balance competing demands: approaching close enough to capture behaviorally-relevant video while avoiding stress responses that compromise animal welfare and data validity. Human operators face a fundamental attentional bottleneck: they cannot simultaneously control drone operations and monitor vigilance states across entire animal groups. By the time elevated vigilance becomes obvious, an adverse flee response by the animals may be unavoidable. To solve this challenge, we present an edge-native, behavior-adaptive drone system for wildlife monitoring. This configurable decision-support system augments operator expertise with automated group-level vigilance monitoring. Our system continuously tracks individual behaviors using YOLOv11m detection and YOLO-Behavior classification, aggregates vigilance states into a real-time group stress metric, and provides graduated alerts (alert vigilance to flee response) with operator-tunable thresholds for context-specific calibration. We derive service-level objectives (SLOs) from video frame rates and behavioral dynamics: to monitor 30fps video streams in real-time, our system must complete detection and classification within 33ms per frame. Our edge-native pipeline achieves 23.8ms total inference on GPU-accelerated hardware, meeting this constraint with a substantial margin. Retrospective analysis of seven wildlife monitoring missions demonstrates detection capability and quantifies the cost of reactive control: manual piloting results in 14 seconds average adverse behavior duration with 71.9% usable frames. Our analysis reveals operators could have received actionable alerts 51s before animals fled in 57% of missions. Simulating 5-second operator intervention yields a projected performance of 82.8% usable frames with 1-second adverse behavior duration,a 93% reduction compared to manual piloting.
comment: Accepted to DroneSys: First Workshop on Autonomous Drone Computing Systems and Applications at the ACM/IEEE Symposium on Edge Computing (SEC) 2025
Verifying Closed-Loop Contractivity of Learning-Based Controllers via Partitioning
We address the problem of verifying closed-loop contraction in nonlinear control systems whose controller and contraction metric are both parameterized by neural networks. By leveraging interval analysis and interval bound propagation, we derive a tractable and scalable sufficient condition for closed-loop contractivity that reduces to checking that the dominant eigenvalue of a symmetric Metzler matrix is nonpositive. We combine this sufficient condition with a domain partitioning strategy to integrate this sufficient condition into training. The proposed approach is validated on an inverted pendulum system, demonstrating the ability to learn neural network controllers and contraction metrics that provably satisfy the contraction condition.
Scalable Distributed Nonlinear Control Under Flatness-Preserving Coupling
We study distributed control for a network of nonlinear, differentially flat subsystems subject to dynamic coupling. Although differential flatness simplifies planning and control for isolated subsystems, the presence of coupling can destroy this property for the overall joint system. Focusing on subsystems in pure-feedback form, we identify a class of compatible lower-triangular dynamic couplings that preserve flatness and guarantee that the flat outputs of the subsystems remain the flat outputs of the coupled system. Further, we show that the joint flatness diffeomorphism can be constructed from those of the individual subsystems and, crucially, its sparsity structure reflects that of the coupling. Exploiting this structure, we synthesize a distributed tracking controller that computes control actions from local information only, thereby ensuring scalability. We validate our proposed framework on a simulated example of planar quadrotors dynamically coupled via aerodynamic downwash, and show that the distributed controller achieves accurate trajectory tracking.
Learning Dissipative Chaotic Dynamics with Boundedness Guarantee
Chaotic dynamics, commonly seen in weather systems and fluid turbulence, are characterized by their sensitivity to initial conditions, which makes accurate prediction challenging. Recent approaches have focused on developing data-driven models that attempt to preserve invariant statistics over long horizons since many chaotic systems exhibit dissipative behaviors and ergodicity. Despite the recent progress in such models, they are still often prone to generating unbounded trajectories, leading to invalid statistics evaluation. To address this fundamental challenge, we introduce a modular framework that provides formal guarantees of trajectory boundedness for neural network chaotic dynamics models. Our core contribution is a dissipative projection layer that leverages control-theoretic principles to ensure the learned system is dissipative. Specifically, our framework simultaneously learns a dynamics emulator and an energy-like function, where the latter is used to construct an algebraic dissipative constraint within the projection layer. Furthermore, the learned invariant level set provides an outer estimate for the system's strange attractor, which is known to be difficult to characterize due to its complex geometry. We demonstrate our model's ability to produce bounded long-horizon forecasts that preserve invariant statistics for chaotic dynamical systems, including Lorenz 96 and a reduced-order model of the Kuramoto-Sivashinsky equation.
An Exact, Finite Dimensional Representation for Full-Block, Circle Criterion Multipliers
This paper provides the first finite-dimensional characterization for the complete set of full-block, circle criterion multipliers. We consider the interconnection of a discrete-time, linear time-invariant system in feedback with a non-repeated, sector-bounded nonlinearity. Sufficient conditions for stability and performance can be derived using: (i) dissipation inequalities, and (ii) Quadratic Constraints (QCs) that bound the input/output pairs of the nonlinearity. Larger classes of QCs (or multipliers) reduce the conservatism of the conditions. Full-block, circle criterion multipliers define the complete set of all possible QCs for non-repeated, sector-bounded nonlinearities. These provide the least conservative conditions. However, full-block multipliers are defined by an uncountably infinite number of constraints and hence do not lead to computationally tractable solutions if left in this raw form. This paper provides a new finite-dimensional characterization for the set of full-block, circle criterion multipliers. The key theoretical insight is: the set of all input/output pairs of non-repeated sector-bounded nonlinearities is equal to the set of all incremental pairs for an appropriately constructed piecewise linear function. Our new description for the complete set of multipliers only requires a finite number of matrix copositivity constraints. These conditions have an exact, computationally tractable implementation for problems where the nonlinearity has small input/output dimensions $(\le 4)$. We illustrate the use of our new characterization via a simple example.
Meta-Reinforcement Learning for Building Energy Management System
The building sector is one of the largest contributors to global energy consumption. Improving its energy efficiency is essential for reducing operational costs and greenhouse gas emissions. Energy management systems (EMS) play a key role in monitoring and controlling building appliances efficiently and reliably. With the increasing integration of renewable energy, intelligent EMS solutions have received growing attention. Reinforcement learning (RL) has recently been explored for this purpose and shows strong potential. However, most RL-based EMS methods require a large number of training steps to learn effective control policies, especially when adapting to unseen buildings, which limits their practical deployment. This paper introduces MetaEMS, a meta-reinforcement learning framework for EMS. MetaEMS improves learning efficiency by transferring knowledge from previously solved tasks to new ones through group-level and building-level adaptation, enabling fast adaptation and effective control across diverse building environments. Experimental results demonstrate that MetaEMS adapts more rapidly to unseen buildings and consistently outperforms baseline methods across various scenarios.
comment: arXiv admin note: text overlap with arXiv:1909.10165 by other authors
Hidden Markov Models for Stock Market Prediction
The stock market presents a challenging environment for accurately predicting future stock prices due to its intricate and ever-changing nature. However, the utilization of advanced methodologies can significantly enhance the precision of stock price predictions. One such method is Hidden Markov Models (HMMs). HMMs are statistical models that can be used to model the behavior of a partially observable system, making them suitable for modeling stock prices based on historical data. Accurate stock price predictions can help traders make better investment decisions, leading to increased profits. In this article, we trained and tested a Hidden Markov Model for the purpose of predicting a stock closing price based on its opening price and the preceding day's prices. The model's performance has been evaluated using two indicators: Mean Average Prediction Error (MAPE), which specifies the average accuracy of our model, and Directional Prediction Accuracy (DPA), a newly introduced indicator that accounts for the number of fractional change predictions that are correct in sign.
comment: This version contains some errors and will be updated soon. The main ideas remain valid; corrections concern structure and implementation details
Decision-Focused Learning for Neural Network-Constrained HVAC Scheduling
Heating, Ventilation, and Air Conditioning (HVAC) is a major electricity end-use with a substantial potential for providing grid services, such as demand response. Harnessing this flexibility requires accurate modeling of the thermal dynamics of buildings, a difficult task because nonlinear heat transfer and recurring daily cycles make historical data highly correlated and insufficient to generalize to new weather, occupancy, and control scenarios. This paper presents an HVAC management system formulated as a Mixed Integer Quadratic Program (MIQP), where Neural Network (NN) models of thermal dynamics are embedded as exact mixed-integer linear constraints. Unlike traditional training approaches that minimize prediction errors, we employ Decision-Focused Learning (DFL) to learn the NN parameters with the objective of directly improving the HVAC cost performance. However, the discrete nature of MIQP hinders DFL, as it leads to undefined and discontinuous gradients, thus impeding standard gradient-based training. We leverage Stochastic Smoothing (SS) to enable efficient gradient computation without the need to differentiate the MIQP. Experiments on a realistic five-zone building using a high-fidelity simulator demonstrate that the proposed SS-DFL approach outperforms conventional identify-then-optimize (i.e., the thermal dynamics model is identified on historical data then used in optimization) and relaxed DFL methods in both cost savings and grid service performance, highlighting its potential for scalable, grid-aware building control.
comment: 14 pages; accepted IEEE Transactions on Smart Grid;
Global Convergence of Policy Gradient for Entropy Regularized Linear-Quadratic Control with Multiplicative Noise
Reinforcement Learning (RL) has emerged as a powerful framework for sequential decision-making in dynamic environments, particularly when system parameters are unknown. This paper investigates RL-based control for entropy-regularized linear-quadratic (LQ) control problems with multiplicative noise over an infinite time horizon. First, we adapt the regularized policy gradient (RPG) algorithm to stochastic optimal control settings, proving that despite the non-convexity of the problem, RPG converges globally under conditions of gradient domination and almost-smoothness. Second, based on zero-order optimization approach, we introduce a novel model free RL algorithm: Sample-based regularized policy gradient (SB-RPG). SB-RPG operates without knowledge of system parameters yet still retains strong theoretical guarantees of global convergence. Our model leverages entropy regularization to address the exploration versus exploitation trade-off inherent in RL. Numerical simulations validate the theoretical results and demonstrate the efficiency of SB-RPG in unknown-parameters environments.
comment: The authors found that article contains some theoretical errors and decided to withdraw from arxiv
Dynamical State Feedback Control for Linear Input Delay Systems, Part I: Dissipative Stabilization via Semidefinite Programming
We propose an SDP-based framework to address the stabilization of input delay systems while taking into account dissipative constraints. A key to our approach is the introduction of the concept of parameterized linear dynamical state feedbacks (LDSFs), which draws inspiration from recent advancements in the analyses of distributed delays. The parameterized LDSFs generalize conventional predictor controllers, where the interpretation of state prediction is concealed and their degree of parameterization can be increased by adjusting the integral kernels. A sufficient condition for the existence of dissipative LDSFs is formulated as matrix inequalities by constructing a complete type Krasovskiĭ functional. To solve the bilinear matrix inequality in the synthesis condition, we employ an off-line inner convex approximation algorithm that can be initialized using the gains of predictor controllers obtained via explicit construction. So the unknowns of our LTDS can be computed by solving convex semidefinite programs. Numerical examples and simulations were experimented to demonstrate the validity and effectiveness of our methodology.
L2RU: a Structured State Space Model with prescribed L2-bound
Structured state-space models (SSMs) have recently emerged as a powerful architecture at the intersection of machine learning and control, featuring layers composed of discrete-time linear time-invariant (LTI) systems followed by pointwise nonlinearities. These models combine the expressiveness of deep neural networks with the interpretability and inductive bias of dynamical systems, offering strong performance on long-sequence tasks with favorable computational complexity. However, their adoption in applications such as system identification and optimal control remains limited by the difficulty of enforcing stability and robustness in a principled and tractable manner. We introduce L2RU, a class of SSMs endowed with a prescribed $\mathcal{L}_2$-gain bound, guaranteeing input--output stability and robustness for all parameter values. The L2RU architecture is derived from free parametrizations of LTI systems satisfying an $\mathcal{L}_2$ constraint, enabling unconstrained optimization via standard gradient-based methods while preserving rigorous stability guarantees. Specifically, we develop two complementary parametrizations: a non-conservative formulation that provides a complete characterization of square LTI systems with a given $\mathcal{L}_2$-bound, and a conservative formulation that extends the approach to general (possibly non-square) systems while improving computational efficiency through a structured representation of the system matrices. Both parametrizations admit efficient initialization schemes that facilitate training long-memory models. We demonstrate the effectiveness of the proposed framework on a nonlinear system identification benchmark, where L2RU achieves improved performance and training stability compared to existing SSM architectures, highlighting its potential as a principled and robust building block for learning and control.
A distributed motion planning approach to cooperative underwater acoustic source tracking
Accurate tracking of underwater acoustic sources is critical for a variety of marine applications, yet remains a challenging task due to communication constraints and environmental uncertainties. In this regard, this paper addresses the problem of underwater acoustic source tracking using a team of autonomous underwater vehicles (AUVs). The core idea is to optimize the guidance of each agent to achieve coordinated motion planning that leads to optimal geometric configurations with respect to the target, thereby enhancing tracking performance. To tackle this, we propose a Distributed Model Predictive Control (DMPC) framework to improve performance and robustness. The control problem is formulated as a multi-objective optimization task, incorporating geometric observability, proximity to the target, and communication connectivity. A Receding Horizon Control (RHC) approach, coupled with an Unscented Transform (UT)-based prediction scheme, is employed to ensure longterm tracking accuracy while accounting for uncertainties. The optimization is distributed using the sequential multi-agent decision-making framework, combined with the Time-Division Multiple Access (TDMA) communication protocol. The proposed methodology is implemented in a simulation environment that accounts for the constraints of acoustic communication. The approach is compared with existing methods such as decentralized MPC and Particle Swarm Optimization (PSO).
How to Adapt Control Barrier Functions? A Learning-Based Approach with Applications to a VTOL Quadplane
In this paper, we present a novel theoretical framework for online adaptation of Control Barrier Function (CBF) parameters, i.e., of the class K functions included in the CBF condition, under input constraints. We introduce the concept of locally validated CBF parameters, which are adapted online to guarantee finite-horizon safety, based on conditions derived from Nagumo's theorem and tangent cone analysis. To identify these parameters online, we integrate a learning-based approach with an uncertainty-aware verification process that account for both epistemic and aleatoric uncertainties inherent in neural network predictions. Our method is demonstrated on a VTOL quadplane model during challenging transition and landing maneuvers, showcasing enhanced performance while maintaining safety.
comment: 2025 IEEE Conference on Decision and Control (CDC). Project page: https://www.taekyung.me/how-to-adapt-cbf
Differentiable-by-design Nonlinear Optimization for Model Predictive Control
Nonlinear optimization-based control policies, such as those those arising in nonlinear Model Predictive Control, have seen remarkable success in recent years. These policies require solving computationally demanding nonlinear optimization programs online at each time-step. The resulting solution map, viewed as a function of the measured state of the system and design parameters, may not be differentiable, which poses significant challenges if the control policy is embedded in a gradient-based policy optimization scheme. We propose a principled way to regularize the nonlinear optimization problem, obtaining a surrogate derivative even if when the original problem is not differentiable. The surrogate problem is differentiable by design and its solution map coincides with the solution of the unregularized problem. We demonstrate the effectiveness of our approach in a free-final-time optimal control problem and a receding-horizon nonlinear MPC example.
Set-based and Dynamical Feedback-augmented Hands-off Control
A novel set-theoretical approach to hands-off control is proposed, focusing on spatial arguments for command limitation rather than temporal ones. By employing dynamical feedback alongside invariant set-based constraints, actuation is employed only to drive the system's state within a "hands-off region" of its state-space, where the plant can freely evolve in open-loop configuration. A computationally-efficient procedure with strong theoretical guarantees is devised, and its effectiveness is showcased via an intuitive practical example.
comment: 10 pages, 4 figures
Assessing the Technical and Environmental Impacts of Energy Management Systems in Smart Ports
A vital strategy for ports to mitigate the environmental impact of the maritime industry, while complying with frameworks such as the European Green Deal and the Sustainable Development Goals (SDGs), entails the systematic implementation of comprehensive energy management solutions. This paper provides a baseline evaluation of the energy management systems (EMSs) implementation and their impact on energy consumption, carbon emissions, and operational costs in smart ports. Initially, we provide a systematic review of the literature focusing on case studies from prominent ports, including Hamburg, Genoa, Jurong, and Shanghai Yangshan Phase IV. The analysis emphasises key aspects such as energy efficiency, reductions in emissions, and the minimization of operational costs. Subsequently, we formulate an optimisation model to simulate load dispatch, carbon emission reduction, and transport scheduling. Results indicate that EMS deployment reduces annual energy consumption and carbon emissions significantly by approximately 7%-8% and 11%-12% respectively, while achieving substantial cost savings of 30%. The study also identifies critical challenges, including system integration, data quality issues, cybersecurity risks, and the need for standardization. These findings provide valuable insights for port authorities and policymakers, supporting the transition toward more sustainable and efficient port operations.
Time-causal and time-recursive wavelets
When to apply wavelet analysis to real-time temporal signals, where the future cannot be accessed, it is essential to base all the steps in the signal processing pipeline on computational mechanisms that are truly time-causal. This paper describes how a time-causal wavelet analysis can be performed based on concepts developed in the area of temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. By necessity, convolution with truncated exponential kernels in cascade constitutes the only permissable class of kernels, as well as their temporal derivatives as a natural complement to fulfil the admissibility conditions of wavelet representations. For a particular way of choosing the time constants in the resulting infinite convolution of truncated exponential kernels, to ensure temporal scale covariance and thus self-similarity over temporal scales, we describe how mother wavelets can be chosen as temporal derivatives of the resulting time-causal limit kernel. By developing connections between wavelet theory and scale-space theory, we characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signals. We propose that this notion of time-causal wavelet analysis could be a valuable tool for signal processing tasks, where streams of signals are to be processed in real time, specifically for signals that may contain local variations over a rich span of temporal scales, or more generally for analysing physical or biophysical temporal phenomena, where a fully time-causal analysis is called for to be physically realistic.
comment: 28 pages, 11 figures
Curvature-Constrained Vector Field for Motion Planning of Nonholonomic Robots
Vector fields are advantageous in handling nonholonomic motion planning as they provide reference orientation for robots. However, additionally incorporating curvature constraints becomes challenging, due to the interconnection between the design of the curvature-bounded vector field and the tracking controller under underactuation. In this paper, we present a novel framework to co-develop the vector field and the control laws, guiding the nonholonomic robot to the target configuration with curvature-bounded trajectory. First, we formulate the problem by introducing the target positive limit set, which allows the robot to converge to or pass through the target configuration, depending on different dynamics and tasks. Next, we construct a curvature-constrained vector field (CVF) via blending and distributing basic flow fields in workspace and propose the saturated control laws with a dynamic gain, under which the tracking error's magnitude decreases even when saturation occurs. Under the control laws, kinematically constrained nonholonomic robots are guaranteed to track the reference CVF and converge to the target positive limit set with bounded trajectory curvature. Numerical simulations show that the proposed CVF method outperforms other vector-field-based algorithms. Experiments on Ackermann UGVs and semi-physical fixed-wing UAVs demonstrate that the method can be effectively implemented in real-world scenarios.
comment: IEEE T-RO accepted, 20 pages, 22 figures
Autocratic strategies in Cournot oligopoly game
An oligopoly is a market in which the price of goods is controlled by a few firms. Cournot introduced the simplest game-theoretic model of oligopoly, where profit-maximizing behavior of each firm results in market failure. Furthermore, when the Cournot oligopoly game is infinitely repeated, firms can tacitly collude to monopolize the market. Such tacit collusion is realized by the same mechanism as direct reciprocity in the repeated prisoner's dilemma game, where mutual cooperation can be realized whereas defection is favorable for both prisoners in a one-shot game. Recently, in the repeated prisoner's dilemma game, a class of strategies called zero-determinant strategies attracts much attention in the context of direct reciprocity. Zero-determinant strategies are autocratic strategies which unilaterally control payoffs of players by enforcing linear relationships between payoffs. There were many attempts to find zero-determinant strategies in other games and to extend them so as to apply them to broader situations. In this paper, first, we show that zero-determinant strategies exist even in the repeated Cournot oligopoly game, and that they are quite different from those in the repeated prisoner's dilemma game. Especially, we prove that a fair zero-determinant strategy exists, which is guaranteed to obtain the average payoff of the opponents. Second, we numerically show that the fair zero-determinant strategy can be used to promote collusion when it is used against an adaptively learning player, whereas it cannot promote collusion when it is used against two adaptively learning players. Our findings elucidate some negative impact of zero-determinant strategies in the oligopoly market.
comment: 24 pages, 4 figures
Dissipativity-Based Synthesis of Distributed Control and Communication Topology Co-Design for AC Microgrids
This paper introduces a dissipativity-based framework for the joint design of distributed controllers and communication topologies in AC microgrids (MGs), providing robust performance guarantees for voltage regulation, frequency synchronization, and proportional power sharing across distributed generators (DGs). The closed-loop AC MG is represented as a networked system in which DGs, distribution lines, and loads function as interconnected subsystems linked through cyber-physical networks. Each DG utilizes a three-layer hierarchical control structure: a steady-state controller for operating point configuration, a local feedback controller for voltage tracking, and a distributed droop-free controller implementing normalized power consensus for frequency coordination and proportional power distribution. The operating point design is formulated as an optimization problem. Leveraging dissipativity theory, we derive necessary and sufficient subsystem dissipativity conditions. The global co-design is then cast as a convex linear matrix inequality (LMI) optimization that jointly determines distributed controller parameters and sparse communication architecture while managing the highly nonlinear, coupled dq-frame dynamics characteristic of AC systems.
Fast 3D Surrogate Modeling for Data Center Thermal Management AAAI 2026
Reducing energy consumption and carbon emissions in data centers by enabling real-time temperature prediction is critical for sustainability and operational efficiency. Achieving this requires accurate modeling of the 3D temperature field to capture airflow dynamics and thermal interactions under varying operating conditions. Traditional thermal CFD solvers, while accurate, are computationally expensive and require expert-crafted meshes and boundary conditions, making them impractical for real-time use. To address these limitations, we develop a vision-based surrogate modeling framework that operates directly on a 3D voxelized representation of the data center, incorporating server workloads, fan speeds, and HVAC temperature set points. We evaluate multiple architectures, including 3D CNN U-Net variants, a 3D Fourier Neural Operator, and 3D vision transformers, to map these thermal inputs to high-fidelity heat maps. Our results show that the surrogate models generalize across data center configurations and significantly speed up computations (20,000x), from hundreds of milliseconds to hours. This fast and accurate estimation of hot spots and temperature distribution enables real-time cooling control and workload redistribution, leading to substantial energy savings (7\%) and reduced carbon footprint.
comment: Submitted to AAAI 2026 Conference
Robotics
Ethically-Aware Participatory Design of a Productivity Social Robot for College Students
College students often face academic and life stressors affecting productivity, especially students with Attention Deficit Hyperactivity Disorder (ADHD) who experience executive functioning challenges. Conventional productivity tools typically demand sustained self-discipline and consistent use, which many students struggle with, leading to disruptive app-switching behaviors. Socially Assistive Robots (SARs), known for their intuitive and interactive nature, offer promising potential to support productivity in academic environments, having been successfully utilized in domains like education, cognitive development, and mental health. To leverage SARs effectively in addressing student productivity, this study employed a Participatory Design (PD) approach, directly involving college students and a Student Success and Well-Being Coach in the design process. Through interviews and a collaborative workshop, we gathered detailed insights on productivity challenges and identified desirable features for a productivity-focused SAR. Importantly, ethical considerations were integrated from the onset, facilitating responsible and user-aligned design choices. Our contributions include comprehensive insights into student productivity challenges, SAR design preferences, and actionable recommendations for effective robot characteristics. Additionally, we present stakeholder-derived ethical guidelines to inform responsible future implementations of productivity-focused SARs in higher education.
Think Fast: Real-Time Kinodynamic Belief-Space Planning for Projectile Interception
Intercepting fast moving objects, by its very nature, is challenging because of its tight time constraints. This problem becomes further complicated in the presence of sensor noise because noisy sensors provide, at best, incomplete information, which results in a distribution over target states to be intercepted. Since time is of the essence, to hit the target, the planner must begin directing the interceptor, in this case a robot arm, while still receiving information. We introduce an tree-like structure, which is grown using kinodynamic motion primitives in state-time space. This tree-like structure encodes reachability to multiple goals from a single origin, while enabling real-time value updates as the target belief evolves and seamless transitions between goals. We evaluate our framework on an interception task on a 6 DOF industrial arm (ABB IRB-1600) with an onboard stereo camera (ZED 2i). A robust Innovation-based Adaptive Estimation Adaptive Kalman Filter (RIAE-AKF) is used to track the target and perform belief updates.
Tactile Robotics: Past and Future
What is the future of tactile robotics? To help define that future, this article provides a historical perspective on tactile sensing in robotics from the wealth of knowledge and expert opinion in nearly 150 reviews over almost half a century. This history is characterized by a succession of generations: 1965-79 (origins), 1980-94 (foundations and growth), 1995-2009 (tactile winter) and 2010-2024 (expansion and diversification). Recent expansion has led to diverse themes emerging of e-skins, tactile robotic hands, vision-based tactile sensing, soft/biomimetic touch, and the tactile Internet. In the next generation from 2025, tactile robotics could mature to widespread commercial use, with applications in human-like dexterity, understanding human intelligence, and telepresence impacting all robotics and AI. By linking past expert insights to present themes, this article highlights recurring challenges in tactile robotics, showing how the field has evolved, why progress has often stalled, and which opportunities are most likely to define its future.
comment: Accepted in International Journal of Robotics Research (IJRR)
Supporting Productivity Skill Development in College Students through Social Robot Coaching: A Proof-of-Concept
College students often face academic challenges that hamper their productivity and well-being. Although self-help books and productivity apps are popular, they often fall short. Books provide generalized, non-interactive guidance, and apps are not inherently educational and can hinder the development of key organizational skills. Traditional productivity coaching offers personalized support, but is resource-intensive and difficult to scale. In this study, we present a proof-of-concept for a socially assistive robot (SAR) as an educational coach and a potential solution to the limitations of existing productivity tools and coaching approaches. The SAR delivers six different lessons on time management and task prioritization. Users interact via a chat interface, while the SAR responds through speech (with a toggle option). An integrated dashboard monitors progress, mood, engagement, confidence per lesson, and time spent per lesson. It also offers personalized productivity insights to foster reflection and self-awareness. We evaluated the system with 15 college students, achieving a System Usability Score of 79.2 and high ratings for overall experience and engagement. Our findings suggest that SAR-based productivity coaching can offer an effective and scalable solution to improve productivity among college students.
Estimation of Kinematic Motion from Dashcam Footage
The goal of this paper is to explore the accuracy of dashcam footage to predict the actual kinematic motion of a car-like vehicle. Our approach uses ground truth information from the vehicle's on-board data stream, through the controller area network, and a time-synchronized dashboard camera, mounted to a consumer-grade vehicle, for 18 hours of footage and driving. The contributions of the paper include neural network models that allow us to quantify the accuracy of predicting the vehicle speed and yaw, as well as the presence of a lead vehicle, and its relative distance and speed. In addition, the paper describes how other researchers can gather their own data to perform similar experiments, using open-source tools and off-the-shelf technology.
comment: 8 pages, 10 figures
Reinforcement Learning for Gliding Projectile Guidance and Control
This paper presents the development of a control law, which is intended to be implemented on an optical guided glider. This guiding law follows an innovative approach, the reinforcement learning. This control law is used to make navigation more flexible and autonomous in a dynamic environment. The final objective is to track a target detected with the camera and then guide the glider to this point with high precision. Already applied on quad-copter drones, we wish by this study to demonstrate the applicability of reinforcement learning for fixed-wing aircraft on all of its axis.
comment: 6 pages
Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer
Recent progress in GPU-accelerated, photorealistic simulation has opened a scalable data-generation path for robot learning, where massive physics and visual randomization allow policies to generalize beyond curated environments. Building on these advances, we develop a teacher-student-bootstrap learning framework for vision-based humanoid loco-manipulation, using articulated-object interaction as a representative high-difficulty benchmark. Our approach introduces a staged-reset exploration strategy that stabilizes long-horizon privileged-policy training, and a GRPO-based fine-tuning procedure that mitigates partial observability and improves closed-loop consistency in sim-to-real RL. Trained entirely on simulation data, the resulting policy achieves robust zero-shot performance across diverse door types and outperforms human teleoperators by up to 31.7% in task completion time under the same whole-body control stack. This represents the first humanoid sim-to-real policy capable of diverse articulated loco-manipulation using pure RGB perception.
comment: https://doorman-humanoid.github.io/
Autonomous Grasping On Quadruped Robot With Task Level Interaction
Quadruped robots are increasingly used in various applications due to their high mobility and ability to operate in diverse terrains. However, most available quadruped robots are primarily focused on mobility without object manipulation capabilities. Equipping a quadruped robot with a robotic arm and gripper introduces a challenge in manual control, especially in remote scenarios that require complex commands. This research aims to develop an autonomous grasping system on a quadruped robot using a task-level interaction approach. The system includes hardware integration of a robotic arm and gripper onto the quadruped robot's body, a layered control system designed using ROS, and a web-based interface for human-robot interaction. The robot is capable of autonomously performing tasks such as navigation, object detection, and grasping using GraspNet. Testing was conducted through real-world scenarios to evaluate navigation, object selection and grasping, and user experience. The results show that the robot can perform tasks accurately and consistently, achieving a grasping success rate of 75 % from 12 trials. Therefore, the system demonstrates significant potential in enhancing the capabilities of quadruped robots as service robots in real-world environments.
comment: This work has been submitted to the IEEE for possible publication
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos are often sped up by 5-10x to appear smooth, with noticeable action stalls and delayed reactions to environmental changes. Asynchronous inference offers a promising solution to achieve continuous and low-latency control by enabling robots to execute actions and perform inference simultaneously. However, because the robot and environment continue to evolve during inference, a temporal misalignment arises between the prediction and execution intervals. This leads to significant action instability, while existing methods either degrade accuracy or introduce runtime overhead to mitigate it. We propose VLASH, a general asynchronous inference framework for VLAs that delivers smooth, accurate, and fast reaction control without additional overhead or architectural changes. VLASH estimates the future execution-time state by rolling the robot state forward with the previously generated action chunk, thereby bridging the gap between prediction and execution. Experiments show that VLASH achieves up to 2.03x speedup and reduces reaction latency by up to 17.4x compared to synchronous inference while fully preserving the original accuracy. Moreover, it empowers VLAs to handle fast-reaction, high-precision tasks such as playing ping-pong and playing whack-a-mole, where traditional synchronous inference fails. Code is available at https://github.com/mit-han-lab/vlash
CycleManip: Enabling Cyclic Task Manipulation via Effective Historical Perception and Understanding
In this paper, we explore an important yet underexplored task in robot manipulation: cycle-based manipulation, where robots need to perform cyclic or repetitive actions with an expected terminal time. These tasks are crucial in daily life, such as shaking a bottle or knocking a nail. However, few prior works have explored this task, leading to two main challenges: 1) the imitation methods often fail to complete these tasks within the expected terminal time due to the ineffective utilization of history; 2) the absence of a benchmark with sufficient data and automatic evaluation tools hinders development of effective solutions in this area. To address these challenges, we first propose the CycleManip framework to achieve cycle-based task manipulation in an end-to-end imitation manner without requiring any extra models, hierarchical structure or significant computational overhead. The core insight is to enhance effective history perception by a cost-aware sampling strategy and to improve historical understanding by multi-task learning. Second, we introduce a cycle-based task manipulation benchmark, which provides diverse cycle-based tasks, and an automatic evaluation method. Extensive experiments conducted in both simulation and real-world settings demonstrate that our method achieves high success rates in cycle-based task manipulation. The results further show strong adaptability performance in general manipulation, and the plug-and-play ability on imitation policies such as Vision-Language-Action (VLA) models. Moreover, the results show that our approach can be applied across diverse robotic platforms, including bi-arm grippers, dexterous hands, and humanoid robots.
comment: Project page: https://isee-laboratory.github.io/OmniDexGrasp/
Integration of UWB Radar on Mobile Robots for Continuous Obstacle and Environment Mapping
This paper presents an infrastructure-free approach for obstacle detection and environmental mapping using ultra-wideband (UWB) radar mounted on a mobile robotic platform. Traditional sensing modalities such as visual cameras and Light Detection and Ranging (LiDAR) fail in environments with poor visibility due to darkness, smoke, or reflective surfaces. In these visioned-impaired conditions, UWB radar offers a promising alternative. To this end, this work explores the suitability of robot-mounted UWB radar for environmental mapping in dynamic, anchor-free scenarios. The study investigates how different materials (metal, concrete and plywood) and UWB radio channels (5 and 9) influence the Channel Impulse Response (CIR). Furthermore, a processing pipeline is proposed to achieve reliable mapping of detected obstacles, consisting of 3 steps: (i) target identification (based on CIR peak detection), (ii) filtering (based on peak properties, signal-to-noise score, and phase-difference of arrival), and (iii) clustering (based on distance estimation and angle-of-arrival estimation). The proposed approach successfully reduces noise and multipath effects, resulting in an obstacle detection precision of at least 82.36% and a recall of 89.46% on channel 9 even when detecting low-reflective materials such as plywood. This work offers a foundation for further development of UWB-based localisation and mapping (SLAM) systems that do not rely on visual features and, unlike conventional UWB localisation systems, do not require on fixed anchor nodes for triangulation.
comment: This paper has been submitted to IEEE Access Journal and is currently undergoing review
FOM-Nav: Frontier-Object Maps for Object Goal Navigation
This paper addresses the Object Goal Navigation problem, where a robot must efficiently find a target object in an unknown environment. Existing implicit memory-based methods struggle with long-term memory retention and planning, while explicit map-based approaches lack rich semantic information. To address these challenges, we propose FOM-Nav, a modular framework that enhances exploration efficiency through Frontier-Object Maps and vision-language models. Our Frontier-Object Maps are built online and jointly encode spatial frontiers and fine-grained object information. Using this representation, a vision-language model performs multimodal scene understanding and high-level goal prediction, which is executed by a low-level planner for efficient trajectory generation. To train FOM-Nav, we automatically construct large-scale navigation datasets from real-world scanned environments. Extensive experiments validate the effectiveness of our model design and constructed dataset. FOM-Nav achieves state-of-the-art performance on the MP3D and HM3D benchmarks, particularly in navigation efficiency metric SPL, and yields promising results on a real robot.
comment: Project page: https://www.di.ens.fr/willow/research/fom-nav/
MM-ACT: Learn from Multimodal Parallel Generation to Act
A generalist robotic policy needs both semantic understanding for task planning and the ability to interact with the environment through predictive capabilities. To tackle this, we present MM-ACT, a unified Vision-Language-Action (VLA) model that integrates text, image, and action in shared token space and performs generation across all three modalities. MM-ACT adopts a re-mask parallel decoding strategy for text and image generation, and employs a one-step parallel decoding strategy for action generation to improve efficiency. We introduce Context-Shared Multimodal Learning, a unified training paradigm that supervises generation in all three modalities from a shared context, enhancing action generation through cross-modal learning. Experiments were conducted on the LIBERO simulation and Franka real-robot setups as well as RoboTwin2.0 to assess in-domain and out-of-domain performances respectively. Our approach achieves a success rate of 96.3% on LIBERO, 72.0% across three tasks of real Franka, and 52.38% across eight bimanual tasks of RoboTwin2.0 with an additional gain of 9.25% from cross-modal learning. We release our codes, models and data at https://github.com/HHYHRHY/MM-ACT.
comment: 17 pages
H-Zero: Cross-Humanoid Locomotion Pretraining Enables Few-shot Novel Embodiment Transfer
The rapid advancement of humanoid robotics has intensified the need for robust and adaptable controllers to enable stable and efficient locomotion across diverse platforms. However, developing such controllers remains a significant challenge because existing solutions are tailored to specific robot designs, requiring extensive tuning of reward functions, physical parameters, and training hyperparameters for each embodiment. To address this challenge, we introduce H-Zero, a cross-humanoid locomotion pretraining pipeline that learns a generalizable humanoid base policy. We show that pretraining on a limited set of embodiments enables zero-shot and few-shot transfer to novel humanoid robots with minimal fine-tuning. Evaluations show that the pretrained policy maintains up to 81% of the full episode duration on unseen robots in simulation while enabling few-shot transfer to unseen humanoids and upright quadrupeds within 30 minutes of fine-tuning.
comment: in submission, under review
Constant-Time Motion Planning with Manipulation Behaviors
Recent progress in contact-rich robotic manipulation has been striking, yet most deployed systems remain confined to simple, scripted routines. One of the key barriers is the lack of motion planning algorithms that can provide verifiable guarantees for safety, efficiency and reliability. To address this, a family of algorithms called Constant-Time Motion Planning (CTMP) was introduced, which leverages a preprocessing phase to enable collision-free motion queries in a fixed, user-specified time budget (e.g., 10 milliseconds). However, existing CTMP methods do not explicitly incorporate the manipulation behaviors essential for object handling. To bridge this gap, we introduce the \textit{Behavioral Constant-Time Motion Planner} (B-CTMP), an algorithm that extends CTMP to solve a broad class of two-step manipulation tasks: (1) a collision-free motion to a behavior initiation state, followed by (2) execution of a manipulation behavior (such as grasping or insertion) to reach the goal. By precomputing compact data structures, B-CTMP guarantees constant-time query in mere milliseconds while ensuring completeness and successful task execution over a specified set of states. We evaluate B-CTMP on two canonical manipulation tasks in simulation, shelf picking and plug insertion,and demonstrate its effectiveness on a real robot. Our results show that B-CTMP unifies collision-free planning and object manipulation within a single constant-time framework, providing provable guarantees of speed and success for manipulation in semi-structured environments.
comment: In submission
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Group symmetries provide a powerful inductive bias for reinforcement learning (RL), enabling efficient generalization across symmetric states and actions via group-invariant Markov Decision Processes (MDPs). However, real-world environments almost never realize fully group-invariant MDPs; dynamics, actuation limits, and reward design usually break symmetries, often only locally. Under group-invariant Bellman backups for such cases, local symmetry-breaking introduces errors that propagate across the entire state-action space, resulting in global value estimation errors. To address this, we introduce Partially group-Invariant MDP (PI-MDP), which selectively applies group-invariant or standard Bellman backups depending on where symmetry holds. This framework mitigates error propagation from locally broken symmetries while maintaining the benefits of equivariance, thereby enhancing sample efficiency and generalizability. Building on this framework, we present practical RL algorithms -- Partially Equivariant (PE)-DQN for discrete control and PE-SAC for continuous control -- that combine the benefits of equivariance with robustness to symmetry-breaking. Experiments across Grid-World, locomotion, and manipulation benchmarks demonstrate that PE-DQN and PE-SAC significantly outperform baselines, highlighting the importance of selective symmetry exploitation for robust and sample-efficient RL.
comment: 27 pages, 10 figures
Magnetic Tactile-Driven Soft Actuator for Intelligent Grasping and Firmness Evaluation
Soft robots are powerful tools for manipulating delicate objects, yet their adoption is hindered by two gaps: the lack of integrated tactile sensing and sensor signal distortion caused by actuator deformations. This paper addresses these challenges by introducing the SoftMag actuator: a magnetic tactile-sensorized soft actuator. Unlike systems relying on attached sensors or treating sensing and actuation separately, SoftMag unifies them through a shared architecture while confronting the mechanical parasitic effect, where deformations corrupt tactile signals. A multiphysics simulation framework models this coupling, and a neural-network-based decoupling strategy removes the parasitic component, restoring sensing fidelity. Experiments including indentation, quasi-static and step actuation, and fatigue tests validate the actuator's performance and decoupling effectiveness. Building upon this foundation, the system is extended into a two-finger SoftMag gripper, where a multi-task neural network enables real-time prediction of tri-axial contact forces and position. Furthermore, a probing-based strategy estimates object firmness during grasping. Validation on apricots shows a strong correlation (Pearson r over 0.8) between gripper-estimated firmness and reference measurements, confirming the system's capability for non-destructive quality assessment. Results demonstrate that combining integrated magnetic sensing, learning-based correction, and real-time inference enables a soft robotic platform that adapts its grasp and quantifies material properties. The framework offers an approach for advancing sensorized soft actuators toward intelligent, material-aware robotics.
comment: 25 pages, 24 figures
SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead
Vision-Language-Action (VLA) models built on pretrained Vision-Language Models (VLMs) show strong potential but are limited in practicality due to their large parameter counts. To mitigate this issue, using a lightweight VLM has been explored, but it compromises spatiotemporal reasoning. Although some methods suggest that incorporating additional 3D inputs can help, they usually rely on large VLMs to fuse 3D and 2D inputs and still lack temporal understanding. Therefore, we propose SwiftVLA, an architecture that enhances a compact model with 4D understanding while preserving design efficiency. Specifically, our approach features a pretrained 4D visual geometry transformer with a temporal cache that extracts 4D features from 2D images. Then, to enhance the VLM's ability to exploit both 2D images and 4D features, we introduce Fusion Tokens, a set of learnable tokens trained with a future prediction objective to generate unified representations for action generation. Finally, we introduce a mask-and-reconstruct strategy that masks 4D inputs to the VLM and trains the VLA to reconstruct them, enabling the VLM to learn effective 4D representations and allowing the 4D branch to be dropped at inference with minimal performance loss. Experiments in real and simulated environments show that SwiftVLA outperforms lightweight baselines and rivals VLAs up to 7 times larger, achieving comparable performance on edge devices while being 18 times faster and reducing memory footprint by 12 times.
A Novel MDP Decomposition Framework for Scalable UAV Mission Planning in Complex and Uncertain Environments
This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.
Transforming Monolithic Foundation Models into Embodied Multi-Agent Architectures for Human-Robot Collaboration
Foundation models have become central to unifying perception and planning in robotics, yet real-world deployment exposes a mismatch between their monolithic assumption that a single model can handle all cognitive functions and the distributed, dynamic nature of practical service workflows. Vision-language models offer strong semantic understanding but lack embodiment-aware action capabilities while relying on hand-crafted skills. Vision-Language-Action policies enable reactive manipulation but remain brittle across embodiments, weak in geometric grounding, and devoid of proactive collaboration mechanisms. These limitations indicate that scaling a single model alone cannot deliver reliable autonomy for service robots operating in human-populated settings. To address this gap, we present InteractGen, an LLM-powered multi-agent framework that decomposes robot intelligence into specialized agents for continuous perception, dependency-aware planning, decision and verification, failure reflection, and dynamic human delegation, treating foundation models as regulated components within a closed-loop collective. Deployed on a heterogeneous robot team and evaluated in a three-month open-use study, InteractGen improves task success, adaptability, and human-robot collaboration, providing evidence that multi-agent orchestration offers a more feasible path toward socially grounded service autonomy than further scaling standalone models.
comment: 21 pages, 16 figures, 4 tables
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
To address the gap in humanoid robot cognitive systems regarding the lack of a time-updable mediating thought space between semantics and continuous control, this study constructs and trains a VLA model named "Sigma" that runs on a single RTX 4090. It uses the open-source pi05_base model as a foundation and preprocesses svla_so101_pickplace into a training dataset. The researcher independently designed an architecture for a vision-language-action model that combines deep semantic understanding and association to achieve telepathic communication. The training process involved repeated optimizations of data preprocessing, LoRA fine-tuning, and the inference-stage adapter. The experiment employed offline closed-loop replay, comparing Sigma with the untuned pure pi05_base_base model under data conditions. Results showed that Sigma exhibited a stable decrease in control MSE across vector, fragment, and entire trajectory timescales, while maintaining the telepathy norm and semantic-text alignment quality unchanged. It demonstrates that mind-responsive alignment control is quantified through an architecture that combines deep understanding of semantics and association without retraining the base model, which provides reproducible experience for semantic alignment and intention-driven behavior in humanoid robots.
comment: The Sigma model has been open-sourced on Hugging Face. Weights, dataset, some scripts, and logs are all available. The link is: https://huggingface.co/Veltraxor/Sigma
Sign Language Recognition using Bidirectional Reservoir Computing
Sign language recognition (SLR) facilitates communication between deaf and hearing individuals. Deep learning is widely used to develop SLR-based systems; however, it is computationally intensive and requires substantial computational resources, making it unsuitable for resource-constrained devices. To address this, we propose an efficient sign language recognition system using MediaPipe and an echo state network (ESN)-based bidirectional reservoir computing (BRC) architecture. MediaPipe extracts hand joint coordinates, which serve as inputs to the ESN-based BRC architecture. The BRC processes these features in both forward and backward directions, efficiently capturing temporal dependencies. The resulting states of BRC are concatenated to form a robust representation for classification. We evaluated our method on the Word-Level American Sign Language (WLASL) video dataset, achieving a competitive accuracy of 57.71% and a significantly lower training time of only 9 seconds, in contrast to the 55 minutes and $38$ seconds required by the deep learning-based Bi-GRU approach. Consequently, the BRC-based SLR system is well-suited for edge devices.
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning
Linear Temporal Logic (LTL) provides a rigorous framework for complex robotic tasks, yet existing methods often rely on accurate dynamics models or expensive online interactions. In this work, we address LTL-constrained control in a challenging offline, model-free setting, utilizing only fixed, task-agnostic datasets of fragmented trajectories. We propose SAGAS, a novel framework combining graph-assisted trajectory stitching with automata-guided planning. First, we construct a latent reachability graph from a learned temporal-distance representation. To bridge the semantic gap, we augment this graph with certified anchor nodes and probabilistic soft labels. We then translate the specification into a Büchi automaton and search the implicit product space to derive a cost-minimal prefix-suffix plan. Finally, a subgoal-conditioned low-level policy is deployed to execute these latent waypoints. Experiments on OGBench locomotion domains demonstrate that SAGAS successfully synthesizes efficient trajectories for diverse LTL tasks, effectively bridging the gap between fragmented offline data and complex logical constraints.
MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion
Reinforcement learning has recently enabled impressive locomotion capabilities on legged robots; however, most policy architectures remain morphology- and symmetry-agnostic, leading to inefficient training and limited generalization. This work introduces MS-PPO, a morphological-symmetry-equivariant policy learning framework that encodes robot kinematic structure and morphological symmetries directly into the policy network. We construct a morphology-informed graph neural architecture that is provably equivariant with respect to the robot's morphological symmetry group actions, ensuring consistent policy responses under symmetric states while maintaining invariance in value estimation. This design eliminates the need for tedious reward shaping or costly data augmentation, which are typically required to enforce symmetry. We evaluate MS-PPO in simulation on Unitree Go2 and Xiaomi CyberDog2 robots across diverse locomotion tasks, including trotting, pronking, slope walking, and bipedal turning, and further deploy the learned policies on hardware. Extensive experiments show that MS-PPO achieves superior training stability, symmetry generalization ability, and sample efficiency in challenging locomotion tasks, compared to state-of-the-art baselines. These findings demonstrate that embedding both kinematic structure and morphological symmetry into policy learning provides a powerful inductive bias for legged robot locomotion control. Our code will be made publicly available at https://lunarlab-gatech.github.io/MS-PPO/.
TrajDiff: End-to-end Autonomous Driving without Perception Annotation
End-to-end autonomous driving systems directly generate driving policies from raw sensor inputs. While these systems can extract effective environmental features for planning, relying on auxiliary perception tasks, developing perception annotation-free planning paradigms has become increasingly critical due to the high cost of manual perception annotation. In this work, we propose TrajDiff, a Trajectory-oriented BEV Conditioned Diffusion framework that establishes a fully perception annotation-free generative method for end-to-end autonomous driving. TrajDiff requires only raw sensor inputs and future trajectory, constructing Gaussian BEV heatmap targets that inherently capture driving modalities. We design a simple yet effective trajectory-oriented BEV encoder to extract the TrajBEV feature without perceptual supervision. Furthermore, we introduce Trajectory-oriented BEV Diffusion Transformer (TB-DiT), which leverages ego-state information and the predicted TrajBEV features to directly generate diverse yet plausible trajectories, eliminating the need for handcrafted motion priors. Beyond architectural innovations, TrajDiff enables exploration of data scaling benefits in the annotation-free setting. Evaluated on the NAVSIM benchmark, TrajDiff achieves 87.5 PDMS, establishing state-of-the-art performance among all annotation-free methods. With data scaling, it further improves to 88.5 PDMS, which is comparable to advanced perception-based approaches. Our code and model will be made publicly available.
Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition
Fast, efficient, robust communication during wildfire and other emergency responses is critical. One way to achieve this is by coordinating swarms of autonomous aerial vehicles carrying communications equipment to form an ad-hoc network connecting emergency response personnel to both each other and central command. However, operating in such extreme environments may lead to individual networking agents being damaged or rendered inoperable, which could bring down the network and interrupt communications. To overcome this challenge and enable multi-agent UAV networking in difficult environments, this paper introduces and formalizes the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. Through simulation across 25 problem configurations, $Φ$IREMAN consistently outperforms the DCCRS baseline, and on large-scale problems with up to 100 tasks and 500 drones, maintains $>99.9\%$ task uptime despite substantial attrition, demonstrating both effectiveness and scalability.
comment: 8 pages, 5 figures, 4 tables, submitted to IEEE RA-L
Gentle Object Retraction in Dense Clutter Using Multimodal Force Sensing and Imitation Learning
Dense collections of movable objects are common in everyday spaces-from cabinets in a home to shelves in a warehouse. Safely retracting objects from such collections is difficult for robots, yet people do it frequently, leveraging learned experience in tandem with vision and non-prehensile tactile sensing on the sides and backs of their hands and arms. We investigate the role of contact force sensing for training robots to gently reach into constrained clutter and extract objects. The available sensing modalities are (1) "eye-in-hand" vision, (2) proprioception, (3) non-prehensile triaxial tactile sensing, (4) contact wrenches estimated from joint torques, and (5) a measure of object acquisition obtained by monitoring the vacuum line of a suction cup. We use imitation learning to train policies from a set of demonstrations on randomly generated scenes, then conduct an ablation study of wrench and tactile information. We evaluate each policy's performance across 40 unseen environment configurations. Policies employing any force sensing show fewer excessive force failures, an increased overall success rate, and faster completion times. The best performance is achieved using both tactile and wrench information, producing an 80% improvement above the baseline without force information.
comment: Accepted in IEEE Robotics and Automation Letters (RA-L)
HiMo: High-Speed Objects Motion Compensation in Point Clouds
LiDAR point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for non-ego motion compensation, correcting the representation of dynamic objects in point clouds. During the development of HiMo, we observed that existing self-supervised scene flow estimators often produce degenerate or inconsistent estimates under high-speed distortion. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. Since well-established motion distortion metrics are absent in the literature, we introduce two evaluation metrics: compensation accuracy at a point level and shape similarity of objects. We validate HiMo through extensive experiments on Argoverse 2, ZOD, and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles. Our findings show that HiMo improves the geometric consistency and visual fidelity of dynamic objects in LiDAR point clouds, benefiting downstream tasks such as semantic segmentation and 3D detection. See https://kin-zhang.github.io/HiMo for more details.
comment: 15 pages, 13 figures, Published in Transactions on Robotics (Volume 41)
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models
Vision-Language-Action (VLA) models excel in robotic manipulation but are constrained by their heavy reliance on expert demonstrations, leading to demonstration bias and limiting performance. Reinforcement learning (RL) is a vital post-training strategy to overcome these limits, yet current VLA-RL methods, including group-based optimization approaches, are crippled by severe reward sparsity. Relying on binary success indicators wastes valuable information in failed trajectories, resulting in low training efficiency. To solve this, we propose Self-Referential Policy Optimization (SRPO), a novel VLA-RL framework. SRPO eliminates the need for external demonstrations or manual reward engineering by leveraging the model's own successful trajectories, generated within the current training batch, as a self-reference. This allows us to assign a progress-wise reward to failed attempts. A core innovation is the use of latent world representations to measure behavioral progress robustly. Instead of relying on raw pixels or requiring domain-specific fine-tuning, we utilize the compressed, transferable encodings from a world model's latent space. These representations naturally capture progress patterns across environments, enabling accurate, generalized trajectory comparison. Empirical evaluations on the LIBERO benchmark demonstrate SRPO's efficiency and effectiveness. Starting from a supervised baseline with 48.9% success, SRPO achieves a new state-of-the-art success rate of 99.2% in just 200 RL steps, representing a 103% relative improvement without any extra supervision. Furthermore, SRPO shows substantial robustness, achieving a 167% performance improvement on the LIBERO-Plus benchmark.
Differentiable Contact Dynamics for Stable Object Placement Under Geometric Uncertainties
From serving a cup of coffee to positioning mechanical parts during assembly, stable object placement is a crucial skill for future robots. It becomes particularly challenging under geometric uncertainties, e.g., when the object pose or shape is not known accurately. This work leverages a differentiable simulation model of contact dynamics to tackle this challenge. We derive a novel gradient that relates force-torque sensor readings to geometric uncertainties, thus enabling uncertainty estimation by minimizing discrepancies between sensor data and model predictions via gradient descent. Gradient-based methods are sensitive to initialization. To mitigate this effect, we maintain a belief over multiple estimates and choose the robot action based on the current belief at each timestep. In experiments on a Franka robot arm, our method achieved promising results on multiple objects under various geometric uncertainties, including the in-hand pose uncertainty of a grasped object, the object shape uncertainty, and the environment uncertainty.
GigaWorld-0: World Models as Data Engine to Empower Embodied AI
World models are emerging as a foundational paradigm for scalable, data-efficient embodied AI. In this work, we present GigaWorld-0, a unified world model framework designed explicitly as a data engine for Vision-Language-Action (VLA) learning. GigaWorld-0 integrates two synergistic components: GigaWorld-0-Video, which leverages large-scale video generation to produce diverse, texture-rich, and temporally coherent embodied sequences under fine-grained control of appearance, camera viewpoint, and action semantics; and GigaWorld-0-3D, which combines 3D generative modeling, 3D Gaussian Splatting reconstruction, physically differentiable system identification, and executable motion planning to ensure geometric consistency and physical realism. Their joint optimization enables the scalable synthesis of embodied interaction data that is visually compelling, spatially coherent, physically plausible, and instruction-aligned. Training at scale is made feasible through our efficient GigaTrain framework, which exploits FP8-precision and sparse attention to drastically reduce memory and compute requirements. We conduct comprehensive evaluations showing that GigaWorld-0 generates high-quality, diverse, and controllable data across multiple dimensions. Critically, VLA model (e.g., GigaBrain-0) trained on GigaWorld-0-generated data achieve strong real-world performance, significantly improving generalization and task success on physical robots without any real-world interaction during training.
comment: Project Page: https://giga-world-0.github.io/
Automaton Constrained Q-Learning NeurIPS 2025
Real-world robotic tasks often require agents to achieve sequences of goals while respecting time-varying safety constraints. However, standard Reinforcement Learning (RL) paradigms are fundamentally limited in these settings. A natural approach to these problems is to combine RL with Linear-time Temporal Logic (LTL), a formal language for specifying complex, temporally extended tasks and safety constraints. Yet, existing RL methods for LTL objectives exhibit poor empirical performance in complex and continuous environments. As a result, no scalable methods support both temporally ordered goals and safety simultaneously, making them ill-suited for realistic robotics scenarios. We propose Automaton Constrained Q-Learning (ACQL), an algorithm that addresses this gap by combining goal-conditioned value learning with automaton-guided reinforcement. ACQL supports most LTL task specifications and leverages their automaton representation to explicitly encode stage-wise goal progression and both stationary and non-stationary safety constraints. We show that ACQL outperforms existing methods across a range of continuous control tasks, including cases where prior methods fail to satisfy either goal-reaching or safety constraints. We further validate its real-world applicability by deploying ACQL on a 6-DOF robotic arm performing a goal-reaching task in a cluttered, cabinet-like space with safety constraints. Our results demonstrate that ACQL is a robust and scalable solution for learning robotic behaviors according to rich temporal specifications.
comment: 10 main content pages, 4 main content figures, 11 appendix pages, 5 appendix figures, camera ready version submitted to 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
SkillWrapper: Generative Predicate Invention for Skill Abstraction
Generalizing from individual skill executions to solving long-horizon tasks remains a core challenge in building autonomous agents. A promising direction is learning high-level, symbolic abstractions of the low-level skills of the agents, enabling reasoning and planning independent of the low-level state space. Among possible high-level representations, object-centric skill abstraction with symbolic predicates has been proven to be efficient because of its compatibility with domain-independent planners. Recent advances in foundation models have made it possible to generate symbolic predicates that operate on raw sensory inputs, a process we call generative predicate invention, to facilitate downstream abstraction learning. However, it remains unclear which formal properties the learned representations must satisfy, and how they can be learned to guarantee these properties. In this paper, we address both questions by presenting a formal theory of generative predicate invention for skill abstraction, resulting in symbolic operators that can be used for provably sound and complete planning. Within this framework, we propose SkillWrapper, a method that leverages foundation models to actively collect robot data and learn human-interpretable, plannable representations of black-box skills, using only RGB image observations. Our extensive empirical evaluation in simulation and on real robots shows that SkillWrapper learns abstract representations that enable solving unseen, long-horizon tasks in the real world with black-box skills.
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: IEEE Robotics and Automation Letters
APULSE: A Scalable Hybrid Algorithm for the RCSPP on Large-Scale Dense Graphs
The resource-constrained shortest path problem (RCSPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing to autonomous navigation. This problem involves finding a path that minimizes a primary cost subject to a budget on a secondary resource. While various RCSPP solvers exist, they often face critical scalability limitations when applied to the large, dense graphs characteristic of complex, real-world scenarios, making them impractical for time-critical planning. This challenge is particularly acute in domains like mission planning for unmanned ground vehicles (UGVs), which demand solutions on large-scale terrain graphs. This paper introduces APULSE, a hybrid label-setting algorithm designed to efficiently solve the RCSPP on such challenging graphs. APULSE integrates a best-first search guided by an A* heuristic with aggressive, Pulse-style pruning mechanisms and a time-bucketing strategy for effective state-space reduction. A computational study, using a large-scale UGV planning scenario, benchmarks APULSE against state-of-the-art algorithms. The results demonstrate that APULSE consistently finds near-optimal solutions while being orders of magnitude faster and more robust, particularly on large problem instances where competing methods fail. This superior scalability establishes APULSE as an effective solution for RCSPP in complex, large-scale environments, enabling capabilities such as interactive decision support and dynamic replanning.
comment: This version corrects keywords and reference [9]. 9 pages
SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning
Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.
Multiagent Systems
Testing the Machine Consciousness Hypothesis
The Machine Consciousness Hypothesis states that consciousness is a substrate-free functional property of computational systems capable of second-order perception. I propose a research program to investigate this idea in silico by studying how collective self-models (coherent, self-referential representations) emerge from distributed learning systems embedded within universal self-organizing environments. The theory outlined here starts from the supposition that consciousness is an emergent property of collective intelligence systems undergoing synchronization of prediction through communication. It is not an epiphenomenon of individual modeling but a property of the language that a system evolves to internally describe itself. For a model of base reality, I begin with a minimal but general computational world: a cellular automaton, which exhibits both computational irreducibility and local reducibility. On top of this computational substrate, I introduce a network of local, predictive, representational (neural) models capable of communication and adaptation. I use this layered model to study how collective intelligence gives rise to self-representation as a direct consequence of inter-agent alignment. I suggest that consciousness does not emerge from modeling per se, but from communication. It arises from the noisy, lossy exchange of predictive messages between groups of local observers describing persistent patterns in the underlying computational substrate (base reality). It is through this representational dialogue that a shared model arises, aligning many partial views of the world. The broader goal is to develop empirically testable theories of machine consciousness, by studying how internal self-models may form in distributed systems without centralized control.
Chain of Unit-Physics: A Primitive-Centric Approach to Scientific Code Synthesis
Agentic large language models are proposed as autonomous code generators for scientific computing, yet their reliability in high-stakes problems remains unclear. Developing computational scientific software from natural-language queries remains challenging broadly due to (a) sparse representation of domain codes during training and (b) the limited feasibility of RLHF with a small expert community. To address these limitations, this work conceptualizes an inverse approach to code design, embodied in the Chain of Unit-Physics framework: a first-principles (or primitives)-centric, multi-agent system in which human expert knowledge is encoded as unit-physics tests that explicitly constrain code generation. The framework is evaluated on a nontrivial combustion task, used here as a representative benchmark for scientific problem with realistic physical constraints. Closed-weight systems and code-focused agentic variants fail to produce correct end-to-end solvers, despite tool and web access, exhibiting four recurrent error classes: interface (syntax/API) hallucinations, overconfident assumptions, numerical/physical incoherence, and configuration fragility. Open-weight models with chain-of-thought (CoT) decoding reduce interface errors but still yield incorrect solutions. On the benchmark task, the proposed framework converges within 5-6 iterations, matches the human-expert implementation (mean error of $3.1\times10^{-3}$ %), with a $\sim$33.4 % faster runtime and a $\sim$30 % efficient memory usage at a cost comparable to mid-sized commercial APIs, yielding a practical template for physics-grounded scientific code generation. As datasets and models evolve, zero-shot code accuracy will improve; however, the Chain of Unit-Physics framework goes further by embedding first-principles analysis that is foundational to scientific codes.
Augmented Runtime Collaboration for Self-Organizing Multi-Agent Systems: A Hybrid Bi-Criteria Routing Approach AAAI
LLM-based multi-agent systems have demonstrated significant capabilities across diverse domains. However, the task performance and efficiency are fundamentally constrained by their collaboration strategies. Prevailing approaches rely on static topologies and centralized global planning, a paradigm that limits their scalability and adaptability in open, decentralized networks. Effective collaboration planning in distributed systems using only local information thus remains a formidable challenge. To address this, we propose BiRouter, a novel dual-criteria routing method for Self-Organizing Multi-Agent Systems (SO-MAS). This method enables each agent to autonomously execute ``next-hop'' task routing at runtime, relying solely on local information. Its core decision-making mechanism is predicated on balancing two metrics: (1) the ImpScore, which evaluates a candidate agent's long-term importance to the overall goal, and (2) the GapScore, which assesses its contextual continuity for the current task state. Furthermore, we introduce a dynamically updated reputation mechanism to bolster system robustness in untrustworthy environments and have developed a large-scale, cross-domain dataset, comprising thousands of annotated task-routing paths, to enhance the model's generalization. Extensive experiments demonstrate that BiRouter achieves superior performance and token efficiency over existing baselines, while maintaining strong robustness and effectiveness in information-limited, decentralized, and untrustworthy settings.
comment: Accepted at The 40th Annual AAAI Conference on Artificial Intelligence
Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition
Fast, efficient, robust communication during wildfire and other emergency responses is critical. One way to achieve this is by coordinating swarms of autonomous aerial vehicles carrying communications equipment to form an ad-hoc network connecting emergency response personnel to both each other and central command. However, operating in such extreme environments may lead to individual networking agents being damaged or rendered inoperable, which could bring down the network and interrupt communications. To overcome this challenge and enable multi-agent UAV networking in difficult environments, this paper introduces and formalizes the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. Through simulation across 25 problem configurations, $Φ$IREMAN consistently outperforms the DCCRS baseline, and on large-scale problems with up to 100 tasks and 500 drones, maintains $>99.9\%$ task uptime despite substantial attrition, demonstrating both effectiveness and scalability.
comment: 8 pages, 5 figures, 4 tables, submitted to IEEE RA-L
Hypernetwork Theory: The Structural Kernel
Modelling across engineering, systems science, and formal methods remains limited by binary relations, implicit semantics, and diagram-centred notations that obscure multilevel structure and hinder mechanisation. Hypernetwork Theory (HT) addresses these gaps by treating the n-ary relation as the primary modelling construct. Each relation is realised as a typed hypersimplex - alpha (conjunctive, part-whole) or beta (disjunctive, taxonomic) - bound to a relation symbol R that fixes arity and ordered roles. Semantics are embedded directly in the construct, enabling hypernetworks to represent hierarchical and heterarchical systems without reconstruction or tool-specific interpretation. This paper presents the structural kernel of HT. It motivates typed n-ary relational modelling, formalises the notation and axioms (A1-A5) for vertices, simplices, hypersimplices, boundaries, and ordering, and develops a complete algebra of structural composition. Five operators - merge, meet, difference, prune, and split - are defined by deterministic conditions and decision tables that ensure semantics-preserving behaviour and reconcile the Open World Assumption with closure under rules. Their deterministic algorithms show that HT supports reproducible and mechanisable model construction, comparison, decomposition, and restructuring. The resulting framework elevates hypernetworks from symbolic collections to structured, executable system models, providing a rigorous and extensible foundation for mechanisable multilevel modelling.
comment: 32 pages, 5 figures, 2 appendices. Companion boundary-calculus paper forthcoming
R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning ICML 2025
Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM}{https://github.com/UTAustin-SwarmLab/R3DM.
comment: 21 pages, To appear in the International Conference of Machine Learning (ICML 2025)
Flexible Swarm Learning May Outpace Foundation Models in Essential Tasks
Foundation models have rapidly advanced AI, raising the question of whether their decisions will ultimately surpass human strategies in real-world domains. The exponential, and possibly super-exponential, pace of AI development makes such analysis elusive. Nevertheless, many application areas that matter for daily life and society show only modest gains so far; a prominent case is diagnosing and treating dynamically evolving disease in intensive care. The common challenge is adapting complex systems to dynamic environments. Effective strategies must optimize outcomes in systems composed of strongly interacting functions while avoiding shared side effects; this requires reliable, self-adaptive modeling. These tasks align with building digital twins of highly complex systems whose mechanisms are not fully or quantitatively understood. It is therefore essential to develop methods for self-adapting AI models with minimal data and limited mechanistic knowledge. As this challenge extends beyond medicine, AI should demonstrate clear superiority in these settings before assuming broader decision-making roles. We identify the curse of dimensionality as a fundamental barrier to efficient self-adaptation and argue that monolithic foundation models face conceptual limits in overcoming it. As an alternative, we propose a decentralized architecture of interacting small agent networks (SANs). We focus on agents representing the specialized substructure of the system, where each agent covers only a subset of the full system functions. Drawing on mathematical results on the learning behavior of SANs and evidence from existing applications, we argue that swarm-learning in diverse swarms can enable self-adaptive SANs to deliver superior decision-making in dynamic environments compared with monolithic foundation models, though at the cost of reduced reproducibility in detail.
HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics
Creating an immersive and interactive theatrical experience is a long-term goal in the field of interactive narrative. The emergence of large language model (LLM) is providing a new path to achieve this goal. However, existing LLM-based drama generation methods often result in agents that lack initiative and cannot interact with the physical scene. Furthermore, these methods typically require detailed user input to drive the drama. These limitations reduce the interactivity and immersion of online real-time performance. To address the above challenges, we propose HAMLET, a multi-agent framework focused on drama creation and online performance. Given a simple topic, the framework generates a narrative blueprint, guiding the subsequent improvisational performance. During the online performance, each actor is given an autonomous mind. This means that actors can make independent decisions based on their own background, goals, and emotional state. In addition to conversations with other actors, their decisions can also change the state of scene props through actions such as opening a letter or picking up a weapon. The change is then broadcast to other related actors, updating what they know and care about, which in turn influences their next action. To evaluate the quality of drama performance generated by HAMLET, we designed an evaluation method to assess three primary aspects, including character performance, narrative quality, and interaction experience. The experimental evaluation shows that HAMLET can create expressive and coherent theatrical experiences.
comment: This submission has been withdrawn due to unresolved issues concerning author order and potential conflicts of interest between author affiliations. The authors will resubmit once these matters are fully resolved
VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis
The current remote sensing image analysis task is increasingly evolving from traditional object recognition to complex intelligence reasoning, which places higher requirements on the model's reasoning ability and the flexibility of tool invocation. To this end, we propose a new multimodal agent framework, Vision-Interleaved Chain-of-Thought Framework (VICoT), which implements explicit multi-round reasoning by dynamically incorporating visual tools into the chain of thought. Through a stack-based reasoning structure and a modular MCP-compatible tool suite, VICoT enables LLMs to efficiently perform multi-round, interleaved vision-language reasoning tasks with strong generalization and flexibility.We also propose the Reasoning Stack distillation method to migrate complex Agent behaviors to small, lightweight models, which ensures the reasoning capability while significantly reducing complexity. Experiments on multiple remote sensing benchmarks demonstrate that VICoT significantly outperforms existing SOTA frameworks in reasoning transparency, execution efficiency, and generation quality.
Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner
In wireless communication systems, efficient and adaptive resource allocation plays a crucial role in enhancing overall Quality of Service (QoS). Compared to the conventional Model-Free Reinforcement Learning (MFRL) scheme, Model-Based RL (MBRL) first learns a generative world model for subsequent planning. The reuse of historical experience in MBRL promises more stable training behavior, yet its deployment in large-scale wireless networks remains challenging due to high-dimensional stochastic dynamics, strong inter-agent cooperation, and communication constraints. To overcome these challenges, we propose the Multi-Agent Conditional Diffusion Model Planner (MA-CDMP) for decentralized communication resource management. Built upon the Distributed Training with Decentralized Execution (DTDE) paradigm, MA-CDMP models each communication node as an autonomous agent and employs Diffusion Models (DMs) to capture and predict environment dynamics. Meanwhile, an inverse dynamics model guides action generation, thereby enhancing sample efficiency and policy scalability. Moreover, to approximate large-scale agent interactions, a Mean-Field (MF) mechanism is introduced as an assistance to the classifier in DMs. This design mitigates inter-agent non-stationarity and enhances cooperation with minimal communication overhead in distributed settings. We further theoretically establish an upper bound on the distributional approximation error introduced by the MF-based diffusion generation, guaranteeing convergence stability and reliable modeling of multi-agent stochastic dynamics. Extensive experiments demonstrate that MA-CDMP consistently outperforms existing MARL baselines in terms of average reward and QoS metrics, showcasing its scalability and practicality for real-world wireless network optimization.
Systems and Control (EESS)
A Neuromodulable Current-Mode Silicon Neuron for Robust and Adaptive Neuromorphic Systems
Neuromorphic engineering makes use of mixed-signal analog and digital circuits to directly emulate the computational principles of biological brains. Such electronic systems offer a high degree of adaptability, robustness, and energy efficiency across a wide range of tasks, from edge computing to robotics. Within this context, we investigate a key feature of biological neurons: their ability to carry out robust and reliable computation by adapting their input response and spiking pattern to context through neuromodulation. Achieving analogous levels of robustness and adaptation in neuromorphic circuits through modulatory mechanisms is a largely unexplored path. We present a novel current-mode neuron design that supports robust neuromodulation with minimal model complexity, compatible with standard CMOS technologies. We first introduce a mathematical model of the circuit and provide tools to analyze and tune the neuron behavior; we then demonstrate both theoretically and experimentally the biologically plausible neuromodulation adaptation capabilities of the circuit over a wide range of parameters. All the theoretical predictions were verified in experiments on a low-power 180 nm CMOS implementation of the proposed neuron circuit. Due to the analog underlying feedback structure, the proposed adaptive neuromodulable neuron exhibits a high degree of robustness, flexibility, and scalability across operating ranges of currents and temperatures, making it a perfect candidate for real-world neuromorphic applications.
comment: 18 pages, 13 figures
Semantic Communications for Vehicle-Based Mission-Critical Services: Challenges and Solutions
As mission-critical (MC) services such as Unmanned Aerial Vehicles (UAVs) based emergency communication and Internet of Vehicles (IoVs) enabled autonomous driving emerge, the traditional communication framework can not meet the growing demands for higher reliability and lower latency and the increasing transmission loads. Semantic Communication (SemCom), an emerging communication paradigm that shifts the focus from bit-level data to its context and intended task at the receiver (i.e., semantic level), is envisioned to be a key revolution in Sixth Generation (6G) networks. However, an explicit and systematic SemCom framework specifically tailored for Vehicle-based MC (VbMC) services has yet to be proposed, primarily due to the complexity and lack of analysis on their MC characteristics. In this article, we first present the key information-critical and infrastructure-critical vehicle-based services within the SemCom framework. We then analyze the unique characteristics of MC services and the corresponding challenges they present for SemCom. Building on this, we propose a novel SemCom framework designed to address the specific needs of MC services in vehicle systems, offering potential solutions to existing challenges. Finally, we present a case study on UAV-based rapid congestion relief, utilizing eXplainable AI (XAI) to validate the effectiveness of the proposed SemCom framework.
Supervisory control synthesis for multilevel DES with local buses
In multilevel supervisor synthesis, dependency structure matrix techniques can be used to transform the models of plants and requirements into a tree-structured hierarchical decomposition of the synthesis problem and thus efficiently synthesize local supervisors. A bus component, which has many dependencies across a system, tends to lead to an undesirable clustering of many components in one synthesis subproblem. Prior work showed how to recognize and properly treat a global bus structure. In this paper we leverage this work from global to local bus structures through a novel multilevel discrete-event system (MLDES) architecture. Specifically, the hierarchical system decomposition is revisited by allowing bus detection not only on the top level but at each level of the system hierarchy. Given this architecture, an algorithm is introduced that constructs a tree-structured MLDES. A case study on a production line shows the effectiveness of the proposed method through significantly improved synthesis performance, measured by the sum of the controlled state-space sizes of the local supervisors.
Negotiating Highway Interchange Traffic with a Decentralized Instability-Driven CBF-based Algorithm
In this paper we consider an interchange lane-swap scenario, a limited stretch of highway with two parallel lanes where most vehicles want to change lanes. We show that a particular decentralized Control Barrier Function based algorithm executes lane swaps efficiently, with minimal speed change, within the specified (short) road segment at high traffic densities (3,500 vehicles per hour per lane). Our main point is that controller tuning, the speed of inter-agent instability, plays a major role in the performance of the vehicle group. This is illustrated by comparing two different tunings of the controller and a third one where the lane swap is enforced by virtual guard rails. Like fighter jet dynamic instability improving maneuverability, the inter-agent instability improves agility of a group of vehicles. We emphasize that the controllers considered are decentralized: agents do not know if others want to change lanes or not.
comment: 7 pages, 7 figures, 2 tables
Bayesian dynamic scheduling of multipurpose batch processes under incomplete look-ahead information
Multipurpose batch processes become increasingly popular in manufacturing industries since they adapt to low-volume, high-value products and shifting demands. These processes often operate in a dynamic environment, which faces disturbances such as processing delays and demand changes. To minimise long-term cost and system nervousness (i.e., disruptive changes to schedules), schedulers must design rescheduling strategies to address such disturbances effectively. Existing methods often assume complete look-ahead information over the scheduling horizon. This assumption contrasts with realistic situations where schedulers can only access incomplete look-ahead information. Sticking with existing methods may lead to suboptimal long-term costs and high-level system nervousness. In this work we propose a Bayesian dynamic scheduling method. Our method relies on learning a Bayesian Network from the probability distribution of disturbances. Specifically, the Bayesian Network represents how likely each operation will be impacted by disturbances. During the online execution, when new disturbances become observed, this method updates the posterior distribution and therefore guides the rescheduling strategy. We compare our method with the existing periodic rescheduling strategy (which generates new schedules from scratch at fixed intervals) on four benchmark problems. Computational results show that our method achieves statistically better long-term costs and system nervousness. In the theoretical aspect, we prove that if disturbances are mutually independent, the impact-quantifying variables inherently satisfy the independence assumptions required by Bayesian Networks. As an implication, practitioners can extend the method to other scheduling problems (such as job shop scheduling and continuous processes), given that they define the problem-specific dependencies between operations.
Assessing the Viability of Fresnel Lenses for Weed Control in Prickly Pear Cactus Cultivation: A Spatiotemporal Coverage Perspective
In tropical semiarid regions, prickly pear cactus has emerged as a vital forage resource due to its high drought tolerance and minimal water requirements. However, even limited weed infestation can severely compromise cactus productivity, as the species are highly sensitive to competition for essential resources, which includes water, mineral nutrients, and sun exposure. Conventional herbicide-based weed control strategies face growing limitations due to resistance development and environmental concerns, underscoring the need for sustainable alternatives. This study revisits the historically underexplored application of linear Fresnel lenses for thermal weed control and establishes the technical feasibility of a contemporary autonomous weed management system that incorporates LFL technology within an unmanned ground vehicle platform. Leveraging real-time image processing, georeferencing, and mechanical actuation, the system can perform a two-phase operation-weed mapping during non-optimal solar hours and targeted solar termination during peak irradiance. Analytical modeling quantifies the effective area coverage and time constraints imposed by the solar window. Preliminary results indicate that, while unsuitable for dense weed infestations, the system presents a viable solution for precision, post-emergent weed control in sparse infestations. The favorable solar geometry in tropical zones, especially in the Brazilian semiarid region, and the targeted nature of the approach make this technology particularly well-suited for sustainable agriculture in under-resourced regions.
comment: 9 pages and 9 figures
Reinforcement Learning for Gliding Projectile Guidance and Control
This paper presents the development of a control law, which is intended to be implemented on an optical guided glider. This guiding law follows an innovative approach, the reinforcement learning. This control law is used to make navigation more flexible and autonomous in a dynamic environment. The final objective is to track a target detected with the camera and then guide the glider to this point with high precision. Already applied on quad-copter drones, we wish by this study to demonstrate the applicability of reinforcement learning for fixed-wing aircraft on all of its axis.
comment: 6 pages
The Silence that Speaks: Neural Estimation via Communication Gaps
Accurate remote state estimation is a fundamental component of many autonomous and networked dynamical systems, where multiple decision-making agents interact and communicate over shared, bandwidth-constrained channels. These communication constraints introduce an additional layer of complexity, namely, the decision of when to communicate. This results in a fundamental trade-off between estimation accuracy and communication resource usage. Traditional extensions of classical estimation algorithms (e.g., the Kalman filter) treat the absence of communication as 'missing' information. However, silence itself can carry implicit information about the system's state, which, if properly interpreted, can enhance the estimation quality even in the absence of explicit communication. Leveraging this implicit structure, however, poses significant analytical challenges, even in relatively simple systems. In this paper, we propose CALM (Communication-Aware Learning and Monitoring), a novel learning-based framework that jointly addresses the dual challenges of communication scheduling and estimator design. Our approach entails learning not only when to communicate but also how to infer useful information from periods of communication silence. We perform comparative case studies on multiple benchmarks to demonstrate that CALM is able to decode the implicit coordination between the estimator and the scheduler to extract information from the instances of 'silence' and enhance the estimation accuracy.
Approximating Analytically-Intractable Likelihood Densities with Deterministic Arithmetic for Optimal Particle Filtering
Particle filtering algorithms have enabled practical solutions to problems in autonomous robotics (self-driving cars, UAVs, warehouse robots), target tracking, and econometrics, with further applications in speech processing and medicine (patient monitoring). Yet, their inherent weakness at representing the likelihood of the observation (which often leads to particle degeneracy) remains unaddressed for high-frequency and resource-constrained systems. Improvements such as the optimal proposal and auxiliary particle filter mitigate this issue under specific circumstances and with increased computational cost. This work presents a new particle filtering method and its implementation, which enables tunably-approximative representation of arbitrary likelihood densities as program transformations of parametric distributions. Our method leverages a recent computing platform that can perform deterministic computation on probability distribution representations (UxHw) without relying on stochastic methods. For non-Gaussian non-linear systems and with an optimal-auxiliary particle filter, we benchmark the likelihood evaluation error and speed for a total of 294840 evaluation points. For such models, the results show that the UxHw method leads to as much as 37.7x speedup compared to the Monte Carlo alternative. For narrow uniform observation noise, the particle filter falsely assigns zero likelihood as much as 81.89% of the time whereas UxHw achieves 1.52% false-zero rate. The UxHw approach achieves filter RMSE improvement of as much as 18.9% (average 3.3%) over the Monte Carlo alternative.
LPWAN based IoT Architecture for Distributed Energy Monitoring in Deep Indoor Environments
Continuous energy monitoring is essential for identifying potential savings and predicting the energy requirements of buildings. Energy meters are often located in underground spaces that are difficult to reach with wireless technology. This paper presents an experimental study comparing different Low Power Wide Area Networks (LPWAN) technologies in terms of building penetration and radio coverage. The technologies Low Power Long Range Wide Area Networks (LoRaWAN), Narrow Band Internet of Things (NB-IoT), Sigfox 0G and Wireless Smart Ubiquitous Networks (Wi-SUN) are evaluated experimentally. It also proposes a distributed hybrid IoT architecture that combines multiple LPWAN technologies using an abstraction layer to optimize cost and coverage. Communication is message-based using the publish-subscribe messaging pattern. It is implemented using the MQTT protocol. The abstraction layer decodes the proprietary binary data and converts it to a normalized JSON format.
Fault-Tolerant Temperature Control of HRSG Superheaters: Stability Analysis Under Valve Leakage Using Physics-Informed Neural Networks
Faults and operational disturbances in Heat Recovery Steam Generators (HRSGs), such as valve leakage, present significant challenges, disrupting steam temperature regulation and potentially causing efficiency losses, safety risks, and unit shutdowns. Traditional PI controllers often struggle due to inherent system delays, nonlinear dynamics, and static gain limitations. This paper introduces a fault-tolerant temperature control framework by integrating a PI plus feedforward control strategy with Physics-Informed Neural Networks (PINNs). The feedforward component anticipates disturbances, preemptively adjusting control actions, while the PINN adaptively tunes control gains in real-time, embedding thermodynamic constraints to manage varying operating conditions and valve leakage faults. A Lyapunov-based stability analysis confirms the asymptotic convergence of temperature tracking errors under bounded leakage conditions. Simulation results using operational data from the Pareh-Sar combined cycle power plant demonstrate significantly improved response times, reduced temperature deviations, enhanced fault resilience, and smooth gain adjustments. The proposed adaptive, data-driven methodology shows strong potential for industrial deployment, ensuring reliable operation, autonomous fault recovery, and enhanced performance in HRSG systems.
Integrating Causal Foundation Model in Prescriptive Maintenance Framework for Optimizing Production Line OEE
The transition to prescriptive maintenance in manufacturing is critically constrained by a dependence on predictive models. These models tend to rely on spurious correlations rather than identifying the true causal drivers of failures, often leading to costly misdiagnoses and ineffective interventions. This fundamental limitation results in a key-challenge: while we can predict that a failure may occur, we lack a systematic method to understand why a failure occurs, thereby providing the basis for identifying the most effective intervention. This paper proposes a model based on causal machine learning to bridge this gap. Our objective is to move beyond diagnosis to active prescription by simulating and evaluating potential fixes toward optimizing KPIs such as Overall Equipment Effectiveness (OEE). For this purpose a pre-trained causal foundation model is used as a "what-if" model to estimate the effects of potential fixes. By measuring the causal effect of each intervention on system-level KPIs, it provides a data-driven ranking of actions to recommend at the production line. This process not only identifies root causes but also quantifies their operational impact. The model is evaluated using semi-synthetic manufacturing data and compared with a baseline machine learning model. This paper sets the technical basis for a robust prescriptive maintenance framework, allowing engineers to test potential solutions in a causal environment to make more effective operational decisions and reduce costly downtimes.
comment: 9 pages, 3 images, 1 table, conference paper
Fundamental Advances in Short-Circuit Measurement: A Novel Time-Resolved Diode-Clamp Circuit Paradigm
Conventional electrical fault models, which rely on static thresholds and instantaneous trip mechanisms, fail to capture the time-evolving dynamics of real faults, creating vulnerabilities in modern power systems. This paper introduces a diode-clamp circuit architecture that reconceives short-circuits as governed, sustained processes and establishes a physics-consistent, measurement system. An Arduino-based data acquisition system recorded continuous fault evolution across multiple input voltages and durations. Multi-resolution sampling at 10ms, 50ms, and 100ms enabled high-fidelity capture of both transients and sustained-state dynamics. The clamped mechanism constrained the circuit to a bounded regime, enabling repeatable observation. Experiments yielded definitive, measurable minima and maxima for voltage, current, and resistance, empirically refuting the classical assumption of instantaneous, unbounded current. Newly introduced metrics quantify this performance: the Sustained-to-Capacitive Energy Ratio (SCER ~1.53x10^12) proves fault energy originates from sustained dynamics, not transient discharge. The Sustained Fault Efficiency (SFE>1) demonstrates that governed fault power can exceed nominal operating power. This work provides the first fully validated short-circuit quantification system, yielding empirical data for next-generation battery management, adaptive grid protection, and fault-tolerant electronics.
comment: 123 pages, 22 Figures
Mechatronic Design, Dynamic Modeling, and Real-Time Control of a Movable Scaffold
This study presents mechatronic design, dynamic modeling, simulations and real-time control experiments of a new movable scaffolding system. The proposed system consists of a 3 degrees-of-freedom movable platform, which can be positioned on the outer surface of buildings. The platform is supported and driven by cords that are wound on pulleys and coupled to servo controlled dc-motors located at four corners of the building surface. A mathematical model considering the actuator dynamics for this cable-driven mechanism is obtained and its simulation results are presented. Design, manufacture and real-time control tests of a prototype has been done. Both numerical simulations and experiments provide good positioning performance of the proposed cable-driven mechanism.
comment: 18 pages, 12 figures
A Novel MDP Decomposition Framework for Scalable UAV Mission Planning in Complex and Uncertain Environments
This paper presents a scalable and fault-tolerant framework for unmanned aerial vehicle (UAV) mission management in complex and uncertain environments. The proposed approach addresses the computational bottleneck inherent in solving large-scale Markov Decision Processes (MDPs) by introducing a two-stage decomposition strategy. In the first stage, a factor-based algorithm partitions the global MDP into smaller, goal-specific sub-MDPs by leveraging domain-specific features such as goal priority, fault states, spatial layout, and energy constraints. In the second stage, a priority-based recombination algorithm solves each sub-MDP independently and integrates the results into a unified global policy using a meta-policy for conflict resolution. Importantly, we present a theoretical analysis showing that, under mild probabilistic independence assumptions, the combined policy is provably equivalent to the optimal global MDP policy. Our work advances artificial intelligence (AI) decision scalability by decomposing large MDPs into tractable subproblems with provable global equivalence. The proposed decomposition framework enhances the scalability of Markov Decision Processes, a cornerstone of sequential decision-making in artificial intelligence, enabling real-time policy updates for complex mission environments. Extensive simulations validate the effectiveness of our method, demonstrating orders-of-magnitude reduction in computation time without sacrificing mission reliability or policy optimality. The proposed framework establishes a practical and robust foundation for scalable decision-making in real-time UAV mission execution.
SAGAS: Semantic-Aware Graph-Assisted Stitching for Offline Temporal Logic Planning
Linear Temporal Logic (LTL) provides a rigorous framework for complex robotic tasks, yet existing methods often rely on accurate dynamics models or expensive online interactions. In this work, we address LTL-constrained control in a challenging offline, model-free setting, utilizing only fixed, task-agnostic datasets of fragmented trajectories. We propose SAGAS, a novel framework combining graph-assisted trajectory stitching with automata-guided planning. First, we construct a latent reachability graph from a learned temporal-distance representation. To bridge the semantic gap, we augment this graph with certified anchor nodes and probabilistic soft labels. We then translate the specification into a Büchi automaton and search the implicit product space to derive a cost-minimal prefix-suffix plan. Finally, a subgoal-conditioned low-level policy is deployed to execute these latent waypoints. Experiments on OGBench locomotion domains demonstrate that SAGAS successfully synthesizes efficient trajectories for diverse LTL tasks, effectively bridging the gap between fragmented offline data and complex logical constraints.
Analysis of Optimal Thrust to Mass Ratio Requirement for Maximizing Payload Mass of Lunar Landing Mission
Recent successful lunar landing missions have generated significant interest among space agencies in establishing a permanent human settlement on the Moon. Building a lunar base requires multiple and frequent landing missions to support logistics and mobility applications. In these missions, maximizing payload mass defined as the useful cargo for human settlement is crucial. The landing mass depends on several factors, with the most critical being the maximum thrust available for braking and the engine's specific impulse (ISP). Generally, increasing engine thrust for braking reduces flight duration and, consequently, gravity losses. However, higher thrust also introduces trade-offs, such as increased engine weight and lower ISP, which can negatively impact payload capacity. Therefore, optimizing the descent trajectory requires careful consideration of these parameters to achieve a global solution that maximizes payload mass. Most existing research focuses on solving optimal control problems that minimize propellant consumption for a given thrust. These problems are typically addressed through trajectory optimization, where a minimum-fuel solution is obtained. The optimized trajectory is then executed onboard using polynomial guidance. In this paper, we propose an outer-layer optimization approach based on a Pareto-optimal solution. This method iterates on the maximum available thrust for descent trajectory optimization while incorporating a loss function that accounts for engine mass and ISP losses. By applying this approach, we identify a globally optimal solution that maximizes payload mass while ensuring an optimal landing trajectory.
comment: 5 pages, 9 figures, Global Space Exploration Conference 2025
DM-MPPI: Datamodel for Efficient and Safe Model Path Integral Control
We extend the Datamodels framework from supervised learning to Model Predictive Path Integral (MPPI) control. Whereas Datamodels estimate sample influence via regression on a fixed dataset, we instead learn to predict influence directly from sample cost features, enabling real-time estimation for newly generated samples without online regression. Our influence predictor is trained offline using influence coefficients computed via the Datamodel framework across diverse MPPI instances, and is then deployed online for efficient sample pruning and adaptive constraint handling. A single learned model simultaneously addresses efficiency and safety: low-influence samples are pruned to reduce computational cost, while monitoring the influence of constraint-violating samples enables adaptive penalty tuning. Experiments on path-tracking with obstacle avoidance demonstrate up to a $5\times$ reduction in the number of samples while maintaining control performance and improving constraint satisfaction.
Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition
Fast, efficient, robust communication during wildfire and other emergency responses is critical. One way to achieve this is by coordinating swarms of autonomous aerial vehicles carrying communications equipment to form an ad-hoc network connecting emergency response personnel to both each other and central command. However, operating in such extreme environments may lead to individual networking agents being damaged or rendered inoperable, which could bring down the network and interrupt communications. To overcome this challenge and enable multi-agent UAV networking in difficult environments, this paper introduces and formalizes the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. Through simulation across 25 problem configurations, $Φ$IREMAN consistently outperforms the DCCRS baseline, and on large-scale problems with up to 100 tasks and 500 drones, maintains $>99.9\%$ task uptime despite substantial attrition, demonstrating both effectiveness and scalability.
comment: 8 pages, 5 figures, 4 tables, submitted to IEEE RA-L
SCI: A Metacognitive Control for Signal Dynamics
Modern deep learning systems are typically deployed as open-loop function approximators: they map inputs to outputs in a single pass, without regulating how much computation or explanatory effort is spent on a given case. In safety-critical settings, this is brittle: easy and ambiguous inputs receive identical processing, and uncertainty is only read off retrospectively from raw probabilities. We introduce the Surgical Cognitive Interpreter (SCI), a lightweight closed-loop metacognitive control layer that wraps an existing stochastic model and turns prediction into an iterative process. SCI monitors a scalar interpretive state SP(t), here instantiated as a normalized entropy-based confidence signal, and adaptively decides whether to stop, continue sampling, or abstain. The goal is not to improve accuracy per se, but to regulate interpretive error ΔSP and expose a safety signal that tracks when the underlying model is likely to fail. We instantiate SCI around Monte Carlo dropout classifiers in three domains: vision (MNIST digits), medical time series (MIT-BIH arrhythmia), and industrial condition monitoring (rolling-element bearings). In all cases, the controller allocates more inference steps to misclassified inputs than to correct ones (up to about 3-4x on MNIST and bearings, and 1.4x on MIT-BIH). The resulting ΔSP acts as a usable safety signal for detecting misclassifications (AUROC 0.63 on MNIST, 0.70 on MIT-BIH, 0.86 on bearings). Code and reproducibility: https://github.com/vishal-1344/sci
comment: v2: Extended theoretical analysis (Lyapunov-style stability), added metacognitive experiments across three domains, and released code and configuration files at https://github.com/vishal-1344/sci
Adaptive Experiment Design for Nonlinear System Identification with Operational Constraints
We consider the joint problem of online experiment design and parameter estimation for identifying nonlinear system models, while adhering to system constraints. We utilize a receding horizon approach and propose a new adaptive input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
Sparse Representations of Dynamical Networks: A Coprime Factorization Approach
We study a class of dynamical networks modeled by linear and time-invariant systems which are described by state-space realizations. For these networks, we investigate the relations between various types of factorizations which preserve the structure of their component subsystems' interconnection. In doing so, we provide tractable means of shifting between different types of sparsity-preserving representations and we show how to employ these factorizations to obtain distributed implementations for stabilizing and possibly stable controllers. By formulating all these results for both discrete- and continuous-time systems, we develop specialized distributed implementations that, up to this point, were only available for networks modeled as discrete-time systems.
comment: 35 pages, 5 figures
Adaptive Legged Locomotion via Online Learning for Model Predictive Control
We provide an algorithm for adaptive legged locomotion via online learning and model predictive control. The algorithm is composed of two interacting modules: model predictive control (MPC) and online learning of residual dynamics. The residual dynamics can represent modeling errors and external disturbances. We are motivated by the future of autonomy where quadrupeds will autonomously perform complex tasks despite real-world unknown uncertainty, such as unknown payload and uneven terrains. The algorithm uses random Fourier features to approximate the residual dynamics in reproducing kernel Hilbert spaces. Then, it employs MPC based on the current learned model of the residual dynamics. The model is updated online in a self-supervised manner using least squares based on the data collected while controlling the quadruped. The algorithm enjoys sublinear \textit{dynamic regret}, defined as the suboptimality against an optimal clairvoyant controller that knows how the residual dynamics. We validate our algorithm in Gazebo and MuJoCo simulations, where the quadruped aims to track reference trajectories. The Gazebo simulations include constant unknown external forces up to $12\boldsymbol{g}$, where $\boldsymbol{g}$ is the gravity vector, in flat terrain, slope terrain with $20\degree$ inclination, and rough terrain with $0.25m$ height variation. The MuJoCo simulations include time-varying unknown disturbances with payload up to $8~kg$ and time-varying ground friction coefficients in flat terrain.
comment: IEEE Robotics and Automation Letters
APULSE: A Scalable Hybrid Algorithm for the RCSPP on Large-Scale Dense Graphs
The resource-constrained shortest path problem (RCSPP) is a fundamental NP-hard optimization challenge with broad applications, from network routing to autonomous navigation. This problem involves finding a path that minimizes a primary cost subject to a budget on a secondary resource. While various RCSPP solvers exist, they often face critical scalability limitations when applied to the large, dense graphs characteristic of complex, real-world scenarios, making them impractical for time-critical planning. This challenge is particularly acute in domains like mission planning for unmanned ground vehicles (UGVs), which demand solutions on large-scale terrain graphs. This paper introduces APULSE, a hybrid label-setting algorithm designed to efficiently solve the RCSPP on such challenging graphs. APULSE integrates a best-first search guided by an A* heuristic with aggressive, Pulse-style pruning mechanisms and a time-bucketing strategy for effective state-space reduction. A computational study, using a large-scale UGV planning scenario, benchmarks APULSE against state-of-the-art algorithms. The results demonstrate that APULSE consistently finds near-optimal solutions while being orders of magnitude faster and more robust, particularly on large problem instances where competing methods fail. This superior scalability establishes APULSE as an effective solution for RCSPP in complex, large-scale environments, enabling capabilities such as interactive decision support and dynamic replanning.
comment: This version corrects keywords and reference [9]. 9 pages
SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning
Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.
Robotics
Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering
Effective medical simulators necessitate realistic haptic rendering of biological tissues that display viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as these models can naturally capture memory effects. However, due to the unintuitive frequency-dependent coupling between the order of the fractional element and the other parameters, determining appropriate parameters for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, via qualitative feedback-based human-in-the-loop~(HiL) optimizations, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results to form an aggregate perceptual map trained on the entire dataset and demonstrate the selection of population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.
comment: This work has been submitted to the IEEE for possible publication. 14 pages, 9 figures
Fast, Robust, Permutation-and-Sign Invariant SO(3) Pattern Alignment
We address the correspondence-free alignment of two rotation sets on \(SO(3)\), a core task in calibration and registration that is often impeded by missing time alignment, outliers, and unknown axis conventions. Our key idea is to decompose each rotation into its \emph{Transformed Basis Vectors} (TBVs)-three unit vectors on \(S^2\)-and align the resulting spherical point sets per axis using fast, robust matchers (SPMC, FRS, and a hybrid). To handle axis relabels and sign flips, we introduce a \emph{Permutation-and-Sign Invariant} (PASI) wrapper that enumerates the 24 proper signed permutations, scores them via summed correlations, and fuses the per-axis estimates into a single rotation by projection/Karcher mean. The overall complexity remains linear in the number of rotations (\(\mathcal{O}(n)\)), contrasting with \(\mathcal{O}(N_r^3\log N_r)\) for spherical/\(SO(3)\) correlation. Experiments on EuRoC Machine Hall simulations (axis-consistent) and the ETH Hand-Eye benchmark (\texttt{robot\_arm\_real}) (axis-ambiguous) show that our methods are accurate, 6-60x faster than traditional methods, and robust under extreme outlier ratios (up to 90\%), all without correspondence search.
HAVEN: Hierarchical Adversary-aware Visibility-Enabled Navigation with Cover Utilization using Deep Transformer Q-Networks
Autonomous navigation in partially observable environments requires agents to reason beyond immediate sensor input, exploit occlusion, and ensure safety while progressing toward a goal. These challenges arise in many robotics domains, from urban driving and warehouse automation to defense and surveillance. Classical path planning approaches and memoryless reinforcement learning often fail under limited fields of view (FoVs) and occlusions, committing to unsafe or inefficient maneuvers. We propose a hierarchical navigation framework that integrates a Deep Transformer Q-Network (DTQN) as a high-level subgoal selector with a modular low-level controller for waypoint execution. The DTQN consumes short histories of task-aware features, encoding odometry, goal direction, obstacle proximity, and visibility cues, and outputs Q-values to rank candidate subgoals. Visibility-aware candidate generation introduces masking and exposure penalties, rewarding the use of cover and anticipatory safety. A low-level potential field controller then tracks the selected subgoal, ensuring smooth short-horizon obstacle avoidance. We validate our approach in 2D simulation and extend it directly to a 3D Unity-ROS environment by projecting point-cloud perception into the same feature schema, enabling transfer without architectural changes. Results show consistent improvements over classical planners and RL baselines in success rate, safety margins, and time to goal, with ablations confirming the value of temporal memory and visibility-aware candidate design. These findings highlight a generalizable framework for safe navigation under uncertainty, with broad relevance across robotic platforms.
Describe Anything Anywhere At Any Moment
Computer vision and robotics applications ranging from augmented reality to robot autonomy in large-scale environments require spatio-temporal memory frameworks that capture both geometric structure for accurate language-grounding as well as semantic detail. Existing methods face a tradeoff, where producing rich open-vocabulary descriptions comes at the expense of real-time performance when these descriptions have to be grounded in 3D. To address these challenges, we propose Describe Anything, Anywhere, at Any Moment (DAAAM), a novel spatio-temporal memory framework for large-scale and real-time 4D scene understanding. DAAAM introduces a novel optimization-based frontend to infer detailed semantic descriptions from localized captioning models, such as the Describe Anything Model (DAM), leveraging batch processing to speed up inference by an order of magnitude for online processing. It leverages such semantic understanding to build a hierarchical 4D scene graph (SG), which acts as an effective globally spatially and temporally consistent memory representation. DAAAM constructs 4D SGs with detailed, geometrically grounded descriptions while maintaining real-time performance. We show that DAAAM's 4D SG interfaces well with a tool-calling agent for inference and reasoning. We thoroughly evaluate DAAAM in the complex task of spatio-temporal question answering on the NaVQA benchmark and show its generalization capabilities for sequential task grounding on the SG3D benchmark. We further curate an extended OC-NaVQA benchmark for large-scale and long-time evaluations. DAAAM achieves state-of-the-art results in both tasks, improving OC-NaVQA question accuracy by 53.6%, position errors by 21.9%, temporal errors by 21.6%, and SG3D task grounding accuracy by 27.8% over the most competitive baselines, respectively. We release our data and code open-source.
comment: 14 pages, 5 figures, 6 tables
Image Generation as a Visual Planner for Robotic Manipulation CVPR 2026
Generating realistic robotic manipulation videos is an important step toward unifying perception, planning, and action in embodied agents. While existing video diffusion models require large domain-specific datasets and struggle to generalize, recent image generation models trained on language-image corpora exhibit strong compositionality, including the ability to synthesize temporally coherent grid images. This suggests a latent capacity for video-like generation even without explicit temporal modeling. We explore whether such models can serve as visual planners for robots when lightly adapted using LoRA finetuning. We propose a two-part framework that includes: (1) text-conditioned generation, which uses a language instruction and the first frame, and (2) trajectory-conditioned generation, which uses a 2D trajectory overlay and the same initial frame. Experiments on the Jaco Play dataset, Bridge V2, and the RT1 dataset show that both modes produce smooth, coherent robot videos aligned with their respective conditions. Our findings indicate that pretrained image generators encode transferable temporal priors and can function as video-like robotic planners under minimal supervision. Code is released at \href{https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation}{https://github.com/pangye202264690373/Image-Generation-as-a-Visual-Planner-for-Robotic-Manipulation}.
comment: 11 pages 9 figures Under review at CVPR 2026
LAP: Fast LAtent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving
Diffusion models have demonstrated strong capabilities for modeling human-like driving behaviors in autonomous driving, but their iterative sampling process induces substantial latency, and operating directly on raw trajectory points forces the model to spend capacity on low-level kinematics, rather than high-level multi-modal semantics. To address these limitations, we propose LAtent Planner (LAP), a framework that plans in a VAE-learned latent space that disentangles high-level intents from low-level kinematics, enabling our planner to capture rich, multi-modal driving strategies. We further introduce a fine-grained feature distillation mechanism to guide a better interaction and fusion between the high-level semantic planning space and the vectorized scene context. Notably, LAP can produce high-quality plans in one single denoising step, substantially reducing computational overhead. Through extensive evaluations on the large-scale nuPlan benchmark, LAP achieves state-of-the-art closed-loop performance among learning-based planning methods, while demonstrating an inference speed-up of at most 10 times over previous SOTA approaches.
Sample-Efficient Expert Query Control in Active Imitation Learning via Conformal Prediction
Active imitation learning (AIL) combats covariate shift by querying an expert during training. However, expert action labeling often dominates the cost, especially in GPU-intensive simulators, human-in-the-loop settings, and robot fleets that revisit near-duplicate states. We present Conformalized Rejection Sampling for Active Imitation Learning (CRSAIL), a querying rule that requests an expert action only when the visited state is under-represented in the expert-labeled dataset. CRSAIL scores state novelty by the distance to the $K$-th nearest expert state and sets a single global threshold via conformal prediction. This threshold is the empirical $(1-α)$ quantile of on-policy calibration scores, providing a distribution-free calibration rule that links $α$ to the expected query rate and makes $α$ a task-agnostic tuning knob. This state-space querying strategy is robust to outliers and, unlike safety-gate-based AIL, can be run without real-time expert takeovers: we roll out full trajectories (episodes) with the learner and only afterward query the expert on a subset of visited states. Evaluated on MuJoCo robotics tasks, CRSAIL matches or exceeds expert-level reward while reducing total expert queries by up to 96% vs. DAgger and up to 65% vs. prior AIL methods, with empirical robustness to $α$ and $K$, easing deployment on novel systems with unknown dynamics.
Hardware-Software Collaborative Computing of Photonic Spiking Reinforcement Learning for Robotic Continuous Control
Robotic continuous control tasks impose stringent demands on the energy efficiency and latency of computing architectures due to their high-dimensional state spaces and real-time interaction requirements. Conventional electronic computing platforms face computational bottlenecks, whereas the fusion of photonic computing and spiking reinforcement learning (RL) offers a promising alternative. Here, we propose a novel computing architecture based on photonic spiking RL, which integrates the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm with spiking neural network (SNN). The proposed architecture employs an optical-electronic hybrid computing paradigm wherein a silicon photonic Mach-Zehnder interferometer (MZI) chip executes linear matrix computations, while nonlinear spiking activations are performed in the electronic domain. Experimental validation on the Pendulum-v1 and HalfCheetah-v2 benchmarks demonstrates the system capability for software-hardware co-inference, achieving a control policy reward of 5831 on HalfCheetah-v2, a 23.33% reduction in convergence steps, and an action deviation below 2.2%. Notably, this work represents the first application of a programmable MZI photonic computing chip to robotic continuous control tasks, attaining an energy efficiency of 1.39 TOPS/W and an ultralow computational latency of 120 ps. Such performance underscores the promise of photonic spiking RL for real-time decision-making in autonomous and industrial robotic systems.
Balancing Efficiency and Fairness: An Iterative Exchange Framework for Multi-UAV Cooperative Path Planning
Multi-UAV cooperative path planning (MUCPP) is a fundamental problem in multi-agent systems, aiming to generate collision-free trajectories for a team of unmanned aerial vehicles (UAVs) to complete distributed tasks efficiently. A key challenge lies in achieving both efficiency, by minimizing total mission cost, and fairness, by balancing the workload among UAVs to avoid overburdening individual agents. This paper presents a novel Iterative Exchange Framework for MUCPP, balancing efficiency and fairness through iterative task exchanges and path refinements. The proposed framework formulates a composite objective that combines the total mission distance and the makespan, and iteratively improves the solution via local exchanges under feasibility and safety constraints. For each UAV, collision-free trajectories are generated using A* search over a terrain-aware configuration space. Comprehensive experiments on multiple terrain datasets demonstrate that the proposed method consistently achieves superior trade-offs between total distance and makespan compared to existing baselines.
DPNet: Doppler LiDAR Motion Planning for Highly-Dynamic Environments
Existing motion planning methods often struggle with rapid-motion obstacles due to an insufficient understanding of environmental changes. To address this limitation, we propose integrating motion planners with Doppler LiDARs which provide not only ranging measurements but also instantaneous point velocities. However, this integration is nontrivial due to the dual requirements of high accuracy and high frequency. To this end, we introduce Doppler Planning Network (DPNet), which tracks and reacts to rapid obstacles using Doppler model-based learning. Particularly, we first propose a Doppler Kalman neural network (D-KalmanNet) to track the future states of obstacles under partially observable Gaussian state space model. We then leverage the estimated motions to construct a Doppler-tuned model predictive control (DT-MPC) framework for ego-motion planning, enabling runtime auto-tuning of the controller parameters. These two model-based learners allow DPNet to maintain lightweight while learning fast environmental changes using minimum data, and achieve both high frequency and high accuracy in tracking and planning. Experiments on both high-fidelity simulator and real-world datasets demonstrate the superiority of DPNet over extensive benchmark schemes.
MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation
Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pipelines suffer from inaccurate motion retargeting, low data-collection efficiency, and missing high-resolution fingertip tactile sensing. We address this gap with MILE, a mechanically isomorphic teleoperation and data-collection system co-designed from human hand to exoskeleton to robotic hand. The exoskeleton is anthropometrically derived from the human hand, and the robotic hand preserves one-to-one joint-position isomorphism, eliminating nonlinear retargeting and enabling precise, natural control. The exoskeleton achieves a multi-joint mean absolute angular error below one degree, while the robotic hand integrates compact fingertip visuotactile modules that provide high-resolution tactile observations. Built on this retargeting-free interface, we teleoperate complex, contact-rich in-hand manipulation and efficiently collect a multimodal dataset comprising high-resolution fingertip visuotactile signals, RGB-D images, and joint positions. The teleoperation pipeline achieves a mean success rate improvement of 64%. Incorporating fingertip tactile observations further increases the success rate by an average of 25% over the vision-only baseline, validating the fidelity and utility of the dataset. Further details are available at: https://sites.google.com/view/mile-system.
Data-Driven Modeling and Correction of Vehicle Dynamics
We develop a data-driven framework for learning and correcting non-autonomous vehicle dynamics. Physics-based vehicle models are often simplified for tractability and therefore exhibit inherent model-form uncertainty, motivating the need for data-driven correction. Moreover, non-autonomous dynamics are governed by time-dependent control inputs, which pose challenges in learning predictive models directly from temporal snapshot data. To address these, we reformulate the vehicle dynamics via a local parameterization of the time-dependent inputs, yielding a modified system composed of a sequence of local parametric dynamical systems. We approximate these parametric systems using two complementary approaches. First, we employ the DRIPS (dimension reduction and interpolation in parameter space) methodology to construct efficient linear surrogate models, equipped with lifted observable spaces and manifold-based operator interpolation. This enables data-efficient learning of vehicle models whose dynamics admit accurate linear representations in the lifted spaces. Second, for more strongly nonlinear systems, we employ FML (Flow Map Learning), a deep neural network approach that approximates the parametric evolution map without requiring special treatment of nonlinearities. We further extend FML with a transfer-learning-based model correction procedure, enabling the correction of misspecified prior models using only a sparse set of high-fidelity or experimental measurements, without assuming a prescribed form for the correction term. Through a suite of numerical experiments on unicycle, simplified bicycle, and slip-based bicycle models, we demonstrate that DRIPS offers robust and highly data-efficient learning of non-autonomous vehicle dynamics, while FML provides expressive nonlinear modeling and effective correction of model-form errors under severe data scarcity.
RealAppliance: Let High-fidelity Appliance Assets Controllable and Workable as Aligned Real Manuals
Existing appliance assets suffer from poor rendering, incomplete mechanisms, and misalignment with manuals, leading to simulation-reality gaps that hinder appliance manipulation development. In this work, we introduce the RealAppliance dataset, comprising 100 high-fidelity appliances with complete physical, electronic mechanisms, and program logic aligned with their manuals. Based on these assets, we propose the RealAppliance-Bench benchmark, which evaluates multimodal large language models and embodied manipulation planning models across key tasks in appliance manipulation planning: manual page retrieval, appliance part grounding, open-loop manipulation planning, and closed-loop planning adjustment. Our analysis of model performances on RealAppliance-Bench provides insights for advancing appliance manipulation research
Dependent Reachable Sets for the Constant Bearing Pursuit Strategy
This paper introduces a novel reachability problem for the scenario where one agent follows another agent using the constant bearing pursuit strategy, and analyzes the geometry of the reachable set of the follower. Key theoretical results are derived, providing bounds for the associated dependent reachable set. Simulation results are presented to empirically establish the shape of the dependent reachable set. In the process, an original optimization problem for the constant bearing strategy is formulated and analyzed.
comment: This work has been submitted to a journal for possible publication
"Why the face?": Exploring Robot Error Detection Using Instrumented Bystander Reactions
How do humans recognize and rectify social missteps? We achieve social competence by looking around at our peers, decoding subtle cues from bystanders - a raised eyebrow, a laugh - to evaluate the environment and our actions. Robots, however, struggle to perceive and make use of these nuanced reactions. By employing a novel neck-mounted device that records facial expressions from the chin region, we explore the potential of previously untapped data to capture and interpret human responses to robot error. First, we develop NeckNet-18, a 3D facial reconstruction model to map the reactions captured through the chin camera onto facial points and head motion. We then use these facial responses to develop a robot error detection model which outperforms standard methodologies such as using OpenFace or video data, generalizing well especially for within-participant data. Through this work, we argue for expanding human-in-the-loop robot sensing, fostering more seamless integration of robots into diverse human environments, pushing the boundaries of social cue detection and opening new avenues for adaptable robotics.
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
Comprehensive, unbiased, and comparable evaluation of modern generalist policies is uniquely challenging: existing approaches for robot benchmarking typically rely on heavy standardization, either by specifying fixed evaluation tasks and environments, or by hosting centralized ''robot challenges'', and do not readily scale to evaluating generalist policies across a broad range of tasks and environments. In this work, we propose RoboArena, a new approach for scalable evaluation of generalist robot policies in the real world. Instead of standardizing evaluations around fixed tasks, environments, or locations, we propose to crowd-source evaluations across a distributed network of evaluators. Importantly, evaluators can freely choose the tasks and environments they evaluate on, enabling easy scaling of diversity, but they are required to perform double-blind evaluations over pairs of policies. Then, by aggregating preference feedback from pairwise comparisons across diverse tasks and environments, we can derive a ranking of policies. We instantiate our approach across a network of evaluators at seven academic institutions using the DROID robot platform. Through more than 600 pairwise real-robot evaluation episodes across seven generalist policies, we demonstrate that our crowd-sourced approach can more accurately rank the performance of existing generalist policies than conventional, centralized evaluation approaches, while being more scalable, resilient, and trustworthy. We open our evaluation network to the community and hope that it can enable more accessible comparisons of generalist robot policies.
comment: Website: https://robo-arena.github.io/
Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
In dynamic environments, robots often encounter constrained movement trajectories when manipulating objects with specific properties, such as doors. Therefore, applying the appropriate force is crucial to prevent damage to both the robots and the objects. However, current vision-guided robot state generation methods often falter in this regard, as they lack the integration of tactile perception. To tackle this issue, this paper introduces a novel state diffusion framework termed SafeDiff. It generates a prospective state sequence from the current robot state and visual context observation while incorporating real-time tactile feedback to refine the sequence. As far as we know, this is the first study specifically focused on ensuring force safety in robotic manipulation. It significantly enhances the rationality of state planning, and the safe action trajectory is derived from inverse dynamics based on this refined planning. In practice, unlike previous approaches that concatenate visual and tactile data to generate future robot state sequences, our method employs tactile data as a calibration signal to adjust the robot's state within the state space implicitly. Additionally, we've developed a large-scale simulation dataset called SafeDoorManip50k, offering extensive multimodal data to train and evaluate the proposed method. Extensive experiments show that our visual-tactile model substantially mitigates the risk of harmful forces in the door opening, across both simulated and real-world settings.
comment: Website URL: see https://i-am-future.github.io/safediff/
A Minimal Subset Approach for Informed Keyframe Sampling in Large-Scale SLAM
Typical LiDAR SLAM architectures feature a front-end for odometry estimation and a back-end for refining and optimizing the trajectory and map, commonly through loop closures. However, loop closure detection in large-scale missions presents significant computational challenges due to the need to identify, verify, and process numerous candidate pairs for pose graph optimization. Keyframe sampling bridges the front-end and back-end by selecting frames for storing and processing during global optimization. This article proposes an online keyframe sampling approach that constructs the pose graph using the most impactful keyframes for loop closure. We introduce the Minimal Subset Approach (MSA), which optimizes two key objectives: redundancy minimization and information preservation, implemented within a sliding window framework. By operating in the feature space rather than 3-D space, MSA efficiently reduces redundant keyframes while retaining essential information. Evaluations on diverse public datasets show that the proposed approach outperforms naive methods in reducing false positive rates in place recognition, while delivering superior ATE and RPE in metric localization, without the need for manual parameter tuning. Additionally, MSA demonstrates efficiency and scalability by reducing memory usage and computational overhead during loop closure detection and pose graph optimization.
comment: Please cite the published version. 8 pages, 9 figures
Have We Scene It All? Scene Graph-Aware Deep Point Cloud Compression
Efficient transmission of 3D point cloud data is critical for advanced perception in centralized and decentralized multi-agent robotic systems, especially nowadays with the growing reliance on edge and cloud-based processing. However, the large and complex nature of point clouds creates challenges under bandwidth constraints and intermittent connectivity, often degrading system performance. We propose a deep compression framework based on semantic scene graphs. The method decomposes point clouds into semantically coherent patches and encodes them into compact latent representations with semantic-aware encoders conditioned by Feature-wise Linear Modulation (FiLM). A folding-based decoder, guided by latent features and graph node attributes, enables structurally accurate reconstruction. Experiments on the SemanticKITTI and nuScenes datasets show that the framework achieves state-of-the-art compression rates, reducing data size by up to 98% while preserving both structural and semantic fidelity. In addition, it supports downstream applications such as multi-robot pose graph optimization and map merging, achieving trajectory accuracy and map alignment comparable to those obtained with raw LiDAR scans.
comment: Please cite published version. 8 pages, 6 figures
Adversarial Exploitation of Data Diversity Improves Visual Localization
Visual localization, which estimates a camera's pose within a known scene, is a fundamental capability for autonomous systems. While absolute pose regression (APR) methods have shown promise for efficient inference, they often struggle with generalization. Recent approaches attempt to address this through data augmentation with varied viewpoints, yet they overlook a critical factor: appearance diversity. In this work, we identify appearance variation as the key to robust localization. Specifically, we first lift real 2D images into 3D Gaussian Splats with varying appearance and deblurring ability, enabling the synthesis of diverse training data that varies not just in poses but also in environmental conditions such as lighting and weather. To fully unleash the potential of the appearance-diverse data, we build a two-branch joint training pipeline with an adversarial discriminator to bridge the syn-to-real gap. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, reducing translation and rotation errors by 50\% and 41\% on indoor datasets, and 38\% and 44\% on outdoor datasets. Most notably, our method shows remarkable robustness in dynamic driving scenarios under varying weather conditions and in day-to-night scenarios, where previous APR methods fail. Project Page: https://ai4ce.github.io/RAP/
comment: 24 pages, 22 figures
Towards Fully Onboard State Estimation and Trajectory Tracking for UAVs with Suspended Payloads
This paper addresses the problem of tracking the position of a cable-suspended payload carried by an unmanned aerial vehicle, with a focus on real-world deployment and minimal hardware requirements. In contrast to many existing approaches that rely on motion-capture systems, additional onboard cameras, or instrumented payloads, we propose a framework that uses only standard onboard sensors--specifically, real-time kinematic global navigation satellite system measurements and data from the onboard inertial measurement unit--to estimate and control the payload's position. The system models the full coupled dynamics of the aerial vehicle and payload, and integrates a linear Kalman filter for state estimation, a model predictive contouring control planner, and an incremental model predictive controller. The control architecture is designed to remain effective despite sensing limitations and estimation uncertainty. Extensive simulations demonstrate that the proposed system achieves performance comparable to control based on ground-truth measurements, with only minor degradation (< 6%). The system also shows strong robustness to variations in payload parameters. Field experiments further validate the framework, confirming its practical applicability and reliable performance in outdoor environments using only off-the-shelf aerial vehicle hardware.
comment: Updated to match the published version. Added journal reference and DOI
Delta-Triplane Transformers as Occupancy World Models
Occupancy World Models (OWMs) aim to predict future scenes via 3D voxelized representations of the environment to support intelligent motion planning. Existing approaches typically generate full future occupancy states from VAE-style latent encodings, which can be computationally expensive and redundant. We propose Delta-Triplane Transformers (DTT), a novel 4D OWM for autonomous driving, that introduces two key innovations: (1) a triplane based representation that encodes 3D occupancy more compactly than previous approaches, and (2) an incremental prediction strategy for OWM that models {\em changes} in occupancy rather than dealing with full states. The core insight is that changes in the compact 3D latent space are naturally sparser and easier to model, enabling higher accuracy with a lighter-weight architecture. Building on this representation, DTT extracts multi-scale motion features from historical data and iteratively predict future triplane deltas. These deltas are combined with past states to decode future occupancy and ego-motion trajectories. Extensive experiments demonstrate that DTT delivers a 1.44$\times$ speedup (26 FPS) over the state of the art, improves mean IoU to 30.85, and reduces the mean absolute planning error to 1.0 meters. Demo videos are provided in the supplementary material.
A Neuro-inspired Theory of Joint Human-Swarm Interaction ICRA
Human-swarm interaction (HSI) is an active research challenge in the realms of swarm robotics and human-factors engineering. Here we apply a cognitive systems engineering perspective and introduce a neuro-inspired joint systems theory of HSI. The mindset defines predictions for adaptive, robust and scalable HSI dynamics and therefore has the potential to inform human-swarm loop design.
comment: ICRA Workshop on Human-Swarm Interaction 2020
Multiagent Systems
Hierarchical Decentralized Multi-Agent Coordination with Privacy-Preserving Knowledge Sharing: Extending AgentNet for Scalable Autonomous Systems
Decentralized multi-agent systems have shown promise in enabling autonomous collaboration among LLM-based agents. While AgentNet demonstrated the feasibility of fully decentralized coordination through dynamic DAG topologies, several limitations remain: scalability challenges with large agent populations, communication overhead, lack of privacy guarantees, and suboptimal resource allocation. We propose AgentNet++, a hierarchical decentralized framework that extends AgentNet with multilevel agent organization, privacy-preserving knowledge sharing via differential privacy and secure aggregation, adaptive resource management, and theoretical convergence guarantees. Our approach introduces cluster-based hierarchies where agents self-organize into specialized groups, enabling efficient task routing and knowledge distillation while maintaining full decentralization. We provide formal analysis of convergence properties and privacy bounds, and demonstrate through extensive experiments on complex multi-agent tasks that AgentNet++ achieves 23% higher task completion rates, 40% reduction in communication overhead, and maintains strong privacy guarantees compared to AgentNet and other baselines. Our framework scales effectively to 1000+ agents while preserving the emergent intelligence properties of the original AgentNet.
comment: 6 pages, 4 figures
AgentODRL: A Large Language Model-based Multi-agent System for ODRL Generation AAAI 2026
The Open Digital Rights Language (ODRL) is a pivotal standard for automating data rights management. However, the inherent logical complexity of authorization policies, combined with the scarcity of high-quality "Natural Language-to-ODRL" training datasets, impedes the ability of current methods to efficiently and accurately translate complex rules from natural language into the ODRL format. To address this challenge, this research leverages the potent comprehension and generation capabilities of Large Language Models (LLMs) to achieve both automation and high fidelity in this translation process. We introduce AgentODRL, a multi-agent system based on an Orchestrator-Workers architecture. The architecture consists of specialized Workers, including a Generator for ODRL policy creation, a Decomposer for breaking down complex use cases, and a Rewriter for simplifying nested logical relationships. The Orchestrator agent dynamically coordinates these Workers, assembling an optimal pathway based on the complexity of the input use case. Specifically, we enhance the ODRL Generator by incorporating a validator-based syntax strategy and a semantic reflection mechanism powered by a LoRA-finetuned model, significantly elevating the quality of the generated policies. Extensive experiments were conducted on a newly constructed dataset comprising 770 use cases of varying complexity, all situated within the context of data spaces. The results, evaluated using ODRL syntax and semantic scores, demonstrate that our proposed Orchestrator-Workers system, enhanced with these strategies, achieves superior performance on the ODRL generation task.
comment: Accepted by AAAI 2026. 9 pages, 1 figure
Toward a Safe Internet of Agents
Background: Autonomous agents powered by Large Language Models (LLMs) are driving a paradigm shift toward an "Internet of Agents" (IoA). While offering immense potential, this vision also introduces novel and systemic risks to safety and security. Objectives: Unlike common threat-centric taxonomies, our survey provides a principled, architectural framework for engineering safe and reliable agentic systems. We aim to identify the architectural sources of vulnerabilities to establish a foundation for secure design. Methods: We perform a bottom-up deconstruction of agentic systems, treating each component as a dual-use interface. The analysis spans three levels of complexity: the foundational Single Agent, the collaborative Multi-Agent System (MAS), and the visionary Interoperable Multi-Agent System (IMAS). At each level, we identify core architectural components and their inherent security risks. Results & Conclusions: Our central finding is that agentic safety is an architectural principle, not an add-on. By identifying specific vulnerabilities and deriving mitigation principles at each level of the agentic stack, this survey serves as a foundational guide for building the capable, safe, and trustworthy AI needed to realize a secure Internet of Agents.
comment: Submitted to the Journal of Artificial Intelligence Research (JAIR) for peer review. 43 pages
Truthful Double Auctions under Approximate VCG: Immediate-Penalty Enforcement in P2P Energy Trading
This paper examines truthful double auctions when exact VCG allocation is computationally infeasible and repeated-game punishments are impractical. We analyze an $α$-approximate VCG mechanism and show that truthful reporting becomes a subgame-perfect equilibrium when the immediate penalty exceeds the incentive gap created by approximation, scaled by monitoring accuracy. To validate this result, we construct a PPO-based multi-agent reinforcement learning environment for P2P smart-grid trading, where prosumers incur penalties for bidding far from their true valuations. Across systematic experiments varying approximation accuracy, tolerance, penalty magnitude, and discounting, the learned behavior closely matches theoretical predictions. The findings demonstrate that immediate-penalty approximate VCG mechanisms provide a practical and transparent approach to sustaining truthful behavior in distributed market settings.
Evaluating LLMs in Open-Source Games NeurIPS 2025
Large Language Models' (LLMs) programming capabilities enable their participation in open-source games: a game-theoretic setting in which players submit computer programs in lieu of actions. These programs offer numerous advantages, including interpretability, inter-agent transparency, and formal verifiability; additionally, they enable program equilibria, solutions that leverage the transparency of code and are inaccessible within normal-form settings. We evaluate the capabilities of leading open- and closed-weight LLMs to predict and classify program strategies and evaluate features of the approximate program equilibria reached by LLM agents in dyadic and evolutionary settings. We identify the emergence of payoff-maximizing, cooperative, and deceptive strategies, characterize the adaptation of mechanisms within these programs over repeated open-source games, and analyze their comparative evolutionary fitness. We find that open-source games serve as a viable environment to study and steer the emergence of cooperative strategy in multi-agent dilemmas.
comment: 39th Conference on Neural Information Processing Systems (NeurIPS 2025)
CogEvo-Edu: Cognitive Evolution Educational Multi-Agent Collaborative System
Large language models (LLMs) are increasingly deployed as conversational tutors in STEM education, yet most systems still rely on a single LLM with a static retrieval-augmented generation (RAG) pipeline over course materials. This design struggles in complex domains such as digital signal processing (DSP), where tutors must maintain coherent long-term student models, manage heterogeneous knowledge bases, and adapt teaching strategies over extended interactions. We argue that retrieval, memory, and control should be treated as a coupled cognitive evolution process. We instantiate this view in CogEvo-Edu, a hierarchical educational multi-agent system comprising a Cognitive Perception Layer (CPL), a Knowledge Evolution Layer (KEL), and a Meta-Control Layer (MCL). CPL maintains dual memories and performs confidence-weighted consolidation to build structured, self-correcting student profiles under limited context. KEL assigns each knowledge chunk a spatiotemporal value that drives activation, semantic compression, and forgetting. MCL formulates tutoring as hierarchical sequential decision making, orchestrating specialized agents and jointly adapting CPL/KEL hyperparameters via a dual inner--outer loop. To evaluate CogEvo-Edu, we construct DSP-EduBench, a vertical benchmark for DSP tutoring with heterogeneous resources, simulated student profiles, and long-horizon interaction scripts. Using a three-model LLM-as-a-Judge ensemble, CogEvo-Edu raises the overall score from 5.32 to 9.23 and improves all six indicators over static RAG, simple memory, and a single-agent variant, demonstrating the value of jointly evolving student profiles, knowledge bases, and teaching policies.
DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents
Spatial task allocation in systems such as multi-robot delivery or ride-sharing requires balancing efficiency with fair service across tasks. Greedy assignment policies that match each agent to its highest-preference or lowest-cost task can maximize efficiency but often create inequities: some tasks receive disproportionately favorable service (e.g., shorter delays or better matches), while others face long waits or poor allocations. We study fairness in heterogeneous multi-agent systems where tasks vary in preference alignment and urgency. Most existing approaches either assume centralized coordination or largely ignore fairness under partial observability. Distinct from this prior work, we establish a connection between the Eisenberg-Gale (EG) equilibrium convex program and decentralized, partially observable multi-agent learning. Building on this connection, we develop two equilibrium-informed algorithms that integrate fairness and efficiency: (i) a multi-agent reinforcement learning (MARL) framework, EG-MARL, whose training is guided by centralized fair assignment algorithms (EG and a preference-aware Hungarian method); and (ii) a stochastic online optimization mechanism that performs guided exploration and subset-based fair assignment as tasks are discovered. We evaluate our frameworks across a range of team sizes and assignment formulations against centralized EG, Hungarian, and Min-Max Distance baselines. Both algorithms preserve the fairness-efficiency balance of the Eisenberg-Gale equilibrium under partial observability. EG-MARL achieves near-centralized coordination and reduced travel distances, while the stochastic online mechanism enables real-time allocation with competitive fairness. Together, these results demonstrate that spatially aware EG formulations can effectively guide decentralized coordination in agents with heterogeneous capabilities.
The Condorcet Dimension of Metric Spaces
A Condorcet winning set is a set of candidates such that no other candidate is preferred by at least half the voters over all members of the set. The Condorcet dimension, which is the minimum cardinality of a Condorcet winning set, is known to be at most logarithmic in the number of candidates. We study the case of elections where voters and candidates are located in a $2$-dimensional space with preferences based upon proximity voting. Our main result is that the Condorcet dimension is at most $3$, under both the Manhattan norm and the infinity norm, natural measures in electoral systems.
comment: 10 pages
A CPU-Centric Perspective on Agentic AI
Agentic AI frameworks add a decision-making orchestrator embedded with external tools, including web search, Python interpreter, contextual database, and others, on top of monolithic LLMs, turning them from passive text oracles into autonomous problem-solvers that can plan, call tools, remember past steps, and adapt on the fly. This paper aims to characterize and understand the system bottlenecks introduced by agentic AI workloads from a largely overlooked CPU-centric perspective. We first systematically characterize Agentic AI on the basis of orchestrator/decision making component, inference path dynamics and repetitiveness of the agentic flow which directly influences the system-level performance. Thereafter, based on the characterization, we choose five representative agentic AI workloads- Haystack RAG, Toolformer, ChemCrow, Langchain and SWE-Agent to profile latency, throughput and energy metrics and demystify the significant impact of CPUs on these metrics relative to GPUs. We observe that - 1. Tool processing on CPUs can take up to 90.6% of the total latency; 2. Agentic throughput gets bottlenecked either by CPU factors - coherence, synchronization and over-subscription of cores or GPU factors - main memory capacity and bandwidth; \circled{3} CPU dynamic energy consumes up to 44% of the total dynamic energy at large batch sizes. Based on the profiling insights, we present two key optimizations- 1. CPU and GPU-Aware Micro-batching (CGAM) and 2. Mixed Agentic Workload Scheduling (MAWS) for homogeneous and heterogeneous agentic workloads respectively to demonstrate the potential to improve the performance, efficiency, and scalability of agentic AI. We achieve up to 2.1x and 1.41x P50 latency speedup compared to the multi-processing benchmark for homogeneous and heterogeneous agentic workloads respectively.
A Neuro-inspired Theory of Joint Human-Swarm Interaction ICRA
Human-swarm interaction (HSI) is an active research challenge in the realms of swarm robotics and human-factors engineering. Here we apply a cognitive systems engineering perspective and introduce a neuro-inspired joint systems theory of HSI. The mindset defines predictions for adaptive, robust and scalable HSI dynamics and therefore has the potential to inform human-swarm loop design.
comment: ICRA Workshop on Human-Swarm Interaction 2020
Tool-RoCo: An Agent-as-Tool Self-organization Large Language Model Benchmark in Multi-robot Cooperation
This study proposes Tool-RoCo, a novel benchmark for evaluating large language models (LLMs) in long-term multi-agent cooperation based on RoCo, a multi-robot cooperative benchmark. Recent research on LLM-based multi-agent systems has relied on predefined orchestration, while ignoring agent autonomy. Tool-RoCo treats other agents as tools and introduces cooperative tools, leveraging tool usage to evaluate multi-agent cooperation and self-organization. Tool usage means that each agent (LLM) selects a tool from a candidate set based on the current state, receives feedback, and adjusts its selection in subsequent rounds. To evaluate different autonomy levels, we propose four LLM paradigms: (1) centralized cooperation, where a single LLM allocates tools to all agents; (2) centralized self-organization, where a central LLM autonomously activates agents while keeping others inactive; (3) decentralized cooperation, where each agent has its own LLM and calls tools based on local information; and (4) self-organization, where a randomly chosen initial agent can request collaboration, activating additional agents via tool calls. Tool-RoCo includes three multi-robot tasks, SORT, PACK, and CABINET, to measure format and parameter accuracy and agent coordination through tool usage. The results using several LLMs showed that cooperative tools accounted for only 7.09% of all tools, indicating that LLM-based agents rarely invoked others as assistants. Moreover, activation tools accounted for 96.42%, suggesting that current LLMs tend to maintain active agents while seldom deactivating them for adaptive coordination. Tool-RoCo provides a systematic benchmark to evaluate LLM autonomy and cooperation in multi-agent tasks. Code and Demo: https://github.com/ColaZhang22/Tool-Roco
comment: 9 pages, 3 figures
Systems and Control (EESS)
Active Learning of Fractional-Order Viscoelastic Model Parameters for Realistic Haptic Rendering
Effective medical simulators necessitate realistic haptic rendering of biological tissues that display viscoelastic material properties, such as creep and stress relaxation. Fractional-order models provide an effective means of describing intrinsically time-dependent viscoelastic dynamics with few parameters, as these models can naturally capture memory effects. However, due to the unintuitive frequency-dependent coupling between the order of the fractional element and the other parameters, determining appropriate parameters for fractional-order models that yield high perceived realism remains a significant challenge. In this study, we propose a systematic means of determining the parameters of fractional-order viscoelastic models that optimizes the perceived realism of haptic rendering across general populations. First, we demonstrate that the parameters of fractional-order models can be effectively optimized through active learning, via qualitative feedback-based human-in-the-loop~(HiL) optimizations, to ensure consistently high realism ratings for each individual. Second, we propose a rigorous method to combine HiL optimization results to form an aggregate perceptual map trained on the entire dataset and demonstrate the selection of population-level optimal parameters from this representation that are broadly perceived as realistic across general populations. Finally, we provide evidence of the effectiveness of the generalized fractional-order viscoelastic model parameters by characterizing their perceived realism through human-subject experiments. Overall, generalized fractional-order viscoelastic models established through the proposed HiL optimization and aggregation approach possess the potential to significantly improve the sim-to-real transition performance of medical training simulators.
comment: This work has been submitted to the IEEE for possible publication. 14 pages, 9 figures
Constraint-Aware Grid-Forming Control for Current Limiting
This work develops a constraint-aware grid-forming (GFM) control that explicitly accounts for current limits and modulation limits within the GFM oscillator dynamics generating the GFM voltage reference (i.e., phase angle and magnitude). Broadly speaking, the voltage reference generated by the constraint-aware GFM control minimizes the deviation from conventional unconstrained GFM droop control, while respecting current and modulation limits. The resulting GFM control achieves fast current limiting while preserving transient stability, e.g., exhibiting infinite critical clearing time. To develop the control, we first characterize and analyze the set of converter voltages that do not result in constraint violations. Next, an efficient algorithm for projecting voltages onto the feasible set is developed. Subsequently, these results are used to restrict the dynamics of GFM droop control to the set of feasible voltages. Finally, detailed simulation studies and hardware experiments are used to illustrate and validate the response to short-circuit faults and phase jumps.
Data-Driven Computation of Polytopic Invariant Sets for Noisy Nonlinear Systems
This paper presents a data-driven framework for computing robust, convex polytopic contractive set for constrained noisy nonlinear systems where an analytical model is not available. Our approach utilizes a finite set of collected noisy measurements to construct a polytopic set that bounds all possible system parameters compatible to available information. Based on previous results, we contribute to provide a sufficient condition for a set to be contractive using Difference of Convex functions for a noisy nonlinear system, while the model is not available. Robustness with respect to unknown model of system is guaranteed by requiring that the computed contractive set is invariant for all possible system models that are compatible with the noisy measurements. We present a tractable, optimization-based algorithm that implements this condition to compute the largest possible contractive set within the state constraint set for the unknown, noisy nonlinear system which is subjected to both state and input constraints. The effectiveness of the proposed methodology is demonstrated with a numerical example.
KinesCeTI: A Modular and Size-Adaptable Force Feedback Glove with Interchangeable Actuation for the Index and Thumb
Force feedback gloves in haptic applications remain constrained by limited adaptability, simplified feedback and fixed architectures that limit force feedback versatility. To address these challenges, we present KinesCeTI, a modular force feedback exoskeleton for the index and thumb, designed as a multipurpose device adaptable to a wide range of hand sizes. The glove incorporates interchangeable thimbles for fingertip or phalanx attachment and a bidirectional tendon transmission that supports both passive and active feedback. It is combined with a modular actuation design, where different feedback systems may be attached. The system was tested with two actuation modules: a compliant ratchet-pawl braking mechanism for passive feedback and a novel one-way clutch for variable active feedback, newly introduced here. The system was evaluated in three user studies with 20 participants each, assessing ergonomics, actuation performance and usability in both real and virtual tasks. Results indicate that the glove adapts to different hand sizes and provides effective feedback with both mechanisms, highlighting its potential as a versatile platform for haptic research.
comment: 13 pages, 15 figures, submitted to IEEE Transactions on Haptics (ToH) (October 8, 2025)
From Range Loss to Recovery - Cold Weather Challenges and Design Strategies for Commercial Electric Vehicle Fleets
The North American commercial electric vehicle (EV) sector is undergoing rapid expansion, with unit sales rising from 21,120 in 2022 to 36,491 in 2023 - a 73% increase, according to the International Energy Agency. However, this accelerating adoption brings emerging technical challenges. One critical concern is the impact of low to extreme winter temperatures (25 degree F to -25 degree F) on EV performance, including reduced energy efficiency and extended charging times. This paper presents a systematic analysis of commercial EV performance degradation under cold weather conditions and its broader implications on grid operations. Monte Carlo simulations, applied using real-world fleet parameters, indicate that approximately 200 MWh of additional daily energy demand may be required in the U.S. alone to offset efficiency losses during severe cold events. The resulting strain on an already stressed winter grid could exacerbate reliability risks. Moreover, increased harmonic distortion associated with cold weather charging behaviors has also been observed, raising concerns about power quality. To address these challenges, this study proposes two practical mitigation strategies: (1) a 'design-integrated safety' battery swapping station model operating in thermally controlled environments to significantly reduce charging downtime, and (2) a hybrid architecture combining roadside fast charging with depot-based deep charging to support continuous fleet utilization without compromising range. Together, these interventions provide a robust foundation for resilient commercial EV integration in cold climates, supporting fleet operators and utilities in managing seasonal performance variability.
comment: Accepted for publication in 2025 IEEE 13th International Conference on Smart Energy Grid Engineering (SEGE)
Cooperative Safety Intelligence in V2X-Enabled Transportation: A Survey
Vehicle-to-Everything (V2X) cooperation is reshaping traffic safety from an ego-centric sensing problem into one of collective intelligence. This survey structures recent progress within a unified Sensor-Perception-Decision (SPD) framework that formalizes how safety emerges from the interaction of distributed sensing, cooperative perception, and coordinated decision-making across vehicles and infrastructure. Rather than centering on link protocols or message formats, we focus on how shared evidence, predictive reasoning, and human-aligned interventions jointly enable proactive risk mitigation. Within this SPD lens, we synthesize advances in cooperative perception, multi-modal forecasting, and risk-aware planning, emphasizing how cross-layer coupling turns isolated detections into calibrated, actionable understanding. Timing, trust, and human factors are identified as cross-cutting constraints that determine whether predictive insights are delivered early enough, with reliable confidence, and in forms that humans and automated controllers can use. Compared with prior V2X safety surveys, this work (i) organizes the literature around a formal SPD safety loop and (ii) systematically analyzes research evolution and evaluation gaps through a PRISMA-guided bibliometric study of hundreds of publications from 2016-2025. The survey concludes with a roadmap toward cooperative safety intelligence, outlining SPD-based design principles and evaluation practices for next-generation V2X safety systems.
A Highly Configurable Framework for Large-Scale Thermal Building Data Generation to drive Machine Learning Research
Data-driven modeling of building thermal dynamics is emerging as an increasingly important field of research for large-scale intelligent building control. However, research in data-driven modeling using machine learning (ML) techniques requires massive amounts of thermal building data, which is not easily available. Neither empirical public datasets nor existing data generators meet the needs of ML research in terms of data quality and quantity. Moreover, existing data generation approaches typically require expert knowledge in building simulation. To fill this gap, we present a thermal building data generation framework which we call BuilDa. BuilDa is designed to produce synthetic data of adequate quality and quantity for ML research. The framework does not require profound building simulation knowledge to generate large volumes of data. BuilDa uses a single-zone Modelica model that is exported as a Functional Mock-up Unit (FMU) and simulated in Python. We demonstrate BuilDa by generating data and utilizing it for a transfer learning study involving the fine-tuning of 486 data-driven models.
comment: Under Review
Distributionally Robust Acceleration Control Barrier Filter for Efficient UAV Obstacle Avoidance
Dynamic obstacle avoidance (DOA) for unmanned aerial vehicles (UAVs) requires fast reaction under limited onboard resources. We introduce the distributionally robust acceleration control barrier function (DR-ACBF) as an efficient collision avoidance method maintaining safety regions. The method constructs a second-order control barrier function as linear half-space constraints on commanded acceleration. Latency, actuator limits, and obstacle accelerations are handled through an effective clearance that considers dynamics and delay. Uncertainty is mitigated using Cantelli tightening with per-obstacle risk. A DR-conditional value at risk (DR-CVaR)based early trigger expands margins near violations to improve DOA. Real-time execution is ensured via constant-time Gauss-Southwell projections. Simulation studies achieve similar avoidance performance at substantially lower computational effort than state-of-the-art baseline approaches. Experiments with Crazyflie drones demonstrate the feasibility of our approach.
comment: This work has been submitted to the IEEE for possible publication
TenonOS: A Self-Generating Intelligent Embedded Operating System Framework for Edge Computing
The rapid evolution of edge computing has exposed fundamental limitations in traditional operating system and hypervisor architectures, particularly in managing heterogeneous platforms and meeting the constraints of limited resources. Existing solutions often rely on monolithic or layered combinations of hypervisors and guest OSes, which are difficult to tailor for the diverse and dynamic requirements of edge scenarios. To address these challenges, we propose TenonOS, a demand-driven, self-generating, and lightweight operating system framework that fundamentally rethinks and reconstructs both the hypervisor and OS architectures. TenonOS introduces a novel LibOS-on-LibOS approach, in which both virtualization and OS functionalities are modularized into fine-grained, reusable micro-libraries. A dynamic orchestration engine composes these modules on demand to construct customized, application-specific runtime environments. At the core of TenonOS are two key components: Mortise, a minimal, modularized hypervisor, and Tenon, a real-time LibOS. Mortise provides low-overhead resource isolation, fast inter-VM communication, and manages the full lifecycle of Tenon instances - including on-demand creation, suspension, and termination - enabling TenonOS to flexibly adapt its runtime layout to workload variations. Tenon delivers deterministic scheduling and multi-process support for time-critical applications. Through this unified and modular architecture, TenonOS eliminates redundant layers, reduces system overhead, and enhances scalability, security, and maintainability. Extensive evaluations demonstrate that TenonOS achieves superior real-time scheduling (40.28% improvement), a compact memory footprint (361 KiB), and high adaptability to dynamic edge workloads, making it an ideal foundation for heterogeneous, resource-constrained edge systems.
Distributed Observer and Controller Design for Linear Systems: A Separation-Based Approach
This paper investigates the problem of consensus-based distributed control of linear time-invariant multi-channel systems subject to unknown inputs. A distributed observer-based control framework is proposed, within which observer nodes and controller nodes collaboratively perform state estimation and control tasks. Consensus refers to a distributed cooperative mechanism by which each observer node compares its state estimate with those of neighboring nodes, and use the resulting discrepancies to update its own state estimate. One key contribution of this work is to show that the distributed observers and the distributed controllers can be designed independently, which parallels the classical separation principle. This separability within the distributed framework is enabled by a discontinuous consensus strategy and two adaptive algorithms developed specifically for handling the unknown inputs. Theoretical analysis and numerical simulation results demonstrate the effectiveness of the proposed framework in achieving state estimation, stabilization, and tracking control objectives.
Adaptive prediction theory combining offline and online learning
Real-world intelligence systems usually operate by combining offline learning and online adaptation with highly correlated and non-stationary system data or signals, which, however, has rarely been investigated theoretically in the literature. This paper initiates a theoretical investigation on the prediction performance of a two-stage learning framework combining offline and online algorithms for a class of nonlinear stochastic dynamical systems. For the offline-learning phase, we establish an upper bound on the generalization error for approximate nonlinear-least-squares estimation under general datasets with strong correlation and distribution shift, leveraging the Kullback-Leibler divergence to quantify the distributional discrepancies. For the online-adaptation phase, we address, on the basis of the offline-trained model, the possible uncertain parameter drift in real-world target systems by proposing a meta-LMS prediction algorithm. This two-stage framework, integrating offline learning with online adaptation, demonstrates superior prediction performances compared with either purely offline or online methods. Both theoretical guarantees and empirical studies are provided.
Introducing AI-Driven IoT Energy Management Framework
Power consumption has become a critical aspect of modern life due to the consistent reliance on technological advancements. Reducing power consumption or following power usage predictions can lead to lower monthly costs and improved electrical reliability. The proposal of a holistic framework to establish a foundation for IoT systems with a focus on contextual decision making, proactive adaptation, and scalable structure. A structured process for IoT systems with accuracy and interconnected development would support reducing power consumption and support grid stability. This study presents the feasibility of this proposal through the application of each aspect of the framework. This system would have long term forecasting, short term forecasting, anomaly detection, and consideration of qualitative data with any energy management decisions taken. Performance was evaluated on Power Consumption Time Series data to display the direct application of the framework.
comment: Accepted in IEEE Smart World Congress 2025, Calgary, Canada
Datamodel-Based Data Selection for Nonlinear Data-Enabled Predictive Control
Data-Enabled Predictive Control (DeePC) has emerged as a powerful framework for controlling unknown systems directly from input-output data. For nonlinear systems, recent work has proposed selecting relevant subsets of data columns based on geometric proximity to the current operating point. However, such proximity-based selection ignores the control objective: different reference trajectories may benefit from different data even at the same operating point. In this paper, we propose a datamodel-based approach that learns a context-dependent influence function mapping the current initial trajectory and reference trajectory to column importance scores. Adapting the linear datamodel framework from machine learning, we model closed-loop cost as a linear function of column inclusion indicators, with coefficients that depend on the control context. Training on closed-loop simulations, our method captures which data columns actually improve tracking performance for specific control tasks. Experimental results demonstrate that task-aware selection substantially outperforms geometry-based heuristics, particularly when using small data subsets.
Dependent Reachable Sets for the Constant Bearing Pursuit Strategy
This paper introduces a novel reachability problem for the scenario where one agent follows another agent using the constant bearing pursuit strategy, and analyzes the geometry of the reachable set of the follower. Key theoretical results are derived, providing bounds for the associated dependent reachable set. Simulation results are presented to empirically establish the shape of the dependent reachable set. In the process, an original optimization problem for the constant bearing strategy is formulated and analyzed.
comment: This work has been submitted to a journal for possible publication
A Linear Programming Framework for Optimal Event-Triggered LQG Control
This letter explores intelligent scheduling of sensor-to-controller communication in networked control systems, particularly when data transmission incurs a cost. While the optimal controller in a standard linear quadratic Gaussian (LQG) setup can be computed analytically, determining the optimal times to transmit sensor data remains computationally and analytically challenging. We show that, through reformulation and the introduction of auxiliary binary variables, the scheduling problem can be cast as a computationally efficient mixed-integer linear program (MILP). This formulation not only simplifies the analysis but also reveals structural insights and provides clear decision criteria at each step. Embedding the approach within a model predictive control (MPC) framework enables dynamic adaptation, and we prove that the resulting scheduler performs at least as well as any deterministic strategy (e.g., periodic strategy). Simulation results further demonstrate that our method consistently outperforms traditional periodic scheduling.
A Cyber Insurance Policy for Hedging Against Load-Altering Attacks and Extreme Load Variations in Distribution Grids
Uncertainties in renewable energy resources (RES) and load variations can lead to elevated system operational costs. Moreover, the emergence of large-scale distributed threats, such as load-altering attacks (LAAs), can induce substantial load variations, further exacerbating these costs. Although traditional defense measures can reduce the likelihood of such attacks, considerable residual risks remain. Thus, this paper proposes a cyber insurance framework designed to hedge against additional operational costs resulting from LAAs and substantial load variations in renewable-rich grids. The insurance framework determines both the insurance coverage and premium based on the Value at Risk (VaR) and Tail Value at Risk (TVaR). These risk metrics are calculated using the system failure probability and the probability density function (PDF) of the system operation cost. The system failure probability is assessed through a semi-Markov process (SMP), while the cost distribution is estimated through a cost minimization model of a distribution grid combined with a Monte Carlo simulation to capture load variability. Furthermore, we employ a bi-level optimization scheme that identifies the specific load distribution leading to the maximum system cost, thereby enhancing the accuracy of the operation cost PDF estimation. The effectiveness and scalability of the proposed cyber insurance policy are evaluated considering a modified IEEE 118-bus test system and the IEEE European low-voltage (LV) test feeder model. The case study shows that with a relatively low premium, the network operator can hedge against additional operational costs caused by malicious load manipulations.
Towards Fully Onboard State Estimation and Trajectory Tracking for UAVs with Suspended Payloads
This paper addresses the problem of tracking the position of a cable-suspended payload carried by an unmanned aerial vehicle, with a focus on real-world deployment and minimal hardware requirements. In contrast to many existing approaches that rely on motion-capture systems, additional onboard cameras, or instrumented payloads, we propose a framework that uses only standard onboard sensors--specifically, real-time kinematic global navigation satellite system measurements and data from the onboard inertial measurement unit--to estimate and control the payload's position. The system models the full coupled dynamics of the aerial vehicle and payload, and integrates a linear Kalman filter for state estimation, a model predictive contouring control planner, and an incremental model predictive controller. The control architecture is designed to remain effective despite sensing limitations and estimation uncertainty. Extensive simulations demonstrate that the proposed system achieves performance comparable to control based on ground-truth measurements, with only minor degradation (< 6%). The system also shows strong robustness to variations in payload parameters. Field experiments further validate the framework, confirming its practical applicability and reliable performance in outdoor environments using only off-the-shelf aerial vehicle hardware.
comment: Updated to match the published version. Added journal reference and DOI
PolyOCP.jl -- A Julia Package for Stochastic OCPs and MPC
The consideration of stochastic uncertainty in optimal and predictive control is a well-explored topic. Recently Polynomial Chaos Expansions (PCE) have seen a lot of considerations for problems involving stochastically uncertain system parameters and also for problems with additive stochastic i.i.d. disturbances. While there exist a number of open-source PCE toolboxes, tailored open-source codes for the solution of OCPs involving additive stochastic i.i.d. disturbances in julia are not available. Hence, this paper introduces the toolbox \texttt{PolyOCP.jl} which enables to efficiently solve stochastic OCPs for a large class of disturbance distributions. We explain the main mathematical concepts between the PCE transcription of stochastic OCPs and how they are provided in the toolbox. We draw upon two examples to illustrate the functionalities of \texttt{PolyOCP.jl}.
Towards Ethical AI in Power Electronics: How Engineering Practice and Roles Must Adapt
Artificial intelligence (AI) is rapidly transforming power electronics, with AI-related publications in IEEE Power Electronics Society selected journals increasing more than fourfold from 2020 to 2025. However, the ethical dimensions of this transformation have received limited attention. This article underscores the urgent need for an ethical framework to guide responsible AI integration in power electronics, not only to prevent AI-related incidents but also to comply with legal and regulatory responsibilities. In this context, this article identifies four core pillars of AI ethics in power electronics: Security & Safety, Explainability & Transparency, Energy Sustainability, and Evolving Roles of Engineers. Each pillar is supported by practical and actionable insights to ensure that ethical principles are embedded in algorithm design, system deployment, and the preparation of an AI-ready engineering workforce. The authors advocate for power electronics engineers to lead the ethical discourse, given their deep technical understanding of both AI systems and power conversion technologies. The paper concludes by calling on the IEEE Power Electronics Society to spearhead the establishment of ethical standards, talent development initiatives, and best practices that ensure AI innovations are not only technically advanced but also oriented toward human and societal benefit.
Make Your AUV Adaptive: An Environment-Aware Reinforcement Learning Framework For Underwater Tasks IROS 2025
This study presents a novel environment-aware reinforcement learning (RL) framework designed to augment the operational capabilities of autonomous underwater vehicles (AUVs) in underwater environments. Departing from traditional RL architectures, the proposed framework integrates an environment-aware network module that dynamically captures flow field data, effectively embedding this critical environmental information into the state space. This integration facilitates real-time environmental adaptation, significantly enhancing the AUV's situational awareness and decision-making capabilities. Furthermore, the framework incorporates AUV structure characteristics into the optimization process, employing a large language model (LLM)-based iterative refinement mechanism that leverages both environmental conditions and training outcomes to optimize task performance. Comprehensive experimental evaluations demonstrate the framework's superior performance, robustness and adaptability.
comment: This paper has been accepted by IROS 2025. Yimian Ding and Jingzehua Xu contributed equally to this work, and Jingzehua Xu is also the corresponding author of this paper
Distance Between Stochastic Linear Systems
While the existing stochastic control theory is well equipped to handle dynamical systems with stochastic uncertainties, a paradigm shift using distance measure based decision making is required for the effective further exploration of the field. As a first step, a distance measure between two stochastic linear time invariant systems is proposed here, extending the existing distance metrics between deterministic linear dynamical systems. In the frequency domain, the proposed distance measure corresponds to the worst-case point-wise in frequency Wasserstein distance between distributions characterising the uncertainties using inverse stereographic projection on the Riemann sphere. For the time domain setting, the proposed distance corresponds to the gap metric induced type-q Wasserstein distance between the distributions characterising the uncertainty of plant models. Apart from providing lower and upper bounds for the proposed distance measures in both frequency and time domain settings, it is proved that the former never exceeds the latter. The proposed distance measures will facilitate the provision of probabilistic guarantees on system robustness and controller performances.
comment: Submitted to IEEE Transactions on Control. 15 Pages in total
Aggregative games with bilevel structures: Distributed algorithms and convergence analysis
In this paper, the problem of distributively seeking the equilibria of aggregative games with bilevel structures is studied. Different from the traditional aggregative games, here the aggregation is determined by the minimizer of a virtual leader's objective function in the inner level, which depends on the actions of the players in the outer level. Moreover, the global objective function of the virtual leader is formed by the sum of some local functions with two arguments, each of which is strongly convex with respect to the second argument. When making decisions, each player in the outer level only has access to a local part of the virtual leader's objective function. To handle this problem, first, we propose a second order gradient-based distributed algorithm, where the Hessian matrices associated with the objective functions of the leader are involved. By the algorithm, players update their actions while cooperatively minimizing the objective function of the virtual leader to estimate the aggregation by communicating with their neighbors via a connected graph. Under mild assumptions on the graph and cost functions, we prove that the actions of players asymptotically converge to the Nash equilibrium point. Then, for the case where the Hessian matrices associated with the objective functions of the virtual leader are not available, we propose a first order gradient-based distributed algorithm, where a distributed two-point estimate strategy is developed to estimate the gradients of players' cost functions in the outer level. Under the same conditions, we prove that the convergence errors of players' actions to the Nash equilibrium point are linear with respect to the estimate parameters. Finally, simulations are provided to demonstrate the effectiveness of our theoretical results.
Convergence Filters for Efficient Economic MPC of Non-dissipative Systems
This note presents a novel, efficient economic model predictive control (EMPC) scheme for non-dissipative systems subject to state and input constraints. A new conception of convergence filters is defined to address the stability issue of EMPC for constrained non-dissipative systems. Three convergence filters are designed accordingly to be imposed into the receding horizon optimization problem of EMPC. To improve online computational efficiency, the variable horizon idea without explicit terminal state constraints is adopted to compromise the convergence speed, economic performance, and computational burden of EMPC. Moreover, sufficient conditions are derived to guarantee the recursive feasibility and stability of the EMPC. The advantages of the proposed EMPC are validated by a classical non-dissipative continuous stirred-tank reactor.
comment: This version has been revised to include more technical details and experiments. Submitted to an IEEE journal
Differentiable Optimization for Deep Learning-Enhanced DC Approximation of AC Optimal Power Flow
The growing scale of power systems and the increasing uncertainty introduced by renewable energy sources necessitates novel optimization techniques that are significantly faster and more accurate than existing methods. The AC Optimal Power Flow (AC-OPF) problem, a core component of power grid optimization, is often approximated using linearized DC Optimal Power Flow (DC-OPF) models for computational tractability, albeit at the cost of suboptimal and inefficient decisions. To address these limitations, we propose a novel deep learning-based framework for network equivalency that enhances DC-OPF to more closely mimic the behavior of AC-OPF. The approach utilizes recent advances in differentiable optimization, incorporating a neural network trained to predict adjusted nodal shunt conductances and branch susceptances in order to account for nonlinear power flow behavior. The model can be trained end-to-end using modern deep learning frameworks by leveraging the implicit function theorem. Results demonstrate the framework's ability to significantly improve prediction accuracy.
PolyOCP.jl -- A Julia Package for Stochastic OCPs and MPC
The consideration of stochastic uncertainty in optimal and predictive control is a well-explored topic. Recently Polynomial Chaos Expansions (PCE) have seen a lot of considerations for problems involving stochastically uncertain system parameters and also for problems with additive stochastic i.i.d. disturbances. While there exist a number of open-source PCE toolboxes, tailored open-source codes for the solution of OCPs involving additive stochastic i.i.d. disturbances in julia are not available. Hence, this paper introduces the toolbox PolyOCP$.$jl which enables to efficiently solve stochastic OCPs for a large class of disturbance distributions. We explain the main mathematical concepts between the PCE transcription of stochastic OCPs and how they are provided in the toolbox. We draw upon two examples to illustrate the functionalities of PolyOCP$.$jl.