MyArxiv
Robotics
YOR: Your Own Mobile Manipulator for Generalizable Robotics
Recent advances in robot learning have generated significant interest in capable platforms that may eventually approach human-level competence. This interest, combined with the commoditization of actuators, has propelled growth in low-cost robotic platforms. However, the optimal form factor for mobile manipulation, especially on a budget, remains an open question. We introduce YOR, an open-source, low-cost mobile manipulator that integrates an omnidirectional base, a telescopic vertical lift, and two arms with grippers to achieve whole-body mobility and manipulation. Our design emphasizes modularity, ease of assembly using off-the-shelf components, and affordability, with a bill-of-materials cost under 10,000 USD. We demonstrate YOR's capability by completing tasks that require coordinated whole-body control, bimanual manipulation, and autonomous navigation. Overall, YOR offers competitive functionality for mobile manipulation research at a fraction of the cost of existing platforms. Project website: https://www.yourownrobot.ai/
APEX: Learning Adaptive High-Platform Traversal for Humanoid Robots
Humanoid locomotion has advanced rapidly with deep reinforcement learning (DRL), enabling robust feet-based traversal over uneven terrain. Yet platforms beyond leg length remain largely out of reach because current RL training paradigms often converge to jumping-like solutions that are high-impact, torque-limited, and unsafe for real-world deployment. To address this gap, we propose APEX, a system for perceptive, climbing-based high-platform traversal that composes terrain-conditioned behaviors: climb-up and climb-down at vertical edges, walking or crawling on the platform, and stand-up and lie-down for posture reconfiguration. Central to our approach is a generalized ratchet progress reward for learning contact-rich, goal-reaching maneuvers. It tracks the best-so-far task progress and penalizes non-improving steps, providing dense yet velocity-free supervision that enables efficient exploration under strong safety regularization. Based on this formulation, we train LiDAR-based full-body maneuver policies and reduce the sim-to-real perception gap through a dual strategy: modeling mapping artifacts during training and applying filtering and inpainting to elevation maps during deployment. Finally, we distill all six skills into a single policy that autonomously selects behaviors and transitions based on local geometry and commands. Experiments on a 29-DoF Unitree G1 humanoid demonstrate zero-shot sim-to-real traversal of 0.8 meter platforms (approximately 114% of leg length), with robust adaptation to platform height and initial pose, as well as smooth and stable multi-skill transitions.
comment: Project Website: https://apex-humanoid.github.io/
Data-Efficient Hierarchical Goal-Conditioned Reinforcement Learning via Normalizing Flows
Hierarchical goal-conditioned reinforcement learning (H-GCRL) provides a powerful framework for tackling complex, long-horizon tasks by decomposing them into structured subgoals. However, its practical adoption is hindered by poor data efficiency and limited policy expressivity, especially in offline or data-scarce regimes. In this work, Normalizing flow-based hierarchical implicit Q-learning (NF-HIQL), a novel framework that replaces unimodal gaussian policies with expressive normalizing flow policies at both the high- and low-levels of the hierarchy is introduced. This design enables tractable log-likelihood computation, efficient sampling, and the ability to model rich multimodal behaviors. New theoretical guarantees are derived, including explicit KL-divergence bounds for Real-valued non-volume preserving (RealNVP) policies and PAC-style sample efficiency results, showing that NF-HIQL preserves stability while improving generalization. Empirically, NF-HIQL is evaluted across diverse long-horizon tasks in locomotion, ball-dribbling, and multi-step manipulation from OGBench. NF-HIQL consistently outperforms prior goal-conditioned and hierarchical baselines, demonstrating superior robustness under limited data and highlighting the potential of flow-based architectures for scalable, data-efficient hierarchical reinforcement learning.
comment: 9 pages, 3 figures, IEEE International Conference on Robotics and Automation 2026
Min-Sum Uniform Coverage Problem by Autonomous Mobile Robots
We study the \textit{min-sum uniform coverage} problem for a swarm of $n$ mobile robots on a given finite line segment and on a circle having finite positive radius, where the circle is given as an input. The robots must coordinate their movements to reach a uniformly spaced configuration that minimizes the total distance traveled by all robots. The robots are autonomous, anonymous, identical, and homogeneous, and operate under the \textit{Look-Compute-Move} (LCM) model with \textit{non-rigid} motion controlled by a fair asynchronous scheduler. They are oblivious and silent, possessing neither persistent memory nor a means of explicit communication. In the \textbf{line-segment setting}, the \textit{min-sum uniform coverage} problem requires placing the robots at uniformly spaced points along the segment so as to minimize the total distance traveled by all robots. In the \textbf{circle setting} for this problem, the robots have to arrange themselves uniformly around the given circle to form a regular $n$-gon. There is no fixed orientation or designated starting vertex, and the goal is to minimize the total distance traveled by all the robots. We present a deterministic distributed algorithm that achieves uniform coverage in the line-segment setting with minimum total movement cost. For the circle setting, we characterize all initial configurations for which the \textit{min-sum uniform coverage} problem is deterministically unsolvable under the considered robot model. For all the other remaining configurations, we provide a deterministic distributed algorithm that achieves uniform coverage while minimizing the total distance traveled. These results characterize the deterministic solvability of min-sum coverage for oblivious robots and achieve optimal cost whenever solvable.
Multi-UAV Trajectory Optimization for Bearing-Only Localization in GPS Denied Environments
Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments. UAVs equipped with gimballed electro-optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single-point failures, limiting scalability and robustness in multi-UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non-gimballed cameras operating in coordination with a surface vessel. This estimation-aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out-of-frame events. Estimation-aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non-gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.
comment: 38 pages, 7 figure, and 6 tables
A receding-horizon multi-contact motion planner for legged robots in challenging environments
We present a novel receding-horizon multi-contact motion planner for legged robots in challenging scenarios, able to plan motions such as chimney climbing, navigating very narrow passages or crossing large gaps. Our approach adds new capabilities to the state of the art, including the ability to reactively re-plan in response to new information, and planning contact locations and whole-body trajectories simultaneously, simplifying the implementation and removing the need for post-processing or complex multi-stage approaches. Our method is more resistant to local minima problems than other potential field based approaches, and our quadratic-program-based posture generator returns nodes more quickly than those of existing algorithms. Rigorous statistical analysis shows that, with short planning horizons (e.g., one step ahead), our planner is faster than the state-of-the-art across all scenarios tested (between 45% and 98% faster on average, depending on the scenario), while planning less efficient motions (requiring 5% fewer to 700% more stance changes on average). In all but one scenario (Chimney Walking), longer planning horizons (e.g., four steps ahead) extended the average planning times (between 73% faster and 400% slower than the state-of-the-art) but resulted in higher quality motion plans (between 8% more and 47% fewer stance changes than the state-of-the-art).
comment: Submitted to Robotics and Autonomous Systems For supplementary video, see https://www.youtube.com/watch?v=RJp8DCmhDa4
Digging for Data: Experiments in Rock Pile Characterization Using Only Proprioceptive Sensing in Excavation
Characterization of fragmented rock piles is a fundamental task in the mining and quarrying industries, where rock is fragmented by blasting, transported using wheel loaders, and then sent for further processing. This field report studies a novel method for estimating the relative particle size of fragmented rock piles from only proprioceptive data collected while digging with a wheel loader. Rather than employ exteroceptive sensors (e.g., cameras or LiDAR sensors) to estimate rock particle sizes, the studied method infers rock fragmentation from an excavator's inertial response during excavation. This paper expands on research that postulated the use of wavelet analysis to construct a unique feature that is proportional to the level of rock fragmentation. We demonstrate through extensive field experiments that the ratio of wavelet features, constructed from data obtained by excavating in different rock piles with different size distributions, approximates the ratio of the mean particle size of the two rock piles. Full-scale excavation experiments were performed with a battery electric, 18-tonne capacity, load-haul-dump (LHD) machine in representative conditions in an operating quarry. The relative particle size estimates generated with the proposed sensing methodology are compared with those obtained from both a vision-based fragmentation analysis tool and from sieving of sampled materials.
comment: Accepted for publication in the IEEE Transactions on Field Robotics
RISE: Self-Improving Robot Policy with Compositional World Model
Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviations can compound into failures. While reinforcement learning (RL) offers a principled path to robustness, on-policy RL in the physical world is constrained by safety risk, hardware cost, and environment reset. To bridge this gap, we present RISE, a scalable framework of robotic reinforcement learning via imagination. At its core is a Compositional World Model that (i) predicts multi-view future via a controllable dynamics model, and (ii) evaluates imagined outcomes with a progress value model, producing informative advantages for the policy improvement. Such compositional design allows state and value to be tailored by best-suited yet distinct architectures and objectives. These components are integrated into a closed-loop self-improving pipeline that continuously generates imaginary rollouts, estimates advantages, and updates the policy in imaginary space without costly physical interaction. Across three challenging real-world tasks, RISE yields significant improvement over prior art, with more than +35% absolute performance increase in dynamic brick sorting, +45% for backpack packing, and +35% for box closing, respectively.
comment: Project page: https://opendrivelab.com/kai0-rl/
SQ-CBF: Signed Distance Functions for Numerically Stable Superquadric-Based Safety Filtering
Ensuring safe robot operation in cluttered and dynamic environments remains a fundamental challenge. While control barrier functions provide an effective framework for real-time safety filtering, their performance critically depends on the underlying geometric representation, which is often simplified, leading to either overly conservative behavior or insufficient collision coverage. Superquadrics offer an expressive way to model complex shapes using a few primitives and are increasingly used for robot safety. To integrate this representation into collision avoidance, most existing approaches directly use their implicit functions as barrier candidates. However, we identify a critical but overlooked issue in this practice: the gradients of the implicit SQ function can become severely ill-conditioned, potentially rendering the optimization infeasible and undermining reliable real-time safety filtering. To address this issue, we formulate an SQ-based safety filtering framework that uses signed distance functions as barrier candidates. Since analytical SDFs are unavailable for general SQs, we compute distances using the efficient Gilbert-Johnson-Keerthi algorithm and obtain gradients via randomized smoothing. Extensive simulation and real-world experiments demonstrate consistent collision-free manipulation in cluttered and unstructured scenes, showing robustness to challenging geometries, sensing noise, and dynamic disturbances, while improving task efficiency in teleoperation tasks. These results highlight a pathway toward safety filters that remain precise and reliable under the geometric complexity of real-world environments.
ContactGaussian-WM: Learning Physics-Grounded World Model from Videos
Developing world models that understand complex physical interactions is essential for advancing robotic planning and simulation.However, existing methods often struggle to accurately model the environment under conditions of data scarcity and complex contact-rich dynamic motion.To address these challenges, we propose ContactGaussian-WM, a differentiable physics-grounded rigid-body world model capable of learning intricate physical laws directly from sparse and contact-rich video sequences.Our framework consists of two core components: (1) a unified Gaussian representation for both visual appearance and collision geometry, and (2) an end-to-end differentiable learning framework that differentiates through a closed-form physics engine to infer physical properties from sparse visual observations.Extensive simulations and real-world evaluations demonstrate that ContactGaussian-WM outperforms state-of-the-art methods in learning complex scenarios, exhibiting robust generalization capabilities.Furthermore, we showcase the practical utility of our framework in downstream applications, including data synthesis and real-time MPC.
Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception
Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV's perception pipeline is challenging due to the large gap between the computation requirement and the AV's limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV's surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.
comment: 13 pages, 12 figures
Multi-Task Reinforcement Learning of Drone Aerobatics by Exploiting Geometric Symmetries
Flight control for autonomous micro aerial vehicles (MAVs) is evolving from steady flight near equilibrium points toward more aggressive aerobatic maneuvers, such as flips, rolls, and Power Loop. Although reinforcement learning (RL) has shown great potential in these tasks, conventional RL methods often suffer from low data efficiency and limited generalization. This challenge becomes more pronounced in multi-task scenarios where a single policy is required to master multiple maneuvers. In this paper, we propose a novel end-to-end multi-task reinforcement learning framework, called GEAR (Geometric Equivariant Aerobatics Reinforcement), which fully exploits the inherent SO(2) rotational symmetry in MAV dynamics and explicitly incorporates this property into the policy network architecture. By integrating an equivariant actor network, FiLM-based task modulation, and a multi-head critic, GEAR achieves both efficiency and flexibility in learning diverse aerobatic maneuvers, enabling a data-efficient, robust, and unified framework for aerobatic control. GEAR attains a 98.85\% success rate across various aerobatic tasks, significantly outperforming baseline methods. In real-world experiments, GEAR demonstrates stable execution of multiple maneuvers and the capability to combine basic motion primitives to complete complex aerobatics.
Scaling World Model for Hierarchical Manipulation Policies
Vision-Language-Action (VLA) models are promising for generalist robot manipulation but remain brittle in out-of-distribution (OOD) settings, especially with limited real-robot data. To resolve the generalization bottleneck, we introduce a hierarchical Vision-Language-Action framework \our{} that leverages the generalization of large-scale pre-trained world model for robust and generalizable VIsual Subgoal TAsk decomposition VISTA. Our hierarchical framework \our{} consists of a world model as the high-level planner and a VLA as the low-level executor. The high-level world model first divides manipulation tasks into subtask sequences with goal images, and the low-level policy follows the textual and visual guidance to generate action sequences. Compared to raw textual goal specification, these synthesized goal images provide visually and physically grounded details for low-level policies, making it feasible to generalize across unseen objects and novel scenarios. We validate both visual goal synthesis and our hierarchical VLA policies in massive out-of-distribution scenarios, and the performance of the same-structured VLA in novel scenarios could boost from 14% to 69% with the guidance generated by the world model. Results demonstrate that our method outperforms previous baselines with a clear margin, particularly in out-of-distribution scenarios. Project page: \href{https://vista-wm.github.io/}{https://vista-wm.github.io}
RADAR: Benchmarking Vision-Language-Action Generalization via Real-World Dynamics, Spatial-Physical Intelligence, and Autonomous Evaluation
VLA models have achieved remarkable progress in embodied intelligence; however, their evaluation remains largely confined to simulations or highly constrained real-world settings. This mismatch creates a substantial reality gap, where strong benchmark performance often masks poor generalization in diverse physical environments. We identify three systemic shortcomings in current benchmarking practices that hinder fair and reliable model comparison. (1) Existing benchmarks fail to model real-world dynamics, overlooking critical factors such as dynamic object configurations, robot initial states, lighting changes, and sensor noise. (2) Current protocols neglect spatial--physical intelligence, reducing evaluation to rote manipulation tasks that do not probe geometric reasoning. (3) The field lacks scalable fully autonomous evaluation, instead relying on simplistic 2D metrics that miss 3D spatial structure or on human-in-the-loop systems that are costly, biased, and unscalable. To address these limitations, we introduce RADAR (Real-world Autonomous Dynamics And Reasoning), a benchmark designed to systematically evaluate VLA generalization under realistic conditions. RADAR integrates three core components: (1) a principled suite of physical dynamics; (2) dedicated tasks that explicitly test spatial reasoning and physical understanding; and (3) a fully autonomous evaluation pipeline based on 3D metrics, eliminating the need for human supervision. We apply RADAR to audit multiple state-of-the-art VLA models and uncover severe fragility beneath their apparent competence. Performance drops precipitously under modest physical dynamics, with the expectation of 3D IoU declining from 0.261 to 0.068 under sensor noise. Moreover, models exhibit limited spatial reasoning capability. These findings position RADAR as a necessary bench toward reliable and generalizable real-world evaluation of VLA models.
comment: 12 pages, 11 figures, 3 tables
Lie Group Variational Integrator for the Geometrically Exact Rod with Circular Cross-Section Incorporating Cross-Sectional Deformation
In this paper, we derive the continuous space-time equations of motion of a three-dimensional geometrically exact rod, or the Cosserat rod, incorporating planar cross-sectional deformation. We then adopt the Lie group variational integrator technique to obtain a discrete model of the rod incorporating both rotational motion and cross-sectional deformation as well. The resulting discrete model possesses several desirable features: it ensures volume conservation of the discrete elements by considering cross-sectional deformation through a local dilatation factor, it demonstrates the beneficial properties associated with the variational integrator technique, such as the preservation of the rotational configuration, and energy conservation with a bounded error. An exhaustive set of numerical results under various initial conditions of the rod demonstrates the efficacy of the model in replicating the physics of the system.
comment: Submitted to: Computers and Mathematics with Applications
Stability Analysis of Geometric Control for a Canonical Class of Underactuated Aerial Vehicles with Spurious Forces
Standard geometric control relies on force-moment decoupling, an assumption that breaks down in many aerial platforms due to spurious forces naturally induced by control moments. While strategies for such coupled systems have been validated experimentally, a rigorous theoretical certification of their stability is currently missing. This work fills this gap by providing the first formal stability analysis for a generic class of floating rigid bodies subject to spurious forces. We introduce a canonical model and construct a Lyapunov-based proof establishing local exponential stability of the hovering equilibrium. Crucially, the analysis explicitly addresses the structural challenges - specifically the induced non-minimum-phase behavior - that prevent the application of standard cascade arguments.
Developing Neural Network-Based Gaze Control Systems for Social Robots
During multi-party interactions, gaze direction is a key indicator of interest and intent, making it essential for social robots to direct their attention appropriately. Understanding the social context is crucial for robots to engage effectively, predict human intentions, and navigate interactions smoothly. This study aims to develop an empirical motion-time pattern for human gaze behavior in various social situations (e.g., entering, leaving, waving, talking, and pointing) using deep neural networks based on participants' data. We created two video clips-one for a computer screen and another for a virtual reality headset-depicting different social scenarios. Data were collected from 30 participants: 15 using an eye-tracker and 15 using an Oculus Quest 1 headset. Deep learning models, specifically Long Short-Term Memory (LSTM) and Transformers, were used to analyze and predict gaze patterns. Our models achieved 60% accuracy in predicting gaze direction in a 2D animation and 65% accuracy in a 3D animation. Then, the best model was implemented onto the Nao robot; and 36 new participants evaluated its performance. The feedback indicated overall satisfaction, with those experienced in robotics rating the models more favorably.
Towards Learning a Generalizable 3D Scene Representation from 2D Observations
We introduce a Generalizable Neural Radiance Field approach for predicting 3D workspace occupancy from egocentric robot observations. Unlike prior methods operating in camera-centric coordinates, our model constructs occupancy representations in a global workspace frame, making it directly applicable to robotic manipulation. The model integrates flexible source views and generalizes to unseen object arrangements without scene-specific finetuning. We demonstrate the approach on a humanoid robot and evaluate predicted geometry against 3D sensor ground truth. Trained on 40 real scenes, our model achieves 26mm reconstruction error, including occluded regions, validating its ability to infer complete 3D occupancy beyond traditional stereo vision methods.
comment: Paper accepted at ESANN 2026
Design, Development, and Use of Maya Robot as an Assistant for the Therapy/Education of Children with Cancer: a Pilot Study
This study centers around the design and implementation of the Maya Robot, a portable elephant-shaped social robot, intended to engage with children undergoing cancer treatment. Initial efforts were devoted to enhancing the robot's facial expression recognition accuracy, achieving a 98% accuracy through deep neural networks. Two subsequent preliminary exploratory experiments were designed to advance the study's objectives. The first experiment aimed to compare pain levels experienced by children during the injection process, with and without the presence of the Maya robot. Twenty-five children, aged 4 to 9, undergoing cancer treatment participated in this counterbalanced study. The paired T-test results revealed a significant reduction in perceived pain when the robot was actively present in the injection room. The second experiment sought to assess perspectives of hospitalized children and their mothers during engagement with Maya through a game. Forty participants, including 20 children aged 4 to 9 and their mothers, were involved. Post Human-Maya Interactions, UTAUT questionnaire results indicated that children experienced significantly less anxiety than their parents during the interaction and game play. Notably, children exhibited higher trust levels in both the robot and the games, presenting a statistically significant difference in trust levels compared to their parents (P-value < 0.05). This preliminary exploratory study highlights the positive impact of utilizing Maya as an assistant for therapy/education in a clinical setting, particularly benefiting children undergoing cancer treatment. The findings underscore the potential of social robots in pediatric healthcare contexts, emphasizing improved pain management and emotional well-being among young patients.
Safe mobility support system using crowd mapping and avoidance route planning using VLM
Autonomous mobile robots offer promising solutions for labor shortages and increased operational efficiency. However, navigating safely and effectively in dynamic environments, particularly crowded areas, remains challenging. This paper proposes a novel framework that integrates Vision-Language Models (VLM) and Gaussian Process Regression (GPR) to generate dynamic crowd-density maps (``Abstraction Maps'') for autonomous robot navigation. Our approach utilizes VLM's capability to recognize abstract environmental concepts, such as crowd densities, and represents them probabilistically via GPR. Experimental results from real-world trials on a university campus demonstrated that robots successfully generated routes avoiding both static obstacles and dynamic crowds, enhancing navigation safety and adaptability.
Biomimetic Mantaray robot toward the underwater autonomous -- Experimental verification of swimming and diving by flapping motion -
This study presents the development and experimental verification of a biomimetic manta ray robot for underwater autonomous exploration. Inspired by manta rays, the robot uses flapping motion for propulsion to minimize seabed disturbance and enhance efficiency compared to traditional screw propulsion. The robot features pectoral fins driven by servo motors and a streamlined control box to reduce fluid resistance. The control system, powered by a Raspberry Pi 3B, includes an IMU and pressure sensor for real-time monitoring and control. Experiments in a pool assessed the robot's swimming and diving capabilities. Results show stable swimming and diving motions with PD control. The robot is suitable for applications in environments like aquariums and fish nurseries, requiring minimal disturbance and efficient maneuverability. Our findings demonstrate the potential of bio-inspired robotic designs to improve ecological monitoring and underwater exploration.
Semi-Supervised Cross-Domain Imitation Learning
Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data, but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision. Our code is available at~ https://github.com/NYCU-RL-Bandits-Lab/CDIL.
comment: Published in Transactions on Machine Learning Research (TMLR)
From Steering to Pedalling: Do Autonomous Driving VLMs Generalize to Cyclist-Assistive Spatial Perception and Planning?
Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.
comment: Preprint
From Representational Complementarity to Dual Systems: Synergizing VLM and Vision-Only Backbones for End-to-End Driving
Vision-Language-Action (VLA) driving augments end-to-end (E2E) planning with language-enabled backbones, yet it remains unclear what changes beyond the usual accuracy--cost trade-off. We revisit this question with 3--RQ analysis in RecogDrive by instantiating the system with a full VLM and vision-only backbones, all under an identical diffusion Transformer planner. RQ1: At the backbone level, the VLM can introduce additional subspaces upon the vision-only backbones. RQ2: This unique subspace leads to a different behavioral in some long-tail scenario: the VLM tends to be more aggressive whereas ViT is more conservative, and each decisively wins on about 2--3% of test scenarios; With an oracle that selects, per scenario, the better trajectory between the VLM and ViT branches, we obtain an upper bound of 93.58 PDMS. RQ3: To fully harness this observation, we propose HybridDriveVLA, which runs both ViT and VLM branches and selects between their endpoint trajectories using a learned scorer, improving PDMS to 92.10. Finally, DualDriveVLA implements a practical fast--slow policy: it runs ViT by default and invokes the VLM only when the scorer's confidence falls below a threshold; calling the VLM on 15% of scenarios achieves 91.00 PDMS while improving throughput by 3.2x. Code will be released.
comment: 22 pages (10 pages main text + 12 pages appendix), 18 figures
Say, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation
Robotic manipulation requires anticipating how the environment evolves in response to actions, yet most existing systems lack this predictive capability, often resulting in errors and inefficiency. While Vision-Language Models (VLMs) provide high-level guidance, they cannot explicitly forecast future states, and existing world models either predict only short horizons or produce spatially inconsistent frames. To address these challenges, we propose a framework for fast and predictive video-conditioned action. Our approach first selects and adapts a robust video generation model to ensure reliable future predictions, then applies adversarial distillation for fast, few-step video generation, and finally trains an action model that leverages both generated videos and real observations to correct spatial errors. Extensive experiments show that our method produces temporally coherent, spatially accurate video predictions that directly support precise manipulation, achieving significant improvements in embodiment consistency, spatial referring ability, and task completion over existing baselines. Codes & Models will be released.
(MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization
Cross-view geo-localization (CVGL) is pivotal for GNSS-denied UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satellite references. Existing methods predominantly operate within a 2D manifold, neglecting the underlying 3D geometry where view-dependent vertical facades (macro-structure) and scale variations (micro-scale) severely corrupt feature alignment. To bridge this gap, we propose (MGS)$^2$, a geometry-grounded framework. The core of our innovation is the Macro-Geometric Structure Filtering (MGSF) module. Unlike pixel-wise matching sensitive to noise, MGSF leverages dilated geometric gradients to physically filter out high-frequency facade artifacts while enhancing the view-invariant horizontal plane, directly addressing the domain shift. To guarantee robust input for this structural filtering, we explicitly incorporate a Micro-Geometric Scale Adaptation (MGSA) module. MGSA utilizes depth priors to dynamically rectify scale discrepancies via multi-branch feature fusion. Furthermore, a Geometric-Appearance Contrastive Distillation (GACD) loss is designed to strictly discriminate against oblique occlusions. Extensive experiments demonstrate that (MGS)$^2$ achieves state-of-the-art performance, recording a Recall@1 of 97.5\% on University-1652 and 97.02\% on SUES-200. Furthermore, the framework exhibits superior cross-dataset generalization against geometric ambiguity. The code is available at: \href{https://github.com/GabrielLi1473/MGS-Net}{https://github.com/GabrielLi1473/MGS-Net}.
Omnidirectional Dual-Arm Aerial Manipulator with Proprioceptive Contact Localization for Landing on Slanted Roofs ICRA
Operating drones in urban environments often means they need to land on rooftops, which can have different geometries and surface irregularities. Accurately detecting roof inclination using conventional sensing methods, such as vision-based or acoustic techniques, can be unreliable, as measurement quality is strongly influenced by external factors including weather conditions and surface materials. To overcome these challenges, we propose a novel unmanned aerial manipulator morphology featuring a dual-arm aerial manipulator with an omnidirectional 3D workspace and extended reach. Building on this design, we develop a proprioceptive contact detection and contact localization strategy based on a momentum-based torque observer. This enables the UAM to infer the inclination of slanted surfaces blindly - through physical interaction - prior to touchdown. We validate the approach in flight experiments, demonstrating robust landings on surfaces with inclinations of up to 30.5 degrees and achieving an average surface inclination estimation error of 2.87 degrees over 9 experiments at different incline angles.
comment: Accepted into 2026 International Conference on Robotics and Automation (ICRA) in Vienna
A Unified Experimental Architecture for Informative Path Planning: from Simulation to Deployment with GuadalPlanner
The evaluation of informative path planning algorithms for autonomous vehicles is often hindered by fragmented execution pipelines and limited transferability between simulation and real-world deployment. This paper introduces a unified architecture that decouples high-level decision-making from vehicle-specific control, enabling algorithms to be evaluated consistently across different abstraction levels without modification. The proposed architecture is realized through GuadalPlanner, which defines standardized interfaces between planning, sensing, and vehicle execution. It is an open and extensible research tool that supports discrete graph-based environments and interchangeable planning strategies, and is built upon widely adopted robotics technologies, including ROS2, MAVLink, and MQTT. Its design allows the same algorithmic logic to be deployed in fully simulated environments, software-in-the-loop configurations, and physical autonomous vehicles using an identical execution pipeline. The approach is validated through a set of experiments, including real-world deployment on an autonomous surface vehicle performing water quality monitoring with real-time sensor feedback.
3D-Printed Anisotropic Soft Magnetic Coating for Directional Rolling of a Magnetically Actuated Capsule Robot
Capsule robots are promising tools for minimally invasive diagnostics and therapy, with applications from gastrointestinal endoscopy to targeted drug delivery and biopsy sampling. Conventional magnetic capsule robots embed bulky permanent magnets at both ends, reducing the usable cavity by about 10-20 mm and limiting integration of functional modules. We propose a compact, 3D-printed soft capsule robot with a magnetic coating that replaces internal magnets, enabling locomotion via a thin, functional shell while preserving the entire interior cavity as a continuous volume for medical payloads. The compliant silicone-magnetic composite also improves swallowability, even with a slightly larger capsule size. Magnetostatic simulations and experiments confirm that programmed NSSN/SNNS pole distributions provide strong anisotropy and reliable torque generation, enabling stable bidirectional rolling, omnidirectional steering, climbing on 7.5 degree inclines, and traversal of 5 mm protrusions. Rolling motion is sustained when the magnetic field at the capsule reaches at least 0.3 mT, corresponding to an effective actuation depth of 30 mm in our setup. Future work will optimize material composition, coating thickness, and magnetic layouts to enhance force output and durability, while next-generation robotic-arm-based field generators with closed-loop feedback will address nonlinearities and expand maneuverability. Together, these advances aim to transition coating-based capsule robots toward reliable clinical deployment and broaden their applications in minimally invasive diagnostics and therapy.
comment: Submitted for review at IEEE/ASME Advanced Intelligenet Mechatronics Conference (2026)
Free-Flying Crew Cooperative Robots on the ISS: A Joint Review of Astrobee, CIMON, and Int-Ball Operations
Intra-vehicular free-flying robots are anticipated to support various work in human spaceflight while working side-by-side with astronauts. Such example of robots includes NASA's Astrobee, DLR's CIMON, and JAXA's Int-Ball, which are deployed on the International Space Station. This paper presents the first joint analyses of these robot's shared experiences, co-authored by their development and operation team members. Despite the different origins and design philosophies, the development and operations of these platforms encountered various convergences. Hence, this paper presents a detailed overview of these robots, presenting their objectives, design, and onboard operations. Hence, joint lessons learned across the lifecycle are presented, from design to on-orbit operations. These lessons learned are anticipated to serve for future development and research as design recommendations.
comment: Author's version of a manuscript accepted at the 2025 International Conference on Space Robotics (iSpaRo25). (c)IEEE
Assessing Vision-Language Models for Perception in Autonomous Underwater Robotic Software
Autonomous Underwater Robots (AURs) operate in challenging underwater environments, including low visibility and harsh water conditions. Such conditions present challenges for software engineers developing perception modules for the AUR software. To successfully carry out these tasks, deep learning has been incorporated into the AUR software to support its operations. However, the unique challenges of underwater environments pose difficulties for deep learning models, which often rely on labeled data that is scarce and noisy. This may undermine the trustworthiness of AUR software that relies on perception modules. Vision-Language Models (VLMs) offer promising solutions for AUR software as they generalize to unseen objects and remain robust in noisy conditions by inferring information from contextual cues. Despite this potential, their performance and uncertainty in underwater environments remain understudied from a software engineering perspective. Motivated by the needs of an industrial partner in assurance and risk management for maritime systems to assess the potential use of VLMs in this context, we present an empirical evaluation of VLM-based perception modules within the AUR software. We assess their ability to detect underwater trash by computing performance, uncertainty, and their relationship, to enable software engineers to select appropriate VLMs for their AUR software.
comment: 10 pages, 5 figures, submitted to ICST 2026
Pitch Angle Control of a Magnetically Actuated Capsule Robot with Nonlinear FEA-based MPC and EKF Multisensory Fusion
Magnetically actuated capsule robots promise minimally invasive diagnosis and therapy in the gastrointestinal (GI) tract, but existing systems largely neglect control of capsule pitch, a degree of freedom critical for contact-rich interaction with inclined gastric walls. This paper presents a nonlinear, model-based framework for magnetic pitch control of an ingestible capsule robot actuated by a four-coil electromagnetic array. Angle-dependent magnetic forces and torques acting on embedded permanent magnets are characterized using three-dimensional finite-element simulations and embedded as lookup tables in a control-oriented rigid-body pitching model with rolling contact and actuator dynamics. A constrained model predictive controller (MPC) is designed to regulate pitch while respecting hardware-imposed current and slew-rate limits. Experiments on a compliant stomach-inspired surface demonstrate robust pitch reorientation from both horizontal and upright configurations, achieving about three to five times faster settling and reduced oscillatory motion than on-off control. Furthermore, an extended Kalman filter (EKF) fusing inertial sensing with intermittent visual measurements enables stable closed-loop control when the camera update rate is reduced from 30 Hz to 1 Hz, emulating clinically realistic imaging constraints. These results establish finite-element-informed MPC with sensor fusion as a scalable strategy for pitch regulation, controlled docking, and future multi-degree-of-freedom capsule locomotion.
comment: This version is submitted for review at IEEE/ASME Transactions on Mechatronics
Flow-Enabled Generalization to Human Demonstrations in Few-Shot Imitation Learning ICRA 2026
Imitation Learning (IL) enables robots to learn complex skills from demonstrations without explicit task modeling, but it typically requires large amounts of demonstrations, creating significant collection costs. Prior work has investigated using flow as an intermediate representation to enable the use of human videos as a substitute, thereby reducing the amount of required robot demonstrations. However, most prior work has focused on the flow, either on the object or on specific points of the robot/hand, which cannot describe the motion of interaction. Meanwhile, relying on flow to achieve generalization to scenarios observed only in human videos remains limited, as flow alone cannot capture precise motion details. Furthermore, conditioning on scene observation to produce precise actions may cause the flow-conditioned policy to overfit to training tasks and weaken the generalization indicated by the flow. To address these gaps, we propose SFCrP, which includes a Scene Flow prediction model for Cross-embodiment learning (SFCr) and a Flow and Cropped point cloud conditioned Policy (FCrP). SFCr learns from both robot and human videos and predicts any point trajectories. FCrP follows the general flow motion and adjusts the action based on observations for precision tasks. Our method outperforms SOTA baselines across various real-world task settings, while also exhibiting strong spatial and instance generalization to scenarios seen only in human videos.
comment: Accepted to ICRA 2026
Morphogenetic Assembly and Adaptive Control for Heterogeneous Modular Robots ICRA 2026
This paper presents a closed-loop automation framework for heterogeneous modular robots, covering the full pipeline from morphological construction to adaptive control. In this framework, a mobile manipulator handles heterogeneous functional modules including structural, joint, and wheeled modules to dynamically assemble diverse robot configurations and provide them with immediate locomotion capability. To address the state-space explosion in large-scale heterogeneous reconfiguration, we propose a hierarchical planner: the high-level planner uses a bidirectional heuristic search with type-penalty terms to generate module-handling sequences, while the low level planner employs A* search to compute optimal execution trajectories. This design effectively decouples discrete configuration planning from continuous motion execution. For adaptive motion generation of unknown assembled configurations, we introduce a GPU accelerated Annealing-Variance Model Predictive Path Integral (MPPI) controller. By incorporating a multi stage variance annealing strategy to balance global exploration and local convergence, the controller enables configuration-agnostic, real-time motion control. Large scale simulations show that the type-penalty term is critical for planning robustness in heterogeneous scenarios. Moreover, the greedy heuristic produces plans with lower physical execution costs than the Hungarian heuristic. The proposed annealing-variance MPPI significantly outperforms standard MPPI in both velocity tracking accuracy and control frequency, achieving real time control at 50 Hz. The framework validates the full-cycle process, including module assembly, robot merging and splitting, and dynamic motion generation.
comment: Accepted by ICRA 2026
LAP: Language-Action Pre-Training Enables Zero-shot Cross-Embodiment Transfer
A long-standing goal in robotics is a generalist policy that can be deployed zero-shot on new robot embodiments without per-embodiment adaptation. Despite large-scale multi-embodiment pre-training, existing Vision-Language-Action models (VLAs) remain tightly coupled to their training embodiments and typically require costly fine-tuning. We introduce Language-Action Pre-training (LAP), a simple recipe that represents low-level robot actions directly in natural language, aligning action supervision with the pre-trained vision-language model's input-output distribution. LAP requires no learned tokenizer, no costly annotation, and no embodiment-specific architectural design. Based on LAP, we present LAP-3B, which to the best of our knowledge is the first VLA to achieve substantial zero-shot transfer to previously unseen robot embodiments without any embodiment-specific fine-tuning. Across multiple novel robots and manipulation tasks, LAP-3B attains over 50% average zero-shot success, delivering roughly a 2x improvement over the strongest prior VLAs. We further show that LAP enables efficient adaptation and favorable scaling, while unifying action prediction and VQA in a shared language-action format that yields additional gains through co-training.
comment: Project website: https://lap-vla.github.io
An Ontology-driven Dynamic Knowledge Base for Uninhabited Ground Vehicles
In this paper, the concept of Dynamic Contextual Mission Data (DCMD) is introduced to develop an ontology-driven dynamic knowledge base for Uninhabited Ground Vehicles (UGVs) at the tactical edge. The dynamic knowledge base with DCMD is added to the UGVs to: support enhanced situation awareness; improve autonomous decision making; and facilitate agility within complex and dynamic environments. As UGVs are heavily reliant on the a priori information added pre-mission, unexpected occurrences during a mission can cause identification ambiguities and require increased levels of user input. Updating this a priori information with contextual information can help UGVs realise their full potential. To address this, the dynamic knowledge base was designed using an ontology-driven representation, supported by near real-time information acquisition and analysis, to provide in-mission on-platform DCMD updates. This was implemented on a team of four UGVs that executed a laboratory based surveillance mission. The results showed that the ontology-driven dynamic representation of the UGV operational environment was machine actionable, producing contextual information to support a successful and timely mission, and contributed directly to the situation awareness.
comment: 10 pages, 11 figures, 2025 Australasian Conference on Robotics and Automation (ACRA 2025)
ReSPEC: A Framework for Online Multispectral Sensor Reconfiguration in Dynamic Environments
Multi-sensor fusion is central to robust robotic perception, yet most existing systems operate under static sensor configurations, collecting all modalities at fixed rates and fidelity regardless of their situational utility. This rigidity wastes bandwidth, computation, and energy, and prevents systems from prioritizing sensors under challenging conditions such as poor lighting or occlusion. Recent advances in reinforcement learning (RL) and modality-aware fusion suggest the potential for adaptive perception, but prior efforts have largely focused on re-weighting features at inference time, ignoring the physical cost of sensor data collection. We introduce a framework that unifies sensing, learning, and actuation into a closed reconfiguration loop. A task-specific detection backbone extracts multispectral features (e.g. RGB, IR, mmWave, depth) and produces quantitative contribution scores for each modality. These scores are passed to an RL agent, which dynamically adjusts sensor configurations, including sampling frequency, resolution, sensing range, and etc., in real time. Less informative sensors are down-sampled or deactivated, while critical sensors are sampled at higher fidelity as environmental conditions evolve. We implement and evaluate this framework on a mobile rover, showing that adaptive control reduces GPU load by 29.3\% with only a 5.3\% accuracy drop compared to a heuristic baseline. These results highlight the potential of resource-aware adaptive sensing for embedded robotic platforms.
comment: 8 pages, 4 figures. This work has been submitted to the IEEE for possible publication
Co-jump: Cooperative Jumping with Quadrupedal Robots via Multi-Agent Reinforcement Learning
While single-agent legged locomotion has witnessed remarkable progress, individual robots remain fundamentally constrained by physical actuation limits. To transcend these boundaries, we introduce Co-jump, a cooperative task where two quadrupedal robots synchronize to execute jumps far beyond their solo capabilities. We tackle the high-impulse contact dynamics of this task under a decentralized setting, achieving synchronization without explicit communication or pre-specified motion primitives. Our framework leverages Multi-Agent Proximal Policy Optimization (MAPPO) enhanced by a progressive curriculum strategy, which effectively overcomes the sparse-reward exploration challenges inherent in mechanically coupled systems. We demonstrate robust performance in simulation and successful transfer to physical hardware, executing multi-directional jumps onto platforms up to 1.5 m in height. Specifically, one of the robots achieves a foot-end elevation of 1.1 m, which represents a 144% improvement over the 0.45 m jump height of a standalone quadrupedal robot, demonstrating superior vertical performance. Notably, this precise coordination is achieved solely through proprioceptive feedback, establishing a foundation for communication-free collaborative locomotion in constrained environments.
comment: 14 pages, 7 figures
Towards Long-Lived Robots: Continual Learning VLA Models via Reinforcement Fine-Tuning
Pretrained on large-scale and diverse datasets, VLA models demonstrate strong generalization and adaptability as general-purpose robotic policies. However, Supervised Fine-Tuning (SFT), which serves as the primary mechanism for adapting VLAs to downstream domains, requires substantial amounts of task-specific data and is prone to catastrophic forgetting. To address these limitations, we propose LifeLong-RFT, a simple yet effective Reinforcement Fine-Tuning (RFT) strategy for VLA models independent of online environmental feedback and pre-trained reward models. By integrating chunking-level on-policy reinforcement learning with the proposed Multi-Dimensional Process Reward (MDPR) mechanism, LifeLong-RFT quantifies the heterogeneous contributions of intermediate action chunks across three dimensions to facilitate policy optimization. Specifically, (1) the Quantized Action Consistency Reward (QACR) ensures accurate action prediction within the discrete action space; (2) the Continuous Trajectory Alignment Reward (CTAR) aligns decoded continuous action chunks with reference trajectories to ensure precise control; (3) the Format Compliance Reward (FCR) guarantees the structural validity of outputs. Comprehensive experiments across SimplerEnv, LIBERO, and real-world tasks demonstrate that LifeLong-RFT exhibits strong performance in multi-task learning. Furthermore, for continual learning on the LIBERO benchmark, our method achieves a 22% gain in average success rate over SFT, while effectively adapting to new tasks using only 20% of the training data. Overall, our method provides a promising post-training paradigm for VLAs.
End-to-End LiDAR optimization for 3D point cloud registration BMVC
LiDAR sensors are a key modality for 3D perception, yet they are typically designed independently of downstream tasks such as point cloud registration. Conventional registration operates on pre-acquired datasets with fixed LiDAR configurations, leading to suboptimal data collection and significant computational overhead for sampling, noise filtering, and parameter tuning. In this work, we propose an adaptive LiDAR sensing framework that dynamically adjusts sensor parameters, jointly optimizing LiDAR acquisition and registration hyperparameters. By integrating registration feedback into the sensing loop, our approach optimally balances point density, noise, and sparsity, improving registration accuracy and efficiency. Evaluations in the CARLA simulation demonstrate that our method outperforms fixed-parameter baselines while retaining generalization abilities, highlighting the potential of adaptive LiDAR for autonomous perception and robotic applications.
comment: 36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025. Project page: https://lvsn.github.io/e2e-lidar-registration/
RadarEye: Robust Liquid Level Tracking Using mmWave Radar in Robotic Pouring ICASSP 2026
Transparent liquid manipulation in robotic pouring remains challenging for perception systems: specular/refraction effects and lighting variability degrade visual cues, undermining reliable level estimation. To address this challenge, we introduce RadarEye, a real-time mmWave radar signal processing pipeline for robust liquid level estimation and tracking during the whole pouring process. RadarEye integrates (i) a high-resolution range-angle beamforming module for liquid level sensing and (ii) a physics-informed mid-pour tracker that suppresses multipath to maintain lock on the liquid surface despite stream-induced clutter and source container reflections. The pipeline delivers sub-millisecond latency. In real-robot water-pouring experiments, RadarEye achieves a 0.35 cm median absolute height error at 0.62 ms per update, substantially outperforming vision and ultrasound baselines.
comment: To appear in IEEE ICASSP 2026
LocoVLM: Grounding Vision and Language for Adapting Versatile Legged Locomotion Policies
Recent advances in legged locomotion learning are still dominated by the utilization of geometric representations of the environment, limiting the robot's capability to respond to higher-level semantics such as human instructions. To address this limitation, we propose a novel approach that integrates high-level commonsense reasoning from foundation models into the process of legged locomotion adaptation. Specifically, our method utilizes a pre-trained large language model to synthesize an instruction-grounded skill database tailored for legged robots. A pre-trained vision-language model is employed to extract high-level environmental semantics and ground them within the skill database, enabling real-time skill advisories for the robot. To facilitate versatile skill control, we train a style-conditioned policy capable of generating diverse and robust locomotion skills with high fidelity to specified styles. To the best of our knowledge, this is the first work to demonstrate real-time adaptation of legged locomotion using high-level reasoning from environmental semantics and instructions with instruction-following accuracy of up to 87% without the need for online query to on-the-cloud foundation models.
comment: Project page: https://locovlm.github.io
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to enhance safety. Based on these constraints, the CMPC constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
Proactive Local-Minima-Free Robot Navigation: Blending Motion Prediction with Safe Control
This work addresses the challenge of safe and efficient mobile robot navigation in complex dynamic environments with concave moving obstacles. Reactive safe controllers like Control Barrier Functions (CBFs) design obstacle avoidance strategies based only on the current states of the obstacles, risking future collisions. To alleviate this problem, we use Gaussian processes to learn barrier functions online from multimodal motion predictions of obstacles generated by neural networks trained with energy-based learning. The learned barrier functions are then fed into quadratic programs using modulated CBFs (MCBFs), a local-minimum-free version of CBFs, to achieve safe and efficient navigation. The proposed framework makes two key contributions. First, it develops a prediction-to-barrier function online learning pipeline. Second, it introduces an autonomous parameter tuning algorithm that adapts MCBFs to deforming, prediction-based barrier functions. The framework is evaluated in both simulations and real-world experiments, consistently outperforming baselines and demonstrating superior safety and efficiency in crowded dynamic environments.
comment: Co-first authors: Yifan Xue and Ze Zhang; Accepted by IEEE RA-L 2026
MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation
Generalist humanoid motion trackers have recently achieved strong simulation metrics by scaling data and training, yet often remain brittle on hardware during sustained teleoperation due to interface- and dynamics-induced errors. We present MOSAIC, an open-source, full-stack system for humanoid motion tracking and whole-body teleoperation across multiple interfaces. MOSAIC first learns a teleoperation-oriented general motion tracker via RL on a multi-source motion bank with adaptive resampling and rewards that emphasize world-frame motion consistency, which is critical for mobile teleoperation. To bridge the sim-to-real interface gap without sacrificing generality, MOSAIC then performs rapid residual adaptation: an interface-specific policy is trained using minimal interface-specific data, and then distilled into the general tracker through an additive residual module, outperforming naive fine-tuning or continual learning. We validate MOSAIC with systematic ablations, out-of-distribution benchmarking, and real-robot experiments demonstrating robust offline motion replay and online long-horizon teleoperation under realistic latency and noise. Project page: baai-humanoid.github.io/MOSAIC.
comment: add project page: training codes and data are open sourced
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: Accepted for publication in the proceedings of The International Association for Safe & Ethical AI (IASEAI) 2026; 17 pages, 3 figures
PA-MPPI: Perception-Aware Model Predictive Path Integral Control for Quadrotor Navigation in Unknown Environments
Quadrotor navigation in unknown environments is critical for practical missions such as search-and-rescue. Solving this problem requires addressing three key challenges: path planning in non-convex free space due to obstacles, satisfying quadrotor-specific dynamics and objectives, and exploring unknown regions to expand the map. Recently, the Model Predictive Path Integral (MPPI) method has emerged as a promising solution to the first two challenges. By leveraging sampling-based optimization, it can effectively handle non-convex free space while directly optimizing over the full quadrotor dynamics, enabling the inclusion of quadrotor-specific costs such as energy consumption. However, MPPI has been limited to tracking control that optimizes trajectories only within a small neighborhood around a reference trajectory, as it lacks the ability to explore unknown regions and plan alternative paths when blocked by large obstacles. To address this limitation, we introduce Perception-Aware MPPI (PA-MPPI). In this approach, perception-awareness is characterized by planning and adapting the trajectory online based on perception objectives. Specifically, when the goal is occluded, PA-MPPI incorporates a perception cost that biases trajectories toward those that can observe unknown regions. This expands the mapped traversable space and increases the likelihood of finding alternative paths to the goal. Through hardware experiments, we demonstrate that PA-MPPI, running at 50 Hz, performs on par with the state-of-the-art quadrotor navigation planner for unknown environments in challenging test scenarios. Furthermore, we show that PA-MPPI can serve as a safe and robust action policy for navigation foundation models, which often provide goal poses that are not directly reachable.
HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba
End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.
comment: 12 pages
Towards Safe Path Tracking Using the Simplex Architecture
Robot navigation in complex environments necessitates controllers that prioritize safety while remaining performant and adaptable. Traditional controllers like Regulated Pure Pursuit, Dynamic Window Approach, and Model-Predictive Path Integral, while reliable, struggle to adapt to dynamic conditions. Reinforcement Learning offers adaptability but state-wise safety guarantees remain challenging and often absent in practice. To address this, we propose a path tracking controller leveraging the Simplex architecture. It combines a Reinforcement Learning controller for adaptiveness and performance with a high-assurance controller providing safety and stability. Our main goal is to provide a safe testbed for the design and evaluation of path-planning algorithms, including machine-learning-based planners. Our contribution is twofold. We firstly discuss general stability and safety considerations for designing controllers using the Simplex architecture. Secondly, we present a Simplex-based path tracking controller. Our simulation results, supported by preliminary in-field tests, demonstrate the controller's effectiveness in maintaining safety while achieving comparable performance to state-of-the-art methods.
ZebraPose: Zebra Detection and Pose Estimation using only Synthetic Data WACV 2026
Collecting and labeling large real-world wild animal datasets is impractical, costly, error-prone, and labor-intensive. For animal monitoring tasks, as detection, tracking, and pose estimation, out-of-distribution viewpoints (e.g. aerial) are also typically needed but rarely found in publicly available datasets. To solve this, existing approaches synthesize data with simplistic techniques that then necessitate strategies to bridge the synthetic-to-real gap. Therefore, real images, style constraints, complex animal models, or pre-trained networks are often leveraged. In contrast, we generate a fully synthetic dataset using a 3D photorealistic simulator and demonstrate that it can eliminate such needs for detecting and estimating 2D poses of wild zebras. Moreover, existing top-down 2D pose estimation approaches using synthetic data assume reliable detection models. However, these often fail in out-of-distribution scenarios, e.g. those that include wildlife or aerial imagery. Our method overcomes this by enabling the training of both tasks using the same synthetic dataset. Through extensive benchmarks, we show that models trained from scratch exclusively on our synthetic data generalize well to real images. We perform these using multiple real-world and synthetic datasets, pre-trained and randomly initialized backbones, and different image resolutions. Code, results, models, and data can be found athttps://zebrapose.is.tue.mpg.de/.
comment: 17 pages, 5 tables, 13 figures. Published in WACV 2026
Discrete Variational Autoencoding via Policy Search
Discrete latent bottlenecks in variational autoencoders (VAEs) offer high bit efficiency and can be modeled with autoregressive discrete distributions, enabling parameter-efficient multimodal search with transformers. However, discrete random variables do not allow for exact differentiable parameterization; therefore, discrete VAEs typically rely on approximations, such as Gumbel-Softmax reparameterization or straight-through gradient estimates, or employ high-variance gradient-free methods such as REINFORCE that have had limited success on high-dimensional tasks such as image reconstruction. Inspired by popular techniques in policy search, we propose a training framework for discrete VAEs that leverages the natural gradient of a non-parametric encoder to update the parametric encoder without requiring reparameterization. Our method, combined with automatic step size adaptation and a transformer-based encoder, scales to challenging datasets such as ImageNet and outperforms both approximate reparameterization methods and quantization-based discrete autoencoders in reconstructing high-dimensional data from compact latent spaces.
Crane Lowering Guidance Using a Attachable Camera Module for Driver Vision Support
Cranes have long been essential equipment for lifting and placing heavy loads in construction projects. This study focuses on the lowering phase of crane operation, the stage in which the load is moved to the desired location. During this phase, a constant challenge exists: the load obstructs the operator's view of the landing point. As a result, operators traditionally have to rely on verbal or gestural instructions from ground personnel, which significantly impacts site safety. To alleviate this constraint, the proposed system incorporates a attachable camera module designed to be attached directly to the load via a suction cup. This module houses a single-board computer, battery, and compact camera. After installation, it streams and processes images of the ground directly below the load in real time to generate installation guidance. Simultaneously, this guidance is transmitted to and monitored by a host computer. Preliminary experiments were conducted by attaching this module to a test object, confirming the feasibility of real-time image acquisition and transmission. This approach has the potential to significantly improve safety on construction sites by providing crane operators with an instant visual reference of hidden landing zones.
comment: Published in the Proceedings of ICCR 2025 (IEEE)
Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
Accurate and efficient modeling of soft-tissue interactions is fundamental for advancing surgical simulation, surgical robotics, and model-based surgical automation. To achieve real-time latency, classical Finite Element Method (FEM) solvers are often replaced with neural approximations; however, naively training such models in a fully data-driven manner without incorporating physical priors frequently leads to poor generalization and physically implausible predictions. We present a novel physics-informed neural simulation framework that enables real-time prediction of soft-tissue deformations under complex single- and multi-grasper interactions. Our approach integrates Kelvinlet-based analytical priors with large-scale FEM data, capturing both linear and nonlinear tissue responses. This hybrid design improves predictive accuracy and physical plausibility across diverse neural architectures while maintaining the low-latency performance required for interactive applications. We validate our method on challenging surgical manipulation tasks involving standard laparoscopic grasping tools, demonstrating substantial improvements in deformation fidelity and temporal stability over existing baselines. These results establish Kelvinlet-augmented learning as a principled and computationally efficient paradigm for real-time, physics-aware soft-tissue simulation in surgical AI.
First Multi-Constellation Observations of Navigation Satellite Signals in the Lunar Domain by Post-Processing L1/L5 IQ Snapshots
The use of Global Navigation Satellite Systems (GNSS) to increase spacecraft autonomy for orbit determination has gained renewed momentum following the Lunar GNSS Receiver Experiment (LuGRE), which demonstrated feasible onboard GPS and Galileo signal reception and tracking at lunar distances. This work processes in-phase and quadrature (IQ) snapshots collected by the LuGRE receiver in cis-lunar space and on the lunar surface to assess multi-frequency, multi-constellation signal availability. Signals from additional systems beyond GPS and Galileo, including RNSS and SBAS constellations, are observable and successfully acquired exclusively in the recorded IQ snapshots. These observations provide the first experimental evidence that signals from multiple constellations, including systems not supported by LuGRE realtime operations, are detectable at unprecedented distances from Earth. Useful observables can be extracted from the IQ snapshots, despite minimal sampling rates, 4-bit quantization, and short durations (200 ms-2 s), through a hybrid coherent/non-coherent acquisition stage compensating for code Doppler. These observations are exploited to tune simulation tools and to perform extended simulation campaigns, showing that the inclusion of additional constellations significantly improves availability; for a 26 dB-Hz acquisition threshold, the fraction of epochs with at least four visible satellites increases from 11% to 46% of the total epoch count. These findings indicate that BeiDou, RNSS, and SBAS signals can substantially enhance GNSS-based autonomy for lunar and cislunar missions.
comment: 13 pages, 9 figures, IEEE Transactions on Aerospace and Electronic Systems
First-order friction models with bristle dynamics: lumped and distributed formulations
Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.
comment: 15 pages, 9 figures. Under review at IEEE Transactions on Control Systems Technology
Lateral tracking control of all-wheel steering vehicles with intelligent tires
The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.
comment: 16 pages, 12 figures. Under review at IEEE Transactions on Intelligent Vehicles
HOGraspFlow: Taxonomy-Aware Hand-Object Retargeting for Multi-Modal SE(3) Grasp Generation ICRA 2026
We propose Hand-Object\emph{(HO)GraspFlow}, an affordance-centric approach that retargets a single RGB with hand-object interaction (HOI) into multi-modal executable parallel jaw grasps without explicit geometric priors on target objects. Building on foundation models for hand reconstruction and vision, we synthesize $SE(3)$ grasp poses with denoising flow matching (FM), conditioned on the following three complementary cues: RGB foundation features as visual semantics, HOI contact reconstruction, and taxonomy-aware prior on grasp types. Our approach demonstrates high fidelity in grasp synthesis without explicit HOI contact input or object geometry, while maintaining strong contact and taxonomy recognition. Another controlled comparison shows that \emph{HOGraspFlow} consistently outperforms diffusion-based variants (\emph{HOGraspDiff}), achieving high distributional fidelity and more stable optimization in $SE(3)$. We demonstrate a reliable, object-agnostic grasp synthesis from human demonstrations in real-world experiments, where an average success rate of over $83\%$ is achieved. Code: https://github.com/YitianShi/HOGraspFlow
comment: Accepted to ICRA 2026
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents
While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models
While world models have emerged as a cornerstone of embodied intelligence by enabling agents to reason about environmental dynamics through action-conditioned prediction, their evaluation remains fragmented. Current evaluation of embodied world models has largely focused on perceptual fidelity (e.g., video generation quality), overlooking the functional utility of these models in downstream decision-making tasks. In this work, we introduce WorldArena, a unified benchmark designed to systematically evaluate embodied world models across both perceptual and functional dimensions. WorldArena assesses models through three dimensions: video perception quality, measured with 16 metrics across six sub-dimensions; embodied task functionality, which evaluates world models as data engines, policy evaluators, and action planners integrating with subjective human evaluation. Furthermore, we propose EWMScore, a holistic metric integrating multi-dimensional performance into a single interpretable index. Through extensive experiments on 14 representative models, we reveal a significant perception-functionality gap, showing that high visual quality does not necessarily translate into strong embodied task capability. WorldArena benchmark with the public leaderboard is released at https://world-arena.ai, providing a framework for tracking progress toward truly functional world models in embodied AI.
Localized Graph-Based Neural Dynamics Models for Terrain Manipulation
Predictive models can be particularly helpful for robots to effectively manipulate terrains in construction sites and extraterrestrial surfaces. However, terrain state representations become extremely high-dimensional especially to capture fine-resolution details and when depth is unknown or unbounded. This paper introduces a learning-based approach for terrain dynamics modeling and manipulation, leveraging the Graph-based Neural Dynamics (GBND) framework to represent terrain deformation as motion of a graph of particles. Based on the principle that the moving portion of a terrain is usually localized, our approach builds a large terrain graph (potentially millions of particles) but only identifies a very small active subgraph (hundreds of particles) for predicting the outcomes of robot-terrain interaction. To minimize the size of the active subgraph we introduce a learning-based approach that identifies a small region of interest (RoI) based on the robot's control inputs and the current scene. We also introduce a novel domain boundary feature encoding that allows GBNDs to perform accurate dynamics prediction in the RoI interior while avoiding particle penetration through RoI boundaries. Our proposed method is both orders of magnitude faster than naive GBND and it achieves better overall prediction accuracy. We further evaluated our framework on excavation and shaping tasks on terrain with different granularity.
OAT: Ordered Action Tokenization
Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and ordering-inducing training mechanisms. The resulting token space aligns naturally with autoregressive generation and enables prefix-based detokenization, yielding an anytime trade-off between inference cost and action fidelity. Across more than 20 tasks spanning four simulation benchmarks and real-world settings, autoregressive policies equipped with OAT consistently outperform prior tokenization schemes and diffusion-based baselines, while offering significantly greater flexibility at inference time.
Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision
Imitation learning is a promising paradigm for training robot agents; however, standard approaches typically require substantial data acquisition -- via numerous demonstrations or random exploration -- to ensure reliable performance. Although exploration reduces human effort, it lacks safety guarantees and often results in frequent collisions -- particularly in clearance-limited tasks (e.g., peg-in-hole) -- thereby, necessitating manual environmental resets and imposing additional human burden. This study proposes Self-Augmented Robot Trajectory (SART), a framework that enables policy learning from a single human demonstration, while safely expanding the dataset through autonomous augmentation. SART consists of two stages: (1) human teaching only once, where a single demonstration is provided and precision boundaries -- represented as spheres around key waypoints -- are annotated, followed by one environment reset; (2) robot self-augmentation, where the robot generates diverse, collision-free trajectories within these boundaries and reconnects to the original demonstration. This design improves the data collection efficiency by minimizing human effort while ensuring safety. Extensive evaluations in simulation and real-world manipulation tasks show that SART achieves substantially higher success rates than policies trained solely on human-collected demonstrations. Video results available at https://sites.google.com/view/sart-il .
comment: 21 pages, 10 figures, Advanced Robotics accepted 2026.02.03
RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World Environments
Temporally locating and classifying fine-grained sub-task segments in long, untrimmed videos is crucial to safe human-robot collaboration. Unlike generic activity recognition, collaborative manipulation requires sub-task labels that are directly robot-executable. We present RoboSubtaskNet, a multi-stage human-to-robot sub-task segmentation framework that couples attention-enhanced I3D features (RGB plus optical flow) with a modified MS-TCN employing a Fibonacci dilation schedule to capture better short-horizon transitions such as reach-pick-place. The network is trained with a composite objective comprising cross-entropy and temporal regularizers (truncated MSE and a transition-aware term) to reduce over-segmentation and to encourage valid sub-task progressions. To close the gap between vision benchmarks and control, we introduce RoboSubtask, a dataset of healthcare and industrial demonstrations annotated at the sub-task level and designed for deterministic mapping to manipulator primitives. Empirically, RoboSubtaskNet outperforms MS-TCN and MS-TCN++ on GTEA and our RoboSubtask benchmark (boundary-sensitive and sequence metrics), while remaining competitive on the long-horizon Breakfast benchmark. Specifically, RoboSubtaskNet attains F1 @ 50 = 79.5%, Edit = 88.6%, Acc = 78.9% on GTEA; F1 @ 50 = 30.4%, Edit = 52.0%, Acc = 53.5% on Breakfast; and F1 @ 50 = 94.2%, Edit = 95.6%, Acc = 92.2% on RoboSubtask. We further validate the full perception-to-execution pipeline on a 7-DoF Kinova Gen3 manipulator, achieving reliable end-to-end behavior in physical trials (overall task success approx 91.25%). These results demonstrate a practical path from sub-task level video understanding to deployed robotic manipulation in real-world settings.
BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation
Equipping embodied agents with the ability to reason about tasks, foresee physical outcomes, and generate precise actions is essential for general-purpose manipulation. While recent Vision-Language-Action (VLA) models have leveraged pre-trained foundation models, they typically focus on either linguistic planning or visual forecasting in isolation. These methods rarely integrate both capabilities simultaneously to guide action generation, leading to suboptimal performance in complex, long-horizon manipulation tasks. To bridge this gap, we propose BagelVLA, a unified model that integrates linguistic planning, visual forecasting, and action generation within a single framework. Initialized from a pretrained unified understanding and generative model, BagelVLA is trained to interleave textual reasoning and visual prediction directly into the action execution loop. To efficiently couple these modalities, we introduce Residual Flow Guidance (RFG), which initializes from current observation and leverages single-step denoising to extract predictive visual features, guiding action generation with minimal latency. Extensive experiments demonstrate that BagelVLA outperforms existing baselines by a significant margin on multiple simulated and real-world benchmarks, particularly in tasks requiring multi-stage reasoning.
Robotic Depowdering for Additive Manufacturing Via Pose Tracking
With the rapid development of powder-based additive manufacturing, depowdering, a process of removing unfused powder that covers 3D-printed parts, has become a major bottleneck to further improve its productiveness. Traditional manual depowdering is extremely time-consuming and costly, and some prior automated systems either require pre-depowdering or lack adaptability to different 3D-printed parts. To solve these problems, we introduce a robotic system that automatically removes unfused powder from the surface of 3D-printed parts. The key component is a visual perception system, which consists of a pose-tracking module that tracks the 6D pose of powder-occluded parts in real-time, and a progress estimation module that estimates the depowdering completion percentage. The tracking module can be run efficiently on a laptop CPU at up to 60 FPS. Experiments show that our depowdering system can remove unfused powder from the surface of various 3D-printed parts without causing any damage. To the best of our knowledge, this is one of the first vision-based robotic depowdering systems that adapt to parts with various shapes without the need for pre-depowdering.
comment: Github link: https://github.com/zhenweil/Robotic-Depowdering-for-Additive-Manufacturing-Via-Pose-Tracking Video link: https://www.youtube.com/watch?v=AUIkyULAhqM
Multiagent Systems
Learning to Compose for Cross-domain Agentic Workflow Generation
Automatically generating agentic workflows -- executable operator graphs or codes that orchestrate reasoning, verification, and repair -- has become a practical way to solve complex tasks beyond what single-pass LLM generation can reliably handle. Yet what constitutes a good workflow depends heavily on the task distribution and the available operators. Under domain shift, current systems typically rely on iterative workflow refinement to discover a feasible workflow from a large workflow space, incurring high iteration costs and yielding unstable, domain-specific behavior. In response, we internalize a decompose-recompose-decide mechanism into an open-source LLM for cross-domain workflow generation. To decompose, we learn a compact set of reusable workflow capabilities across diverse domains. To recompose, we map each input task to a sparse composition over these bases to generate a task-specific workflow in a single pass. To decide, we attribute the success or failure of workflow generation to counterfactual contributions from learned capabilities, thereby capturing which capabilities actually drive success by their marginal effects. Across stringent multi-domain, cross-domain, and unseen-domain evaluations, our 1-pass generator surpasses SOTA refinement baselines that consume 20 iterations, while substantially reducing generation latency and cost.
Generalized Langevin Models of Linear Agent-Based Systems: Strategic Influence Through Environmental Coupling
Agent-based models typically treat systems in isolation, discarding environmental coupling as either computationally prohibitive or dynamically irrelevant. We demonstrate that this neglect misses essential physics: environmental degrees of freedom create memory effects that fundamentally alter system dynamics. By systematically transforming linear update rules into exact generalized Langevin equations, we show that unobserved environmental agents manifest as memory kernels whose timescales and coupling strengths are determined by the environmental interaction spectrum. Network topology shapes this memory structure in distinct ways: small-world rewiring drives dynamics toward a single dominant relaxation mode, while fragmented environments sustain multiple persistent modes corresponding to isolated subpopulations. We apply this framework to covert influence operations where adversaries manipulate target populations exclusively via environmental intermediaries. The steady-state response admits a random-walk interpretation through hitting probabilities, revealing how zealot opinions diffuse through the environment to shift system agent opinions toward the zealot mean - even when zealots never directly contact targets.
comment: 7 pages, 3 figures
The emergence of numerical representations in communicating artificial agents
Human languages provide efficient systems for expressing numerosities, but whether the sheer pressure to communicate is enough for numerical representations to arise in artificial agents, and whether the emergent codes resemble human numerals at all, remains an open question. We study two neural network-based agents that must communicate numerosities in a referential game using either discrete tokens or continuous sketches, thus exploring both symbolic and iconic representations. Without any pre-defined numeric concepts, the agents achieve high in-distribution communication accuracy in both communication channels and converge on high-precision symbol-meaning mappings. However, the emergent code is non-compositional: the agents fail to derive systematic messages for unseen numerosities, typically reusing the symbol of the highest trained numerosity (discrete), or collapsing extrapolated values onto a single sketch (continuous). We conclude that the communication pressure alone suffices for precise transmission of learned numerosities, but additional pressures are needed to yield compositional codes and generalisation abilities.
comment: In the Sixteenth International Conference on the Evolution of Language
Towards Probabilistic Strategic Timed CTL
We define PSTCTL, a probabilistic variant of Strategic Timed CTL (STCTL), interpreted over stochastic multi-agent systems with continuous time and asynchronous execution semantics. STCTL extends TCTL with strategic operators in the style of ATL. Moreover, we demonstrate the feasibility of verification with irP-strategies.
IMITATOR4AMAS: Strategy Synthesis for STCTL
IMITATOR4AMAS supports model checking and synthesis of memoryless imperfect information strategies for STCTL, interpreted over networks of parametric timed automata with asynchronous execution. While extending the verifier IMITATOR, IMITATOR4AMAS is the first tool for strategy synthesis in this setting. Our experimental results show a substantial speedup over previous approaches.
Beyond Task Performance: A Metric-Based Analysis of Sequential Cooperation in Heterogeneous Multi-Agent Destructive Foraging
This work addresses the problem of analyzing cooperation in heterogeneous multi-agent systems which operate under partial observability and temporal role dependency, framed within a destructive multi-agent foraging setting. Unlike most previous studies, which focus primarily on algorithmic performance with respect to task completion, this article proposes a systematic set of general-purpose cooperation metrics aimed at characterizing not only efficiency, but also coordination and dependency between teams and agents, fairness, and sensitivity. These metrics are designed to be transferable to different multi-agent sequential domains similar to foraging. The proposed suite of metrics is structured into three main categories that jointly provide a multilevel characterization of cooperation: primary metrics, inter-team metrics, and intra-team metrics. They have been validated in a realistic destructive foraging scenario inspired by dynamic aquatic surface cleaning using heterogeneous autonomous vehicles. It involves two specialized teams with sequential dependencies: one focused on the search of resources, and another on their destruction. Several representative approaches have been evaluated, covering both learning-based algorithms and classical heuristic paradigms.
An Ontology-driven Dynamic Knowledge Base for Uninhabited Ground Vehicles
In this paper, the concept of Dynamic Contextual Mission Data (DCMD) is introduced to develop an ontology-driven dynamic knowledge base for Uninhabited Ground Vehicles (UGVs) at the tactical edge. The dynamic knowledge base with DCMD is added to the UGVs to: support enhanced situation awareness; improve autonomous decision making; and facilitate agility within complex and dynamic environments. As UGVs are heavily reliant on the a priori information added pre-mission, unexpected occurrences during a mission can cause identification ambiguities and require increased levels of user input. Updating this a priori information with contextual information can help UGVs realise their full potential. To address this, the dynamic knowledge base was designed using an ontology-driven representation, supported by near real-time information acquisition and analysis, to provide in-mission on-platform DCMD updates. This was implemented on a team of four UGVs that executed a laboratory based surveillance mission. The results showed that the ontology-driven dynamic representation of the UGV operational environment was machine actionable, producing contextual information to support a successful and timely mission, and contributed directly to the situation awareness.
comment: 10 pages, 11 figures, 2025 Australasian Conference on Robotics and Automation (ACRA 2025)
Protecting Context and Prompts: Deterministic Security for Non-Deterministic AI
Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives--authenticated prompts and authenticated context--that provide cryptographically verifiable provenance across LLM workflows. Authenticated prompts enable self-contained lineage verification, while authenticated context uses tamper-evident hash chains to ensure integrity of dynamic inputs. Building on these primitives, we formalize a policy algebra with four proven theorems providing protocol-level Byzantine resistance--even adversarial agents cannot violate organizational policies. Five complementary defenses--from lightweight resource controls to LLM-based semantic validation--deliver layered, preventative security with formal guarantees. Evaluation against representative attacks spanning 6 exhaustive categories achieves 100% detection with zero false positives and nominal overhead. We demonstrate the first approach combining cryptographically enforced prompt lineage, tamper-evident context, and provable policy reasoning--shifting LLM security from reactive detection to preventative guarantees.
Authenticated Workflows: A Systems Approach to Protecting Agentic AI
Agentic AI systems automate enterprise workflows but existing defenses--guardrails, semantic filters--are probabilistic and routinely bypassed. We introduce authenticated workflows, the first complete trust layer for enterprise agentic AI. Security reduces to protecting four fundamental boundaries: prompts, tools, data, and context. We enforce intent (operations satisfy organizational policies) and integrity (operations are cryptographically authentic) at every boundary crossing, combining cryptographic elimination of attack classes with runtime policy enforcement. This delivers deterministic security--operations either carry valid cryptographic proof or are rejected. We introduce MAPL, an AI-native policy language that expresses agentic constraints dynamically as agents evolve and invocation context changes, scaling as O(log M + N) policies versus O(M x N) rules through hierarchical composition with cryptographic attestations for workflow dependencies. We prove practicality through a universal security runtime integrating nine leading frameworks (MCP, A2A, OpenAI, Claude, LangChain, CrewAI, AutoGen, LlamaIndex, Haystack) through thin adapters requiring zero protocol modifications. Formal proofs establish completeness and soundness. Empirical validation shows 100% recall with zero false positives across 174 test cases, protection against 9 of 10 OWASP Top 10 risks, and complete mitigation of two high impact production CVEs.
AIvilization v0: Toward Large-Scale Artificial Social Simulation with a Unified Agent Architecture and Adaptive Agent Profiles
AIvilization v0 is a publicly deployed large-scale artificial society that couples a resource-constrained sandbox economy with a unified LLM-agent architecture, aiming to sustain long-horizon autonomy while remaining executable under rapidly changing environment. To mitigate the tension between goal stability and reactive correctness, we introduce (i) a hierarchical branch-thinking planner that decomposes life goals into parallel objective branches and uses simulation-guided validation plus tiered re-planning to ensure feasibility; (ii) an adaptive agent profile with dual-process memory that separates short-term execution traces from long-term semantic consolidation, enabling persistent yet evolving identity; and (iii) a human-in-the-loop steering interface that injects long-horizon objectives and short commands at appropriate abstraction levels, with effects propagated through memory rather than brittle prompt overrides. The environment integrates physiological survival costs, non-substitutable multi-tier production, an AMM-based price mechanism, and a gated education-occupation system. Using high-frequency transactions from the platforms mature phase, we find stable markets that reproduce key stylized facts (heavy-tailed returns and volatility clustering) and produce structured wealth stratification driven by education and access constraints. Ablations show simplified planners can match performance on narrow tasks, while the full architecture is more robust under multi-objective, long-horizon settings, supporting delayed investment and sustained exploration.
On Qualitative Preference in Alternating-time Temporal Logic with Strategy Contexts
We show how to add and eliminate binary preference on plays in Alternating-time Temporal Logic (ATL) with strategy contexts on Concurrent Game Models (CGMs) by means of a translation which preserves satisfaction in models where preference-indiscernibility between plays is an equivalence relation of finite index. The elimination technique also works for a companion second-order path quantifier, which makes quantified path variables range over sets of plays that are closed under preference-indiscernibility. We argue that the preference operator and the specialized quantifier facilitate formulating interesting solution concepts such as Nash equilibrium and secure equilibrium in a straightforward way. We also present a novel translation from ATL with strategy contexts to Quantified Computation Tree Logic (QCTL). Together with the translation which eliminates preference and the specialized form of quantification, this translation allows reasoning about infinite multiplayer synchronous games on CGMs to be translated from the proposed extension of ATL with strategy contexts into QCTL. The setting is related to that of ordered objectives in the works of Bouyer, Brenguier, Markey and Ummels, except that our focus is on the use of the temporal logic languages mentioned above, and we rely on translations into QCTL for the algorithmic solutions.
comment: 30 pages
LLM-Mediated Guidance of MARL Systems
In complex multi-agent environments, achieving efficient learning and desirable behaviours is a significant challenge for Multi-Agent Reinforcement Learning (MARL) systems. This work explores the potential of combining MARL with Large Language Model (LLM)-mediated interventions to guide agents toward more desirable behaviours. Specifically, we investigate how LLMs can be used to interpret and facilitate interventions that shape the learning trajectories of multiple agents. We experimented with two types of interventions, referred to as controllers: a Natural Language (NL) Controller and a Rule-Based (RB) Controller. The RB Controller showed a stronger impact than the NL Controller, which uses a small (7B/8B) LLM to simulate human-like interventions. Our findings indicate that agents particularly benefit from early interventions, leading to more efficient training and higher performance. Both intervention types outperform the baseline without interventions, highlighting the potential of LLM-mediated guidance to accelerate training and enhance MARL performance in challenging environments.
Games with Payments between Learning Agents
In repeated games, such as auctions, players rely on autonomous learning agents to choose their actions. We study settings in which players have their agents make monetary transfers to other agents during play at their own expense, in order to influence learning dynamics in their favor. Our goal is to understand when players have incentives to use such payments, how payments between agents affect learning outcomes, and what the resulting implications are for welfare and its distribution. We propose a simple game-theoretic model to capture the incentive structure of such scenarios. We find that, quite generally, abstaining from payments is not robust to strategic deviations by users of learning agents: self-interested players benefit from having their agents make payments to other learners. In a broad class of games, such endogenous payments between learning agents lead to higher welfare for all players. In first- and second-price auctions, equilibria of the induced "payment-policy game" lead to highly collusive learning outcomes, with low or vanishing revenue for the auctioneer. These results highlight a fundamental challenge for mechanism design, as well as for regulatory policies, in environments where learning agents may interact in the digital ecosystem beyond a mechanism's boundaries.
AI Driven Discovery of Bio Ecological Mediation in Cascading Heatwave Risks
Compound heatwaves increasingly trigger complex cascading failures that propagate through interconnected physical and human systems, yet the fragmentation of disciplinary knowledge hinders the comprehensive mapping of these systemic risk topologies. This study introduces the Heatwave Discovery Agent HeDA as an autonomous scientific synthesis framework designed to bridge cognitive gaps by constructing a high fidelity knowledge graph from 8,111 academic publications. By structuring 70,297 evidence nodes, the system exhibits enhanced inferential fidelity in capturing long tail risk mechanisms and achieves a significant accuracy margin compared to standard foundation models including GPT 5.2 and Claude Sonnet 4.5 in complex reasoning tasks. The resulting topological analysis reveals a critical bio ecological mediation effect where biological systems function as the primary non linear amplifiers of thermal stress that transform physical meteorological hazards into systemic socioeconomic losses. We further identify latent functional couplings between theoretically distinct sectors such as the heat induced synchronization of power grid failures and emergency medical capacity saturation. These findings elucidate the dynamics of compound climate risks and provide an empirical basis for shifting adaptation strategies from static sectoral defense to dynamic cross system resilience.
Learning to Coordinate via Quantum Entanglement in Multi-Agent Reinforcement Learning
The inability to communicate poses a major challenge to coordination in multi-agent reinforcement learning (MARL). Prior work has explored correlating local policies via shared randomness, sometimes in the form of a correlation device, as a mechanism to assist in decentralized decision-making. In contrast, this work introduces the first framework for training MARL agents to exploit shared quantum entanglement as a coordination resource, which permits a larger class of communication-free correlated policies than shared randomness alone. This is motivated by well-known results in quantum physics which posit that, for certain single-round cooperative games with no communication, shared quantum entanglement enables strategies that outperform those that only use shared randomness. In such cases, we say that there is quantum advantage. Our framework is based on a novel differentiable policy parameterization that enables optimization over quantum measurements, together with a novel policy architecture that decomposes joint policies into a quantum coordinator and decentralized local actors. To illustrate the effectiveness of our proposed method, we first show that we can learn, purely from experience, strategies that attain quantum advantage in single-round games that are treated as black box oracles. We then demonstrate how our machinery can learn policies with quantum advantage in an illustrative multi-agent sequential decision-making problem formulated as a decentralized partially observable Markov decision process (Dec-POMDP).
Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI
Grassroots Logic Programs (GLP) is a concurrent logic programming language with variables partitioned into paired \emph{readers} and \emph{writers}, conjuring both linear logic and futures/promises: an assignment is produced at most once via the sole occurrence of a writer (promise) and consumed at most once via the sole occurrence of its paired reader (future), and may contain additional readers and/or writers, enabling the concise expression of rich multidirectional communication modalities. GLP was designed as a language for grassroots platforms -- distributed systems with multiple instances that can operate independently of each other and of any global resource, and can coalesce into ever larger instances -- with its target architecture being smartphones communicating peer-to-peer. The operational semantics of Concurrent (single-agent) GLP and of multiagent GLP (maGLP) were defined via transition systems/multiagent transition systems, respectively. Here, we describe the mathematics developed to facilitate the workstation- and smartphone-based implementations of GLP by AI in Dart. We developed dGLP -- implementation-ready deterministic operational semantics for single-agent GLP -- and proved it correct with respect to the Concurrent GLP operational semantics; dGLP was used by AI as a formal spec, from which it developed a workstation-based implementation of GLP. We developed madGLP -- an implementation-ready multiagent operational semantics for maGLP -- and proved it correct with respect to the maGLP operational semantics; madGLP is deterministic at the agent level (not at the system level due to communication asynchrony), and is being used by AI as a formal spec from which it develops a smartphone-based implementation of maGLP.
Code2MCP: Transforming Code Repositories into MCP Services
The Model Context Protocol (MCP) aims to create a standard for how Large Language Models use tools. However, most current research focuses on selecting tools from an existing pool. A more fundamental, yet largely overlooked, problem is how to populate this pool by converting the vast number of existing software projects into MCP-compatible services. To bridge this gap, we introduce Code2MCP, an agent-based framework that automatically transforms a GitHub repository into a functional MCP service with minimal human intervention. Code2MCP employs a multi-agent workflow for code analysis, environment setup, tool function design, and service generation, enhanced by a self-correcting loop to ensure reliability. We demonstrate that Code2MCP successfully transforms open-source computing libraries in scientific fields such as bioinformatics, mathematics, and fluid dynamics that are not available in existing MCP servers. By providing a novel automated pathway to unlock GitHub, the world's largest code repository, for the MCP ecosystem, Code2MCP serves as a catalyst to significantly accelerate the protocol's adoption and practical application. The code is public at https://github.com/DEFENSE-SEU/Code2MCP.
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis
Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simultaneously provide realistic patient simulation, clinician-verified diagnostic labels, and support for dynamic multi-turn consultation. We present LingxiDiagBench, a large-scale multi-agent benchmark that evaluates LLMs on both static diagnostic inference and dynamic multi-turn psychiatric consultation in Chinese. At its core is LingxiDiag-16K, a dataset of 16,000 EMR-aligned synthetic consultation dialogues designed to reproduce real clinical demographic and diagnostic distributions across 12 ICD-10 psychiatric categories. Through extensive experiments across state-of-the-art LLMs, we establish key findings: (1) although LLMs achieve high accuracy on binary depression--anxiety classification (up to 92.3%), performance deteriorates substantially for depression--anxiety comorbidity recognition (43.0%) and 12-way differential diagnosis (28.5%); (2) dynamic consultation often underperforms static evaluation, indicating that ineffective information-gathering strategies significantly impair downstream diagnostic reasoning; (3) consultation quality assessed by LLM-as-a-Judge shows only moderate correlation with diagnostic accuracy, suggesting that well-structured questioning alone does not ensure correct diagnostic decisions. We release LingxiDiag-16K and the full evaluation framework to support reproducible research at https://github.com/Lingxi-mental-health/LingxiDiagBench.
Systems and Control (EESS)
Multi-UAV Trajectory Optimization for Bearing-Only Localization in GPS Denied Environments
Accurate localization of maritime targets by unmanned aerial vehicles (UAVs) remains challenging in GPS-denied environments. UAVs equipped with gimballed electro-optical sensors are typically used to localize targets, however, reliance on these sensors increases mechanical complexity, cost, and susceptibility to single-point failures, limiting scalability and robustness in multi-UAV operations. This work presents a new trajectory optimization framework that enables cooperative target localization using UAVs with fixed, non-gimballed cameras operating in coordination with a surface vessel. This estimation-aware optimization generates dynamically feasible trajectories that explicitly account for mission constraints, platform dynamics, and out-of-frame events. Estimation-aware trajectories outperform heuristic paths by reducing localization error by more than a factor of two, motivating their use in cooperative operations. Results further demonstrate that coordinated UAVs with fixed, non-gimballed cameras achieve localization accuracy that meets or exceeds that of single gimballed systems, while substantially lowering system complexity and cost, enabling scalability, and enhancing mission resilience.
comment: 38 pages, 7 figure, and 6 tables
Credit-Based vs. Discount-Based Congestion Pricing: A Comparison Study
Credit-based congestion pricing (CBCP) and discount-based congestion pricing (DBCP), which respectively allot travel credits and toll discounts to subsidize low-income users' access to tolled roads, have emerged as promising policies for alleviating the societal inequity concerns of congestion pricing. However, since real-world implementations of CBCP and DBCP are nascent, their relative merits remain unclear. In this work, we compare the efficacy of deploying CBCP and DBCP in reducing user costs and increasing toll revenues. We first formulate a non-atomic congestion game in which low-income users receive a travel credit or toll discount for accessing tolled lanes. We establish that, in our formulation, Nash equilibrium flows always exist and can be computed or well approximated via convex programming. Our main result establishes a set of practically relevant conditions under which DBCP provably outperforms CBCP in inducing equilibrium outcomes that minimize a given societal cost, which encodes user cost reduction and toll revenue maximization. Finally, we validate our theoretical contributions via a case study of the 101 Express Lanes Project, a CBCP program implemented in the San Francisco Bay Area.
Interpretable Attention-Based Multi-Agent PPO for Latency Spike Resolution in 6G RAN Slicing
Sixth-generation (6G) radio access networks (RANs) must enforce strict service-level agreements (SLAs) for heterogeneous slices, yet sudden latency spikes remain difficult to diagnose and resolve with conventional deep reinforcement learning (DRL) or explainable RL (XRL). We propose \emph{Attention-Enhanced Multi-Agent Proximal Policy Optimization (AE-MAPPO)}, which integrates six specialized attention mechanisms into multi-agent slice control and surfaces them as zero-cost, faithful explanations. The framework operates across O-RAN timescales with a three-phase strategy: predictive, reactive, and inter-slice optimization. A URLLC case study shows AE-MAPPO resolves a latency spike in $18$ms, restores latency to $0.98$ms with $99.9999\%$ reliability, and reduces troubleshooting time by $93\%$ while maintaining eMBB and mMTC continuity. These results confirm AE-MAPPO's ability to combine SLA compliance with inherent interpretability, enabling trustworthy and real-time automation for 6G RAN slicing.
comment: This work has been accepted to appear in the IEEE International Conference on Communications (ICC)
Deep Neural Network-Enhanced Frequency-Constrained Optimal Power Flow with Multi-Governor Dynamics
To ensure frequency security in power systems, both the rate of change of frequency (RoCoF) and the frequency nadir (FN) must be explicitly accounted for in real-time frequency-constrained optimal power flow (FCOPF). However, accurately modeling sys-tem frequency dynamics through analytical formulations is chal-lenging due to their inherent nonlinearity and complexity. To address this issue, deep neural networks (DNNs) are utilized to capture the nonlinear mapping between system operating condi-tions and key frequency performance metrics. In this paper, a DNN-based frequency prediction model is developed and trained using the high-fidelity time-domain simulation data generated in PSCAD/EMTDC. The trained DNN is subsequently transformed into an equivalent mixed-integer linear programming (MILP) form and embedded into the FCOPF problem as additional con-straints to explicitly enforce frequency security, leading to the proposed DNN-FCOPF formulation. For benchmarking, two alternative models are considered: a conventional optimal power flow without frequency constraints and a linearized FCOPF in-corporating system-level RoCoF and FN constraints. The effec-tiveness of the proposed method is demonstrated by comparing the solutions of these three models through extensive PSCAD/EMTDC time-domain simulations under various loading scenarios.
Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception
Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV's perception pipeline is challenging due to the large gap between the computation requirement and the AV's limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV's surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.
comment: 13 pages, 12 figures
Lie Group Variational Integrator for the Geometrically Exact Rod with Circular Cross-Section Incorporating Cross-Sectional Deformation
In this paper, we derive the continuous space-time equations of motion of a three-dimensional geometrically exact rod, or the Cosserat rod, incorporating planar cross-sectional deformation. We then adopt the Lie group variational integrator technique to obtain a discrete model of the rod incorporating both rotational motion and cross-sectional deformation as well. The resulting discrete model possesses several desirable features: it ensures volume conservation of the discrete elements by considering cross-sectional deformation through a local dilatation factor, it demonstrates the beneficial properties associated with the variational integrator technique, such as the preservation of the rotational configuration, and energy conservation with a bounded error. An exhaustive set of numerical results under various initial conditions of the rod demonstrates the efficacy of the model in replicating the physics of the system.
comment: Submitted to: Computers and Mathematics with Applications
Trajectory-based data-driven predictive control and the state-space predictor
We define trajectory predictive control (TPC) as a family of output-feedback indirect data-driven predictive control (DDPC) methods that represent the output trajectory of a discrete-time system as a linear function of the recent input/output history and the planned input trajectory. This paper shows that for different choices of the trajectory predictor, TPC encompasses a wide variety of DDPC methods, including subspace predictive control (SPC), closed-loop SPC, $γ$-DDPC, causal-$γ$-DDPC, transient predictive control, and others. This paper introduces a trajectory predictor that corresponds to a linear state-space model with the recent input/output history as the state. With this state-space predictor, TPC is a special case of linear model predictive control and therefore inherits its mature theory. In numerical experiments, TPC performance approaches the limit of oracle $H_2$-optimal control with perfect knowledge of the underlying system model. For TPC with small training datasets, the state-space predictor outperforms other predictors because it has fewer parameters.
Tuning the burn-in phase in training recurrent neural networks improves their performance ICLR 2026
Training recurrent neural networks (RNNs) with standard backpropagation through time (BPTT) can be challenging, especially in the presence of long input sequences. A practical alternative to reduce computational and memory overhead is to perform BPTT repeatedly over shorter segments of the training data set, corresponding to truncated BPTT. In this paper, we examine the training of RNNs when using such a truncated learning approach for time series tasks. Specifically, we establish theoretical bounds on the accuracy and performance loss when optimizing over subsequences instead of the full data sequence. This reveals that the burn-in phase of the RNN is an important tuning knob in its training, with significant impact on the performance guarantees. We validate our theoretical results through experiments on standard benchmarks from the fields of system identification and time series forecasting. In all experiments, we observe a strong influence of the burn-in phase on the training process, and proper tuning can lead to a reduction of the prediction error on the training and test data of more than 60% in some cases.
comment: Published as a conference paper at ICLR 2026, https://openreview.net/forum?id=jwkdKpioHJ
Anomaly Detection with Machine Learning Algorithms in Large-Scale Power Grids
We apply several machine learning algorithms to the problem of anomaly detection in operational data for large-scale, high-voltage electric power grids. We observe important differences in the performance of the algorithms. Neural networks typically outperform classical algorithms such as k-nearest neighbors and support vector machines, which we explain by the strong contextual nature of the anomalies. We show that unsupervised learning algorithm work remarkably well and that their predictions are robust against simultaneous, concurring anomalies.
comment: 12 pages, 9 figures
Simple generators of rational function fields
Consider a subfield of the field of rational functions in several indeterminates. We present an algorithm that, given a set of generators of such a subfield, finds a simple generating set. We provide an implementation of the algorithm and show that it improves upon the state of the art both in efficiency and the quality of the results. Furthermore, we demonstrate the utility of simplified generators through several case studies from different application domains, such as structural parameter identifiability. The main algorithmic novelties include performing only partial Gröbner basis computation via sparse interpolation and efficient search for polynomials of a fixed degree in a subfield of the rational function field.
Backstepping Control of PDEs on Domains with Graph-Monotone Boundaries
Despite the extensive body of work on backstepping for one-dimensional PDEs, results in higher dimensions remain comparatively limited. Most available methods either exploit particular symmetries of the PDE or address problems posed on parallelepiped domains. To the best of our knowledge, the only approach that enables the design of backstepping controllers on non-parallelepiped regions without symmetry assumptions is the domain extension technique. This method, however, presents several drawbacks. In particular, the control input at each time instant is obtained by simulating a PDE on an extended domain, from which the actual input on the original domain is approximated. By contrast, in the one-dimensional setting, once the time-independent backstepping gain kernel is known, the control input can be computed in closed form as a feedback depending solely on the state at that same instant. Moreover, problems such as output-feedback design or adaptive and robust control do not appear straightforward to address with the domain extension method, at least to the best of our knowledge. These considerations motivate the search, whenever possible, for alternatives that preserve the main advantages of one-dimensional backstepping. A motivating example for the domain extension method is the control of the heat equation on a piano-shaped domain, with actuation applied at the tail of the piano. In this extended abstract, we show through a simple calculation that the domain extension method is not required in this setting. Instead, a strategy akin to that used for parallelepiped domains can be adopted. This result constitutes a first instance of a broader framework for backstepping control of asymmetric PDEs posed on non-parallelepiped regions, which we refer to as domains with graph-monotone boundaries. The general framework is developed in a forthcoming paper.
comment: This extended abstract has been submitted to the 2026 Interna-tional Symposium on Mathematical Theory of Networks and Systems(MTNS 2026), to be held in Waterloo, ON, Canada
Singular Port-Hamiltonian Systems Beyond Passivity
In this paper, we study a class of port-Hamiltonian systems whose vector fields exhibit singularities. A representative example of this class has recently been employed in the power electronics literature to implement a grid-forming controller. We show that, under certain conditions, these port-Hamiltonian systems, when interconnected with passive systems, converge to a prescribed non-equilibrium steady state. At first glance, the apparently passive nature of the port-Hamiltonian system seems incompatible with the active power injection required to sustain this non-equilibrium condition. However, we demonstrate that the discontinuity inherent in the vector field provides the additional energy needed to maintain this operating point, indicating that the system is not globally passive. Moreover, when the discontinuity is replaced by a continuous approximation, the resulting system becomes cyclo-dissipative while still capable of supplying the required power.
Reference Output Tracking in Boolean Control Networks
In this paper, the problem of tracking a given reference output trajectory is investigated for the class of Boolean control networks, by resorting to their algebraic representation. First, the case of a finite-length reference trajectory is addressed, and the analysis and algorithm first proposed in [17] are extended to be able to deal with arbitrary initial conditions and to identify all possible solutions. The approach developed for the finite-length case is then adjusted to cope with periodic reference output trajectories. The results of the paper are illustrated through an example.
Dynamic Interference Management for TN-NTN Coexistence in the Upper Mid-Band
The coexistence of terrestrial networks (TN) and non-terrestrial networks (NTN) in the frequency range 3 (FR3) upper mid-band presents considerable interference concerns, as dense TN deployments can severely degrade NTN downlink performance. Existing studies rely on interference-nulling beamforming, precoding, or exclusion zones that require accurate channel state information (CSI) and static coordination, making them unsuitable for dynamic NTN scenarios. To overcome these limitations, we develop an optimization framework that jointly controls TN downlink power, uplink power, and antenna downtilt to protect NTN links while preserving terrestrial performance. The resultant non-convex coupling between TN and NTN parameters is addressed by a Proximal Policy Optimization (PPO)-based reinforcement learning method that develops adaptive power and tilt control strategies. Simulation results demonstrate a reduction up to 8 dB in the median interference-to-noise ratio (INR) while maintaining over 87% TN basestation activity, outperforming conventional baseline methods and validating the feasibility of the proposed strategy for FR3 coexistence.
comment: This work has been accepted for publication in the IEEE ICC 2026 Conference
Exploring the impact of adaptive rewiring in Graph Neural Networks
This paper explores sparsification methods as a form of regularization in Graph Neural Networks (GNNs) to address high memory usage and computational costs in large-scale graph applications. Using techniques from Network Science and Machine Learning, including Erdős-Rényi for model sparsification, we enhance the efficiency of GNNs for real-world applications. We demonstrate our approach on N-1 contingency assessment in electrical grids, a critical task for ensuring grid reliability. We apply our methods to three datasets of varying sizes, exploring Graph Convolutional Networks (GCN) and Graph Isomorphism Networks (GIN) with different degrees of sparsification and rewiring. Comparison across sparsification levels shows the potential of combining insights from both research fields to improve GNN performance and scalability. Our experiments highlight the importance of tuning sparsity parameters: while sparsity can improve generalization, excessive sparsity may hinder learning of complex patterns. Our adaptive rewiring approach, particularly when combined with early stopping, proves promising by allowing the model to adapt its connectivity structure during training. This research contributes to understanding how sparsity can be effectively leveraged in GNNs for critical applications like power grid reliability analysis.
comment: This work has been submitted to the IEEE for possible publication
Improving CACC Robustness to Parametric Uncertainty via Plant Equivalent Controller Realizations
Cooperative Adaptive Cruise Control (CACC) enables vehicle platooning through inter-vehicle communication, improving traffic efficiency and safety. Conventional CACC relies on feedback linearization, assuming exact vehicle parameters; however, longitudinal vehicle dynamics are nonlinear and subject to parametric uncertainty. Applying feedback linearization with a nominal model yields imperfect cancellation, leading to model mismatch and degraded performance with off-the-shelf CACC controllers. To improve robustness without redesigning the CACC law, we explicitly model the mismatch between the ideal closed-loop dynamics assumed by the CACC design and the actual dynamics under parametric uncertainties. Robustness is formulated as an $\mathcal{L}_2$ trajectory-matching problem, minimizing the energy of this mismatch to make the uncertain system behave as closely as possible to the ideal model. This objective is addressed by optimizing over plant equivalent controller (PEC) realizations that preserve the nominal closed-loop behavior while mitigating the effects of parametric uncertainty. Stability and performance are enforced via linear matrix inequalities, yielding a convex optimization problem applicable to heterogeneous platoons. Experimental results demonstrate improved robustness and performance under parametric uncertainty while preserving nominal CACC behavior.
comment: Submitted to the IEEE International Conference on Intelligent Transportation Systems 2026 -Naples, Italy
Integrating Active Damping with Shaping-Filtered Reset Tracking Control for Piezo-Actuated Nanopositioning
Piezoelectric nanopositioning systems are often limited by lightly damped structural resonances and the gain--phase constraints of linear feedback, which restrict achievable bandwidth and tracking performance. This paper presents a dual-loop architecture that combines an inner-loop non-minimum-phase resonant controller (NRC) for active damping with an outer-loop tracking controller augmented by a constant-gain, lead-in-phase (CgLp) reset element to provide phase lead at the targeted crossover without increasing loop gain. We show that aggressively tuned CgLp designs with larger phase lead can introduce pronounced higher-order harmonics, degrading error sensitivity in specific frequency bands and causing multiple-reset behavior. To address this, a shaping filter is introduced in the reset-trigger path to regulate the reset action and suppress harmonic-induced effects while preserving the desired crossover-phase recovery. The proposed controllers are implemented in real time on an industrial piezo nanopositioner, demonstrating an experimental open-loop crossover increase of approximately 55~Hz and a closed-loop bandwidth improvement of about 34~Hz relative to a well-tuned linear baseline.
Supercharging Packet-level Network Simulation of Large Model Training via Memoization and Fast-Forwarding
Packet-level discrete-event simulation (PLDES) is a prevalent tool for evaluating detailed performance of large model training. Although PLDES offers high fidelity and generality, its slow performance has plagued networking practitioners. Existing optimization techniques either simplify the network model, resulting in large errors; or execute it in parallel using multiple processors, with an upper bound on speedup. This paper explores an alternative optimization direction that reduces the computational loads of PLDES while maintaining high fidelity. Our key insight is that, in distributed LLM training, packet-level traffic behaviors often exhibit repetitive contention patterns and steady-states where flow rates stabilize, ignoring these redundant discrete events speeds up the simulation considerably and the error is negligible. We realize this idea by proposing Wormhole, a user-transparent PLDES kernel capable of automatically memoization for unsteady-states and skipping for steady-states. Wormhole adopts network partitioning, state memoization and reuse, and rate-based steady-state identification to accurately determine the periods of each flow's steady-state, while maintaining simulation consistency after fast-forwarding. Experiments demonstrate that Wormhole can achieve a 744x speedup over the original ns-3 (510x for MoE workload), with a bounded error of <1%. Applying current multithreading parallel techniques and Wormhole together allows a 1012x speedup, reducing the simulation time for one GPT-13B training under 128 GPUs from 9 hours to 5 minutes.
comment: 13 pages body, 21 pages total
Rapid Boundary Stabilization of Two-Dimensional Elastic Plates with In-Domain Aeroelastic Instabilities
Motivated by active wing flutter suppression in high-Mach-number flight, this paper presents a rapid boundary stabilization strategy for a two-dimensional PDE-modeled elastic plate with in-domain instabilities, where the exponential stability is achieved with a decay rate that can be arbitrarily assigned by the users. First, the aeroelastic system is modeled as two-dimensional coupled wave PDEs with internal anti-damping terms, derived by Piston theory and Hamilton's principle. Using Fourier series expansion, the 2-D problem is decomposed into a parameterized family of 1-D systems. For each mode, a full-state boundary feedback controller is designed via PDE backstepping transformation. To enable output-feedback implementation, a state observer is further designed to estimate the distributed states over the two-dimensional spatial domain. Through Lyapunov analysis, the exponential stability of the 2-D elastic plate PDE under the proposed boundary control is established with a designer-tunable decay rate. Numerical simulations verify the effectiveness of the control strategy in suppressing flow-induced vibrations.
The Infrastructure Equation: Water, Energy, and Community Policy for Georgia's Data Center Boom
The rapid growth of data centers driven by cloud computing and artificial intelligence is reshaping infrastructure planning and environmental governance in the United States. Georgia has emerged as a major market for data center development, particularly in the Atlanta metropolitan region, creating economic opportunity alongside significant challenges. Data centers are water-intensive, energy-intensive, and land-intensive infrastructure whose cumulative impacts strain municipal water systems, electric grids, and local land-use frameworks. Unlike single industrial projects, data centers are often proposed in clusters, amplifying community and infrastructure impacts. This report draws on insights from a Georgia-based expert convening to describe the implications of data center growth for water management, energy reliability, ratepayer equity, zoning, and community engagement, identify potential gaps in transparency and regulatory coordination, and present a policy roadmap to help Georgia balance digital infrastructure growth with sustainability, equity, and community protection.
Scale-Free delta-Level Coherent Output Synchronization of Multi-Agent Systems with Adaptive Protocols and Bounded Disturbances
In this paper, we investigate scale-free delta-level coherent output synchronization for multi-agent systems (MAS) operating under bounded disturbances or noises. We introduce an adaptive scale-free framework designed solely based on the knowledge of agent models and completely agnostic to both the communication topology and the size of the network. We define the level of coherency for each agent as the norm of the weighted sum of the disagreement dynamics with its neighbors. We define each agents coherency level as the norm of a weighted sum of its disagreement dynamics relative to its neighbors. The goal is to ensure that the networks coherency level remains below a prescribed threshold delta, without requiring any a priori knowledge of the disturbance.
comment: This paper is now submitted to the "International Journal of Robust and Nonlinear Control."
Resilient Voltage Estimation for Battery Packs Using Self-Learning Koopman Operator
Cloud-based battery management systems (BMSs) rely on real-time voltage measurement data to ensure coordinated bi-directional charging of electric vehicles (EVs) with vehicle-to-grid technology. Unfortunately, an adversary can corrupt the measurement data during transmission from the local-BMS to the cloud-BMS, leading to disrupted EV charging. Therefore, to ensure reliable voltage data under such sensor attacks, this paper proposes a two-stage error-corrected self-learning Koopman operator-based secure voltage estimation scheme for large-format battery packs. The first stage of correction compensates for the Koopman approximation error. The second stage aims to recover the error amassing from the lack of higher-order battery dynamics information in the self-learning feedback, using two alternative methods: an adaptable empirical strategy that uses cell-level knowledge of open circuit voltage to state-of-charge mapping for pack-level estimation, and a Gaussian process regression-based data-driven method that leverages minimal data-training. During our comprehensive case studies using the high-fidelity battery simulation package 'PyBaMM-liionpack', our proposed secure estimator reliably generated real-time voltage estimation with high accuracy under varying pack topologies, charging settings, battery age-levels, and attack policies. Thus, the scalable and adaptable algorithm can be easily employed to diverse battery configurations and operating conditions, without requiring significant modifications, excessive data or sensor redundancy, to ensure optimum charging of EVs under compromised sensing.
comment: 8 figures, 2 tables
Occlusion-Aware Consistent Model Predictive Control for Robot Navigation in Occluded Obstacle-Dense Environments
Ensuring safety and motion consistency for robot navigation in occluded, obstacle-dense environments is a critical challenge. In this context, this study presents an occlusion-aware Consistent Model Predictive Control (CMPC) strategy. To account for the occluded obstacles, it incorporates adjustable risk regions that represent their potential future locations. Subsequently, dynamic risk boundary constraints are developed online to enhance safety. Based on these constraints, the CMPC constructs multiple locally optimal trajectory branches (each tailored to different risk regions) to strike a balance between safety and performance. A shared consensus segment is generated to ensure smooth transitions between branches without significant velocity fluctuations, preserving motion consistency. To facilitate high computational efficiency and ensure coordination across local trajectories, we use the alternating direction method of multipliers (ADMM) to decompose the CMPC into manageable sub-problems for parallel solving. The proposed strategy is validated through simulations and real-world experiments on an Ackermann-steering robot platform. The results demonstrate the effectiveness of the proposed CMPC strategy through comparisons with baseline approaches in occluded, obstacle-dense environments.
Provably Optimal Reinforcement Learning under Safety Filtering
Recent advances in reinforcement learning (RL) enable its use on increasingly complex tasks, but the lack of formal safety guarantees still limits its application in safety-critical settings. A common practical approach is to augment the RL policy with a safety filter that overrides unsafe actions to prevent failures during both training and deployment. However, safety filtering is often perceived as sacrificing performance and hindering the learning process. We show that this perceived safety-performance tradeoff is not inherent and prove, for the first time, that enforcing safety with a sufficiently permissive safety filter does not degrade asymptotic performance. We formalize RL safety with a safety-critical Markov decision process (SC-MDP), which requires categorical, rather than high-probability, avoidance of catastrophic failure states. Additionally, we define an associated filtered MDP in which all actions result in safe effects, thanks to a safety filter that is considered to be a part of the environment. Our main theorem establishes that (i) learning in the filtered MDP is safe categorically, (ii) standard RL convergence carries over to the filtered MDP, and (iii) any policy that is optimal in the filtered MDP-when executed through the same filter-achieves the same asymptotic return as the best safe policy in the SC-MDP, yielding a complete separation between safety enforcement and performance optimization. We validate the theory on Safety Gymnasium with representative tasks and constraints, observing zero violations during training and final performance matching or exceeding unfiltered baselines. Together, these results shed light on a long-standing question in safety-filtered learning and provide a simple, principled recipe for safe RL: train and deploy RL policies with the most permissive safety filter that is available.
comment: Accepted for publication in the proceedings of The International Association for Safe & Ethical AI (IASEAI) 2026; 17 pages, 3 figures
HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba
End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. A concise six-term reward balances contact quality, swing smoothness, foot placement, posture, and body stability while implicitly promoting energy saving. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. To our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy.
comment: 12 pages
First-order friction models with bristle dynamics: lumped and distributed formulations
Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.
comment: 15 pages, 9 figures. Under review at IEEE Transactions on Control Systems Technology
Lateral tracking control of all-wheel steering vehicles with intelligent tires
The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.
comment: 16 pages, 12 figures. Under review at IEEE Transactions on Intelligent Vehicles
Quantitative Verification with Neural Networks
We present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.
comment: The conference version of this manuscript appeared at CONCUR 2023
Online Coreset Selection for Learning Dynamic Systems
With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.
Analysis of Control Bellman Residual Minimization for Markov Decision Problem
Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.
On Robust Fixed-Time Stabilization of the Cauchy Problem in Hilbert Spaces
This paper presents finite-time and fixed-time stabilization results for inhomogeneous abstract evolution problems, extending existing theories. We prove well-posedness for strong and weak solutions, and estimate upper bounds for settling times for both homogeneous and inhomogeneous systems. We generalize finite-dimensional results to infinite-dimensional systems and demonstrate partial state stabilization with actuation on a subset of the domain. The interest of these results are illustrated through an application of a heat equation with memory term.
comment: The paper has some problems in the proofs
Robotics
SAGE: Scalable Agentic 3D Scene Generation for Embodied AI
Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existing scene-generation systems often rely on rule-based or task-specific pipelines, yielding artifacts and physically invalid scenes. We present SAGE, an agentic framework that, given a user-specified embodied task (e.g., "pick up a bowl and place it on the table"), understands the intent and automatically generates simulation-ready environments at scale. The agent couples multiple generators for layout and object composition with critics that evaluate semantic plausibility, visual realism, and physical stability. Through iterative reasoning and adaptive tool selection, it self-refines the scenes until meeting user intent and physical validity. The resulting environments are realistic, diverse, and directly deployable in modern simulators for policy training. Policies trained purely on this data exhibit clear scaling trends and generalize to unseen objects and layouts, demonstrating the promise of simulation-driven scaling for embodied AI. Code, demos, and the SAGE-10k dataset can be found on the project page here: https://nvlabs.github.io/sage.
comment: Project Page: https://nvlabs.github.io/sage
Decoupled MPPI-Based Multi-Arm Motion Planning
Recent advances in sampling-based motion planning algorithms for high DOF arms leverage GPUs to provide SOTA performance. These algorithms can be used to control multiple arms jointly, but this approach scales poorly. To address this, we extend STORM, a sampling-based model-predictive-control (MPC) motion planning algorithm, to handle multiple robots in a distributed fashion. First, we modify STORM to handle dynamic obstacles. Then, we let each arm compute its own motion plan prefix, which it shares with the other arms, which treat it as a dynamic obstacle. Finally, we add a dynamic priority scheme. The new algorithm, MR-STORM, demonstrates clear empirical advantages over SOTA algorithms when operating with both static and dynamic obstacles.
Learning Agile Quadrotor Flight in the Real World
Learning-based controllers have achieved impressive performance in agile quadrotor flight but typically rely on massive training in simulation, necessitating accurate system identification for effective Sim2Real transfer. However, even with precise modeling, fixed policies remain susceptible to out-of-distribution scenarios, ranging from external aerodynamic disturbances to internal hardware degradation. To ensure safety under these evolving uncertainties, such controllers are forced to operate with conservative safety margins, inherently constraining their agility outside of controlled settings. While online adaptation offers a potential remedy, safely exploring physical limits remains a critical bottleneck due to data scarcity and safety risks. To bridge this gap, we propose a self-adaptive framework that eliminates the need for precise system identification or offline Sim2Real transfer. We introduce Adaptive Temporal Scaling (ATS) to actively explore platform physical limits, and employ online residual learning to augment a simple nominal model. {Based on the learned hybrid model, we further propose Real-world Anchored Short-horizon Backpropagation Through Time (RASH-BPTT) to achieve efficient and robust in-flight policy updates. Extensive experiments demonstrate that our quadrotor reliably executes agile maneuvers near actuator saturation limits. The system evolves a conservative base policy with a peak speed of 1.9 m/s to 7.3 m/s within approximately 100 seconds of flight time. These findings underscore that real-world adaptation serves not merely to compensate for modeling errors, but as a practical mechanism for sustained performance improvement in aggressive flight regimes.
ST4VLA: Spatially Guided Training for Vision-Language-Action Models ICLR 2026
Large vision-language models (VLMs) excel at multimodal understanding but fall short when extended to embodied tasks, where instructions must be transformed into low-level motor actions. We introduce ST4VLA, a dual-system Vision-Language-Action framework that leverages Spatial Guided Training to align action learning with spatial priors in VLMs. ST4VLA includes two stages: (i) spatial grounding pre-training, which equips the VLM with transferable priors via scalable point, box, and trajectory prediction from both web-scale and robot-specific data, and (ii) spatially guided action post-training, which encourages the model to produce richer spatial priors to guide action generation via spatial prompting. This design preserves spatial grounding during policy learning and promotes consistent optimization across spatial and action objectives. Empirically, ST4VLA achieves substantial improvements over vanilla VLA, with performance increasing from 66.1 -> 84.6 on Google Robot and from 54.7 -> 73.2 on WidowX Robot, establishing new state-of-the-art results on SimplerEnv. It also demonstrates stronger generalization to unseen objects and paraphrased instructions, as well as robustness to long-horizon perturbations in real-world settings. These results highlight scalable spatially guided training as a promising direction for robust, generalizable robot learning. Source code, data and models are released at https://internrobotics.github.io/internvla-m1.github.io/
comment: Spatially Training for VLA, Accepted by ICLR 2026
EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration
Human demonstrations offer rich environmental diversity and scale naturally, making them an appealing alternative to robot teleoperation. While this paradigm has advanced robot-arm manipulation, its potential for the more challenging, data-hungry problem of humanoid loco-manipulation remains largely unexplored. We present EgoHumanoid, the first framework to co-train a vision-language-action policy using abundant egocentric human demonstrations together with a limited amount of robot data, enabling humanoids to perform loco-manipulation across diverse real-world environments. To bridge the embodiment gap between humans and robots, including discrepancies in physical morphology and viewpoint, we introduce a systematic alignment pipeline spanning from hardware design to data processing. A portable system for scalable human data collection is developed, and we establish practical collection protocols to improve transferability. At the core of our human-to-humanoid alignment pipeline lies two key components. The view alignment reduces visual domain discrepancies caused by camera height and perspective variation. The action alignment maps human motions into a unified, kinematically feasible action space for humanoid control. Extensive real-world experiments demonstrate that incorporating robot-free egocentric data significantly outperforms robot-only baselines by 51\%, particularly in unseen environments. Our analysis further reveals which behaviors transfer effectively and the potential for scaling human data.
comment: Project page: https://opendrivelab.com/EgoHumanoid
DexImit: Learning Bimanual Dexterous Manipulation from Monocular Human Videos
Data scarcity fundamentally limits the generalization of bimanual dexterous manipulation, as real-world data collection for dexterous hands is expensive and labor-intensive. Human manipulation videos, as a direct carrier of manipulation knowledge, offer significant potential for scaling up robot learning. However, the substantial embodiment gap between human hands and robotic dexterous hands makes direct pretraining from human videos extremely challenging. To bridge this gap and unleash the potential of large-scale human manipulation video data, we propose DexImit, an automated framework that converts monocular human manipulation videos into physically plausible robot data, without any additional information. DexImit employs a four-stage generation pipeline: (1) reconstructing hand-object interactions from arbitrary viewpoints with near-metric scale; (2) performing subtask decomposition and bimanual scheduling; (3) synthesizing robot trajectories consistent with the demonstrated interactions; (4) comprehensive data augmentation for zero-shot real-world deployment. Building on these designs, DexImit can generate large-scale robot data based on human videos, either from the Internet or video generation models. DexImit is capable of handling diverse manipulation tasks, including tool use (e.g., cutting an apple), long-horizon tasks (e.g., making a beverage), and fine-grained manipulations (e.g., stacking cups).
Robo3R: Enhancing Robotic Manipulation with Accurate Feed-Forward 3D Reconstruction
3D spatial perception is fundamental to generalizable robotic manipulation, yet obtaining reliable, high-quality 3D geometry remains challenging. Depth sensors suffer from noise and material sensitivity, while existing reconstruction models lack the precision and metric consistency required for physical interaction. We introduce Robo3R, a feed-forward, manipulation-ready 3D reconstruction model that predicts accurate, metric-scale scene geometry directly from RGB images and robot states in real time. Robo3R jointly infers scale-invariant local geometry and relative camera poses, which are unified into the scene representation in the canonical robot frame via a learned global similarity transformation. To meet the precision demands of manipulation, Robo3R employs a masked point head for sharp, fine-grained point clouds, and a keypoint-based Perspective-n-Point (PnP) formulation to refine camera extrinsics and global alignment. Trained on Robo3R-4M, a curated large-scale synthetic dataset with four million high-fidelity annotated frames, Robo3R consistently outperforms state-of-the-art reconstruction methods and depth sensors. Across downstream tasks including imitation learning, sim-to-real transfer, grasp synthesis, and collision-free motion planning, we observe consistent gains in performance, suggesting the promise of this alternative 3D sensing module for robotic manipulation.
VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model
Pretraining Vision-Language-Action (VLA) policies on internet-scale video is appealing, yet current latent-action objectives often learn the wrong thing: they remain anchored to pixel variation rather than action-relevant state transitions, making them vulnerable to appearance bias, nuisance motion, and information leakage. We introduce VLA-JEPA, a JEPA-style pretraining framework that sidesteps these pitfalls by design. The key idea is \emph{leakage-free state prediction}: a target encoder produces latent representations from future frames, while the student pathway sees only the current observation -- future information is used solely as supervision targets, never as input. By predicting in latent space rather than pixel space, VLA-JEPA learns dynamics abstractions that are robust to camera motion and irrelevant background changes. This yields a simple two-stage recipe -- JEPA pretraining followed by action-head fine-tuning -- without the multi-stage complexity of prior latent-action pipelines. Experiments on LIBERO, LIBERO-Plus, SimplerEnv and real-world manipulation tasks show that VLA-JEPA achieves consistent gains in generalization and robustness over existing methods.
UniVTAC: A Unified Simulation Platform for Visuo-Tactile Manipulation Data Generation, Learning, and Benchmarking
Robotic manipulation has seen rapid progress with vision-language-action (VLA) policies. However, visuo-tactile perception is critical for contact-rich manipulation, as tasks such as insertion are difficult to complete robustly using vision alone. At the same time, acquiring large-scale and reliable tactile data in the physical world remains costly and challenging, and the lack of a unified evaluation platform further limits policy learning and systematic analysis. To address these challenges, we propose UniVTAC, a simulation-based visuo-tactile data synthesis platform that supports three commonly used visuo-tactile sensors and enables scalable and controllable generation of informative contact interactions. Based on this platform, we introduce the UniVTAC Encoder, a visuo-tactile encoder trained on large-scale simulation-synthesized data with designed supervisory signals, providing tactile-centric visuo-tactile representations for downstream manipulation tasks. In addition, we present the UniVTAC Benchmark, which consists of eight representative visuo-tactile manipulation tasks for evaluating tactile-driven policies. Experimental results show that integrating the UniVTAC Encoder improves average success rates by 17.1% on the UniVTAC Benchmark, while real-world robotic experiments further demonstrate a 25% improvement in task success. Our webpage is available at https://univtac.github.io/.
comment: Website: https://univtac.github.io/
Humanoid Factors: Design Principles for AI Humanoids in Human Worlds
Human factors research has long focused on optimizing environments, tools, and systems to account for human performance. Yet, as humanoid robots begin to share our workplaces, homes, and public spaces, the design challenge expands. We must now consider not only factors for humans but also factors for humanoids, since both will coexist and interact within the same environments. Unlike conventional machines, humanoids introduce expectations of human-like behavior, communication, and social presence, which reshape usability, trust, and safety considerations. In this article, we introduce the concept of humanoid factors as a framework structured around four pillars - physical, cognitive, social, and ethical - that shape the development of humanoids to help them effectively coexist and collaborate with humans. This framework characterizes the overlap and divergence between human capabilities and those of general-purpose humanoids powered by AI foundation models. To demonstrate our framework's practical utility, we then apply the framework to evaluate a real-world humanoid control algorithm, illustrating how conventional task completion metrics in robotics overlook key human cognitive and interaction principles. We thus position humanoid factors as a foundational framework for designing, evaluating, and governing sustained human-humanoid coexistence.
A Collision-Free Sway Damping Model Predictive Controller for Safe and Reactive Forestry Crane Navigation ICRA 2026
Forestry cranes operate in dynamic, unstructured outdoor environments where simultaneous collision avoidance and payload sway control are critical for safe navigation. Existing approaches address these challenges separately, either focusing on sway damping with predefined collision-free paths or performing collision avoidance only at the global planning level. We present the first collision-free, sway-damping model predictive controller (MPC) for a forestry crane that unifies both objectives in a single control framework. Our approach integrates LiDAR-based environment mapping directly into the MPC using online Euclidean distance fields (EDF), enabling real-time environmental adaptation. The controller simultaneously enforces collision constraints while damping payload sway, allowing it to (i) replan upon quasi-static environmental changes, (ii) maintain collision-free operation under disturbances, and (iii) provide safe stopping when no bypass exists. Experimental validation on a real forestry crane demonstrates effective sway damping and successful obstacle avoidance. A video can be found at https://youtu.be/tEXDoeLLTxA.
comment: Accepted at ICRA 2026
Perception with Guarantees: Certified Pose Estimation via Reachability Analysis
Agents in cyber-physical systems are increasingly entrusted with safety-critical tasks. Ensuring safety of these agents often requires localizing the pose for subsequent actions. Pose estimates can, e.g., be obtained from various combinations of lidar sensors, cameras, and external services such as GPS. Crucially, in safety-critical domains, a rough estimate is insufficient to formally determine safety, i.e., guaranteeing safety even in the worst-case scenario, and external services might additionally not be trustworthy. We address this problem by presenting a certified pose estimation in 3D solely from a camera image and a well-known target geometry. This is realized by formally bounding the pose, which is computed by leveraging recent results from reachability analysis and formal neural network verification. Our experiments demonstrate that our approach efficiently and accurately localizes agents in both synthetic and real-world experiments.
RoboSubtaskNet: Temporal Sub-task Segmentation for Human-to-Robot Skill Transfer in Real-World Environments
Temporally locating and classifying fine-grained sub-task segments in long, untrimmed videos is crucial to safe human-robot collaboration. Unlike generic activity recognition, collaborative manipulation requires sub-task labels that are directly robot-executable. We present RoboSubtaskNet, a multi-stage human-to-robot sub-task segmentation framework that couples attention-enhanced I3D features (RGB plus optical flow) with a modified MS-TCN employing a Fibonacci dilation schedule to capture better short-horizon transitions such as reach-pick-place. The network is trained with a composite objective comprising cross-entropy and temporal regularizers (truncated MSE and a transition-aware term) to reduce over-segmentation and to encourage valid sub-task progressions. To close the gap between vision benchmarks and control, we introduce RoboSubtask, a dataset of healthcare and industrial demonstrations annotated at the sub-task level and designed for deterministic mapping to manipulator primitives. Empirically, RoboSubtaskNet outperforms MS-TCN and MS-TCN++ on GTEA and our RoboSubtask benchmark (boundary-sensitive and sequence metrics), while remaining competitive on the long-horizon Breakfast benchmark. Specifically, RoboSubtaskNet attains F1 @ 50 = 79.5%, Edit = 88.6%, Acc = 78.9% on GTEA; F1 @ 50 = 30.4%, Edit = 52.0%, Acc = 53.5% on Breakfast; and F1 @ 50 = 94.2%, Edit = 95.6%, Acc = 92.2% on RoboSubtask. We further validate the full perception-to-execution pipeline on a 7-DoF Kinova Gen3 manipulator, achieving reliable end-to-end behavior in physical trials (overall task success approx 91.25%). These results demonstrate a practical path from sub-task level video understanding to deployed robotic manipulation in real-world settings.
Learning Force-Regulated Manipulation with a Low-Cost Tactile-Force-Controlled Gripper
Successfully manipulating many everyday objects, such as potato chips, requires precise force regulation. Failure to modulate force can lead to task failure or irreversible damage to the objects. Humans can precisely achieve this by adapting force from tactile feedback, even within a short period of physical contact. We aim to give robots this capability. However, commercial grippers exhibit high cost or high minimum force, making them unsuitable for studying force-controlled policy learning with everyday force-sensitive objects. We introduce TF-Gripper, a low-cost (~$150) force-controlled parallel-jaw gripper that integrates tactile sensing as feedback. It has an effective force range of 0.45-45N and is compatible with different robot arms. Additionally, we designed a teleoperation device paired with TF-Gripper to record human-applied grasping forces. While standard low-frequency policies can be trained on this data, they struggle with the reactive, contact-dependent nature of force regulation. To overcome this, we propose RETAF (REactive Tactile Adaptation of Force), a framework that decouples grasping force control from arm pose prediction. RETAF regulates force at high frequency using wrist images and tactile feedback, while a base policy predicts end-effector pose and gripper open/close action. We evaluate TF-Gripper and RETAF across five real-world tasks requiring precise force regulation. Results show that compared to position control, direct force control significantly improves grasp stability and task performance. We further show that tactile feedback is essential for force regulation, and that RETAF consistently outperforms baselines and can be integrated with various base policies. We hope this work opens a path for scaling the learning of force-controlled policies in robotic manipulation. Project page: https://force-gripper.github.io .
comment: 10 pages, 11 figures
A Collaborative Safety Shield for Safe and Efficient CAV Lane Changes in Congested On-Ramp Merging
Lane changing in dense traffic is a significant challenge for Connected and Autonomous Vehicles (CAVs). Existing lane change controllers primarily either ensure safety or collaboratively improve traffic efficiency, but do not consider these conflicting objectives together. To address this, we propose the Multi-Agent Safety Shield (MASS), designed using Control Barrier Functions (CBFs) to enable safe and collaborative lane changes. The MASS enables collaboration by capturing multi-agent interactions among CAVs through interaction topologies constructed as a graph using a simple algorithm. Further, a state-of-the-art Multi-Agent Reinforcement Learning (MARL) lane change controller is extended by integrating MASS to ensure safety and defining a customised reward function to prioritise efficiency improvements. As a result, we propose a lane change controller, known as MARL-MASS, and evaluate it in a congested on-ramp merging simulation. The results demonstrate that MASS enables collaborative lane changes with safety guarantees by strictly respecting the safety constraints. Moreover, the proposed custom reward function improves the stability of MARL policies trained with a safety shield. Overall, by encouraging the exploration of a collaborative lane change policy while respecting safety constraints, MARL-MASS effectively balances the trade-off between ensuring safety and improving traffic efficiency in congested traffic. The code for MARL-MASS is available with an open-source licence at https://github.com/hkbharath/MARL-MASS
comment: Accepted in IEEE IV 2026
Acoustic Drone Package Delivery Detection
In recent years, the illicit use of unmanned aerial vehicles (UAVs) for deliveries in restricted area such as prisons became a significant security challenge. While numerous studies have focused on UAV detection or localization, little attention has been given to delivery events identification. This study presents the first acoustic package delivery detection algorithm using a ground-based microphone array. The proposed method estimates both the drone's propeller speed and the delivery event using solely acoustic features. A deep neural network detects the presence of a drone and estimates the propeller's rotation speed or blade passing frequency (BPF) from a mel spectrogram. The algorithm analyzes the BPFs to identify probable delivery moments based on sudden changes before and after a specific time. Results demonstrate a mean absolute error of the blade passing frequency estimator of 16 Hz when the drone is less than 150 meters away from the microphone array. The drone presence detection estimator has a accuracy of 97%. The delivery detection algorithm correctly identifies 96% of events with a false positive rate of 8%. This study shows that deliveries can be identified using acoustic signals up to a range of 100 meters.
RoboInter: A Holistic Intermediate Representation Suite Towards Robotic Manipulation ICLR 2026
Advances in large vision-language models (VLMs) have stimulated growing interest in vision-language-action (VLA) systems for robot manipulation. However, existing manipulation datasets remain costly to curate, highly embodiment-specific, and insufficient in coverage and diversity, thereby hindering the generalization of VLA models. Recent approaches attempt to mitigate these limitations via a plan-then-execute paradigm, where high-level plans (e.g., subtasks, trace) are first generated and subsequently translated into low-level actions, but they critically rely on extra intermediate supervision, which is largely absent from existing datasets. To bridge this gap, we introduce the RoboInter Manipulation Suite, a unified resource including data, benchmarks, and models of intermediate representations for manipulation. It comprises RoboInter-Tool, a lightweight GUI that enables semi-automatic annotation of diverse representations, and RoboInter-Data, a large-scale dataset containing over 230k episodes across 571 diverse scenes, which provides dense per-frame annotations over more than 10 categories of intermediate representations, substantially exceeding prior work in scale and annotation quality. Building upon this foundation, RoboInter-VQA introduces 9 spatial and 20 temporal embodied VQA categories to systematically benchmark and enhance the embodied reasoning capabilities of VLMs. Meanwhile, RoboInter-VLA offers an integrated plan-then-execute framework, supporting modular and end-to-end VLA variants that bridge high-level planning with low-level execution via intermediate supervision. In total, RoboInter establishes a practical foundation for advancing robust and generalizable robotic learning via fine-grained and diverse intermediate representations.
comment: Published to ICLR 2026, 69 pages, 40 figures
Hydra-Nav: Object Navigation via Adaptive Dual-Process Reasoning
While large vision-language models (VLMs) show promise for object goal navigation, current methods still struggle with low success rates and inefficient localization of unseen objects--failures primarily attributed to weak temporal-spatial reasoning. Meanwhile, recent attempts to inject reasoning into VLM-based agents improve success rates but incur substantial computational overhead. To address both the ineffectiveness and inefficiency of existing approaches, we introduce Hydra-Nav, a unified VLM architecture that adaptively switches between a deliberative slow system for analyzing exploration history and formulating high-level plans, and a reactive fast system for efficient execution. We train Hydra-Nav through a three-stage curriculum: (i) spatial-action alignment to strengthen trajectory planning, (ii) memory-reasoning integration to enhance temporal-spatial reasoning over long-horizon exploration, and (iii) iterative rejection fine-tuning to enable selective reasoning at critical decision points. Extensive experiments demonstrate that Hydra-Nav achieves state-of-the-art performance on the HM3D, MP3D, and OVON benchmarks, outperforming the second-best methods by 11.1%, 17.4%, and 21.2%, respectively. Furthermore, we introduce SOT (Success weighted by Operation Time), a new metric to measure search efficiency across VLMs with varying reasoning intensity. Results show that adaptive reasoning significantly enhances search efficiency over fixed-frequency baselines.
Instruct2Act: From Human Instruction to Actions Sequencing and Execution via Robot Action Network for Robotic Manipulation
Robots often struggle to follow free-form human instructions in real-world settings due to computational and sensing limitations. We address this gap with a lightweight, fully on-device pipeline that converts natural-language commands into reliable manipulation. Our approach has two stages: (i) the instruction to actions module (Instruct2Act), a compact BiLSTM with a multi-head-attention autoencoder that parses an instruction into an ordered sequence of atomic actions (e.g., reach, grasp, move, place); and (ii) the robot action network (RAN), which uses the dynamic adaptive trajectory radial network (DATRN) together with a vision-based environment analyzer (YOLOv8) to generate precise control trajectories for each sub-action. The entire system runs on a modest system with no cloud services. On our custom proprietary dataset, Instruct2Act attains 91.5% sub-actions prediction accuracy while retaining a small footprint. Real-robot evaluations across four tasks (pick-place, pick-pour, wipe, and pick-give) yield an overall 90% success; sub-action inference completes in < 3.8s, with end-to-end executions in 30-60s depending on task complexity. These results demonstrate that fine-grained instruction-to-action parsing, coupled with DATRN-based trajectory generation and vision-guided grounding, provides a practical path to deterministic, real-time manipulation in resource-constrained, single-camera settings.
TaCo: A Benchmark for Lossless and Lossy Codecs of Heterogeneous Tactile Data
Tactile sensing is crucial for embodied intelligence, providing fine-grained perception and control in complex environments. However, efficient tactile data compression, which is essential for real-time robotic applications under strict bandwidth constraints, remains underexplored. The inherent heterogeneity and spatiotemporal complexity of tactile data further complicate this challenge. To bridge this gap, we introduce TaCo, the first comprehensive benchmark for Tactile data Codecs. TaCo evaluates 30 compression methods, including off-the-shelf compression algorithms and neural codecs, across five diverse datasets from various sensor types. We systematically assess both lossless and lossy compression schemes on four key tasks: lossless storage, human visualization, material and object classification, and dexterous robotic grasping. Notably, we pioneer the development of data-driven codecs explicitly trained on tactile data, TaCo-LL (lossless) and TaCo-L (lossy). Results have validated the superior performance of our TaCo-LL and TaCo-L. This benchmark provides a foundational framework for understanding the critical trade-offs between compression efficiency and task performance, paving the way for future advances in tactile perception.
comment: 27 pages
TriPilot-FF: Coordinated Whole-Body Teleoperation with Force Feedback
Mobile manipulators broaden the operational envelope for robot manipulation. However, the whole-body teleoperation of such robots remains a problem: operators must coordinate a wheeled base and two arms while reasoning about obstacles and contact. Existing interfaces are predominantly hand-centric (e.g., VR controllers and joysticks), leaving foot-operated channels underexplored for continuous base control. We present TriPilot-FF, an open-source whole-body teleoperation system for a custom bimanual mobile manipulator that introduces a foot-operated pedal with lidar-driven pedal haptics, coupled with upper-body bimanual leader-follower teleoperation. Using only a low-cost base-mounted lidar, TriPilot-FF renders a resistive pedal cue from proximity-to-obstacle signals in the commanded direction, shaping operator commands toward collision-averse behaviour without an explicit collision-avoidance controller. The system also supports arm-side force reflection for contact awareness and provides real-time force and visual guidance of bimanual manipulability to prompt mobile base repositioning, thereby improving reach. We demonstrate the capability of TriPilot-FF to effectively ``co-pilot'' the human operator over long time-horizons and tasks requiring precise mobile base movement and coordination. Finally, we incorporate teleoperation feedback signals into an Action Chunking with Transformers (ACT) policy and demonstrate improved performance when the additional information is available. We release the pedal device design, full software stack, and conduct extensive real-world evaluations on a bimanual wheeled platform. The project page of TriPilot-FF is http://bit.ly/46H3ZJT.
BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation
Equipping embodied agents with the ability to reason about tasks, foresee physical outcomes, and generate precise actions is essential for general-purpose manipulation. While recent Vision-Language-Action (VLA) models have leveraged pre-trained foundation models, they typically focus on either linguistic planning or visual forecasting in isolation. These methods rarely integrate both capabilities simultaneously to guide action generation, leading to suboptimal performance in complex, long-horizon manipulation tasks. To bridge this gap, we propose BagelVLA, a unified model that integrates linguistic planning, visual forecasting, and action generation within a single framework. Initialized from a pretrained unified understanding and generative model, BagelVLA is trained to interleave textual reasoning and visual prediction directly into the action execution loop. To efficiently couple these modalities, we introduce Residual Flow Guidance (RFG), which initializes from current observation and leverages single-step denoising to extract predictive visual features, guiding action generation with minimal latency. Extensive experiments demonstrate that BagelVLA outperforms existing baselines by a significant margin on multiple simulated and real-world benchmarks, particularly in tasks requiring multi-stage reasoning.
Design and Evaluation of an Assisted Programming Interface for Behavior Trees in Robotics
The possibility to create reactive robot programs faster without the need for extensively trained programmers is becoming increasingly important. So far, it has not been explored how various techniques for creating Behavior Tree (BT) program representations could be combined with complete graphical user interfaces (GUIs) to allow a human user to validate and edit trees suggested by automated methods. In this paper, we introduce BEhavior TRee GUI (BETR-GUI) for creating BTs with the help of an AI assistant that combines methods using large language models, planning, genetic programming, and Bayesian optimization with a drag-and-drop editor. A user study with 60 participants shows that by combining different assistive methods, BETR-GUI enables users to perform better at solving the robot programming tasks. The results also show that humans using the full variant of BETR-GUI perform better than the AI assistant running on its own.
Diverse Skill Discovery for Quadruped Robots via Unsupervised Learning
Reinforcement learning necessitates meticulous reward shaping by specialists to elicit target behaviors, while imitation learning relies on costly task-specific data. In contrast, unsupervised skill discovery can potentially reduce these burdens by learning a diverse repertoire of useful skills driven by intrinsic motivation. However, existing methods exhibit two key limitations: they typically rely on a single policy to master a versatile repertoire of behaviors without modeling the shared structure or distinctions among them, which results in low learning efficiency; moreover, they are susceptible to reward hacking, where the reward signal increases and converges rapidly while the learned skills display insufficient actual diversity. In this work, we introduce an Orthogonal Mixture-of-Experts (OMoE) architecture that prevents diverse behaviors from collapsing into overlapping representations, enabling a single policy to master a wide spectrum of locomotion skills. In addition, we design a multi-discriminator framework in which different discriminators operate on distinct observation spaces, effectively mitigating reward hacking. We evaluated our method on the 12-DOF Unitree A1 quadruped robot, demonstrating a diverse set of locomotion skills. Our experiments demonstrate that the proposed framework boosts training efficiency and yields an 18.3\% expansion in state-space coverage compared to the baseline.
NavDreamer: Video Models as Zero-Shot 3D Navigators
Previous Vision-Language-Action models face critical limitations in navigation: scarce, diverse data from labor-intensive collection and static representations that fail to capture temporal dynamics and physical laws. We propose NavDreamer, a video-based framework for 3D navigation that leverages generative video models as a universal interface between language instructions and navigation trajectories. Our main hypothesis is that video's ability to encode spatiotemporal information and physical dynamics, combined with internet-scale availability, enables strong zero-shot generalization in navigation. To mitigate the stochasticity of generative predictions, we introduce a sampling-based optimization method that utilizes a VLM for trajectory scoring and selection. An inverse dynamics model is employed to decode executable waypoints from generated video plans for navigation. To systematically evaluate this paradigm in several video model backbones, we introduce a comprehensive benchmark covering object navigation, precise navigation, spatial grounding, language control, and scene reasoning. Extensive experiments demonstrate robust generalization across novel objects and unseen environments, with ablation studies revealing that navigation's high-level decision-making nature makes it particularly suited for video-based planning.
comment: Work in the progress. 22 pages, 15 figures
Rethinking Visual-Language-Action Model Scaling: Alignment, Mixture, and Regularization
While Vision-Language-Action (VLA) models show strong promise for generalist robot control, it remains unclear whether -- and under what conditions -- the standard "scale data" recipe translates to robotics, where training data is inherently heterogeneous across embodiments, sensors, and action spaces. We present a systematic, controlled study of VLA scaling that revisits core training choices for pretraining across diverse robots. Using a representative VLA framework that combines a vision-language backbone with flow-matching, we ablate key design decisions under matched conditions and evaluate in extensive simulation and real-robot experiments. To improve the reliability of real-world results, we introduce a Grouped Blind Ensemble protocol that blinds operators to model identity and separates policy execution from outcome judgment, reducing experimenter bias. Our analysis targets three dimensions of VLA scaling. (1) Physical alignment: we show that a unified end-effector (EEF)-relative action representation is critical for robust cross-embodiment transfer. (2) Embodiment mixture: we find that naively pooling heterogeneous robot datasets often induces negative transfer rather than gains, underscoring the fragility of indiscriminate data scaling. (3) Training regularization: we observe that intuitive strategies, such as sensory dropout and multi-stage fine-tuning, do not consistently improve performance at scale. Together, this study challenge some common assumptions about embodied scaling and provide practical guidance for training large-scale VLA policies from diverse robotic data. Project website: https://research.beingbeyond.com/rethink_vla
Fast Motion Planning for Non-Holonomic Mobile Robots via a Rectangular Corridor Representation of Structured Environments ICRA 2026
We present a complete framework for fast motion planning of non-holonomic autonomous mobile robots in highly complex but structured environments. Conventional grid-based planners struggle with scalability, while many kinematically-feasible planners impose a significant computational burden due to their search space complexity. To overcome these limitations, our approach introduces a deterministic free-space decomposition that creates a compact graph of overlapping rectangular corridors. This method enables a significant reduction in the search space, without sacrificing path resolution. The framework then performs online motion planning by finding a sequence of rectangles and generating a near-time-optimal, kinematically-feasible trajectory using an analytical planner. The result is a highly efficient solution for large-scale navigation. We validate our framework through extensive simulations and on a physical robot. The implementation is publicly available as open-source software.
comment: ICRA 2026
RANT: Ant-Inspired Multi-Robot Rainforest Exploration Using Particle Filter Localisation and Virtual Pheromone Coordination
This paper presents RANT, an ant-inspired multi-robot exploration framework for noisy, uncertain environments. A team of differential-drive robots navigates a 10 x 10 m terrain, collects noisy probe measurements of a hidden richness field, and builds local probabilistic maps while the supervisor maintains a global evaluation. RANT combines particle-filter localisation, a behaviour-based controller with gradient-driven hotspot exploitation, and a lightweight no-revisit coordination mechanism based on virtual pheromone blocking. We experimentally analyse how team size, localisation fidelity, and coordination influence coverage, hotspot recall, and redundancy. Results show that particle filtering is essential for reliable hotspot engagement, coordination substantially reduces overlap, and increasing team size improves coverage but yields diminishing returns due to interference.
comment: 7 pages, IEEEtran format
AutoFly: Vision-Language-Action Model for UAV Autonomous Navigation in the Wild ICLR 2026
Vision-language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned aerial vehicles (UAVs) relies on detailed, pre-specified instructions to guide the UAV along predetermined routes. However, real-world outdoor exploration typically occurs in unknown environments where detailed navigation instructions are unavailable. Instead, only coarse-grained positional or directional guidance can be provided, requiring UAVs to autonomously navigate through continuous planning and obstacle avoidance. To bridge this gap, we propose AutoFly, an end-to-end Vision-Language-Action (VLA) model for autonomous UAV navigation. AutoFly incorporates a pseudo-depth encoder that derives depth-aware features from RGB inputs to enhance spatial reasoning, coupled with a progressive two-stage training strategy that effectively aligns visual, depth, and linguistic representations with action policies. Moreover, existing VLN datasets have fundamental limitations for real-world autonomous navigation, stemming from their heavy reliance on explicit instruction-following over autonomous decision-making and insufficient real-world data. To address these issues, we construct a novel autonomous navigation dataset that shifts the paradigm from instruction-following to autonomous behavior modeling through: (1) trajectory collection emphasizing continuous obstacle avoidance, autonomous planning, and recognition workflows; (2) comprehensive real-world data integration. Experimental results demonstrate that AutoFly achieves a 3.9% higher success rate compared to state-of-the-art VLA baselines, with consistent performance across simulated and real environments.
comment: Acceped by ICLR 2026
TeleGate: Whole-Body Humanoid Teleoperation via Gated Expert Selection with Motion Prior
Real-time whole-body teleoperation is a critical method for humanoid robots to perform complex tasks in unstructured environments. However, developing a unified controller that robustly supports diverse human motions remains a significant challenge. Existing methods typically distill multiple expert policies into a single general policy, which often inevitably leads to performance degradation, particularly on highly dynamic motions. This paper presents TeleGate, a unified whole-body teleoperation framework for humanoid robots that achieves high-precision tracking across various motions while avoiding the performance loss inherent in knowledge distillation. Our key idea is to preserve the full capability of domain-specific expert policies by training a lightweight gating network, which dynamically activates experts in real-time based on proprioceptive states and reference trajectories. Furthermore, to compensate for the absence of future reference trajectories in real-time teleoperation, we introduce a VAE-based motion prior module that extracts implicit future motion intent from historical observations, enabling anticipatory control for motions requiring prediction such as jumping and standing up. We conducted empirical evaluations in simulation and also deployed our technique on the Unitree G1 humanoid robot. Using only 2.5 hours of motion capture data for training, our TeleGate achieves high-precision real-time teleoperation across diverse dynamic motions (e.g., running, fall recovery, and jumping), significantly outperforming the baseline methods in both tracking accuracy and success rate.
AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception ICLR 2026
Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained tactile temporal dynamics during physical interactions. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale hierarchical tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive tactile dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities-from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks.
comment: Accepted by ICLR 2026
Preference Aligned Visuomotor Diffusion Policies for Deformable Object Manipulation
Humans naturally develop preferences for how manipulation tasks should be performed, which are often subtle, personal, and difficult to articulate. Although it is important for robots to account for these preferences to increase personalization and user satisfaction, they remain largely underexplored in robotic manipulation, particularly in the context of deformable objects like garments and fabrics. In this work, we study how to adapt pretrained visuomotor diffusion policies to reflect preferred behaviors using limited demonstrations. We introduce RKO, a novel preference-alignment method that combines the benefits of two recent frameworks: RPO and KTO. We evaluate RKO against common preference learning frameworks, including these two, as well as a baseline vanilla diffusion policy, on real-world cloth-folding tasks spanning multiple garments and preference settings. We show that preference-aligned policies (particularly RKO) achieve superior performance and sample efficiency compared to standard diffusion policy fine-tuning. These results highlight the importance and feasibility of structured preference learning for scaling personalized robot behavior in complex deformable object manipulation tasks.
Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows
Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based policies, while expressive, do not permit conservative likelihood-based updates during fine-tuning because action probabilities are intractable. In contrast, conventional Gaussian policies collapse under multimodality, particularly when actions are executed in chunks, and standard per-step critics fail to align with chunked execution, leading to poor credit assignment. We present SOFT-FLOW, a sample-efficient off-policy fine-tuning framework with normalizing flow (NF) to address these challenges. The normalizing flow policy yields exact likelihoods for multimodal action chunks, allowing conservative, stable policy updates through likelihood regularization and thereby improving sample efficiency. An action-chunked critic evaluates entire action sequences, aligning value estimation with the policy's temporal structure and improving long-horizon credit assignment. To our knowledge, this is the first demonstration of a likelihood-based, multimodal generative policy combined with chunk-level value learning on real robotic hardware. We evaluate SOFT-FLOW on two challenging dexterous manipulation tasks in the real world: cutting tape with scissors retrieved from a case, and in-hand cube rotation with a palm-down grasp -- both of which require precise, dexterous control over long horizons. On these tasks, SOFT-FLOW achieves stable, sample-efficient adaptation where standard methods struggle.
Optimal Control of Microswimmers for Trajectory Tracking Using Bayesian Optimization
Trajectory tracking for microswimmers remains a key challenge in microrobotics, where low-Reynolds-number dynamics make control design particularly complex. In this work, we formulate the trajectory tracking problem as an optimal control problem and solve it using a combination of B-spline parametrization with Bayesian optimization, allowing the treatment of high computational costs without requiring complex gradient computations. Applied to a flagellated magnetic swimmer, the proposed method reproduces a variety of target trajectories, including biologically inspired paths observed in experimental studies. We further evaluate the approach on a three-sphere swimmer model, demonstrating that it can adapt to and partially compensate for wall-induced hydrodynamic effects. The proposed optimization strategy can be applied consistently across models of different fidelity, from low-dimensional ODE-based models to high-fidelity PDE-based simulations, showing its robustness and generality. These results highlight the potential of Bayesian optimization as a versatile tool for optimal control strategies in microscale locomotion under complex fluid-structure interactions.
comment: 16 pages, 16 figures
LLM-Grounded Dynamic Task Planning with Hierarchical Temporal Logic for Human-Aware Multi-Robot Collaboration
While Large Language Models (LLM) enable non-experts to specify open-world multi-robot tasks, the generated plans often lack kinematic feasibility and are not efficient, especially in long-horizon scenarios. Formal methods like Linear Temporal Logic (LTL) offer correctness and optimal guarantees, but are typically confined to static, offline settings and struggle with computational scalability. To bridge this gap, we propose a neuro-symbolic framework that grounds LLM reasoning into hierarchical LTL specifications and solves the corresponding Simultaneous Task Allocation and Planning (STAP) problem. Unlike static approaches, our system resolves stochastic environmental changes, such as moving users or updated instructions via a receding horizon planning (RHP) loop with real-time perception, which dynamically refines plans through a hierarchical state space. Extensive real-world experiments demonstrate that our approach significantly outperforms baseline methods in success rate and interaction fluency while minimizing planning latency.
Sci-VLA: Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments
Robotic laboratories play a critical role in autonomous scientific discovery by enabling scalable, continuous experimental execution. Recent vision-language-action (VLA) models offer a promising foundation for robotic laboratories. However, scientific experiments typically involve long-horizon tasks composed of multiple atomic tasks, posing a fundamental challenge to existing VLA models. While VLA models fine-tuned for scientific tasks can reliably execute atomic experimental actions seen during training, they often fail to perform composite tasks formed by reordering and composing these known atomic actions. This limitation arises from a distributional mismatch between training-time atomic tasks and inference-time composite tasks, which prevents VLA models from executing necessary transitional operations between atomic tasks. To address this challenge, we propose an Agentic VLA Inference Plugin for Long-Horizon Tasks in Scientific Experiments. It introduces an LLM-based agentic inference mechanism that intervenes when executing sequential manipulation tasks. By performing explicit transition inference and generating transitional robotic action code, the proposed plugin guides VLA models through missing transitional steps, enabling reliable execution of composite scientific workflows without any additional training. This inference-only intervention makes our method computationally efficient, data-efficient, and well-suited for open-ended and long-horizon robotic laboratory tasks. We build 3D assets of scientific instruments and common scientific operating scenes within an existing simulation environment. In these scenes, we have verified that our method increases the average success rate per atomic task by 42\% during inference. Furthermore, we show that our method can be easily transferred from the simulation to real scientific laboratories.
First-order friction models with bristle dynamics: lumped and distributed formulations
Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.
comment: 15 pages, 9 figures. Under review at IEEE Transactions on Control Systems Technology
Lateral tracking control of all-wheel steering vehicles with intelligent tires
The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.
comment: 17 pages, 12 figures. Under review at IEEE Transactions on Intelligent Vehicles
Finite-time Stable Pose Estimation on TSE(3) using Point Cloud and Velocity Sensors
This work presents a finite-time stable pose estimator (FTS-PE) for rigid bodies undergoing rotational and translational motion in three dimensions, using measurements from onboard sensors that provide position vectors to inertially-fixed points and body velocities. The FTS-PE is a full-state observer for the pose (position and orientation) and velocities and is obtained through a Lyapunov analysis that shows its stability in finite time and its robustness to bounded measurement noise. Further, this observer is designed directly on the state space, the tangent bundle of the Lie group of rigid body motions, SE(3), without using local coordinates or (dual) quaternion representations. Therefore, it can estimate arbitrary rigid body motions without encountering singularities or the unwinding phenomenon and be readily applied to autonomous vehicles. A version of this observer that does not need translational velocity measurements and uses only point clouds and angular velocity measurements from rate gyros, is also obtained. It is discretized using the framework of geometric mechanics for numerical and experimental implementations. The numerical simulations compare the FTS-PE with a dual-quaternion extended Kalman filter and our previously developed variational pose estimator (VPE). The experimental results are obtained using point cloud images and rate gyro measurements obtained from a Zed 2i stereo depth camera sensor. These results validate the stability and robustness of the FTS-PE.
comment: 17 pages, 8 figures, submitted to Automatica
Phase-Aware Policy Learning for Skateboard Riding of Quadruped Robots via Feature-wise Linear Modulation ICRA 2026
Skateboards offer a compact and efficient means of transportation as a type of personal mobility device. However, controlling them with legged robots poses several challenges for policy learning due to perception-driven interactions and multi-modal control objectives across distinct skateboarding phases. To address these challenges, we introduce Phase-Aware Policy Learning (PAPL), a reinforcement-learning framework tailored for skateboarding with quadruped robots. PAPL leverages the cyclic nature of skateboarding by integrating phase-conditioned Feature-wise Linear Modulation layers into actor and critic networks, enabling a unified policy that captures phase-dependent behaviors while sharing robot-specific knowledge across phases. Our evaluations in simulation validate command-tracking accuracy and conduct ablation studies quantifying each component's contribution. We also compare locomotion efficiency against leg and wheel-leg baselines and show real-world transferability.
comment: Accepted at ICRA 2026. Supplementary Video: https://www.youtube.com/watch?v=bCNfdQ3RYKg. M. Yoon and J. Jeong contributed equally
Certified Gradient-Based Contact-Rich Manipulation via Smoothing-Error Reachable Tubes
Gradient-based methods can efficiently optimize controllers using physical priors and differentiable simulators, but contact-rich manipulation remains challenging due to discontinuous or vanishing gradients from hybrid contact dynamics. Smoothing the dynamics yields continuous gradients, but the resulting model mismatch can cause controller failures when executed on real systems. We address this trade-off by planning with smoothed dynamics while explicitly quantifying and compensating for the induced errors, providing formal guarantees of constraint satisfaction and goal reachability on the true hybrid dynamics. Our method smooths both contact dynamics and geometry via a novel differentiable simulator based on convex optimization, which enables us to characterize the discrepancy from the true dynamics as a set-valued deviation. This deviation constrains the optimization of time-varying affine feedback policies through analytical bounds on the system's reachable set, enabling robust constraint satisfaction guarantees for the true closed-loop hybrid dynamics, while relying solely on informative gradients from the smoothed dynamics. We evaluate our method on several contact-rich tasks, including planar pushing, object rotation, and in-hand dexterous manipulation, achieving guaranteed constraint satisfaction with lower safety violation and goal error than baselines. By bridging differentiable physics with set-valued robust control, our method is the first certifiable gradient-based policy synthesis method for contact-rich manipulation.
CAPER: Constrained and Procedural Reasoning for Robotic Scientific Experiments
Robotic assistance in scientific laboratories requires procedurally correct long-horizon manipulation, reliable execution under limited supervision, and robustness in low-demonstration regimes. Such conditions greatly challenge end-to-end vision-language-action (VLA) models, whose assumptions of recoverable errors and data-driven policy learning often break down in protocol-sensitive experiments. We propose CAPER, a framework for Constrained And ProcEdural Reasoning for robotic scientific experiments, which explicitly restricts where learning and reasoning occur in the planning and control pipeline. Rather than strengthening end-to-end policies, CAPER enforces a responsibility-separated structure: task-level reasoning generates procedurally valid action sequences under explicit constraints, mid-level multimodal grounding realizes subtasks without delegating spatial decision-making to large language models, and low-level control adapts to physical uncertainty via reinforcement learning with minimal demonstrations. By encoding procedural commitments through interpretable intermediate representations, CAPER prevents execution-time violations of experimental logic, improving controllability, robustness, and data efficiency. Experiments on a scientific workflow benchmark and a public long-horizon manipulation dataset demonstrate consistent improvements in success rate and procedural correctness, particularly in low-data and long-horizon settings.
Disambiguating Anthropomorphism and Anthropomimesis in Human-Robot Interaction
In this preliminary work, we offer an initial disambiguation of the theoretical concepts anthropomorphism and anthropomimesis in Human-Robot Interaction (HRI) and social robotics. We define anthropomorphism as users perceiving human-like qualities in robots, and anthropomimesis as robot developers designing human-like features into robots. This contribution aims to provide a clarification and exploration of these concepts for future HRI scholarship, particularly regarding the party responsible for human-like qualities - robot perceiver for anthropomorphism, and robot designer for anthropomimesis. We provide this contribution so that researchers can build on these disambiguated theoretical concepts for future robot design and evaluation.
comment: 6 pages, 0 figures. Accepted at Human-Robot Interaction (HRI) conference (ACM/IEEE) 2026, Late-Breaking Reports
Solving Geodesic Equations with Composite Bernstein Polynomials for Trajectory Planning
This work presents a trajectory planning method based on composite Bernstein polynomials for autonomous systems navigating complex environments. The method is implemented in a symbolic optimization framework that enables continuous paths and precise control over trajectory shape. Trajectories are planned over a cost surface that encodes obstacles as continuous fields rather than discrete boundaries. Regions near obstacles are assigned higher costs, naturally encouraging the trajectory to maintain a safe distance while still allowing efficient routing through constrained spaces. The use of composite Bernstein polynomials preserves continuity while enabling fine control over local curvature to satisfy geodesic constraints. The symbolic representation supports exact derivatives, improving optimization efficiency. The method applies to both two- and three-dimensional environments and is suitable for ground, aerial, underwater, and space systems. In spacecraft trajectory planning, for example, it enables the generation of continuous, dynamically feasible trajectories with high numerical efficiency, making it well suited for orbital maneuvers, rendezvous and proximity operations, cluttered gravitational environments, and planetary exploration missions with limited onboard computational resources. Demonstrations show that the approach efficiently generates smooth, collision-free paths in scenarios with multiple obstacles, maintaining clearance without extensive sampling or post-processing. The optimization incorporates three constraint types: (1) a Gaussian surface inequality enforcing minimum obstacle clearance; (2) geodesic equations guiding the path along locally efficient directions on the cost surface; and (3) boundary constraints enforcing fixed start and end conditions. The method can serve as a standalone planner or as an initializer for more complex motion planning problems.
comment: Accepted for the 2026 IEEE Aerospace Conference
Confounding Robust Continuous Control via Automatic Reward Shaping AAMAS 2026
Reward shaping has been applied widely to accelerate Reinforcement Learning (RL) agents' training. However, a principled way of designing effective reward shaping functions, especially for complex continuous control problems, remains largely under-explained. In this work, we propose to automatically learn a reward shaping function for continuous control problems from offline datasets, potentially contaminated by unobserved confounding variables. Specifically, our method builds upon the recently proposed causal Bellman equation to learn a tight upper bound on the optimal state values, which is then used as the potentials in the Potential-Based Reward Shaping (PBRS) framework. Our proposed reward shaping algorithm is tested with Soft-Actor-Critic (SAC) on multiple commonly used continuous control benchmarks and exhibits strong performance guarantees under unobserved confounders. More broadly, our work marks a solid first step towards confounding robust continuous control from a causal perspective. Code for training our reward shaping functions can be found at https://github.com/mateojuliani/confounding_robust_cont_control.
comment: Mateo Juliani and Mingxuan Li contributed equally to this work; accepted in AAMAS 2026
A Human-in-the-Loop Confidence-Aware Failure Recovery Framework for Modular Robot Policies
Robots operating in unstructured human environments inevitably encounter failures, especially in robot caregiving scenarios. While humans can often help robots recover, excessive or poorly targeted queries impose unnecessary cognitive and physical workload on the human partner. We present a human-in-the-loop failure-recovery framework for modular robotic policies, where a policy is composed of distinct modules such as perception, planning, and control, any of which may fail and often require different forms of human feedback. Our framework integrates calibrated estimates of module-level uncertainty with models of human intervention cost to decide which module to query and when to query the human. It separates these two decisions: a module selector identifies the module most likely responsible for failure, and a querying algorithm determines whether to solicit human input or act autonomously. We evaluate several module-selection strategies and querying algorithms in controlled synthetic experiments, revealing trade-offs between recovery efficiency, robustness to system and user variables, and user workload. Finally, we deploy the framework on a robot-assisted bite acquisition system and demonstrate, in studies involving individuals with both emulated and real mobility limitations, that it improves recovery success while reducing the workload imposed on users. Our results highlight how explicitly reasoning about both robot uncertainty and human effort can enable more efficient and user-centered failure recovery in collaborative robots. Supplementary materials and videos can be found at: http://emprise.cs.cornell.edu/modularhil
comment: The second and third authors contributed equally. The last two authors advised equally
Adaptive Time Step Flow Matching for Autonomous Driving Motion Planning
Autonomous driving requires reasoning about interactions with surrounding traffic. A prevailing approach is large-scale imitation learning on expert driving datasets, aimed at generalizing across diverse real-world scenarios. For online trajectory generation, such methods must operate at real-time rates. Diffusion models require hundreds of denoising steps at inference, resulting in high latency. Consistency models mitigate this issue but rely on carefully tuned noise schedules to capture the multimodal action distributions common in autonomous driving. Adapting the schedule, typically requires expensive retraining. To address these limitations, we propose a framework based on conditional flow matching that jointly predicts future motions of surrounding agents and plans the ego trajectory in real time. We train a lightweight variance estimator that selects the number of inference steps online, removing the need for retraining to balance runtime and imitation learning performance. To further enhance ride quality, we introduce a trajectory post-processing step cast as a convex quadratic program, with negligible computational overhead. Trained on the Waymo Open Motion Dataset, the framework performs maneuvers such as lane changes, cruise control, and navigating unprotected left turns without requiring scenario-specific tuning. Our method maintains a 20 Hz update rate on an NVIDIA RTX 3070 GPU, making it suitable for online deployment. Compared to transformer, diffusion, and consistency model baselines, we achieve improved trajectory smoothness and better adherence to dynamic constraints. Experiment videos and code implementations can be found at https://flow-matching-self-driving.github.io/.
comment: Accepted to Intelligent Vehicles Symposium 2026
Transforming Policy-Car Swerving for Mitigating Stop-and-Go Traffic Waves: A Practice-Oriented Jam-Absorption Driving Strategy
Stop-and-go waves, as a major form of freeway traffic congestion, cause severe and long-lasting adverse effects, including reduced traffic efficiency, increased driving risks, and higher vehicle emissions. Amongst the highway traffic management strategies, jam-absorption driving (JAD), in which a dedicated vehicle performs "slow-in" and "fast-out" maneuvers before being captured by a stop-and-go wave, has been proposed as a potential method for preventing the propagation of such waves. However, most existing JAD strategies remain impractical mainly due to the lack of discussion regarding implementation vehicles and operational conditions. Inspired by real-world observations of police-car swerving behavior, this paper first introduces a Single-Vehicle Two-Detector Jam-Absorption Driving (SVDD-JAD) problem, and then proposes a practical JAD strategy that transforms such behavior into a maneuver capable of suppressing the propagation of an isolated stop-and-go wave. Five key parameters that significantly affect the proposed strategy, namely, JAD speed, inflow traffic speed, wave width, wave speed, and in-wave speed, are identified and systematically analyzed. Using a SUMO-based simulation as an illustrative example, we further demonstrate how these parameters can be measured in practice with two stationary roadside traffic detectors. The results show that the proposed JAD strategy successfully suppresses the propagation of a stop-and-go wave, without triggering a secondary wave. This paper is expected to take a significant step toward making JAD practical, advancing it from a theoretical concept to a feasible and implementable strategy. To promote reproducibility in the transportation domain, we have also open-sourced all the code on our GitHub repository https://github.com/gotrafficgo.
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors ICLR 2026
Existing vision-language-action (VLA) models act in 3D real-world but are typically built on 2D encoders, leaving a spatial reasoning gap that limits generalization and adaptability. Recent 3D integration techniques for VLAs either require specialized sensors and transfer poorly across modalities, or inject weak cues that lack geometry and degrade vision-language alignment. In this work, we introduce FALCON (From Spatial to Action), a novel paradigm that injects rich 3D spatial tokens into the action head. FALCON leverages spatial foundation models to deliver strong geometric priors from RGB alone, and includes an Embodied Spatial Model that can optionally fuse depth, or pose for higher fidelity when available, without retraining or architectural changes. To preserve language reasoning, spatial tokens are consumed by a Spatial-Enhanced Action Head rather than being concatenated into the vision-language backbone. These designs enable FALCON to address limitations in spatial representation, modality transferability, and alignment. In comprehensive evaluations across three simulation benchmarks and eleven real-world tasks, our proposed FALCON achieves state-of-the-art performance, consistently surpasses competitive baselines, and remains robust under clutter, spatial-prompt conditioning, and variations in object scale and height.
comment: ICLR 2026, Project page: https://falcon-vla.github.io/
Going with the Flow: Koopman Behavioral Models as Implicit Planners for Visuo-Motor Dexterity
There has been rapid and dramatic progress in learning complex visuo-motor manipulation skills from demonstrations, thanks in part to expressive policy classes that employ diffusion- and transformer-based backbones. However, these design choices require significant data and computational resources and remain far from reliable, particularly within the context of multi-fingered dexterous manipulation. Fundamentally, they model skills as reactive mappings and rely on fixed-horizon action chunking to mitigate jitter, creating a rigid trade-off between temporal coherence and reactivity. In this work, we introduce Unified Behavioral Models (UBMs), a framework that learns to represent dexterous skills as coupled dynamical systems that capture how visual features of the environment (visual flow) and proprioceptive states of the robot (action flow) co-evolve. By capturing such behavioral dynamics, UBMs can ensure temporal coherence by construction rather than by heuristic averaging. To operationalize these models, we propose Koopman-UBM, a first instantiation of UBMs that leverages Koopman Operator theory to effectively learn a unified representation in which the joint flow of latent visual and proprioceptive features is governed by a structured linear system. We demonstrate that Koopman-UBM can be viewed as an implicit planner: given an initial condition, it computes the desired robot behavior with the resulting flow of visual features over the entire skill horizon. To enable reactivity, we introduce an online replanning strategy in which the model acts as its own runtime monitor that automatically triggers replanning when predicted and observed visual flow diverge. Across seven simulated and two real-world tasks, we demonstrate that K-UBM matches or exceeds the performance of SOTA baselines, while offering faster inference, smooth execution, robustness to occlusions, and flexible replanning.
comment: Codes can be found at https://github.com/GT-STAR-Lab/K-UBM/
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation ICRA 2026
Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.
comment: Accepted for publication by IEEE International Conference on Robotics & Automation (ICRA 2026)
Model Predictive Control via Probabilistic Inference: A Tutorial and Survey
This paper provides a tutorial and a survey of probabilistic inference-based model predictive control (PI-MPC) for robotics. PI-MPC defines an optimal control distribution shaped by trajectory cost, a control prior, and a temperature parameter. In the tutorial part, we derive this view and describe action generation via variational inference, highlighting Model Predictive Path Integral (MPPI) control as a representative algorithm. In the survey part, we organize the PI-MPC literature around the principal design choices identified in the tutorial: prior distribution design, multi-modal distribution handling, constraint satisfaction, scalability to high-degree-of-freedom robots, hardware acceleration, and theoretical foundations. Overall, this paper aims to serve as a coherent entry point for researchers and practitioners interested in understanding, implementing, and extending PI-MPC.
comment: 26 pages, 7 figures
karl. -- A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Hoi! -- A Multimodal Dataset for Force-Grounded, Cross-View Articulated Manipulation
We present a dataset for force-grounded, cross-view articulated manipulation that couples what is seen with what is done and what is felt during real human interaction. The dataset contains 3048 sequences across 381 articulated objects in 38 environments. Each object is operated under four embodiments - (i) human hand, (ii) human hand with a wrist-mounted camera, (iii) handheld UMI gripper, and (iv) a custom Hoi! gripper - where the tool embodiment provides synchronized end-effector forces and tactile sensing. Our dataset offers a holistic view of interaction understanding from video, enabling researchers to evaluate how well methods transfer between human and robotic viewpoints, but also investigate underexplored modalities such as force sensing and prediction. Further information can be found on the Website.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion
Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
comment: fix RoboGauge linear velocity metric, friction error; update radar fig (page 1), score tables (page 6,7); fix obs stack bug
Simultaneous Calibration of Noise Covariance and Kinematics for State Estimation of Legged Robots via Bi-level Optimization
Accurate state estimation is critical for legged and aerial robots operating in dynamic, uncertain environments. A key challenge lies in specifying process and measurement noise covariances, which are typically unknown or manually tuned. In this work, we introduce a bi-level optimization framework that jointly calibrates covariance matrices and kinematic parameters in an estimator-in-the-loop manner. The upper level treats noise covariances and model parameters as optimization variables, while the lower level executes a full-information estimator. Differentiating through the estimator allows direct optimization of trajectory-level objectives, resulting in accurate and consistent state estimates. We validate our approach on quadrupedal and humanoid robots, demonstrating significantly improved estimation accuracy and uncertainty calibration compared to hand-tuned baselines. Our method unifies state estimation, sensor, and kinematics calibration into a principled, data-driven framework applicable across diverse robotic platforms.
VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-end Autonomous Driving
In autonomous driving, dynamic environment and corner cases pose significant challenges to the robustness of ego vehicle's state understanding and decision making. We introduce VDRive, a novel pipeline for end-to-end autonomous driving that explicitly models state-action mapping to address these challenges, enabling interpretable and robust decision making. By leveraging the advancement of the state understanding of the Vision Language Action Model (VLA) with generative diffusion policy-based action head, our VDRive guides the driving contextually and geometrically. Contextually, VLA predicts future observations through token generation pre-training, where the observations are represented as discrete codes by a Conditional Vector Quantized Variational Autoencoder (CVQ-VAE). Geometrically, we perform reinforcement learning fine-tuning of the VLA to predict future trajectories and actions based on current driving conditions. VLA supplies the current state tokens and predicted state tokens for the action policy head to generate hierarchical actions and trajectories. During policy training, a learned critic evaluates the actions generated by the policy and provides gradient-based feedback, forming an actor-critic framework that enables a reinforcement-based policy learning pipeline. Experiments show that our VDRive achieves state-of-the-art performance in the Bench2Drive closed-loop benchmark and nuScenes open-loop planning.
comment: WIP
GenTrack2: An Improved Hybrid Approach for Multi-Object Tracking
This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target numbers under nonlinear dynamics. A stochastic particle filter addresses nonlinear dynamics and non-Gaussian noise, with support from particle swarm optimization (PSO) to guide particles toward state distribution modes and mitigate divergence through proposed fitness measures incorporating motion consistency, appearance similarity, and social-interaction cues with neighboring targets. Deterministic association further enforces identifier consistency via a proposed cost matrix incorporating spatial consistency between particles and current detections, detection confidences, and track penalties. Subsequently, a novel scheme is proposed for the smooth updating of target states while preserving their identities, particularly for weak tracks during interactions with other targets and prolonged occlusions. Moreover, velocity regression over past states provides trend-seed velocities, enhancing particle sampling and state updates. The proposed tracker is designed to operate flexibly for both pre-recorded videos and camera live streams, where future frames are unavailable. Experimental results confirm superior performance compared to state-of-the-art trackers. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack2
comment: This work has been submitted to the IEEE for possible publication
CoLA-Flow Policy: Temporally Coherent Imitation Learning via Continuous Latent Action Flow Matching for Robotic Manipulation
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches provide strong modeling capacity but typically incur high inference latency, while flow matching enables fast one-step generation yet often leads to unstable execution when applied directly in the raw action space. We propose LG-Flow Policy, a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally regularized latent trajectories and learning an explicit latent-space flow, the proposed approach decouples global motion structure from low-level control noise, resulting in smooth and reliable long-horizon execution. LG-Flow Policy further incorporates geometry-aware point cloud conditioning and execution-time multimodal modulation, with visual cues evaluated as a representative modality in real-world settings. Experimental results in simulation and on physical robot platforms demonstrate that LG-Flow Policy achieves near single-step inference, substantially improves trajectory smoothness and task success over flow-based baselines operating in the raw action space, and remains significantly more efficient than diffusion-based policies.
comment: 8 pages, 8 figures
xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions
We present a novel architecture aimed towards incremental construction and exploitation of a hierarchical 3D scene graph representation during semantic-aware inspection missions. Inspection planning, particularly of distributed targets in previously unseen environments, presents an opportunity to exploit the semantic structure of the scene during reasoning, navigation and scene understanding. Motivated by this, we propose the 3D Layered Semantic Graph (3DLSG), a hierarchical inspection scene graph constructed in an incremental manner and organized into abstraction layers that support planning demands in real-time. To address the task of semantic-aware inspection, a mission framework, termed as Enhanced First-Look Inspect Explore (xFLIE), that tightly couples the 3DLSG with an inspection planner is proposed. We assess the performance through simulations and experimental trials, evaluating target-selection, path-planning and semantic navigation tasks over the 3DLSG model. The scenarios presented are diverse, ranging from city-scale distributed to solitary infrastructure targets in simulated worlds and subsequent outdoor and subterranean environment deployments onboard a quadrupedal robot. The proposed method successfully demonstrates incremental construction and planning over the 3DLSG representation to meet the objectives of the missions. Furthermore, the framework demonstrates successful semantic navigation tasks over the structured interface at the end of the inspection missions. Finally, we report multiple orders of magnitude reduction in path-planning time compared to conventional volumetric-map-based methods over various environment scale, demonstrating the planning efficiency and scalability of the proposed approach.
comment: 20 pages
Bi-Adapt: Few-shot Bimanual Adaptation for Novel Categories of 3D Objects via Semantic Correspondence
Bimanual manipulation is imperative yet challenging for robots to execute complex tasks, requiring coordinated collaboration between two arms. However, existing methods for bimanual manipulation often rely on costly data collection and training, struggling to generalize to unseen objects in novel categories efficiently. In this paper, we present Bi-Adapt, a novel framework designed for efficient generalization for bimanual manipulation via semantic correspondence. Bi-Adapt achieves cross-category affordance mapping by leveraging the strong capability of vision foundation models. Fine-tuning with restricted data on novel categories, Bi-Adapt exhibits notable generalization to out-of-category objects in a zero-shot manner. Extensive experiments conducted in both simulation and real-world environments validate the effectiveness of our approach and demonstrate its high efficiency, achieving a high success rate on different benchmark tasks across novel categories with limited data. Project website: https://biadapt-project.github.io/
Implicit State Estimation via Video Replanning
Video-based representations have gained prominence in planning and decision-making due to their ability to encode rich spatiotemporal dynamics and geometric relationships. These representations enable flexible and generalizable solutions for complex tasks such as object manipulation and navigation. However, existing video planning frameworks often struggle to adapt to failures at interaction time due to their inability to reason about uncertainties in partially observed environments. To overcome these limitations, we introduce a novel framework that integrates interaction-time data into the planning process. Our approach updates model parameters online and filters out previously failed plans during generation. This enables implicit state estimation, allowing the system to adapt dynamically without explicitly modeling unknown state variables. We evaluate our framework through extensive experiments on a new simulated manipulation benchmark, demonstrating its ability to improve replanning performance and advance the field of video-based decision-making.
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we implicitly suggest that it is a misconception to think that humans learn fast. The learning process takes time. Babies start learning to move in the restricted fluid environment of the womb. Children are often limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for tens of millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is well known that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are submerged in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update the Actor and Critic in one pass, as well as combine the Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI
Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.
LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation
We propose LCLA (Language-Conditioned Latent Alignment), a framework for vision-language navigation that learns modular perception-action interfaces by aligning sensory observations to a latent representation of an expert policy. The expert is first trained with privileged state information, inducing a latent space sufficient for control, after which its latent interface and action head are frozen. A lightweight adapter is then trained to map raw visual-language observations, via a frozen vision-language model, into the expert's latent space, reducing the problem of visuomotor learning to supervised latent alignment rather than end-to-end policy optimization. This decoupling enforces a stable contract between perception and control, enabling expert behavior to be reused across sensing modalities and environmental variations. We instantiate LCLA and evaluate it on a vision-language indoor navigation task, where aligned latent spaces yield strong in-distribution performance and robust zero-shot generalization to unseen environments, lighting conditions, and viewpoints while remaining lightweight at inference time.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Dense image correspondence is central to many applications, such as visual odometry, 3D reconstruction, object association, and re-identification. Historically, dense correspondence has been tackled separately for wide-baseline scenarios and optical flow estimation, despite the common goal of matching content between two images. In this paper, we develop a Unified Flow & Matching model (UFM), which is trained on unified data for pixels that are co-visible in both source and target images. UFM uses a simple, generic transformer architecture that directly regresses the (u,v) flow. It is easier to train and more accurate for large flows compared to the typical coarse-to-fine cost volumes in prior work. UFM is 28% more accurate than state-of-the-art flow methods (Unimatch), while also having 62% less error and 6.7x faster than dense wide-baseline matchers (RoMa). UFM is the first to demonstrate that unified training can outperform specialized approaches across both domains. This result enables fast, general-purpose correspondence and opens new directions for multi-modal, long-range, and real-time correspondence tasks.
comment: Project Page: https://uniflowmatch.github.io/
karl. - A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets
Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. Robot demonstration data is scarce and often lacks appropriate variation in camera viewpoints. Simulation offers a way to collect robot demonstrations at scale with comprehensive coverage of different viewpoints, but presents a visual sim2real challenge. To bridge this gap, we propose MANGO -- an unpaired image translation method with a novel segmentation-conditioned InfoNCE loss, a highly-regularized discriminator design, and a modified PatchNCE loss. We find that these elements are crucial for maintaining viewpoint consistency during sim2real translation. When training MANGO, we only require a small amount of fixed-camera data from the real world, but show that our method can generate diverse unseen viewpoints by translating simulated observations. In this domain, MANGO outperforms all other image translation methods we tested. Imitation-learning policies trained on data augmented by MANGO are able to achieve success rates as high as 60% on views that the non-augmented policy fails completely on.
Multi-Momentum Observer Contact Estimation for Bipedal Robots
As bipedal robots become more and more popular in commercial and industrial settings, the ability to control them with a high degree of reliability is critical. To that end, this paper considers how to accurately estimate which feet are currently in contact with the ground so as to avoid improper control actions that could jeopardize the stability of the robot. Additionally, modern algorithms for estimating the position and orientation of a robot's base frame rely heavily on such contact mode estimates. Dedicated contact sensors on the feet can be used to estimate this contact mode, but these sensors are prone to noise, time delays, damage/yielding from repeated impacts with the ground, and are not available on every robot. To overcome these limitations, we propose a momentum observer based method for contact mode estimation that does not rely on such contact sensors. Often, momentum observers assume that the robot's base frame can be treated as an inertial frame. However, since many humanoids' legs represent a significant portion of the overall mass, the proposed method instead utilizes multiple simultaneous dynamic models. Each of these models assumes a different contact condition. A given contact assumption is then used to constrain the full dynamics in order to avoid assuming that either the body is an inertial frame or that a fully accurate estimate of body velocity is known. The (dis)agreement between each model's estimates and measurements is used to determine which contact mode is most likely using a Markov-style fusion method. The proposed method produces contact detection accuracy of up to 98.44% with a low noise simulation and 77.12% when utilizing data collect on the Sarcos Guardian XO robot (a hybrid humanoid/exoskeleton).
Multiagent Systems
Resilient Topology-Aware Coordination for Dynamic 3D UAV Networks under Node Failure
In 3D Aerial-Ground Integrated Networks (AGINs), ensuring continuous service coverage under unexpected hardware failures is critical for mission-critical applications. While Multi-Agent Reinforcement Learning (MARL) has shown promise in autonomous coordination, its resilience under sudden node failures remains a challenge due to dynamic topology deformation. This paper proposes a Topology-Aware Graph MAPPO (TAG-MAPPO) framework designed to enhance system survivability through autonomous 3D spatial reconfiguration. Our framework incorporates graph-based feature aggregation with a residual ego-state fusion mechanism to capture intricate inter-agent dependencies. This architecture enables the surviving swarm to rapidly adapt its topology compared to conventional Multi-Layer Perceptron (MLP) based approaches. Extensive simulations across heterogeneous environments, ranging from interference-limited Crowded Urban to sparse Rural areas, validate the proposed approach. The results demonstrate that TAG-MAPPO consistently outperforms baselines in both stability and efficiency; specifically, it reduces redundant handoffs by up to 50 percent while maintaining a lead in energy efficiency. Most notably, the framework exhibits exceptional self-healing capabilities following a catastrophic node failure. TAG-MAPPO restores over 90 percent of the pre-failure service coverage within 15 time steps, exhibiting a significantly faster V-shaped recovery trajectory than MLP baselines. Furthermore, in dense urban scenarios, the framework achieves a post-failure Jain's Fairness Index that even surpasses its original four-UAV configuration by effectively resolving service overlaps. These findings suggest that topology-aware coordination is essential for the realization of resilient 6G aerial networks and provides a robust foundation for adaptive deployments in volatile environments.
comment: 25 pages, 5 figures. Full research paper providing a resilience-aware RL framework for UAV networks under node failure. A preliminary version has been submitted to ACM Transactions on Internet Technology (TOIT) for possible publication
A Collaborative Safety Shield for Safe and Efficient CAV Lane Changes in Congested On-Ramp Merging
Lane changing in dense traffic is a significant challenge for Connected and Autonomous Vehicles (CAVs). Existing lane change controllers primarily either ensure safety or collaboratively improve traffic efficiency, but do not consider these conflicting objectives together. To address this, we propose the Multi-Agent Safety Shield (MASS), designed using Control Barrier Functions (CBFs) to enable safe and collaborative lane changes. The MASS enables collaboration by capturing multi-agent interactions among CAVs through interaction topologies constructed as a graph using a simple algorithm. Further, a state-of-the-art Multi-Agent Reinforcement Learning (MARL) lane change controller is extended by integrating MASS to ensure safety and defining a customised reward function to prioritise efficiency improvements. As a result, we propose a lane change controller, known as MARL-MASS, and evaluate it in a congested on-ramp merging simulation. The results demonstrate that MASS enables collaborative lane changes with safety guarantees by strictly respecting the safety constraints. Moreover, the proposed custom reward function improves the stability of MARL policies trained with a safety shield. Overall, by encouraging the exploration of a collaborative lane change policy while respecting safety constraints, MARL-MASS effectively balances the trade-off between ensuring safety and improving traffic efficiency in congested traffic. The code for MARL-MASS is available with an open-source licence at https://github.com/hkbharath/MARL-MASS
comment: Accepted in IEEE IV 2026
Tiny Moves: Game-based Hypothesis Refinement
Most machine learning approaches to scientific discovery frame hypotheses as end-to-end predictions, obscuring the incremental structure of scientific reasoning. We propose The Hypothesis Game, a symbolic formalism for hypothesis refinement in which LLM agents operate on a shared hypothesis state using a fixed grammar of reasoning moves. The framework is motivated by the observation that scientific progress often proceeds through small, localized revisions, grounded in domain context, rather than extensive rewrites. We instantiate a minimal game with LLM agents and evaluate it on pathway-level mechanistic refinement tasks. In the primary setting of corruption recovery, where hypotheses contain controlled errors, the game-based approach consistently removes more errors and achieves higher precision than strong prompting baselines, while preserving valid structure through incremental edits. In a secondary reconstruction setting from partial cues, it performs comparably to the strongest baseline, indicating that explicit move-based refinement remains competitive even when ground-truth recovery is difficult. These findings support game-based reasoning as a principled route to more controllable, interpretable, and transferable hypothesis refinement systems for scientific discovery.
Dieu khien he da tac tu
Since the early 2000s, control of multiagent systems has attracted significant research interest, with applications ranging from natural collective behaviors and social dynamics to engineered systems such as autonomous vehicles, sensor networks, and smart grids. Although research on multi-agent systems has diversified into numerous specialized directions, textbooks -- including those in English -- that provide a systematic treatment of the fundamental principles of multi-agent system control remain scarce. The material presented in this book has been developed and used in teaching since 2021, initially as a concise Vietnamese-language reference for the courses Networked Control Systems and Control of Multi-Agent Systems at Hanoi University of Science and Technology. The book focuses on a selection of fundamental topics of broad and continuing interest in the field. The complexity of several topics is asymptotic to that encountered in research-level studies, however, the analysis is presented in a step-by-step manner to facilitate access to commonly used methods and tools. The material is divided into three main parts. Part I introduces multiagent systems and basic graph-theoretic concepts. Part II addresses the design and analysis of linear consensus algorithms. Part III covers selected applications and research directions, including formation control, network localization, distributed optimization, opinion dynamics, and matrix-weighted networks. Each chapter concludes with notes on notable researchers in this field, further reading, and exercises. This book cannot be completed without the encouragement, support and suggestions from families, colleagues and friends. The authors appreciate feedback from readers to further improve the content of the book.
comment: "Multi-Agent System Control", 252 pages; in Vietnamese language, 82 figures
LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis
Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consistent mental-health assessment. Progress in AI-assisted psychiatric diagnosis is constrained by the absence of benchmarks that simultaneously provide realistic patient simulation, clinician-verified diagnostic labels, and support for dynamic multi-turn consultation. We present LingxiDiagBench, a large-scale multi-agent benchmark that evaluates LLMs on both static diagnostic inference and dynamic multi-turn psychiatric consultation in Chinese. At its core is LingxiDiag-16K, a dataset of 16,000 EMR-aligned synthetic consultation dialogues designed to reproduce real clinical demographic and diagnostic distributions across 12 ICD-10 psychiatric categories. Through extensive experiments across state-of-the-art LLMs, we establish key findings: (1) although LLMs achieve high accuracy on binary depression--anxiety classification (up to 92.3%), performance deteriorates substantially for depression--anxiety comorbidity recognition (43.0%) and 12-way differential diagnosis (28.5%); (2) dynamic consultation often underperforms static evaluation, indicating that ineffective information-gathering strategies significantly impair downstream diagnostic reasoning; (3) consultation quality assessed by LLM-as-a-Judge shows only moderate correlation with diagnostic accuracy, suggesting that well-structured questioning alone does not ensure correct diagnostic decisions. We release LingxiDiag-16K and the full evaluation framework to support reproducible research at https://github.com/Lingxi-mental-health/LingxiDiagBench.
CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Existing LLM-based offensive agent evaluations rely on closed-world settings with predefined goals and binary success criteria. To address this gap, we introduce CyberExplorer, an evaluation suite with two core components: (1) an open-environment benchmark built on a virtual machine hosting 40 vulnerable web services derived from real-world CTF challenges, where agents autonomously perform reconnaissance, target selection, and exploitation without prior knowledge of vulnerability locations; and (2) a reactive multi-agent framework supporting dynamic exploration without predefined plans. CyberExplorer enables fine-grained evaluation beyond flag recovery, capturing interaction dynamics, coordination behavior, failure modes, and vulnerability discovery signals-bridging the gap between benchmarks and realistic multi-target attack scenarios.
Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis AAMAS'25
Climate policy development faces significant challenges due to deep uncertainty, complex system dynamics, and competing stakeholder interests. Climate simulation methods, such as Earth System Models, have become valuable tools for policy exploration. However, their typical use is for evaluating potential polices, rather than directly synthesizing them. The problem can be inverted to optimize for policy pathways, but the traditional optimization approaches often struggle with non-linear dynamics, heterogeneous agents, and comprehensive uncertainty quantification. We propose a framework for augmenting climate simulations with Multi-Agent Reinforcement Learning (MARL) to address these limitations. We identify key challenges at the interface between climate simulations and the application of MARL in the context of policy synthesis, including reward definition, scalability with increasing agents and state spaces, uncertainty propagation across linked systems, and solution validation. Additionally, we discuss challenges in making MARL-derived solutions interpretable and useful for policy-makers. Our framework provides a foundation for more sophisticated climate policy exploration while acknowledging important limitations and areas for future research.
comment: Published in AAMAS'25 Blue Sky Ideas Track
Agentifying Agentic AI
Agentic AI seeks to endow systems with sustained autonomy, reasoning, and interaction capabilities. To realize this vision, its assumptions about agency must be complemented by explicit models of cognition, cooperation, and governance. This paper argues that the conceptual tools developed within the Autonomous Agents and Multi-Agent Systems (AAMAS) community, such as BDI architectures, communication protocols, mechanism design, and institutional modelling, provide precisely such a foundation. By aligning adaptive, data-driven approaches with structured models of reasoning and coordination, we outline a path toward agentic systems that are not only capable and flexible, but also transparent, cooperative, and accountable. The result is a perspective on agency that bridges formal theory and practical autonomy.
comment: 10 pages; 1 figure
Modelling Socio-Psychological Drivers of Land Management Intensity
Land management intensity shapes ecosystem service provision, socio-ecological resilience and is central to sustainable transformation. Yet most land use models emphasise economic and biophysical drivers, while socio-psychological factors influencing land managers' decisions remain underrepresented despite increasing evidence that they shape land management choices. To address this gap, we develop a generic behavioural extension for agent-based land use models, guided by the Theory of Planned Behaviour as an overarching conceptual framework. The extension integrates environmental attitudes, descriptive social norms and behavioural inertia into land managers' decisions on land management intensity. To demonstrate applicability, the extension is coupled to an existing land use modelling framework and explored in stylised settings to isolate behavioural mechanisms. Results show that socio-psychological drivers can significantly alter land management intensity shares, landscape configuration, and ecosystem service provision. Nonlinear feedbacks between these drivers, spatial resource heterogeneity, and ecosystem service demand lead to emergent dynamics that are sometimes counter-intuitive and can diverge from the agent-level decision rules. Increasing the influence of social norms generates spatial clustering and higher landscape connectivity, while feedbacks between behavioural factors can lead to path dependence, lock-in effects, and the emergence of multiple stable regimes with sharp transitions. The proposed framework demonstrates how even low levels of behavioural diversity and social interactions can reshape system-level land use outcomes and provides a reusable modelling component for incorporating socio-psychological processes into land use simulations. The approach can be integrated into other agent-based land use models and parameterised empirically in future work.
comment: 34 pages (including references and appendix), 27 figures. Submitted to JASSS
Virtual Force-Based Routing of Modular Agents on a Graph
Modular vehicles present a novel area of academic and industrial interest in the field of multi-agent systems. Modularity allows vehicles to connect and disconnect with each other mid-transit which provides a balance between efficiency and flexibility when solving complex and large scale tasks in urban or aerial transportation. This paper details a generalized scheme to route multiple modular agents on a graph to a predetermined set of target nodes. The objective is to visit all target nodes while incurring minimum resource expenditure. Agents that are joined together will incur the equivalent cost of a single agent, which is motivated by the logistical benefits of traffic reduction and increased fuel efficiency. To solve this problem, we introduce a novel algorithm that seeks to balance the optimality of the path that every single module takes and the cost benefit of joining modules. Our approach models the agents and targets as point charges, where the modules take the path of highest attractive force from its target node and neighboring agents. We validate our approach by simulating multiple modular agents along real-world transportation routes in the road network of Champaign-Urbana, Illinois, USA. The proposed method easily exceeds the available benchmarks and illustrates the benefits of modularity in multi-target planning problems.
Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms AAMAS 2026
As intelligent agents become more generally-capable, i.e. able to master a wide variety of tasks, the complexity and cost of properly evaluating them rises significantly. Tasks that assess specific capabilities of the agents can be correlated and stochastic, requiring many samples for accurate comparisons, leading to added costs. In this paper, we propose a formal definition and a conceptual framework for active evaluation of agents across multiple tasks, which assesses the performance of ranking algorithms as a function of number of evaluation data samples. Rather than curating, filtering, or compressing existing data sets as a preprocessing step, we propose an online framing: on every iteration, the ranking algorithm chooses the task and agents to sample scores from. Then, evaluation algorithms report a ranking of agents on each iteration and their performance is assessed with respect to the ground truth ranking over time. Several baselines are compared under different experimental contexts, with synthetic generated data and simulated online access to real evaluation data from Atari game-playing agents. We find that the classical Elo rating system -- while it suffers from well-known failure modes, in theory -- is a consistently reliable choice for efficient reduction of ranking error in practice. A recently-proposed method, Soft Condorcet Optimization, shows comparable performance to Elo on synthetic data and significantly outperforms Elo on real Atari agent evaluation. When task variation from the ground truth is high, selecting tasks based on proportional representation leads to higher rate of ranking error reduction.
comment: AAMAS 2026
The Specification Trap: Why Content-Based AI Value Alignment Cannot Produce Robust Alignment
I argue that content-based AI value alignment--any approach that treats alignment as optimizing toward a formal value-object (reward function, utility function, constitutional principles, or learned preference representation)--cannot, by itself, produce robust alignment under capability scaling, distributional shift, and increasing autonomy. This limitation arises from three philosophical results: Hume's is-ought gap (behavioral data cannot entail normative conclusions), Berlin's value pluralism (human values are irreducibly plural and incommensurable), and the extended frame problem (any value encoding will misfit future contexts that advanced AI creates). I show that RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games each instantiate this specification trap, and that their failure modes are structural, not engineering limitations. Proposed escape routes--continual updating, meta-preferences, moral realism--relocate the trap rather than exit it. Drawing on Fischer and Ravizza's compatibilist theory, I argue that behavioral compliance does not constitute alignment: there is a principled distinction between simulated value-following and genuine reasons-responsiveness, and specification-based methods cannot produce the latter. The specification trap establishes a ceiling on content-based approaches, not their uselessness--but this ceiling becomes safety-critical at the capability frontier. The alignment problem must be reframed from value specification to value emergence.
comment: ~6,000 words, 19 pages. First in a five-paper research program on AI alignment. Establishes structural ceiling on content-based approaches (RLHF, Constitutional AI, IRL, assistance games); does not claim alignment is impossible, claims robust alignment under scaling/shift/autonomy requires process-based alternatives
Convergence and Connectivity: Dynamics of Multi-Agent Q-Learning in Random Networks
Beyond specific settings, many multi-agent learning algorithms fail to converge to an equilibrium solution, instead displaying complex, non-stationary behaviours such as recurrent or chaotic orbits. In fact, recent literature suggests that such complex behaviours are likely to occur when the number of agents increases. In this paper, we study Q-learning dynamics in network polymatrix normal-form games where the network structure is drawn from classical random graph models. In particular, we focus on the Erdős-Rényi model, which is used to analyze connectivity in distributed systems, and the Stochastic Block model, which generalizes the above by accounting for community structures that naturally arise in multi-agent systems. In each setting, we establish sufficient conditions under which the agents' joint strategies converge to a unique equilibrium. We investigate how this condition depends on the exploration rates, payoff matrices and, crucially, the probabilities of interaction between network agents. We validate our theoretical findings through numerical simulations and demonstrate that convergence can be reliably achieved in many-agent systems, provided interactions in the network are controlled.
AOI: Context-Aware Multi-Agent Operations via Dynamic Scheduling and Hierarchical Memory Compression
The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on synthetic and real-world benchmarks show that AOI achieves 72.4\% context compression while preserving 92.8\% critical information, improves task success to 94.2\%, and reduces MTTR by 34.4\% over the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
comment: Accepted at the 2026 IEEE International Conference on Robotics and Automation
Systems and Control (EESS)
Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning
Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and scalable framework for optimistic exploration that brings classical reward-biased maximum likelihood estimation (RBMLE) from adaptive control into deep RL. In contrast to upper confidence bound (UCB)-style exploration methods, OWMs incorporate optimism directly into model learning by augmentation with an optimistic dynamics loss that biases imagined transitions toward higher-reward outcomes. This fully gradient-based loss requires neither uncertainty estimates nor constrained optimization. Our approach is plug-and-play with existing world model frameworks, preserving scalability while requiring only minimal modifications to standard training procedures. We instantiate OWMs within two state-of-the-art world model architectures, leading to Optimistic DreamerV3 and Optimistic STORM, which demonstrate significant improvements in sample efficiency and cumulative return compared to their baseline counterparts.
A Collaborative Safety Shield for Safe and Efficient CAV Lane Changes in Congested On-Ramp Merging
Lane changing in dense traffic is a significant challenge for Connected and Autonomous Vehicles (CAVs). Existing lane change controllers primarily either ensure safety or collaboratively improve traffic efficiency, but do not consider these conflicting objectives together. To address this, we propose the Multi-Agent Safety Shield (MASS), designed using Control Barrier Functions (CBFs) to enable safe and collaborative lane changes. The MASS enables collaboration by capturing multi-agent interactions among CAVs through interaction topologies constructed as a graph using a simple algorithm. Further, a state-of-the-art Multi-Agent Reinforcement Learning (MARL) lane change controller is extended by integrating MASS to ensure safety and defining a customised reward function to prioritise efficiency improvements. As a result, we propose a lane change controller, known as MARL-MASS, and evaluate it in a congested on-ramp merging simulation. The results demonstrate that MASS enables collaborative lane changes with safety guarantees by strictly respecting the safety constraints. Moreover, the proposed custom reward function improves the stability of MARL policies trained with a safety shield. Overall, by encouraging the exploration of a collaborative lane change policy while respecting safety constraints, MARL-MASS effectively balances the trade-off between ensuring safety and improving traffic efficiency in congested traffic. The code for MARL-MASS is available with an open-source licence at https://github.com/hkbharath/MARL-MASS
comment: Accepted in IEEE IV 2026
Safe Feedback Optimization through Control Barrier Functions
Feedback optimization refers to a class of methods that steer a control system to a steady state that solves an optimization problem. Despite tremendous progress on the topic, an important problem remains open: enforcing state constraints at all times. The difficulty in addressing it lies on mediating between the safety enforcement and the closed-loop stability, and ensuring the equivalence between closed-loop equilibria and the optimization problem's critical points. In this work, we present a feedback-optimization method that enforces state constraints at all times employing high-order control-barrier functions. We provide several results on the proposed controller dynamics, including well-posedness, safety guarantees, equivalence between equilibria and critical points, and local and global (in certain convex cases) asymptotic stability of optima. Various simulations illustrate our results.
Robust Macroscopic Density Control of Heterogeneous Multi-Agent Systems
Modern applications, such as orchestrating the collective behavior of robotic swarms or traffic flows, require the coordination of large groups of agents evolving in unstructured environments, where disturbances and unmodeled dynamics are unavoidable. In this work, we develop a scalable macroscopic density control framework in which a feedback law is designed directly at the level of an advection--diffusion partial differential equation. We formulate the control problem in the density space and prove global exponential convergence towards the desired behavior in $\mathcal{L}^2$ with guaranteed asymptotic rejection of bounded unknown drift terms, explicitly accounting for heterogeneous agent dynamics, unmodeled behaviors, and environmental perturbations. Our theoretical findings are corroborated by numerical experiments spanning heterogeneous oscillators, traffic systems, and swarm robotics in partially unknown environments.
Community-Centered Resilience Enhancement of Urban Power and Gas Networks via Microgrid Partitioning, Mobile Energy Storage, and Data-Driven Risk Assessment
Urban energy systems face increasing challenges due to high penetration of renewable energy sources, extreme weather events, and other high-impact, low-probability disruptions. This project proposes a community-centered, open-access framework to enhance the resilience and reliability of urban power and gas networks by integrating microgrid partitioning, mobile energy storage deployment, and data-driven risk assessment. The approach involves converting passive distribution networks into active, self-healing microgrids using distributed energy resources and remotely controlled switches to enable flexible reconfiguration during normal and emergency operations. To address uncertainties from intermittent renewable generation and variable load, an adjustable interval optimization method combined with a column and constraint generation algorithm is developed, providing robust planning solutions without requiring probabilistic information. Additionally, a real-time online risk assessment tool is proposed, leveraging 25 multi-dimensional indices including load, grid status, resilient resources, emergency response, and meteorological factors to support operational decision-making during extreme events. The framework also optimizes the long-term sizing and allocation of mobile energy storage units while incorporating urban traffic data for effective routing during emergencies. Finally, a novel time-dependent resilience and reliability index is introduced to quantify system performance under diverse operating conditions. The proposed methodology aims to enable resilient, efficient, and adaptable urban energy networks capable of withstanding high-impact disruptions while maximizing operational and economic benefits.
Differentiable Modeling for Low-Inertia Grids: Benchmarking PINNs, NODEs, and DP for Identification and Control of SMIB System
The transition toward low-inertia power systems demands modeling frameworks that provide not only accurate state predictions but also physically consistent sensitivities for control. While scientific machine learning offers powerful nonlinear modeling tools, the control-oriented implications of different differentiable paradigms remain insufficiently understood. This paper presents a comparative study of Physics-Informed Neural Networks (PINNs), Neural Ordinary Differential Equations (NODEs), and Differentiable Programming (DP) for modeling, identification, and control of power system dynamics. Using the Single Machine Infinite Bus (SMIB) system as a benchmark, we evaluate their performance in trajectory extrapolation, parameter estimation, and Linear Quadratic Regulator (LQR) synthesis. Our results highlight a fundamental trade-off between data-driven flexibility and physical structure. NODE exhibits superior extrapolation by capturing the underlying vector field, whereas PINN shows limited generalization due to its reliance on a time-dependent solution map. In the inverse problem of parameter identification, while both DP and PINN successfully recover the unknown parameters, DP achieves significantly faster convergence by enforcing governing equations as hard constraints. Most importantly, for control synthesis, the DP framework yields closed-loop stability comparable to the theoretical optimum. Furthermore, we demonstrate that NODE serves as a viable data-driven surrogate when governing equations are unavailable.
comment: 9 pages, 7 figures, 4 tables
Impact of Market Reforms on Deterministic Frequency Deviations in the European Power Grid
Deterministic frequency deviations (DFDs) are systematic and predictable excursions of grid frequency that arise from synchronized generation ramps induced by electricity market scheduling. In this paper, we analyze the impact of the European day-ahead market reform of 1 October 2025, which replaced hourly trading blocks with quarter-hourly blocks, on DFDs in the Central European synchronous area. Using publicly available frequency measurements, we compare periods before and after the reform based on daily frequency profiles, indicators characterizing frequency deviations, principal component analysis, Fourier-based functional data analysis, and power spectral density analysis. We show that the reform substantially reduces characteristic hourly frequency deviations and suppresses dominant spectral components at hourly and half-hourly time scales, while quarter-hourly structures gain relative importance. While the likelihood of large frequency deviations decreases overall, reductions for extreme events are less clear and depend on the metric used. Our results demonstrate that market design reforms can effectively mitigate systematic frequency deviations, but also highlight that complementary technical and regulatory measures are required to further reduce large frequency excursions in low-inertia power systems.
comment: 9 pages, 7 figures
ISO FastLane: Faster ISO 11783 with Dual Stack Approach as a Short Term Solution
The agricultural industry has been searching for a high-speed successor to the 250~kbit/s CAN bus backbone of ISO~11783 (ISOBUS) for over a decade, yet no protocol-level solution has reached standardization. Meanwhile, modern planters, sprayers, and Virtual Terminals are already constrained by the bus bandwidth. This paper presents ISO FastLane, a gateway-less dual-stack approach that routes point-to-point ISOBUS traffic over Ethernet while keeping broadcast messages on the existing CAN bus. The solution requires no new state machines, no middleware, and no changes to application layer code: only a simple Layer~3 routing decision and a lightweight peer discovery mechanism called Augmented Address Claim (AACL). Legacy devices continue to operate unmodified and unaware of FastLane traffic. Preliminary tests reported on the paper demonstrate that ISO FastLane accelerates Virtual Terminal object pool uploads by factor of 8 and sustains Task Controller message rates over 100 times beyond the current specification limit. Because ISO FastLane builds entirely on existing J1939 and ISO~11783 conventions, it can be implemented by ISOBUS engineers in a matter of weeks. This is delivering tangible performance gains today, without waiting for the long-term High Speed ISOBUS solution.
A General Formulation for the Teaching Assignment Problem: Computational Analysis Over a Real-World Dataset
The Teacher Assignment Problem is a combinatorial optimization problem that involves assigning teachers to courses while guaranteeing that all courses are covered, teachers do not teach too few or too many hours, teachers do not switch assigned courses too often and possibly teach the courses they favor. Typically the problem is solved manually, a task that requires several hours every year. In this work we present a mathematical formulation for the problem and an experimental evaluation of the model implemented using state-of-the-art SMT, CP, and MILP solvers. The implementations are tested over a real-world dataset provided by the Division of Systems and Control at Chalmers University of Technology, and produce teacher assignments with smaller workload deviation, a more even workload distribution among the teachers, and a lower number of switched courses.
UAV-Assisted 6G Communication Networks for Railways: Technologies, Applications, and Challenges
Unmanned Aerial Vehicles (UAVs) are crucial for advancing railway communication by offering reliable connectivity, adaptive coverage, and mobile edge services . This survey examines UAV-assisted approaches for 6G railway needs including ultra-reliable low-latency communication (URLLC) and integrated sensing and communication (ISAC). We cover railway channel models, reconfigurable intelligent surfaces (RIS), and UAV-assisted mobile edge computing (MEC). Key challenges include coexistence with existing systems, handover management, Doppler effect, and security. The roadmap suggests work on integrated communication-control systems and AI-driven optimization for intelligent railway networks.
comment: 5 pages , 2 figures accepted to the INNOVARail Conference 2026
QoS Identifier and Slice Mapping in 5G and Non-Terrestrial Network Interconnected Systems
The interconnection of 5G and non-terrestrial networks (NTNs) has been actively studied to expand connectivity beyond conventional terrestrial infrastructure. In the 3GPP standardization of 5G systems, the 5G Quality of Service (QoS) Identifier (5QI) is defined to characterize the QoS requirements of different traffic requirements. However, it falls short in capturing the diverse latency, capacity, and reliability profiles of NTN environments, particularly when NTNs are used as backhaul. Furthermore, it is difficult to manage individual traffic flows and perform efficient resource allocation and routing when a large number of 5G traffic flows are present in NTN systems. To address these challenges, we propose an optimization framework that enhances QoS handling by introducing an NTN QoS Identifier (NQI) and grouping 5G traffic into NTN slices based on similar requirements. This enables unified resource control and routing for a large number of 5G flows in NTN systems. In this paper, we present the detailed procedure of the proposed framework, which consists of 5QI to NQI mapping, NTN traffic to NTN slice mapping, and slice-level flow and routing optimization. We evaluate the framework by comparing multiple mapping schemes through numerical simulations and analyze their impact on overall network performance.
XLB: A High Performance Layer-7 Load Balancer for Microservices using eBPF-based In-kernel Interposition
L7 load balancers are a fundamental building block in microservices as they enable fine-grained traffic distribution. Compared to monolithic applications, microservices demand higher performance and stricter isolation from load balancers. This is due to the increased number of instances, longer service chains, and the necessity for co-location with services on the same host. Traditional sidecar-based load balancers are ill-equipped to meet these demands, often resulting in significant performance degradation. In this work, we present XLB, a novel architecture that reshapes L7 load balancers as in-kernel interposition operating on the socket layer. We leverage eBPF to implement the core load balancing logic in the kernel, and address the connection management and state maintenance challenges through novel socket layer redirection and nested eBPF maps designs. XLB eliminates the extra overhead of scheduling, communication, and data movement, resulting in a more lightweight, scalable, and efficient L7 load balancer architecture. Compared to the widely used microservices load balancers (Istio and Cilium), over 50 microservice instances, XLB achieves up to 1.5x higher throughput and 60% lower end-to-end latency.
First-order friction models with bristle dynamics: lumped and distributed formulations
Dynamic models, particularly rate-dependent models, have proven effective in capturing the key phenomenological features of frictional processes, whilst also possessing important mathematical properties that facilitate the design of control and estimation algorithms. However, many rate-dependent formulations are built on empirical considerations, whereas physical derivations may offer greater interpretability. In this context, starting from fundamental physical principles, this paper introduces a novel class of first-order dynamic friction models that approximate the dynamics of a bristle element by inverting the friction characteristic. Amongst the developed models, a specific formulation closely resembling the LuGre model is derived using a simple rheological equation for the bristle element. This model is rigorously analyzed in terms of stability and passivity -- important properties that support the synthesis of observers and controllers. Furthermore, a distributed version, formulated as a hyperbolic partial differential equation (PDE), is presented, which enables the modeling of frictional processes commonly encountered in rolling contact phenomena. The tribological behavior of the proposed description is evaluated through classical experiments and validated against the response predicted by the LuGre model, revealing both notable similarities and key differences.
comment: 15 pages, 9 figures. Under review at IEEE Transactions on Control Systems Technology
Lateral tracking control of all-wheel steering vehicles with intelligent tires
The accurate characterization of tire dynamics is critical for advancing control strategies in autonomous road vehicles, as tire behavior significantly influences handling and stability through the generation of forces and moments at the tire-road interface. Smart tire technologies have emerged as a promising tool for sensing key variables such as road friction, tire pressure, and wear states, and for estimating kinematic and dynamic states like vehicle speed and tire forces. However, most existing estimation and control algorithms rely on empirical correlations or machine learning approaches, which require extensive calibration and can be sensitive to variations in operating conditions. In contrast, model-based techniques, which leverage infinite-dimensional representations of tire dynamics using partial differential equations (PDEs), offer a more robust approach. This paper proposes a novel model-based, output-feedback lateral tracking control strategy for all-wheel steering vehicles that integrates distributed tire dynamics with smart tire technologies. The primary contributions include the suppression of micro-shimmy phenomena at low speeds and path-following via force control, achieved through the estimation of tire slip angles, vehicle kinematics, and lateral tire forces. The proposed controller and observer are based on formulations using ODE-PDE systems, representing rigid body dynamics and distributed tire behavior. This work marks the first rigorous control strategy for vehicular systems equipped with distributed tire representations in conjunction with smart tire technologies.
comment: 17 pages, 12 figures. Under review at IEEE Transactions on Intelligent Vehicles
Finite-time Stable Pose Estimation on TSE(3) using Point Cloud and Velocity Sensors
This work presents a finite-time stable pose estimator (FTS-PE) for rigid bodies undergoing rotational and translational motion in three dimensions, using measurements from onboard sensors that provide position vectors to inertially-fixed points and body velocities. The FTS-PE is a full-state observer for the pose (position and orientation) and velocities and is obtained through a Lyapunov analysis that shows its stability in finite time and its robustness to bounded measurement noise. Further, this observer is designed directly on the state space, the tangent bundle of the Lie group of rigid body motions, SE(3), without using local coordinates or (dual) quaternion representations. Therefore, it can estimate arbitrary rigid body motions without encountering singularities or the unwinding phenomenon and be readily applied to autonomous vehicles. A version of this observer that does not need translational velocity measurements and uses only point clouds and angular velocity measurements from rate gyros, is also obtained. It is discretized using the framework of geometric mechanics for numerical and experimental implementations. The numerical simulations compare the FTS-PE with a dual-quaternion extended Kalman filter and our previously developed variational pose estimator (VPE). The experimental results are obtained using point cloud images and rate gyro measurements obtained from a Zed 2i stereo depth camera sensor. These results validate the stability and robustness of the FTS-PE.
comment: 17 pages, 8 figures, submitted to Automatica
Escaping Local Minima: A Finite-Time Markov Chain Analysis of Constant-Temperature Simulated Annealing
Simulated Annealing (SA) is a widely used stochastic optimization algorithm, yet much of its theoretical understanding is limited to asymptotic convergence guarantees or general spectral bounds. In this paper, we develop a finite-time analytical framework for constant-temperature SA by studying a piecewise linear cost function that permits exact characterization. We model SA as a discrete-state Markov chain and first derive a closed-form expression for the expected time to escape a single linear basin in a one-dimensional landscape. We show that this expression also accurately predicts the behavior of continuous-state searches up to a constant scaling factor, which we analyze empirically and explain via variance matching, demonstrating convergence to a factor of sqrt(3) in certain regimes. We then extend the analysis to a two-basin landscape containing a local and a global optimum, obtaining exact expressions for the expected time to reach the global optimum starting from the local optimum, as a function of basin geometry, neighborhood radius, and temperature. Finally, we demonstrate how the predicted basin escape time can be used to guide the design of a simple two-temperature switching strategy.
comment: 6 pages, 10 figures
Efficient Policy Adaptation for Voltage Control Under Unknown Topology Changes SC
Reinforcement learning (RL) has shown great potential for designing voltage control policies, but their performance often degrades under changing system conditions such as topology reconfigurations and load variations. We introduce a topology-aware online policy optimization framework that leverages data-driven estimation of voltage-reactive power sensitivities to achieve efficient policy adaptation. Exploiting the sparsity of topology-switching events, where only a few lines change at a time, our method efficiently detects topology changes and identifies the affected lines and parameters, enabling fast and accurate sensitivity updates without recomputing the full sensitivity matrix. The estimated sensitivity is subsequently used for online policy optimization of a pre-trained neural-network-based RL controller. Simulations on both the IEEE 13-bus and SCE 56-bus systems demonstrate over 90 percent line identification accuracy, using only 15 data points. The proposed method also significantly improves voltage regulation performance compared with non-adaptive policies and adaptive policies that rely on regression-based online optimization methods for sensitivity estimation.
comment: 11 pages, PSCC 2026
Device Applications of Heterogeneously Integrated Strain-Switched Ferrimagnets/Topological Insulator/Piezoelectric Stacks
A family of ferrimagnets (CoV2O4, GdCo, TbCo) exhibits out-of-plane magnetic anisotropy when strained compressively and in-plane magnetic anisotropy when strained expansively (or vice versa). If such a ferrimagnetic thin film is placed on top of a topological insulator (TI) thin film and its magnetic anisotropy is modulated with strain, then interfacial exchange coupling between the ferrimagnet (FM) and the underlying TI will modulate the surface current flowing through the latter. If the strain is varied continuously, the current will also vary continuously and if the strain alternates in time, the current will also alternate with the frequency of the strain modulation, as long as the frequency is not so high that the period is smaller than the switching time of the FM. If the strain is generated with a gate voltage by integrating a piezoelectric underneath the FM/TI stack, then that can implement a transconductance amplifier or a synapse for neuromorphic computation.
Non-Fungible Blockchain Tokens for Traceable Online-Quality Assurance of Milled Workpieces
This work presents a concept and implementation for the secure storage and transfer of quality-relevant data of milled workpieces from online-quality assurance processes enabled by real-time simulation models. It utilises Non-Fungible Tokens (NFT) to securely and interoperably store quality data in the form of an Asset Administration Shell (AAS) on a public Ethereum blockchain. Minted by a custom smart contract, the NFTs reference the metadata saved in the Interplanetary File System (IPFS), allowing new data from additional processing steps to be added in a flexible yet secure manner. The concept enables automated traceability throughout the value chain, minimising the need for time-consuming and costly repetitive manual quality checks.
Limits of Residual-Based Detection for Physically Consistent False Data Injection
False data injection attacks (FDIAs) pose a persistent challenge to AC power system state estimation. In current practice, detection relies primarily on topology-aware residual-based tests that assume malicious measurements can be distinguished from normal operation through physical inconsistency reflected in abnormal residual behavior. This paper shows that this assumption does not always hold: when FDIA scenarios produce manipulated measurements that remain on the measurement manifold induced by AC power flow relations and measurement redundancy, residual-based detectors may fail to distinguish them from nominal data. The resulting detectability limitation is a property of the measurement manifold itself and does not depend on the attacker's detailed knowledge of the physical system model. To make this limitation observable in practice, we present a data-driven constructive mechanism that incorporates the generic functional structure of AC power flow to generate physically consistent, manifold-constrained perturbations, providing a concrete witness of how residual-based detectors can be bypassed. Numerical studies on multiple AC test systems characterize the conditions under which detection becomes challenging and illustrate its failure modes. The results highlight fundamental limits of residual-based detection in AC state estimation and motivate the need for complementary defenses beyond measurement consistency tests.
comment: 10 pages, 10 figures
Linear Quadratic Control with Non-Markovian and Non-Semimartingale Noise Models
The standard linear quadratic Gaussian (LQG) framework assumes a Brownian noise process and relies on classical stochastic calculus tools, such as those based on Itô calculus. In this paper, we solve a generalized linear quadratic optimal control problem where the process and measurement noises can be non-Markovian and non-semimartingale stochastic processes with sample paths that have low Hölder regularity. Since these noise models do not, in general, permit the use of the standard Itô calculus, we employ rough path theory to formulate and solve the problem. By leveraging signature representations and controlled rough paths, we derive the optimal state estimation and control strategies.
NLoS Localization with Single Base Station Based on Radio Map
Accurate outdoor localization in Non-Line-of-Sight (NLoS) environments remains a critical challenge for wireless communication and sensing systems. Existing methods, including positioning based on the Global Navigation Satellite System (GNSS) and triple Base Stations (BSs) techniques, cannot provide reliable performance under NLoS conditions, particularly in dense urban areas with strong multipath effects. To address this limitation, we propose a single BS localization framework that integrates sequential signal measurements with prior radio information embedded in the Radio Map (RM). Using temporal measurement features and matching them with radio maps, the proposed method effectively mitigates the adverse impact of multipath propagation and reduces the dependence on LoS paths. Simulation experiments further evaluate the impact of different radio map construction strategies and the varying lengths of the measurement sequence on localization accuracy. Results demonstrate that the proposed scheme achieves sub-meter positioning accuracy in typical NLoS environments, highlighting its potential as a practical and robust solution for single-base-station deployment.
comment: The manuscript lacks a complete description of the radio map generation process, which is foundational to the localization method. We believe it is necessary to withdraw the current version to prevent the dissemination of misleading results. A corrected version will be submitted as a replacement once the issue is resolved
Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors
This paper presents a novel learning based framework for predicting power outages caused by extreme events. The proposed approach targets low-probability high-consequence outage scenarios and leverages a comprehensive set of features derived from publicly available data sources. We integrate EAGLE-I outage records from 2014 to 2024 with weather, socioeconomic, infrastructure, and seasonal event data. Incorporating social and demographic indicators reveals patterns of community vulnerability and improves understanding of outage risk during extreme conditions. Four machine learning models are evaluated, including Random Forest (RF), Graph Neural Network (GNN), Adaptive Boosting (AdaBoost), and Long Short-Term Memory (LSTM). Experimental validation is performed on a large-scale dataset covering counties in the lower peninsula of Michigan. Among all models tested, the LSTM network achieves higher accuracy.
Model Predictive Control via Probabilistic Inference: A Tutorial and Survey
This paper provides a tutorial and a survey of probabilistic inference-based model predictive control (PI-MPC) for robotics. PI-MPC defines an optimal control distribution shaped by trajectory cost, a control prior, and a temperature parameter. In the tutorial part, we derive this view and describe action generation via variational inference, highlighting Model Predictive Path Integral (MPPI) control as a representative algorithm. In the survey part, we organize the PI-MPC literature around the principal design choices identified in the tutorial: prior distribution design, multi-modal distribution handling, constraint satisfaction, scalability to high-degree-of-freedom robots, hardware acceleration, and theoretical foundations. Overall, this paper aims to serve as a coherent entry point for researchers and practitioners interested in understanding, implementing, and extending PI-MPC.
comment: 26 pages, 7 figures
karl. -- A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Coherent Load Profile Synthesis with Conditional Diffusion for LV Distribution Network Scenario Generation
Limited visibility of distribution network power flows at the low voltage level presents challenges to both distribution network operators from a planning perspective and distribution system operators from a congestion management perspective. More representative loads are required to support meaningful analysis of LV substations; otherwise, such analysis risks misinforming future decisions. Traditional load profiling relies on typical profiles, oversimplifying substation-level complexity. Generative models have attempted to address this through synthesising representative loads from historical exemplars; however, while these approaches can approximate load shapes to a convincing degree of fidelity, analysis of the co-behaviour between substations is limited, which ultimately impacts higher voltage level network operation. This limitation will become even more pronounced with the increasing integration of low-carbon technologies, as estimates of base loads fail to capture load diversity. To address this gap, Conditional Diffusion models for synthesising daily active and reactive power profiles at the low voltage distribution substation level are proposed. The evaluation of fidelity is demonstrated through conventional metrics capturing temporal and statistical realism, as well as power flow modelling. Multiple models are proposed to handle varying levels of data availability, ranging from unconditional synthesis to an informed generation driven by metadata and daily statistics. The results show synthesised load profiles are plausible both independently and as a cohort in a wider power systems context. The Conditional Diffusion model is benchmarked against both naive and state-of-the-art models to demonstrate its effectiveness in producing realistic scenarios on which to base sub-regional power distribution network planning and operations.
Optimal Control of Behavioral-Feedback SIR Epidemic Model
We consider a behavioral-feedback SIR epidemic model, in which the infection rate depends in feedback on the fractions of susceptible and infected agents, respectively. The considered model allows one to account for endogenous adaptation mechanisms of the agents in response to the epidemics, such as voluntary social distancing, or the adoption of face masks. For this model, we formulate an optimal control problem for a social planner that has the ability to reduce the infection rate to keep the infection curve below a certain threshold within an infinite time horizon, while minimizing the intervention cost. Based on the dynamic properties of the model, we prove that, under quite general conditions on the infection rate, the filling the box strategy is the optimal control. This strategy consists in letting the epidemics spread without intervention until the threshold is reached, then applying the minimum control that leaves the fraction of infected individuals constantly at the threshold until the reproduction number becomes less than one and the infection naturally fades out. Our result generalizes one available in the literature for the equivalent problem formulated for the classical SIR model, which can be recovered as a special case of our model when the infection rate is constant. Our contribution enhances the understanding of epidemic management with adaptive human behavior, offering insights for robust containment strategies.
comment: 15 pages, 3 figures
Co-Investment with Payoff-Sharing Mechanism for Cooperative Decision-Making in Network Design Games
Network-based systems are inherently interconnected, with the design and performance of subnetworks being interdependent. However, the decisions of self-interested operators may lead to suboptimal outcomes for users and the overall system. This paper explores cooperative mechanisms that can simultaneously benefit both operators and users. We address this challenge using a game-theoretical framework that integrates both non-cooperative and cooperative game theory. In the non-cooperative stage, we propose a network design game in which subnetwork decision-makers strategically design local infrastructures. In the cooperative stage, co-investment with payoff-sharing mechanism is developed to enlarge collective benefits and fairly distribute them. To demonstrate the effectiveness of our framework, we conduct case studies on the Sioux Falls network and real-world public transport networks in Zurich and Winterthur, Switzerland. Our evaluation considers impacts on environmental sustainability, social welfare, and economic efficiency. The proposed framework provides a foundation for improving interdependent networked systems by enabling strategic cooperation among self-interested operators.
Semi-on-Demand Hybrid Transit Route Design with Shared Autonomous Mobility Services
Shared Autonomous Vehicles (SAVs) enable transit agencies to design more agile and responsive services at lower operating costs. This study designs and evaluates a semi-on-demand hybrid route directional service in the public transit network, offering on-demand flexible route service in low-density areas and fixed route service in higher-density areas. We develop analytically tractable cost expressions that capture access, waiting, and riding costs for users, and distance-based operating and time-based vehicle costs for operators. Two formulations are presented for strategic and tactical decisions in flexible route portion, fleet size, headway, and vehicle size optimization, enabling the determination of route types between fixed, hybrid, and flexible routes based on demand, cost, and operational parameters. Analytical results demonstrate that the lower operating costs of SAVs favor more flexible route services. The practical applications and benefits of semi-on-demand feeders are presented with numerical examples and a large-scale case study in the Chicago metropolitan area, USA. Findings reveal scenarios in which flexible route portions serving passengers located further away reduce total costs, particularly user costs, whereas higher demand densities favor more traditional line-based operations. Current cost forecasts suggest smaller vehicles with fully flexible routes are optimal, but operating constraints or higher operating costs would favor larger vehicles with hybrid routes. The study provides an analytical tool to design SAVs as directional services and transit feeders, and tractable continuous approximation formulations for planning and research in transit network design.
comment: 29 pages, 13 figures, previous version presented at the 103rd Transportation Research Board Annual Meeting, Washington, D.C
karl. - A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Metareasoning in uncertain environments: a meta-BAMDP framework
\textit{Reasoning} may be viewed as an algorithm $P$ that makes a choice of an action $a^* \in \mathcal{A}$, aiming to optimize some outcome. However, executing $P$ itself bears costs (time, energy, limited capacity, etc.) and needs to be considered alongside explicit utility obtained by making the choice in the underlying decision problem. Finding the right $P$ can itself be framed as an optimization problem over the space of reasoning processes $P$, generally referred to as \textit{metareasoning}. Conventionally, human metareasoning models assume that the agent knows the transition and reward distributions of the underlying MDP. This paper generalizes such models by proposing a meta Bayes-Adaptive MDP (meta-BAMDP) framework to handle metareasoning in environments with unknown reward/transition distributions, which encompasses a far larger and more realistic set of planning problems that humans and AI systems face. As a first step, we apply the framework to Bernoulli bandit tasks. Owing to the meta problem's complexity, our solutions are necessarily approximate. However, we introduce two novel theorems that significantly enhance the tractability of the problem, enabling stronger approximations that are robust within a range of assumptions grounded in realistic human decision-making scenarios. These results offer a resource-rational perspective and a normative framework for understanding human exploration under cognitive constraints, as well as providing experimentally testable predictions about human behavior in Bernoulli Bandit tasks.
Multi-Momentum Observer Contact Estimation for Bipedal Robots
As bipedal robots become more and more popular in commercial and industrial settings, the ability to control them with a high degree of reliability is critical. To that end, this paper considers how to accurately estimate which feet are currently in contact with the ground so as to avoid improper control actions that could jeopardize the stability of the robot. Additionally, modern algorithms for estimating the position and orientation of a robot's base frame rely heavily on such contact mode estimates. Dedicated contact sensors on the feet can be used to estimate this contact mode, but these sensors are prone to noise, time delays, damage/yielding from repeated impacts with the ground, and are not available on every robot. To overcome these limitations, we propose a momentum observer based method for contact mode estimation that does not rely on such contact sensors. Often, momentum observers assume that the robot's base frame can be treated as an inertial frame. However, since many humanoids' legs represent a significant portion of the overall mass, the proposed method instead utilizes multiple simultaneous dynamic models. Each of these models assumes a different contact condition. A given contact assumption is then used to constrain the full dynamics in order to avoid assuming that either the body is an inertial frame or that a fully accurate estimate of body velocity is known. The (dis)agreement between each model's estimates and measurements is used to determine which contact mode is most likely using a Markov-style fusion method. The proposed method produces contact detection accuracy of up to 98.44% with a low noise simulation and 77.12% when utilizing data collect on the Sarcos Guardian XO robot (a hybrid humanoid/exoskeleton).
Mitigating Dynamic Tip-Over during Mobile Crane Slewing using Input Shaping
Payload swing during rapid slewing of mobile cranes poses a safety risk, as it generates overturning moments that can lead to tip-over accidents of mobile cranes. Currently, to limit the risk of tip-over, mobile crane operators are forced to either reduce the slewing speed (which lowers productivity) or reduce the load being carried to reduce the induced moments. Both of these approaches reduce productivity. This paper seeks to enable rapid slewing without compromising safety by applying input shaping to the crane-slewing commands generated by the operator. A key advantage of this approach is that the input shaper requires only the information about the rope length, and does not require detailed mobile crane dynamics. Simulations and experiments show that the proposed method reduces residual payload swing and enables significantly higher slewing speeds without tip over, reducing slewing completion time by at least 38% compared to unshaped control. Human control with input shaping improves task completion time by 13%, reduces the peak swing by 18%, and reduces the potential of collisions by 82% when compared to unshaped control. Moreover, shaped control with a human had no tip-over, whereas large swing led to tip-over without input shaping. Thereby, the proposed method substantially recovers the operational-safety envelope of mobile cranes (designed to avoid tip-over using static analysis) that would otherwise be lost in dynamic conditions. Videos and demonstrations are available at https://youtu.be/dVy3bbIhrBU.
comment: Submitted to conference
Long-Term Mapping of the Douro River Plume with Multi-Agent Reinforcement Learning
We study the problem of long-term (multiple days) mapping of a river plume using multiple autonomous underwater vehicles (AUVs), focusing on the Douro river representative use-case. We propose an energy - and communication - efficient multi-agent reinforcement learning approach in which a central coordinator intermittently communicates with the AUVs, collecting measurements and issuing commands. Our approach integrates spatiotemporal Gaussian process regression (GPR) with a multi-head Q-network controller that regulates direction and speed for each AUV. Simulations using the Delft3D ocean model demonstrate that our method consistently outperforms both single- and multi-agent benchmarks, with scaling the number of agents both improving mean squared error (MSE) and operational endurance. In some instances, our algorithm demonstrates that doubling the number of AUVs can more than double endurance while maintaining or improving accuracy, underscoring the benefits of multi-agent coordination. Our learned policies generalize across unseen seasonal regimes over different months and years, demonstrating promise for future developments of data-driven long-term monitoring of dynamic plume environments.
comment: Accepted at the 2026 IEEE International Conference on Robotics and Automation
Robotics
TwinRL-VLA: Digital Twin-Driven Reinforcement Learning for Real-World Robotic Manipulation
Despite strong generalization capabilities, Vision-Language-Action (VLA) models remain constrained by the high cost of expert demonstrations and insufficient real-world interaction. While online reinforcement learning (RL) has shown promise in improving general foundation models, applying RL to VLA manipulation in real-world settings is still hindered by low exploration efficiency and a restricted exploration space. Through systematic real-world experiments, we observe that the effective exploration space of online RL is closely tied to the data distribution of supervised fine-tuning (SFT). Motivated by this observation, we propose TwinRL, a digital twin-real-world collaborative RL framework designed to scale and guide exploration for VLA models. First, a high-fidelity digital twin is efficiently reconstructed from smartphone-captured scenes, enabling realistic bidirectional transfer between real and simulated environments. During the SFT warm-up stage, we introduce an exploration space expansion strategy using digital twins to broaden the support of the data trajectory distribution. Building on this enhanced initialization, we propose a sim-to-real guided exploration strategy to further accelerate online RL. Specifically, TwinRL performs efficient and parallel online RL in the digital twin prior to deployment, effectively bridging the gap between offline and online training stages. Subsequently, we exploit efficient digital twin sampling to identify failure-prone yet informative configurations, which are used to guide targeted human-in-the-loop rollouts on the real robot. In our experiments, TwinRL approaches 100% success in both in-distribution regions covered by real-world demonstrations and out-of-distribution regions, delivering at least a 30% speedup over prior real-world RL methods and requiring only about 20 minutes on average across four tasks.
$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies
High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose $χ_{0}$, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. $χ_{0}$ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that $χ_{0}$ surpasses the state-of-the-art $π_{0.5}$ in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.
Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based Driving
Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), season, weather, time (day/night), and agent mix; and measure performance under controlled $k$-factor perturbations ($k \in \{0,1,2,3\}$). Using closed loop control in VISTA, we benchmark FC, CNN, and ViT policies, train compact ViT heads on frozen foundation-model (FM) features, and vary ID support in scale, diversity, and temporal context. (1) ViT policies are markedly more OOD-robust than comparably sized CNN/FC, and FM features yield state-of-the-art success at a latency cost. (2) Naive temporal inputs (multi-frame) do not beat the best single-frame baseline. (3) The largest single factor drops are rural $\rightarrow$ urban and day $\rightarrow$ night ($\sim 31\%$ each); actor swaps $\sim 10\%$, moderate rain $\sim 7\%$; season shifts can be drastic, and combining a time flip with other changes further degrades performance. (4) FM-feature policies stay above $85\%$ under three simultaneous changes; non-FM single-frame policies take a large first-shift hit, and all no-FM models fall below $50\%$ by three changes. (5) Interactions are non-additive: some pairings partially offset, whereas season-time combinations are especially harmful. (6) Training on winter/snow is most robust to single-factor shifts, while a rural+summer baseline gives the best overall OOD performance. (7) Scaling traces/views improves robustness ($+11.8$ points from $5$ to $14$ traces), yet targeted exposure to hard conditions can substitute for scale. (8) Using multiple ID environments broadens coverage and strengthens weak cases (urban OOD $60.6\% \rightarrow 70.1\%$) with a small ID drop; single-ID preserves peak performance but in a narrow domain. These results yield actionable design rules for OOD-robust driving policies.
Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models
The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: language is often too abstract to guide the concrete physical understanding required for robust manipulation. In this work, we introduce Contact-Anchored Policies (CAP), which replace language conditioning with points of physical contact in space. Simultaneously, we structure CAP as a library of modular utility models rather than a monolithic generalist policy. This factorization allows us to implement a real-to-sim iteration cycle: we build EgoGym, a lightweight simulation benchmark, to rapidly identify failure modes and refine our models and datasets prior to real-world deployment. We show that by conditioning on contact and iterating via simulation, CAP generalizes to novel environments and embodiments out of the box on three fundamental manipulation skills while using only 23 hours of demonstration data, and outperforms large, state-of-the-art VLAs in zero-shot evaluations by 56%. All model checkpoints, codebase, hardware, simulation, and datasets will be open-sourced. Project page: https://cap-policy.github.io/
Dexterous Manipulation Policies from RGB Human Videos via 4D Hand-Object Trajectory Reconstruction
Multi-finger robotic hand manipulation and grasping are challenging due to the high-dimensional action space and the difficulty of acquiring large-scale training data. Existing approaches largely rely on human teleoperation with wearable devices or specialized sensing equipment to capture hand-object interactions, which limits scalability. In this work, we propose VIDEOMANIP, a device-free framework that learns dexterous manipulation directly from RGB human videos. Leveraging recent advances in computer vision, VIDEOMANIP reconstructs explicit 4D robot-object trajectories from monocular videos by estimating human hand poses, object meshes, and retargets the reconstructed human motions to robotic hands for manipulation learning. To make the reconstructed robot data suitable for dexterous manipulation training, we introduce hand-object contact optimization with interaction-centric grasp modeling, as well as a demonstration synthesis strategy that generates diverse training trajectories from a single video, enabling generalizable policy learning without additional robot demonstrations. In simulation, the learned grasping model achieves a 70.25% success rate across 20 diverse objects using the Inspire Hand. In the real world, manipulation policies trained from RGB videos achieve an average 62.86% success rate across seven tasks using the LEAP Hand, outperforming retargeting-based methods by 15.87%. Project videos are available at videomanip.github.io.
From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection
Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social norms. Addressing this challenge calls for analyzing interactions between agents and incorporating common-sense reasoning into planning. This paper presents a social robot navigation framework that integrates geometric planning with contextual social reasoning. The system first extracts obstacles and human dynamics to generate geometrically feasible candidate paths, then leverages a fine-tuned vision-language model (VLM) to evaluate these paths, informed by contextually grounded social expectations, selecting a socially optimized path for the controller. This task-specific VLM distills social reasoning from large foundation models into a smaller and efficient model, allowing the framework to perform real-time adaptation in diverse human-robot interaction contexts. Experiments in four social navigation contexts demonstrate that our method achieves the best overall performance with the lowest personal space violation duration, the minimal pedestrian-facing time, and no social zone intrusions. Project page: https://path-etiquette.github.io
comment: Accepted to IEEE Robotics and Automation Letters (RA-L)
CLUE: Crossmodal disambiguation via Language-vision Understanding with attEntion
With the increasing integration of robots into daily life, human-robot interaction has become more complex and multifaceted. A critical component of this interaction is Interactive Visual Grounding (IVG), through which robots must interpret human intentions and resolve ambiguity. Existing IVG models generally lack a mechanism to determine when to ask clarification questions, as they implicitly rely on their learned representations. CLUE addresses this gap by converting the VLM's cross-modal attention into an explicit, spatially grounded signal for deciding when to ask. We extract text to image attention maps and pass them to a lightweight CNN to detect referential ambiguity, while a LoRA fine-tuned decoder conducts the dialog and emits grounding location tokens. We train on a real-world interactive dataset for IVG, and a mixed ambiguity set for the detector. With InViG-only supervision, our model surpasses a state-of-the-art method while using parameter-efficient fine-tuning. Similarly, the ambiguity detector outperforms prior baselines. Overall, CLUE turns the internal cross-modal attention of a VLM into an explicit, spatially grounded signal for deciding when to ask. The data and code are publicly available at: mouadabrini.github.io/clue
WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models
While world models have emerged as a cornerstone of embodied intelligence by enabling agents to reason about environmental dynamics through action-conditioned prediction, their evaluation remains fragmented. Current evaluation of embodied world models has largely focused on perceptual fidelity (e.g., video generation quality), overlooking the functional utility of these models in downstream decision-making tasks. In this work, we introduce WorldArena, a unified benchmark designed to systematically evaluate embodied world models across both perceptual and functional dimensions. WorldArena assesses models through three dimensions: video perception quality, measured with 16 metrics across six sub-dimensions; embodied task functionality, which evaluates world models as data engines, policy evaluators, and action planners integrating with subjective human evaluation. Furthermore, we propose EWMScore, a holistic metric integrating multi-dimensional performance into a single interpretable index. Through extensive experiments on 14 representative models, we reveal a significant perception-functionality gap, showing that high visual quality does not necessarily translate into strong embodied task capability. WorldArena benchmark with the public leaderboard is released at https://worldarena.ai, providing a framework for tracking progress toward truly functional world models in embodied AI.
Reduced-order Control and Geometric Structure of Learned Lagrangian Latent Dynamics
Model-based controllers can offer strong guarantees on stability and convergence by relying on physically accurate dynamic models. However, these are rarely available for high-dimensional mechanical systems such as deformable objects or soft robots. While neural architectures can learn to approximate complex dynamics, they are either limited to low-dimensional systems or provide only limited formal control guarantees due to a lack of embedded physical structure. This paper introduces a latent control framework based on learned structure-preserving reduced-order dynamics for high-dimensional Lagrangian systems. We derive a reduced tracking law for fully actuated systems and adopt a Riemannian perspective on projection-based model-order reduction to study the resulting latent and projected closed-loop dynamics. By quantifying the sources of modeling error, we derive interpretable conditions for stability and convergence. We extend the proposed controller and analysis to underactuated systems by introducing learned actuation patterns. Experimental results on simulated and real-world systems validate our theoretical investigation and the accuracy of our controllers.
comment: 20 pages, 15 figures
Modeling 3D Pedestrian-Vehicle Interactions for Vehicle-Conditioned Pose Forecasting ICRA
Accurately predicting pedestrian motion is crucial for safe and reliable autonomous driving in complex urban environments. In this work, we present a 3D vehicle-conditioned pedestrian pose forecasting framework that explicitly incorporates surrounding vehicle information. To support this, we enhance the Waymo-3DSkelMo dataset with aligned 3D vehicle bounding boxes, enabling realistic modeling of multi-agent pedestrian-vehicle interactions. We introduce a sampling scheme to categorize scenes by pedestrian and vehicle count, facilitating training across varying interaction complexities. Our proposed network adapts the TBIFormer architecture with a dedicated vehicle encoder and pedestrian-vehicle interaction cross-attention module to fuse pedestrian and vehicle features, allowing predictions to be conditioned on both historical pedestrian motion and surrounding vehicles. Extensive experiments demonstrate substantial improvements in forecasting accuracy and validate different approaches for modeling pedestrian-vehicle interactions, highlighting the importance of vehicle-aware 3D pose prediction for autonomous driving. Code is available at: https://github.com/GuangxunZhu/VehCondPose3D
comment: Accepted for IEEE International Conference on Robotics and Automation (ICRA) 2026
Finite-Time Teleoperation of Euler-Lagrange Systems via Energy-Shaping
This paper proposes a family of finite-time controllers for the bilateral teleoperation of fully actuated nonlinear Euler-Lagrange systems. Based on the energy-shaping framework and under the standard assumption of passive interactions with the human and the environment, the controllers ensure that the position error and velocities globally converge to zero in the absence of time delays. In this case, the closed-loop system admits a homogeneous approximation of negative degree, and thus the control objective is achieved in finite-time. The proposed controllers are simple, continuous-time proportional-plus-damping-injection schemes, validated through both simulation and experimental results.
karl. -- A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Multi-Staged Framework for Safety Analysis of Offloaded Services in Distributed Intelligent Transportation Systems
The integration of service-oriented architectures (SOA) with function offloading for distributed, intelligent transportation systems (ITS) offers the opportunity for connected autonomous vehicles (CAVs) to extend their locally available services. One major goal of offloading a subset of functions in the processing chain of a CAV to remote devices is to reduce the overall computational complexity on the CAV. The extension of using remote services, however, requires careful safety analysis, since the remotely created data are corrupted more easily, e.g., through an attacker on the remote device or by intercepting the wireless transmission. To tackle this problem, we first analyze the concept of SOA for distributed environments. From this, we derive a safety framework that validates the reliability of remote services and the data received locally. Since it is possible for the autonomous driving task to offload multiple different services, we propose a specific multi-staged framework for safety analysis dependent on the service composition of local and remote services. For efficiency reasons, we directly include the multi-staged framework for safety analysis in our service-oriented function offloading framework (SOFOF) that we have proposed in earlier work. The evaluation compares the performance of the extended framework considering computational complexity, with energy savings being a major motivation for function offloading, and its capability to detect data from corrupted remote services.
A Generic Service-Oriented Function Offloading Framework for Connected Automated Vehicles
Function offloading is a promising solution to address limitations concerning computational capacity and available energy of Connected Automated Vehicles~(CAVs) or other autonomous robots by distributing computational tasks between local and remote computing devices in form of distributed services. This paper presents a generic function offloading framework that can be used to offload an arbitrary set of computational tasks with a focus on autonomous driving. To provide flexibility, the function offloading framework is designed to incorporate different offloading decision making algorithms and quality of service~(QoS) requirements that can be adjusted to different scenarios or the objectives of the CAVs. With a focus on the applicability, we propose an efficient location-based approach, where the decision whether tasks are processed locally or remotely depends on the location of the CAV. We apply the proposed framework on the use case of service-oriented trajectory planning, where we offload the trajectory planning task of CAVs to a Multi-Access Edge Computing~(MEC) server. The evaluation is conducted in both simulation and real-world application. It demonstrates the potential of the function offloading framework to guarantee the QoS for trajectory planning while improving the computational efficiency of the CAVs. Moreover, the simulation results also show the adaptability of the framework to diverse scenarios involving simultaneous offloading requests from multiple CAVs.
comment: 8 pages, 6 figures, 2 tables, published in RA-L
GaussianCaR: Gaussian Splatting for Efficient Camera-Radar Fusion ICRA 2026
Robust and accurate perception of dynamic objects and map elements is crucial for autonomous vehicles performing safe navigation in complex traffic scenarios. While vision-only methods have become the de facto standard due to their technical advances, they can benefit from effective and cost-efficient fusion with radar measurements. In this work, we advance fusion methods by repurposing Gaussian Splatting as an efficient universal view transformer that bridges the view disparity gap, mapping both image pixels and radar points into a common Bird's-Eye View (BEV) representation. Our main contribution is GaussianCaR, an end-to-end network for BEV segmentation that, unlike prior BEV fusion methods, leverages Gaussian Splatting to map raw sensor information into latent features for efficient camera-radar fusion. Our architecture combines multi-scale fusion with a transformer decoder to efficiently extract BEV features. Experimental results demonstrate that our approach achieves performance on par with, or even surpassing, the state of the art on BEV segmentation tasks (57.3%, 82.9%, and 50.1% IoU for vehicles, roads, and lane dividers) on the nuScenes dataset, while maintaining a 3.2x faster inference runtime. Code and project page are available online.
comment: 8 pages, 6 figures. Accepted to IEEE ICRA 2026
Mind the Gap: Learning Implicit Impedance in Visuomotor Policies via Intent-Execution Mismatch
Teleoperation inherently relies on the human operator acting as a closed-loop controller to actively compensate for hardware imperfections, including latency, mechanical friction, and lack of explicit force feedback. Standard Behavior Cloning (BC), by mimicking the robot's executed trajectory, fundamentally ignores this compensatory mechanism. In this work, we propose a Dual-State Conditioning framework that shifts the learning objective to "Intent Cloning" (master command). We posit that the Intent-Execution Mismatch, the discrepancy between master command and slave response, is not noise, but a critical signal that physically encodes implicit interaction forces and algorithmically reveals the operator's strategy for overcoming system dynamics. By predicting the master intent, our policy learns to generate a "virtual equilibrium point", effectively realizing implicit impedance control. Furthermore, by explicitly conditioning on the history of this mismatch, the model performs implicit system identification, perceiving tracking errors as external forces to close the control loop. To bridge the temporal gap caused by inference latency, we further formulate the policy as a trajectory inpainter to ensure continuous control. We validate our approach on a sensorless, low-cost bi-manual setup. Empirical results across tasks requiring contact-rich manipulation and dynamic tracking reveal a decisive gap: while standard execution-cloning fails due to the inability to overcome contact stiffness and tracking lag, our mismatch-aware approach achieves robust success. This presents a minimalist behavior cloning framework for low-cost hardware, enabling force perception and dynamic compensation without relying on explicit force sensing. Videos are available on the \href{https://xucj98.github.io/mind-the-gap-page/}{project page}.
comment: 14 pages, 9 figures, 5 tables
High-Speed Vision-Based Flight in Clutter with Safety-Shielded Reinforcement Learning
Quadrotor unmanned aerial vehicles (UAVs) are increasingly deployed in complex missions that demand reliable autonomous navigation and robust obstacle avoidance. However, traditional modular pipelines often incur cumulative latency, whereas purely reinforcement learning (RL) approaches typically provide limited formal safety guarantees. To bridge this gap, we propose an end-to-end RL framework augmented with model-based safety mechanisms. We incorporate physical priors in both training and deployment. During training, we design a physics-informed reward structure that provides global navigational guidance. During deployment, we integrate a real-time safety filter that projects the policy outputs onto a provably safe set to enforce strict collision-avoidance constraints. This hybrid architecture reconciles high-speed flight with robust safety assurances. Benchmark evaluations demonstrate that our method outperforms both traditional planners and recent end-to-end obstacle avoidance approaches based on differentiable physics. Extensive experiments demonstrate strong generalization, enabling reliable high-speed navigation in dense clutter and challenging outdoor forest environments at velocities up to 7.5m/s.
Mimic Intent, Not Just Trajectories
While imitation learning (IL) has achieved impressive success in dexterous manipulation through generative modeling and pretraining, state-of-the-art approaches like Vision-Language-Action (VLA) models still struggle with adaptation to environmental changes and skill transfer. We argue this stems from mimicking raw trajectories without understanding the underlying intent. To address this, we propose explicitly disentangling behavior intent from execution details in end-2-end IL: \textit{``Mimic Intent, Not just Trajectories'' (MINT)}. We achieve this via \textit{multi-scale frequency-space tokenization}, which enforces a spectral decomposition of action chunk representation. We learn action tokens with a multi-scale coarse-to-fine structure, and force the coarsest token to capture low-frequency global structure and finer tokens to encode high-frequency details. This yields an abstract \textit{Intent token} that facilitates planning and transfer, and multi-scale \textit{Execution tokens} that enable precise adaptation to environmental dynamics. Building on this hierarchy, our policy generates trajectories through \textit{next-scale autoregression}, performing progressive \textit{intent-to-execution reasoning}, thus boosting learning efficiency and generalization. Crucially, this disentanglement enables \textit{one-shot transfer} of skills, by simply injecting the Intent token from a demonstration into the autoregressive generation process. Experiments on several manipulation benchmarks and on a real robot demonstrate state-of-the-art success rates, superior inference efficiency, robust generalization against disturbances, and effective one-shot transfer.
comment: Under review
A Precise Real-Time Force-Aware Grasping System for Robust Aerial Manipulation
Aerial manipulation requires force-aware capabilities to enable safe and effective grasping and physical interaction. Previous works often rely on heavy, expensive force sensors unsuitable for typical quadrotor platforms, or perform grasping without force feedback, risking damage to fragile objects. To address these limitations, we propose a novel force-aware grasping framework incorporating six low-cost, sensitive skin-like tactile sensors. We introduce a magnetic-based tactile sensing module that provides high-precision three-dimensional force measurements. We eliminate geomagnetic interference through a reference Hall sensor and simplify the calibration process compared to previous work. The proposed framework enables precise force-aware grasping control, allowing safe manipulation of fragile objects and real-time weight measurement of grasped items. The system is validated through comprehensive real-world experiments, including balloon grasping, dynamic load variation tests, and ablation studies, demonstrating its effectiveness in various aerial manipulation scenarios. Our approach achieves fully onboard operation without external motion capture systems, significantly enhancing the practicality of force-sensitive aerial manipulation. The supplementary video is available at: https://www.youtube.com/watch?v=mbcZkrJEf1I.
MOSAIC: Bridging the Sim-to-Real Gap in Generalist Humanoid Motion Tracking and Teleoperation with Rapid Residual Adaptation
Generalist humanoid motion trackers have recently achieved strong simulation metrics by scaling data and training, yet often remain brittle on hardware during sustained teleoperation due to interface- and dynamics-induced errors. We present MOSAIC, an open-source, full-stack system for humanoid motion tracking and whole-body teleoperation across multiple interfaces. MOSAIC first learns a teleoperation-oriented general motion tracker via RL on a multi-source motion bank with adaptive resampling and rewards that emphasize world-frame motion consistency, which is critical for mobile teleoperation. To bridge the sim-to-real interface gap without sacrificing generality, MOSAIC then performs rapid residual adaptation: an interface-specific policy is trained using minimal interface-specific data, and then distilled into the general tracker through an additive residual module, outperforming naive fine-tuning or continual learning. We validate MOSAIC with systematic ablations, out-of-distribution benchmarking, and real-robot experiments demonstrating robust offline motion replay and online long-horizon teleoperation under realistic latency and noise.
Head-to-Head autonomous racing at the limits of handling in the A2RL challenge
Autonomous racing presents a complex challenge involving multi-agent interactions between vehicles operating at the limit of performance and dynamics. As such, it provides a valuable research and testing environment for advancing autonomous driving technology and improving road safety. This article presents the algorithms and deployment strategies developed by the TUM Autonomous Motorsport team for the inaugural Abu Dhabi Autonomous Racing League (A2RL). We showcase how our software emulates human driving behavior, pushing the limits of vehicle handling and multi-vehicle interactions to win the A2RL. Finally, we highlight the key enablers of our success and share our most significant learnings.
comment: Submitted to Science Robotics for possible publication
Constrained Sampling to Guide Universal Manipulation RL
We consider how model-based solvers can be leveraged to guide training of a universal policy to control from any feasible start state to any feasible goal in a contact-rich manipulation setting. While Reinforcement Learning (RL) has demonstrated its strength in such settings, it may struggle to sufficiently explore and discover complex manipulation strategies, especially in sparse-reward settings. Our approach is based on the idea of a lower-dimensional manifold of feasible, likely-visited states during such manipulation and to guide RL with a sampler from this manifold. We propose Sample-Guided RL, which uses model-based constraint solvers to efficiently sample feasible configurations (satisfying differentiable collision, contact, and force constraints) and leverage them to guide RL for universal (goal-conditioned) manipulation policies. We study using this data directly to bias state visitation, as well as using black-box optimization of open-loop trajectories between random configurations to impose a state bias and optionally add a behavior cloning loss. In a minimalistic double sphere manipulation setting, Sample-Guided RL discovers complex manipulation strategies and achieves high success rates in reaching any statically stable state. In a more challenging panda arm setting, our approach achieves a significant success rate over a near-zero baseline, and demonstrates a breadth of complex whole-body-contact manipulation strategies.
UniPlan: Vision-Language Task Planning for Mobile Manipulation with Unified PDDL Formulation
Integration of VLM reasoning with symbolic planning has proven to be a promising approach to real-world robot task planning. Existing work like UniDomain effectively learns symbolic manipulation domains from real-world demonstrations, described in Planning Domain Definition Language (PDDL), and has successfully applied them to real-world tasks. These domains, however, are restricted to tabletop manipulation. We propose UniPlan, a vision-language task planning system for long-horizon mobile-manipulation in large-scale indoor environments, that unifies scene topology, visuals, and robot capabilities into a holistic PDDL representation. UniPlan programmatically extends learned tabletop domains from UniDomain to support navigation, door traversal, and bimanual coordination. It operates on a visual-topological map, comprising navigation landmarks anchored with scene images. Given a language instruction, UniPlan retrieves task-relevant nodes from the map and uses a VLM to ground the anchored image into task-relevant objects and their PDDL states; next, it reconnects these nodes to a compressed, densely-connected topological map, also represented in PDDL, with connectivity and costs derived from the original map; Finally, a mobile-manipulation plan is generated using off-the-shelf PDDL solvers. Evaluated on human-raised tasks in a large-scale map with real-world imagery, UniPlan significantly outperforms VLM and LLM+PDDL planning in success rate, plan quality, and computational efficiency.
Characteristics, Management, and Utilization of Muscles in Musculoskeletal Humanoids: Empirical Study on Kengoro and Musashi
Various musculoskeletal humanoids have been developed so far, and numerous studies on control mechanisms have been conducted to leverage the advantages of their biomimetic bodies. However, there has not been sufficient and unified discussion on the diverse properties inherent in these musculoskeletal structures, nor on how to manage and utilize them. Therefore, this study categorizes and analyzes the characteristics of muscles, as well as their management and utilization methods, based on the various research conducted on the musculoskeletal humanoids we have developed, Kengoro and Musashi. We classify the features of the musculoskeletal structure into five properties: Redundancy, Independency, Anisotropy, Variable Moment Arm, and Nonlinear Elasticity. We then organize the diverse advantages and disadvantages of musculoskeletal humanoids that arise from the combination of these properties. In particular, we discuss body schema learning and reflex control, along with muscle grouping and body schema adaptation. Also, we describe the implementation of movements through an integrated system and discuss future challenges and prospects.
comment: Accepted to Advanced Intelligent Systems
Reliability-aware Execution Gating for Near-field and Off-axis Vision-guided Robotic Alignment
Vision-guided robotic systems are increasingly deployed in precision alignment tasks that require reliable execution under near-field and off-axis configurations. While recent advances in pose estimation have significantly improved numerical accuracy, practical robotic systems still suffer from frequent execution failures even when pose estimates appear accurate. This gap suggests that pose accuracy alone is insufficient to guarantee execution-level reliability. In this paper, we reveal that such failures arise from a deterministic geometric error amplification mechanism, in which small pose estimation errors are magnified through system structure and motion execution, leading to unstable or failed alignment. Rather than modifying pose estimation algorithms, we propose a Reliability-aware Execution Gating mechanism that operates at the execution level. The proposed approach evaluates geometric consistency and configuration risk before execution, and selectively rejects or scales high-risk pose updates. We validate the proposed method on a real UR5 robotic platform performing single-step visual alignment tasks under varying camera-target distances and off-axis configurations. Experimental results demonstrate that the proposed execution gating significantly improves task success rates, reduces execution variance, and suppresses tail-risk behavior, while leaving average pose accuracy largely unchanged. Importantly, the proposed mechanism is estimator-agnostic and can be readily integrated with both classical geometry-based and learning-based pose estimation pipelines. These results highlight the importance of execution-level reliability modeling and provide a practical solution for improving robustness in near-field vision-guided robotic systems.
comment: 7 pages, 1 figure
UAV-Supported Maritime Search System: Experience from Valun Bay Field Trials
This paper presents the integration of flow field reconstruction, dynamic probabilistic modeling, search control, and machine vision detection in a system for autonomous maritime search operations. Field experiments conducted in Valun Bay (Cres Island, Croatia) involved real-time drifter data acquisition, surrogate flow model fitting based on computational fluid dynamics and numerical optimization, advanced multi-UAV search control and vision sensing, as well as deep learning-based object detection. The results demonstrate that a tightly coupled approach enables reliable detection of floating targets under realistic uncertainties and complex environmental conditions, providing concrete insights for future autonomous maritime search and rescue applications.
Post-Collision Trajectory Restoration for a Single-track Ackermann Vehicle using Heuristic Steering and Tractive Force Functions
Post-collision trajectory restoration is a safety-critical capability for autonomous vehicles, as impact-induced lateral motion and yaw transients can rapidly drive the vehicle away from the intended path. This paper proposes a structured heuristic recovery control law that jointly commands steering and tractive force for a generalized single-track Ackermann vehicle model. The formulation explicitly accounts for time-varying longitudinal velocity in the lateral-yaw dynamics and retains nonlinear steering-coupled interaction terms that are commonly simplified in the literature. Unlike approaches that assume constant longitudinal speed, the proposed design targets the transient post-impact regime where speed variations and nonlinear coupling significantly influence recovery. The method is evaluated in simulation on the proposed generalized single-track model and a standard 3DOF single-track reference model in MATLAB, demonstrating consistent post-collision restoration behaviour across representative initial post-impact conditions.
comment: 10 pages, 6 figures
SteerVLA: Steering Vision-Language-Action Models in Long-Tail Driving Scenarios
A fundamental challenge in autonomous driving is the integration of high-level, semantic reasoning for long-tail events with low-level, reactive control for robust driving. While large vision-language models (VLMs) trained on web-scale data offer powerful common-sense reasoning, they lack the grounded experience necessary for safe vehicle control. We posit that an effective autonomous agent should leverage the world knowledge of VLMs to guide a steerable driving policy toward robust control in driving scenarios. To this end, we propose SteerVLA, which leverages the reasoning capabilities of VLMs to produce fine-grained language instructions that steer a vision-language-action (VLA) driving policy. Key to our method is this rich language interface between the high-level VLM and low-level VLA, which allows the high-level policy to more effectively ground its reasoning in the control outputs of the low-level policy. To provide fine-grained language supervision aligned with vehicle control, we leverage a VLM to augment existing driving data with detailed language annotations, which we find to be essential for effective reasoning and steerability. We evaluate SteerVLA on a challenging closed-loop benchmark, where it outperforms state-of-the-art methods by 4.77 points in overall driving score and by 8.04 points on a long-tail subset. The project website is available at: https://steervla.github.io/.
Bi-Adapt: Few-shot Bimanual Adaptation for Novel Categories of 3D Objects via Semantic Correspondence
Bimanual manipulation is imperative yet challenging for robots to execute complex tasks, requiring coordinated collaboration between two arms. However, existing methods for bimanual manipulation often rely on costly data collection and training, struggling to generalize to unseen objects in novel categories efficiently. In this paper, we present Bi-Adapt, a novel framework designed for efficient generalization for bimanual manipulation via semantic correspondence. Bi-Adapt achieves cross-category affordance mapping by leveraging the strong capability of vision foundation models. Fine-tuning with restricted data on novel categories, Bi-Adapt exhibits notable generalization to out-of-category objects in a zero-shot manner. Extensive experiments conducted in both simulation and real-world environments validate the effectiveness of our approach and demonstrate its high efficiency, achieving a high success rate on different benchmark tasks across novel categories with limited data. Project website: https://biadapt-project.github.io/
Decentralized Intent-Based Multi-Robot Task Planner with LLM Oracles on Hyperledger Fabric
Large language models (LLMs) have opened new opportunities for transforming natural language user intents into executable actions. This capability enables embodied AI agents to perform complex tasks, without involvement of an expert, making human-robot interaction (HRI) more convenient. However these developments raise significant security and privacy challenges such as self-preferencing, where a single LLM service provider dominates the market and uses this power to promote their own preferences. LLM oracles have been recently proposed as a mechanism to decentralize LLMs by executing multiple LLMs from different vendors and aggregating their outputs to obtain a more reliable and trustworthy final result. However, the accuracy of these approaches highly depends on the aggregation method. The current aggregation methods mostly use semantic similarity between various LLM outputs, not suitable for robotic task planning, where the temporal order of tasks is important. To fill the gap, we propose an LLM oracle with a new aggregation method for robotic task planning. In addition, we propose a decentralized multi-robot infrastructure based on Hyperledger Fabric that can host the proposed oracle. The proposed infrastructure enables users to express their natural language intent to the system, which then can be decomposed into subtasks. These subtasks require coordinating different robots from different vendors, while enforcing fine-grained access control management on the data. To evaluate our methodology, we created the SkillChain-RTD benchmark made it publicly available. Our experimental results demonstrate the feasibility of the proposed architecture, and the proposed aggregation method outperforms other aggregation methods currently in use.
Graph-Loc: Robust Graph-Based LiDAR Pose Tracking with Compact Structural Map Priors under Low Observability and Occlusion
Map-based LiDAR pose tracking is essential for long-term autonomous operation, where onboard map priors need be compact for scalable storage and fast retrieval, while online observations are often partial, repetitive, and heavily occluded. We propose Graph-Loc, a graph-based localization framework that tracks the platform pose against compact structural map priors represented as a lightweight point-line graph. Such priors can be constructed from heterogeneous sources commonly available in practice, including polygon outlines vectorized from occupancy/grid maps and CAD/model/floor-plan layouts. For each incoming LiDAR scan, Graph-Loc extracts sparse point and line primitives to form an observation graph, retrieves a pose-conditioned visible subgraph via LiDAR ray simulation, and performs scan-to-map association through unbalanced optimal transport with a local graph-context regularizer. The unbalanced formulation relaxes mass conservation, improving robustness to missing, spurious, and fragmented structures under occlusion. To enhance stability in low-observability segments, we estimate information anisotropy from the refinement normal matrix and defer updates along weakly constrained directions until sufficient constraints reappear. Experiments on public benchmarks, controlled stress tests, and real-world deployments demonstrate accurate and stable tracking with KB-level priors from heterogeneous map sources, including under geometrically degenerate and sustained occlusion and in the presence of gradual scene changes.
comment: 13 pages, 8 figures, 8 tables
BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models
Multimodal Large Language Models (MLLMs) have significantly advanced embodied AI, and using them to benchmark robotic intelligence has become a pivotal trend. However, existing frameworks remain predominantly confined to single-arm manipulation, failing to capture the spatio-temporal coordination required for bimanual tasks like lifting a heavy pot. To address this, we introduce BiManiBench, a hierarchical benchmark evaluating MLLMs across three tiers: fundamental spatial reasoning, high-level action planning, and low-level end-effector control. Our framework isolates unique bimanual challenges, such as arm reachability and kinematic constraints, thereby distinguishing perceptual hallucinations from planning failures. Analysis of over 30 state-of-the-art models reveals that despite high-level reasoning proficiency, MLLMs struggle with dual-arm spatial grounding and control, frequently resulting in mutual interference and sequencing errors. These findings suggest the current paradigm lacks a deep understanding of mutual kinematic constraints, highlighting the need for future research to focus on inter-arm collision-avoidance and fine-grained temporal sequencing.
comment: 38 pages, 9 figures. Project page:https://bimanibench.github.io/
Learning Human-Like Badminton Skills for Humanoid Robots
Realizing versatile and human-like performance in high-demand sports like badminton remains a formidable challenge for humanoid robotics. Unlike standard locomotion or static manipulation, this task demands a seamless integration of explosive whole-body coordination and precise, timing-critical interception. While recent advances have achieved lifelike motion mimicry, bridging the gap between kinematic imitation and functional, physics-aware striking without compromising stylistic naturalness is non-trivial. To address this, we propose Imitation-to-Interaction, a progressive reinforcement learning framework designed to evolve a robot from a "mimic" to a capable "striker." Our approach establishes a robust motor prior from human data, distills it into a compact, model-based state representation, and stabilizes dynamics via adversarial priors. Crucially, to overcome the sparsity of expert demonstrations, we introduce a manifold expansion strategy that generalizes discrete strike points into a dense interaction volume. We validate our framework through the mastery of diverse skills, including lifts and drop shots, in simulation. Furthermore, we demonstrate the first zero-shot sim-to-real transfer of anthropomorphic badminton skills to a humanoid robot, successfully replicating the kinetic elegance and functional precision of human athletes in the physical world.
comment: 10 pages, 4 figures
Vec-QMDP: Vectorized POMDP Planning on CPUs for Real-Time Autonomous Driving
Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving $227\times$--$1073\times$ speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evaluated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.
Controlled Flight of an Insect-Scale Flapping-Wing Robot via Integrated Onboard Sensing and Computation
Aerial insects can effortlessly navigate dense vegetation, whereas similarly sized aerial robots typically depend on offboard sensors and computation to maintain stable flight. This disparity restricts insect-scale robots to operation within motion capture environments, substantially limiting their applicability to tasks such as search-and-rescue and precision agriculture. In this work, we present a 1.29-gram aerial robot capable of hovering and tracking trajectories with solely onboard sensing and computation. The combination of a sensor suite, estimators, and a low-level controller achieved centimeter-scale positional flight accuracy. Additionally, we developed a hierarchical controller in which a human operator provides high-level commands to direct the robot's motion. In a 30-second flight experiment conducted outside a motion capture system, the robot avoided obstacles and ultimately landed on a sunflower. This level of sensing and computational autonomy represents a significant advancement for the aerial microrobotics community, further opening opportunities to explore onboard planning and power autonomy.
comment: 22 pages, 7 figures
Personalized Autonomous Driving via Optimal Control with Clearance Constraints from Questionnaires
Driving without considering the preferred separation distance from surrounding vehicles may cause discomfort for users. To address this limitation, we propose a planning framework that explicitly incorporates user preferences regarding the desired level of safe clearance from surrounding vehicles. We design a questionnaire purposefully tailored to capture user preferences relevant to our framework, while minimizing unnecessary questions. Specifically, the questionnaire considers various interaction-relevant factors, including the surrounding vehicle's size, speed, position, and maneuvers of surrounding vehicles, as well as the maneuvers of the ego vehicle. The response indicates the user-preferred clearance for the scenario defined by the question and is incorporated as constraints in the optimal control problem. However, it is impractical to account for all possible scenarios that may arise in a driving environment within a single optimal control problem, as the resulting computational complexity renders real-time implementation infeasible. To overcome this limitation, we approximate the original problem by decomposing it into multiple subproblems, each dealing with one fixed scenario. We then solve these subproblems in parallel and select one using the cost function from the original problem. To validate our work, we conduct simulations using different user responses to the questionnaire. We assess how effectively our planner reflects user preferences compared to preference-agnostic baseline planners by measuring preference alignment.
Benchmarking Autonomous Vehicles: A Driver Foundation Model Framework
Autonomous vehicles (AVs) are poised to revolutionize global transportation systems. However, its widespread acceptance and market penetration remain significantly below expectations. This gap is primarily driven by persistent challenges in safety, comfort, commuting efficiency and energy economy when compared to the performance of experienced human drivers. We hypothesize that these challenges can be addressed through the development of a driver foundation model (DFM). Accordingly, we propose a framework for establishing DFMs to comprehensively benchmark AVs. Specifically, we describe a large-scale dataset collection strategy for training a DFM, discuss the core functionalities such a model should possess, and explore potential technical solutions to realize these functionalities. We further present the utility of the DFM across the operational spectrum, from defining human-centric safety envelopes to establishing benchmarks for energy economy. Overall, We aim to formalize the DFM concept and introduce a new paradigm for the systematic specification, verification and validation of AVs.
ReefFlex: A Generative Design Framework for Soft Robotic Grasping of Organic and Fragile objects
Climate change, invasive species and human activities are currently damaging the world's coral reefs at unprecedented rates, threatening their vast biodiversity and fisheries, and reducing coastal protection. Solving this vast challenge requires scalable coral regeneration technologies that can breed climate-resilient species and accelerate the natural regrowth processes; actions that are impeded by the absence of safe and robust tools to handle the fragile coral. We investigate ReefFlex, a generative soft finger design methodology that explores a diverse space of soft fingers to produce a set of candidates capable of safely grasping fragile and geometrically heterogeneous coral in a cluttered environment. Our key insight is encoding heterogeneous grasping into a reduced set of motion primitives, creating a simplified, tractable multi-objective optimisation problem. To evaluate the method, we design a soft robot for reef rehabilitation, which grows and manipulates coral in onshore aquaculture facilities for future reef out-planting. We demonstrate ReefFlex increases both grasp success and grasp quality (disturbance resistance, positioning accuracy) and reduces in adverse events encountered during coral manipulation compared to reference designs. ReefFlex, offers a generalisable method to design soft end-effectors for complex handling and paves a pathway towards automation in previously unachievable domains like coral handling for restoration.
DexFormer: Cross-Embodied Dexterous Manipulation via History-Conditioned Transformer
Dexterous manipulation remains one of the most challenging problems in robotics, requiring coherent control of high-DoF hands and arms under complex, contact-rich dynamics. A major barrier is embodiment variability: different dexterous hands exhibit distinct kinematics and dynamics, forcing prior methods to train separate policies or rely on shared action spaces with per-embodiment decoder heads. We present DexFormer, an end-to-end, dynamics-aware cross-embodiment policy built on a modified transformer backbone that conditions on historical observations. By using temporal context to infer morphology and dynamics on the fly, DexFormer adapts to diverse hand configurations and produces embodiment-appropriate control actions. Trained over a variety of procedurally generated dexterous-hand assets, DexFormer acquires a generalizable manipulation prior and exhibits strong zero-shot transfer to Leap Hand, Allegro Hand, and Rapid Hand. Our results show that a single policy can generalize across heterogeneous hand embodiments, establishing a scalable foundation for cross-embodiment dexterous manipulation. Project website: https://davidlxu.github.io/DexFormer-web/.
Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes ICRA 2026
In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian Splatting (3DGS) offers a distinct advantage, as it can explicitly guide the selection of subsequent viewpoints and then refine the representation with new observations. However, existing approaches rely solely on geometric cues, neglect manipulation-relevant semantics, and tend to prioritize exploitation over exploration. To tackle these limitations, we introduce an instance-aware Next Best View (NBV) policy that prioritizes underexplored regions by leveraging object features. Specifically, our object-aware 3DGS distills instancelevel information into one-hot object vectors, which are used to compute confidence-weighted information gain that guides the identification of regions associated with erroneous and uncertain Gaussians. Furthermore, our method can be easily adapted to an object-centric NBV, which focuses view selection on a target object, thereby improving reconstruction robustness to object placement. Experiments demonstrate that our NBV policy reduces depth error by up to 77.14% on the synthetic dataset and 34.10% on the real-world GraspNet dataset compared to baselines. Moreover, compared to targeting the entire scene, performing NBV on a specific object yields an additional reduction of 25.60% in depth error for that object. We further validate the effectiveness of our approach through real-world robotic manipulation tasks.
comment: 9 pages, 8 figures, 4 tables, accepted to ICRA 2026
Aerial Manipulation with Contact-Aware Onboard Perception and Hybrid Control ICRA 2026
Aerial manipulation (AM) promises to move Unmanned Aerial Vehicles (UAVs) beyond passive inspection to contact-rich tasks such as grasping, assembly, and in-situ maintenance. Most prior AM demonstrations rely on external motion capture (MoCap) and emphasize position control for coarse interactions, limiting deployability. We present a fully onboard perception-control pipeline for contact-rich AM that achieves accurate motion tracking and regulated contact wrenches without MoCap. The main components are (1) an augmented visual-inertial odometry (VIO) estimator with contact-consistency factors that activate only during interaction, tightening uncertainty around the contact frame and reducing drift, and (2) image-based visual servoing (IBVS) to mitigate perception-control coupling, together with a hybrid force-motion controller that regulates contact wrenches and lateral motion for stable contact. Experiments show that our approach closes the perception-to-wrench loop using only onboard sensing, yielding an velocity estimation improvement of 66.01% at contact, reliable target approach, and stable force holding-pointing toward deployable, in-the-wild aerial manipulation.
comment: 9 pages, 7 figures. Accepted by ICRA 2026
STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction
Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture multimodality. However, iterative denoising leads to substantial inference latency, limiting control frequency in real-time closed-loop systems. Existing acceleration methods either reduce sampling steps, bypass diffusion through direct prediction, or reuse past actions, but often struggle to jointly preserve action quality and achieve consistently low latency. In this work, we propose STEP, a lightweight spatiotemporal consistency prediction mechanism to construct high-quality warm-start actions that are both distributionally close to the target action and temporally consistent, without compromising the generative capability of the original diffusion policy. Then, we propose a velocity-aware perturbation injection mechanism that adaptively modulates actuation excitation based on temporal action variation to prevent execution stall especially for real-world tasks. We further provide a theoretical analysis showing that the proposed prediction induces a locally contractive mapping, ensuring convergence of action errors during diffusion refinement. We conduct extensive evaluations on nine simulated benchmarks and two real-world tasks. Notably, STEP with 2 steps can achieve an average 21.6% and 27.5% higher success rate than BRIDGER and DDIM on the RoboMimic benchmark and real-world tasks, respectively. These results demonstrate that STEP consistently advances the Pareto frontier of inference latency and success rate over existing methods.
comment: 13 pages, 9 figures
Chamelion: Reliable Change Detection for Long-Term LiDAR Mapping in Transient Environments
Online change detection is crucial for mobile robots to efficiently navigate through dynamic environments. Detecting changes in transient settings, such as active construction sites or frequently reconfigured indoor spaces, is particularly challenging due to frequent occlusions and spatiotemporal variations. Existing approaches often struggle to detect changes and fail to update the map across different observations. To address these limitations, we propose a dual-head network designed for online change detection and long-term map maintenance. A key difficulty in this task is the collection and alignment of real-world data, as manually registering structural differences over time is both labor-intensive and often impractical. To overcome this, we develop a data augmentation strategy that synthesizes structural changes by importing elements from different scenes, enabling effective model training without the need for extensive ground-truth annotations. Experiments conducted at real-world construction sites and in indoor office environments demonstrate that our approach generalizes well across diverse scenarios, achieving efficient and accurate map updates.\resubmit{Our source code and additional material are available at: https://chamelion-pages.github.io/.
comment: 8 pages, IEEE Robot. Automat. Lett. (RA-L) 2026
Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning
Embodied Chain-of-Thought (CoT) reasoning has significantly enhanced Vision-Language-Action (VLA) models, yet current methods rely on rigid templates to specify reasoning primitives (e.g., objects in the scene, high-level plans, structural affordances). These templates can force policies to process irrelevant information that distracts from critical action-prediction signals. This creates a bottleneck: without successful policies, we cannot verify reasoning quality; without quality reasoning, we cannot build robust policies. We introduce R&B-EnCoRe, which enables models to bootstrap embodied reasoning from internet-scale knowledge through self-supervised refinement. By treating reasoning as a latent variable within importance-weighted variational inference, models can generate and distill a refined reasoning training dataset of embodiment-specific strategies without external rewards, verifiers, or human annotation. We validate R&B-EnCoRe across manipulation (Franka Panda in simulation, WidowX in hardware), legged navigation (bipedal, wheeled, bicycle, quadruped), and autonomous driving embodiments using various VLA architectures with 1B, 4B, 7B, and 30B parameters. Our approach achieves 28% gains in manipulation success, 101% improvement in navigation scores, and 21% reduction in collision-rate metric over models that indiscriminately reason about all available primitives. R&B-EnCoRe enables models to distill reasoning that is predictive of successful control, bypassing manual annotation engineering while grounding internet-scale knowledge in physical execution.
Data-centric Design of Learning-based Surgical Gaze Perception Models in Multi-Task Simulation
In robot-assisted minimally invasive surgery (RMIS), reduced haptic feedback and depth cues increase reliance on expert visual perception, motivating gaze-guided training and learning-based surgical perception models. However, operative expert gaze is costly to collect, and it remains unclear how the source of gaze supervision, both expertise level (intermediate vs. novice) and perceptual modality (active execution vs. passive viewing), shapes what attention models learn. We introduce a paired active-passive, multi-task surgical gaze dataset collected on the da Vinci SimNow simulator across four drills. Active gaze was recorded during task execution using a VR headset with eye tracking, and the corresponding videos were reused as stimuli to collect passive gaze from observers, enabling controlled same-video comparisons. We quantify skill- and modality-dependent differences in gaze organization and evaluate the substitutability of passive gaze for operative supervision using fixation density overlap analyses and single-frame saliency modeling. Across settings, MSI-Net produced stable, interpretable predictions, whereas SalGAN was unstable and often poorly aligned with human fixations. Models trained on passive gaze recovered a substantial portion of intermediate active attention, but with predictable degradation, and transfer was asymmetric between active and passive targets. Notably, novice passive labels approximated intermediate-passive targets with limited loss on higher-quality demonstrations, suggesting a practical path for scalable, crowd-sourced gaze supervision in surgical coaching and perception modeling.
comment: 8 pages, conference submission pre-print
STaR: Scalable Task-Conditioned Retrieval for Long-Horizon Multimodal Robot Memory
Mobile robots are often deployed over long durations in diverse open, dynamic scenes, including indoor setting such as warehouses and manufacturing facilities, and outdoor settings such as agricultural and roadway operations. A core challenge is to build a scalable long-horizon memory that supports an agentic workflow for planning, retrieval, and reasoning over open-ended instructions at variable granularity, while producing precise, actionable answers for navigation. We present STaR, an agentic reasoning framework that (i) constructs a task-agnostic, multimodal long-term memory that generalizes to unseen queries while preserving fine-grained environmental semantics (object attributes, spatial relations, and dynamic events), and (ii) introduces a Scalable TaskConditioned Retrieval algorithm based on the Information Bottleneck principle to extract from long-term memory a compact, non-redundant, information-rich set of candidate memories for contextual reasoning. We evaluate STaR on NaVQA (mixed indoor/outdoor campus scenes) and WH-VQA, a customized warehouse benchmark with many visually similar objects built with Isaac Sim, emphasizing contextual reasoning. Across the two datasets, STaR consistently outperforms strong baselines, achieving higher success rates and markedly lower spatial error. We further deploy STaR on a real Husky wheeled robot in both indoor and outdoor environments, demonstrating robust longhorizon reasoning, scalability, and practical utility.
From Legible to Inscrutable Trajectories: (Il)legible Motion Planning Accounting for Multiple Observers
In cooperative environments, such as in factories or assistive scenarios, it is important for a robot to communicate its intentions to observers, who could be either other humans or robots. A legible trajectory allows an observer to quickly and accurately predict an agent's intention. In adversarial environments, such as in military operations or games, it is important for a robot to not communicate its intentions to observers. An illegible trajectory leads an observer to incorrectly predict the agent's intention or delays when an observer is able to make a correct prediction about the agent's intention. However, in some environments there are multiple observers, each of whom may be able to see only part of the environment, and each of whom may have different motives. In this work, we introduce the Mixed-Motive Limited-Observability Legible Motion Planning (MMLO-LMP) problem, which requires a motion planner to generate a trajectory that is legible to observers with positive motives and illegible to observers with negative motives while also considering the visibility limitations of each observer. We highlight multiple strategies an agent can take while still achieving the problem objective. We also present DUBIOUS, a trajectory optimizer that solves MMLO-LMP. Our results show that DUBIOUS can generate trajectories that balance legibility with the motives and limited visibility regions of the observers. Future work includes many variations of MMLO-LMP, including moving observers and observer teaming.
comment: 17 pages, 5 figures
Risk-Aware Obstacle Avoidance Algorithm for Real-Time Applications
Robust navigation in changing marine environments requires autonomous systems capable of perceiving, reasoning, and acting under uncertainty. This study introduces a hybrid risk-aware navigation architecture that integrates probabilistic modeling of obstacles along the vehicle path with smooth trajectory optimization for autonomous surface vessels. The system constructs probabilistic risk maps that capture both obstacle proximity and the behavior of dynamic objects. A risk-biased Rapidly Exploring Random Tree (RRT) planner leverages these maps to generate collision-free paths, which are subsequently refined using B-spline algorithms to ensure trajectory continuity. Three distinct RRT* rewiring modes are implemented based on the cost function: minimizing the path length, minimizing risk, and optimizing a combination of the path length and total risk. The framework is evaluated in experimental scenarios containing both static and dynamic obstacles. The results demonstrate the system's ability to navigate safely, maintain smooth trajectories, and dynamically adapt to changing environmental risks. Compared with conventional LIDAR or vision-only navigation approaches, the proposed method shows improvements in operational safety and autonomy, establishing it as a promising solution for risk-aware autonomous vehicle missions in uncertain and dynamic environments.
Elements of Robot Morphology: Supporting Designers in Robot Form Exploration
Robot morphology, the form, shape, and structure of robots, is a key design space in human-robot interaction (HRI), shaping how robots function, express themselves, and interact with people. Yet, despite its importance, little is known about how design frameworks can guide systematic form exploration. To address this gap, we introduce Elements of Robot Morphology, a framework that identifies five fundamental elements: perception, articulation, end effectors, locomotion, and structure. Derived from an analysis of existing robots, the framework supports structured exploration of diverse robot forms. To operationalize the framework, we developed Morphology Exploration Blocks (MEB), a set of tangible blocks that enable hands-on, collaborative experimentation with robot morphologies. We evaluate the framework and toolkit through a case study and design workshops, showing how they support analysis, ideation, reflection, and collaborative robot design.
comment: 10 pages, 5 figures, Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26)
SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes
Simulation has become a key tool for training and evaluating home robots at scale, yet existing environments fail to capture the diversity and physical complexity of real indoor spaces. Current scene synthesis methods produce sparsely furnished rooms that lack the dense clutter, articulated furniture, and physical properties essential for robotic manipulation. We introduce SceneSmith, a hierarchical agentic framework that generates simulation-ready indoor environments from natural language prompts. SceneSmith constructs scenes through successive stages$\unicode{x2013}$from architectural layout to furniture placement to small object population$\unicode{x2013}$each implemented as an interaction among VLM agents: designer, critic, and orchestrator. The framework tightly integrates asset generation through text-to-3D synthesis for static objects, dataset retrieval for articulated objects, and physical property estimation. SceneSmith generates 3-6x more objects than prior methods, with <2% inter-object collisions and 96% of objects remaining stable under physics simulation. In a user study with 205 participants, it achieves 92% average realism and 91% average prompt faithfulness win rates against baselines. We further demonstrate that these environments can be used in an end-to-end pipeline for automatic robot policy evaluation.
comment: Project page: https://scenesmith.github.io/
Agile asymmetric multi-legged locomotion: contact planning via geometric mechanics and spin model duality
Legged robot research is presently focused on bipedal or quadrupedal robots, despite capabilities to build robots with many more legs to potentially improve locomotion performance. This imbalance is not necessarily due to hardware limitations, but rather to the absence of principled control frameworks that explain when and how additional legs improve locomotion performance. In multi-legged systems, coordinating many simultaneous contacts introduces a severe curse of dimensionality that challenges existing modeling and control approaches. As an alternative, multi-legged robots are typically controlled using low-dimensional gaits originally developed for bipeds or quadrupeds. These strategies fail to exploit the new symmetries and control opportunities that emerge in higher-dimensional systems. In this work, we develop a principled framework for discovering new control structures in multi-legged locomotion. We use geometric mechanics to reduce contact-rich locomotion planning to a graph optimization problem, and propose a spin model duality framework from statistical mechanics to exploit symmetry breaking and guide optimal gait reorganization. Using this approach, we identify an asymmetric locomotion strategy for a hexapod robot that achieves a forward speed of 0.61 body lengths per cycle (a 50% improvement over conventional gaits). The resulting asymmetry appears at both the control and hardware levels. At the control level, the body orientation oscillates asymmetrically between fast clockwise and slow counterclockwise turning phases for forward locomotion. At the hardware level, two legs on the same side remain unactuated and can be replaced with rigid parts without degrading performance. Numerical simulations and robophysical experiments validate the framework and reveal novel locomotion behaviors that emerge from symmetry reforming in high-dimensional embodied systems.
comment: 10 pages, 7 figures. arXiv admin note: text overlap with arXiv:2302.03019
Legs Over Arms: On the Predictive Value of Lower-Body Pose for Human Trajectory Prediction from Egocentric Robot Perception ICRA 2026
Predicting human trajectory is crucial for social robot navigation in crowded environments. While most existing approaches treat human as point mass, we present a study on multi-agent trajectory prediction that leverages different human skeletal features for improved forecast accuracy. In particular, we systematically evaluate the predictive utility of 2D and 3D skeletal keypoints and derived biomechanical cues as additional inputs. Through a comprehensive study on the JRDB dataset and another new dataset for social navigation with 360-degree panoramic videos, we find that focusing on lower-body 3D keypoints yields a 13% reduction in Average Displacement Error and augmenting 3D keypoint inputs with corresponding biomechanical cues provides a further 1-4% improvement. Notably, the performance gain persists when using 2D keypoint inputs extracted from equirectangular panoramic images, indicating that monocular surround vision can capture informative cues for motion forecasting. Our finding that robots can forecast human movement efficiently by watching their legs provides actionable insights for designing sensing capabilities for social robot navigation.
comment: Accepted to IEEE ICRA 2026
Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency
Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a roadside unit (RSU) can sense nearby traffic, perform edge perception, and transmit state estimates to the ego vehicle over vehicle-to-infrastructure (V2I) links. With recent advancements in intelligent transportation infrastructure and edge computing, such RSU-assisted perception is increasingly realistic and already deployed in modern connected roadway systems. However, edge processing time and wireless transmission can introduce stochastic V2I communication delays, violating the Markov assumption and substantially degrading control performance. In this work, we propose DAROM, a Delay-Aware Reinforcement Learning framework for On-ramp Merging that is robust to stochastic delays. We model the problem as a random delay Markov decision process (RDMDP) and develop a unified RL agent for joint longitudinal and lateral control. To recover a Markovian representation under delayed observations, we introduce a Delay-Aware Encoder that conditions on delayed observations, masked action histories, and observed delay magnitude to infer the current latent state. We further integrate a physics-based safety controller to reduce collision risk during merging. Experiments in the Simulation of Urban MObility (SUMO) simulator using real-world traffic data from the Next Generation Simulation (NGSIM) dataset demonstrate that DAROM consistently outperforms standard RL baselines across traffic densities. In particular, the gated recurrent unit (GRU)-based encoder achieves over 99% success in high-density traffic with random V2I delays of up to 2.0 seconds.
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning ICLR 2026
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined ``trigger'', leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent's training pipeline, enabling them to both alter and observe agent's rewards. As these assumptions are infeasible to implement within a simulator, we propose a new attack ``Daze'' which is able to reliably and stealthily implant backdoors into RL agents trained for real world tasks without altering or even observing their rewards. We provide formal proof of Daze's effectiveness in guaranteeing attack success across general RL tasks along with extensive empirical evaluations on both discrete and continuous action space domains. We additionally provide the first example of RL backdoor attacks transferring to real, robotic hardware. These developments motivate further research into securing all components of the RL training pipeline to prevent malicious attacks.
comment: 10 pages main body, ICLR 2026
Multi-Player, Multi-Strategy Quantum Game Model for Interaction-Aware Decision-Making in Automated Driving
Although significant progress has been made in decision-making for automated driving, challenges remain for deployment in the real world. One challenge lies in addressing interaction-awareness. Most existing approaches oversimplify interactions between the ego vehicle and surrounding agents, and often neglect interactions among the agents themselves. A common solution is to model these interactions using classical game theory. However, its formulation assumes rational players, whereas human behavior is frequently uncertain or irrational. To address these challenges, we propose the Quantum Game Decision-Making (QGDM) model, a novel framework that combines classical game theory with quantum mechanics principles (such as superposition, entanglement, and interference) to tackle multi-player, multi-strategy decision-making problems. To the best of our knowledge, this is one of the first studies to apply quantum game theory to decision-making for automated driving. QGDM runs in real time on a standard computer, without requiring quantum hardware. We evaluate QGDM in simulation across various scenarios, including roundabouts, merging, and highways, and compare its performance with multiple baseline methods. Results show that QGDM significantly improves success rates and reduces collision rates compared to classical approaches, particularly in scenarios with high interaction.
Through the Perspective of LiDAR: A Feature-Enriched and Uncertainty-Aware Annotation Pipeline for Terrestrial Point Cloud Segmentation
Accurate semantic segmentation of terrestrial laser scanning (TLS) point clouds is limited by costly manual annotation. We propose a semi-automated, uncertainty-aware pipeline that integrates spherical projection, feature enrichment, ensemble learning, and targeted annotation to reduce labeling effort, while sustaining high accuracy. Our approach projects 3D points to a 2D spherical grid, enriches pixels with multi-source features, and trains an ensemble of segmentation networks to produce pseudo-labels and uncertainty maps, the latter guiding annotation of ambiguous regions. The 2D outputs are back-projected to 3D, yielding densely annotated point clouds supported by a three-tier visualization suite (2D feature maps, 3D colorized point clouds, and compact virtual spheres) for rapid triage and reviewer guidance. Using this pipeline, we build Mangrove3D, a semantic segmentation TLS dataset for mangrove forests. We further evaluate data efficiency and feature importance to address two key questions: (1) how much annotated data are needed and (2) which features matter most. Results show that performance saturates after ~12 annotated scans, geometric features contribute the most, and compact nine-channel stacks capture nearly all discriminative power, with the mean Intersection over Union (mIoU) plateauing at around 0.76. Finally, we confirm the generalization of our feature-enrichment strategy through cross-dataset tests on ForestSemantic and Semantic3D. Our contributions include: (i) a robust, uncertainty-aware TLS annotation pipeline with visualization tools; (ii) the Mangrove3D dataset; and (iii) empirical guidance on data efficiency and feature importance, thus enabling scalable, high-quality segmentation of TLS point clouds for ecological monitoring and beyond. The dataset and processing scripts are publicly available at https://fz-rit.github.io/through-the-lidars-eye/.
comment: 40 pages (28 main text), 20 figures, 4 supplementary materials; links to 3D point animations are included in the last table
Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy networks. However, their integration with online reinforcement learning remains challenging due to fundamental incompatibilities between diffusion model training objectives and standard RL policy improvement mechanisms. This paper presents the first comprehensive review and empirical analysis of current Online Diffusion Policy Reinforcement Learning (Online DPRL) algorithms for scalable robotic control systems. We propose a novel taxonomy that categorizes existing approaches into four distinct families--Action-Gradient, Q-Weighting, Proximity-Based, and Backpropagation Through Time (BPTT) methods--based on their policy improvement mechanisms. Through extensive experiments on a unified NVIDIA Isaac Lab benchmark encompassing 12 diverse robotic tasks, we systematically evaluate representative algorithms across five critical dimensions: task diversity, parallelization capability, diffusion step scalability, cross-embodiment generalization, and environmental robustness. Our analysis identifies key findings regarding the fundamental trade-offs inherent in each algorithmic family, particularly concerning sample efficiency and scalability. Furthermore, we reveal critical computational and algorithmic bottlenecks that currently limit the practical deployment of online DPRL. Based on these findings, we provide concrete guidelines for algorithm selection tailored to specific operational constraints and outline promising future research directions to advance the field toward more general and scalable robotic learning systems.
CoBEVMoE: Heterogeneity-aware Feature Fusion with Dynamic Mixture-of-Experts for Collaborative Perception ICRA 2026
Collaborative perception aims to extend sensing coverage and improve perception accuracy by sharing information among multiple agents. However, due to differences in viewpoints and spatial positions, agents often acquire heterogeneous observations. Existing intermediate fusion methods primarily focus on aligning similar features, often overlooking the perceptual diversity among agents. To address this limitation, we propose CoBEVMoE, a novel collaborative perception framework that operates in the Bird's Eye View (BEV) space and incorporates a Dynamic Mixture-of-Experts (DMoE) architecture. In DMoE, each expert is dynamically generated based on the input features of a specific agent, enabling it to extract distinctive and reliable cues while attending to shared semantics. This design allows the fusion process to explicitly model both feature similarity and heterogeneity across agents. Furthermore, we introduce a Dynamic Expert Metric Loss (DEML) to enhance inter-expert diversity and improve the discriminability of the fused representation. Extensive experiments on the OPV2V and DAIR-V2X-C datasets demonstrate that CoBEVMoE achieves state-of-the-art performance. Specifically, it improves the IoU for Camera-based BEV segmentation by +1.5% on OPV2V and the AP@0.5 for LiDAR-based 3D object detection by +3.0% on DAIR-V2X-C, verifying the effectiveness of expert-based heterogeneous feature modeling in multi-agent collaborative perception. The source code will be made publicly available at https://github.com/godk0509/CoBEVMoE.
comment: Accepted to ICRA 2026. The source code will be made publicly available at https://github.com/godk0509/CoBEVMoE
SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning
Pruning is a typical acceleration technique for compute-bound models by removing computation on unimportant values. Recently, it has been applied to accelerate Vision-Language-Action (VLA) model inference. However, existing acceleration methods focus on local information from the current action step and ignore the global context, leading to >20% success rate drop and limited speedup in some scenarios. In this paper, we point out spatial-temporal consistency in VLA tasks: input images in consecutive steps exhibit high similarity, and propose the key insight that token selection should combine local information with global context of the model. Based on this, we propose SpecPrune-VLA, a training-free, two-level pruning method with heuristic control. (1) Action-level static pruning. We leverage global history and local attention to statically reduce visual tokens per action. (2) Layer-level dynamic pruning. We prune tokens adaptively per layer based on layer-wise importance. (3) Lightweight action-aware controller: We classify actions as coarse- or fine-grained by the speed of the end effector and adjust pruning aggressiveness accordingly. Extensive experiments show that SpecPrune-VLA achieves up to 1.57$\times$ speedup in LIBERO simulation and 1.70$\times$ on real-world tasks, with negligible success rate degradation.
SafeLink: Safety-Critical Control Under Dynamic and Irregular Unsafe Regions
Control barrier functions (CBFs) provide a theoretical foundation for safety-critical control in robotic systems. However, most existing methods rely on explicit analytical expressions of unsafe state regions, which are often impractical for irregular and dynamic unsafe regions. This paper introduces SafeLink, a novel CBF construction method based on cost-sensitive incremental random vector functional-link (RVFL) neural networks. By designing a valid cost function, SafeLink assigns different sensitivities to safe and unsafe state points, thereby eliminating false negatives in classification of unsafe state points. Under the constructed CBF, theoretical guarantees are established regarding system safety and the Lipschitz continuity of the control inputs. Furthermore, incremental update theorems are provided, enabling precise real-time adaptation to changes in unsafe regions. An analytical expression for the gradient of SafeLink is also derived to facilitate control input computation. The proposed method is validated on the endpoint position control task of a nonlinear two-link manipulator. Experimental results demonstrate that the method effectively learns the unsafe regions and rapidly adapts as these regions change, achieving computational speeds significantly faster than baseline methods while ensuring the system safely reaches its target position.
comment: 12 pages, 7 figures
Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models
End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models that directly output throttle, steer and brake. Most recent benchmark protocols and training pipelines are waypoint-based, which makes action-based policies harder to train and compare, slowing their progress. To bridge this waypoint-action gap, we propose a novel, differentiable vehicle-model framework that rolls out predicted action sequences to their corresponding ego-frame waypoint trajectories while supervising in waypoint space. Our approach enables action-based architectures to be trained and evaluated, for the first time, within waypoint-based benchmarks without modifying the underlying evaluation protocol. We extensively evaluate our framework across multiple challenging benchmarks and observe consistent improvements over the baselines. In particular, on NAVSIM \texttt{navhard} our approach achieves state-of-the-art performance. Our code will be made publicly available upon acceptance.
comment: 8 pages, 3 figures
EgoFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving ICRA2026
Current End-to-End Autonomous Driving (E2E-AD) methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized with a fully differentiable framework in a planning-oriented manner, existing end-to-end driving systems lacking ego-centric designs still suffer from unsatisfactory performance and inferior efficiency, due to rasterized scene representation learning and redundant information transmission. In this paper, we propose an ego-centric fully sparse paradigm, named EgoFSD, for end-to-end self-driving. Specifically, EgoFSD consists of sparse perception, hierarchical interaction and iterative motion planner. The sparse perception module performs detection and online mapping based on sparse representation of the driving scene. The hierarchical interaction module aims to select the Closest In-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting from an additional geometric prior. As for the iterative motion planner, both selected interactive agents and ego-vehicle are considered for joint motion prediction, where the output multi-modal ego-trajectories are optimized in an iterative fashion. In addition, position-level motion diffusion and trajectory-level planning denoising are introduced for uncertainty modeling, thereby enhancing the training stability and convergence speed. Extensive experiments are conducted on nuScenes and Bench2Drive datasets, which significantly reduces the average L2 error by 59% and collision rate by 92% than UniAD while achieves 6.9x faster running efficiency.
comment: Accepted to ICRA2026
A Survey of Behavior Foundation Model: Next-Generation Whole-Body Control System of Humanoid Robots
Humanoid robots are drawing significant attention as versatile platforms for complex motor control, human-robot interaction, and general-purpose physical intelligence. However, achieving efficient whole-body control (WBC) in humanoids remains a fundamental challenge due to sophisticated dynamics, underactuation, and diverse task requirements. While learning-based controllers have shown promise for complex tasks, their reliance on labor-intensive and costly retraining for new scenarios limits real-world applicability. To address these limitations, behavior(al) foundation models (BFMs) have emerged as a new paradigm that leverages large-scale pre-training to learn reusable primitive skills and broad behavioral priors, enabling zero-shot or rapid adaptation to a wide range of downstream tasks. In this paper, we present a comprehensive overview of BFMs for humanoid WBC, tracing their development across diverse pre-training pipelines. Furthermore, we discuss real-world applications, current limitations, urgent challenges, and future opportunities, positioning BFMs as a key approach toward scalable and general-purpose humanoid intelligence. Finally, we provide a curated and regularly updated collection of BFM papers and projects to facilitate more subsequent research, which is available at https://github.com/yuanmingqi/awesome-bfm-papers.
comment: 19 pages, 10 figures
Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation
Robotic manipulation requires both rich multimodal perception and effective learning frameworks to handle complex real-world tasks. See-through-skin (STS) sensors, which combine tactile and visual perception, offer promising sensing capabilities, while modern imitation learning provides powerful tools for policy acquisition. However, existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking. Furthermore, integrating these rich multimodal signals into learning-based manipulation pipelines remains an open challenge. We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for manipulation. Our sensor features a fully transparent elastomer, persistent illumination, novel keyline markers, and efficient tracking, while our learning system integrates these signals through a Transformer-based Diffusion Policy. Experiments on five challenging real-world tasks show that TacThru-UMI achieves an average success rate of 85.5%, significantly outperforming the baselines of tactile policy(66.3%) and vision-only policy (55.4%). The system excels in critical scenarios, including contact detection with thin and soft objects and precision manipulation requiring multimodal coordination. This work demonstrates that combining simultaneous multimodal perception with modern learning frameworks enables more precise, adaptable robotic manipulation.
Multi-Robot Data-Free Continual Communicative Learning (CCL) from Black-Box Visual Place Recognition Models
In emerging multi-robot societies, heterogeneous agents must continually extract and integrate local knowledge from one another through communication, even when their internal models are completely opaque. Existing approaches to continual or collaborative learning for visual place recognition (VPR) largely assume white-box access to model parameters or shared training datasets, which is unrealistic when robots encounter unknown peers in the wild. This paper introduces \emph{Continual Communicative Learning (CCL)}, a data-free multi-robot framework in which a traveler robot (student) continually improves its VPR capability by communicating with black-box teacher models via a constrained query--response channel. We repurpose Membership Inference Attacks (MIA), originally developed as privacy attacks on machine learning models, as a constructive communication primitive to reconstruct pseudo-training sets from black-box VPR teachers without accessing their parameters or raw data. To overcome the intrinsic communication bottleneck caused by the low sampling efficiency of black-box MIA, we propose a prior-based query strategy that leverages the student's own VPR prior to focus queries on informative regions of the embedding space, thereby reducing the knowledge transfer (KT) cost. Experimental results on a standard multi-session VPR benchmark demonstrate that the proposed CCL framework yields substantial performance gains for low-performing robots under modest communication budgets, highlighting CCL as a promising building block for scalable and fault-tolerant multi-robot systems. Furthermore, we propose a Distributed Statistic Integration (DSI) framework that theoretically eliminates catastrophic forgetting by efficiently aggregating sufficient statistics from black-box VPR models while maintaining data privacy and reducing communication overhead to a sample-invariant constant complexity.
comment: 6 pages, 4 figures, technical report
OpenGVL -- Benchmarking Visual Temporal Progress for Data Curation
Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately $70\%$ of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at \href{github.com/budzianowski/opengvl}{OpenGVL}.
comment: Workshop on Making Sense of Data in Robotics: Composition, Curation, and Interpretability at Scale at CoRL 2025
Nimbus: A Unified Embodied Synthetic Data Generation Framework
Scaling data volume and diversity is critical for generalizing embodied intelligence. While synthetic data generation offers a scalable alternative to expensive physical data acquisition, existing pipelines remain fragmented and task-specific. This isolation leads to significant engineering inefficiency and system instability, failing to support the sustained, high-throughput data generation required for foundation model training. To address these challenges, we present Nimbus, a unified synthetic data generation framework designed to integrate heterogeneous navigation and manipulation pipelines. Nimbus introduces a modular four-layer architecture featuring a decoupled execution model that separates trajectory planning, rendering, and storage into asynchronous stages. By implementing dynamic pipeline scheduling, global load balancing, distributed fault tolerance, and backend-specific rendering optimizations, the system maximizes resource utilization across CPU, GPU, and I/O resources. Our evaluation demonstrates that Nimbus achieves a 2-3X improvement in end-to-end throughput compared to unoptimized baselines and ensuring robust, long-term operation in large-scale distributed environments. This framework serves as the production backbone for the InternData suite, enabling seamless cross-domain data synthesis.
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning
Mobile manipulators in households must both navigate and manipulate. This requires a compact, semantically rich scene representation that captures where objects are, how they function, and which parts are actionable. Scene graphs are a natural choice, yet prior work often separates spatial and functional relations, treats scenes as static snapshots without object states or temporal updates, and overlooks information most relevant for accomplishing the current task. To address these limitations, we introduce MomaGraph, a unified scene representation for embodied agents that integrates spatial-functional relationships and part-level interactive elements. However, advancing such a representation requires both suitable data and rigorous evaluation, which have been largely missing. We thus contribute MomaGraph-Scenes, the first large-scale dataset of richly annotated, task-driven scene graphs in household environments, along with MomaGraph-Bench, a systematic evaluation suite spanning six reasoning capabilities from high-level planning to fine-grained scene understanding. Built upon this foundation, we further develop MomaGraph-R1, a 7B vision-language model trained with reinforcement learning on MomaGraph-Scenes. MomaGraph-R1 predicts task-oriented scene graphs and serves as a zero-shot task planner under a Graph-then-Plan framework. Extensive experiments demonstrate that our model achieves state-of-the-art results among open-source models, reaching 71.6% accuracy on the benchmark (+11.4% over the best baseline), while generalizing across public benchmarks and transferring effectively to real-robot experiments.
comment: 25 pages, 10 figures. Project page:https://hybridrobotics.github.io/MomaGraph/
AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act
Recent advances in large vision-language models (VLMs) have demonstrated generalizable open-vocabulary perception and reasoning, yet their real-robot manipulation capability remains unclear for long-horizon, closed-loop execution in unstructured, in-the-wild environments. Prior VLM-based manipulation pipelines are difficult to compare across different research groups' setups, and many evaluations rely on simulation, privileged state, or specially designed setups. We present AgenticLab, a model-agnostic robot agent platform and benchmark for open-world manipulation. AgenticLab provides a closed-loop agent pipeline for perception, task decomposition, online verification, and replanning. Using AgenticLab, we benchmark state-of-the-art VLM-based agents on real-robot tasks in unstructured environments. Our benchmark reveals several failure modes that offline vision-language tests (e.g., VQA and static image understanding) fail to capture, including breakdowns in multi-step grounding consistency, object grounding under occlusion and scene changes, and insufficient spatial reasoning for reliable manipulation. We will release the full hardware and software stack to support reproducible evaluation and accelerate research on general-purpose robot agents.
comment: Added appendix
Learning-based Adaptive Control of Quadruped Robots for Active Stabilization on Moving Platforms IROS 2024
A quadruped robot faces balancing challenges on a six-degrees-of-freedom moving platform, like subways, buses, airplanes, and yachts, due to independent platform motions and resultant diverse inertia forces on the robot. To alleviate these challenges, we present the Learning-based Active Stabilization on Moving Platforms (\textit{LAS-MP}), featuring a self-balancing policy and system state estimators. The policy adaptively adjusts the robot's posture in response to the platform's motion. The estimators infer robot and platform states based on proprioceptive sensor data. For a systematic training scheme across various platform motions, we introduce platform trajectory generation and scheduling methods. Our evaluation demonstrates superior balancing performance across multiple metrics compared to three baselines. Furthermore, we conduct a detailed analysis of the \textit{LAS-MP}, including ablation studies and evaluation of the estimators, to validate the effectiveness of each component.
comment: Accepted at IROS 2024. Project Page: https://sgvr.kaist.ac.kr/~msyoon/papers/IROS24/
Enhancing Navigation Efficiency of Quadruped Robots via Leveraging Personal Transportation Platforms ICRA 2025
Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (\textit{RL-ATR}), inspired by humans' utilization of personal transporters, including Segways. The \textit{RL-ATR} features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the \textit{RL-ATR}. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.
comment: Accepted at ICRA 2025. Project page: https://sgvr.kaist.ac.kr/~msyoon/papers/ICRA25/
Learning-based Initialization of Trajectory Optimization for Path-following Problems of Redundant Manipulators ICRA 2023
Trajectory optimization (TO) is an efficient tool to generate a redundant manipulator's joint trajectory following a 6-dimensional Cartesian path. The optimization performance largely depends on the quality of initial trajectories. However, the selection of a high-quality initial trajectory is non-trivial and requires a considerable time budget due to the extremely large space of the solution trajectories and the lack of prior knowledge about task constraints in configuration space. To alleviate the issue, we present a learning-based initial trajectory generation method that generates high-quality initial trajectories in a short time budget by adopting example-guided reinforcement learning. In addition, we suggest a null-space projected imitation reward to consider null-space constraints by efficiently learning kinematically feasible motion captured in expert demonstrations. Our statistical evaluation in simulation shows the improved optimality, efficiency, and applicability of TO when we plug in our method's output, compared with three other baselines. We also show the performance improvement and feasibility via real-world experiments with a seven-degree-of-freedom manipulator.
comment: Accepted at ICRA 2023. Project page: https://sgvr.kaist.ac.kr/~msyoon/papers/ICRA23_RLITG/
DREAM: Domain-aware Reasoning for Efficient Autonomous Underwater Monitoring ICRA 2026
The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time, environment-aware decisions without human intervention, we must equip them with an intelligent "brain." This highlights the need for persistent,wide-area, and low-cost benthic monitoring. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and habitat monitoring. The results show that our framework is highly efficient in finding and exploring target objects (e.g., oysters, shipwrecks) without prior location information. In the oyster-monitoring task, our framework takes 31.5% less time than the previous baseline with the same amount of oysters. Compared to the vanilla VLM, it uses 23% fewer steps while covering 8.88% more oysters. In shipwreck scenes, our framework successfully explores and maps the wreck without collisions, requiring 27.5% fewer steps than the vanilla model and achieving 100% coverage, while the vanilla model achieves 60.23% average coverage in our shipwreck environments.
comment: In Proceeding of ICRA 2026
Modelling Pedestrian Behaviour in Autonomous Vehicle Encounters Using Naturalistic Dataset
Understanding how pedestrians adjust their movement when interacting with autonomous vehicles (AVs) is essential for improving safety in mixed traffic. This study examines micro-level pedestrian behaviour during midblock encounters in the NuScenes dataset using a hybrid discrete choice-machine learning framework based on the Residual Logit (ResLogit) model. The model incorporates temporal, spatial, kinematic, and perceptual indicators. These include relative speed, visual looming, remaining distance, and directional collision risk proximity (CRP) measures. Results suggest that some of these variables may meaningfully influence movement adjustments, although predictive performance remains moderate. Marginal effects and elasticities indicate strong directional asymmetries in risk perception, with frontal and rear CRP showing opposite influences. The remaining distance exhibits a possible mid-crossing threshold. Relative speed cues appear to have a comparatively less effect. These patterns may reflect multiple behavioural tendencies driven by both risk perception and movement efficiency.
Affective and Conversational Predictors of Re-Engagement in Human-Robot Interactions -- A Student-Centered Study with A Humanoid Social Robot
Humanoid social robots are increasingly present in daily life, making sustained user engagement a critical factor for their effectiveness and acceptance. While prior work has often examined affective evaluations or anthropomorphic design, less is known about the relative influence of dynamic conversational qualities and perceived robot characteristics in determining a user's intention to re-engage with Large Language Model (LLM)-driven social robots. In this study, 68 participants interacted in open-ended conversations with the Nadine humanoid social robot, completing pre- and post-interaction surveys to assess changes in robot perception, conversational quality, and intention to re-engage. The results showed that verbal interaction significantly improved the robot's perceived characteristics, with statistically significant increases in pleasantness ($p<.0001$) and approachability ($p<.0001$), and a reduction in creepiness ($p<.001$). However, these affective changes were not strong and unique predictors of users' intention to re-engage in a multiple regression model. Instead, participants' perceptions of the interestingness ($β=0.60$, $p<.001$) and naturalness ($β=0.31$, $p=0.015$) of the robot's conversation emerged as the most significant and robust independent predictors of intention to re-engage. Overall, the results highlight that conversational quality, specifically perceived interestingness and naturalness, is the dominant driver of re-engagement, indicating that LLM-driven robot design should prioritize engaging, natural dialogue over affective impression management or anthropomorphic cues.
comment: 27 pages, 7 figures
CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation
Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoid robot to perform forceful manipulation tasks such as moving objects, wiping, and pushing a cart. We propose adaptive Compliance Humanoid control through hIsight Perturbation (CHIP), a plug-and-play module that enables controllable end-effector stiffness while preserving agile tracking of dynamic reference motions. CHIP is easy to implement and requires neither data augmentation nor additional reward tuning. We show that a generalist motion-tracking controller trained with CHIP can perform a diverse set of forceful manipulation tasks that require different end-effector compliance, such as multi-robot collaboration, wiping, box delivery, and door opening.
comment: The first two authors contributed equally. Project page: https://nvlabs.github.io/CHIP/
VIMD: Monocular Visual-Inertial Motion and Depth Estimation
Accurate and efficient dense metric depth estimation is crucial for 3D visual perception in robotics and XR. In this paper, we develop a monocular visual-inertial motion and depth (VIMD) learning framework to estimate dense metric depth by leveraging accurate and efficient MSCKF-based monocular visual-inertial motion tracking. At the core the proposed VIMD is to exploit multi-view information to iteratively refine per-pixel scale, instead of globally fitting an invariant affine model as in the prior work. The VIMD framework is highly modular, making it compatible with a variety of existing depth estimation backbones. We conduct extensive evaluations on the TartanAir and VOID datasets and demonstrate its zero-shot generalization capabilities on the AR Table dataset. Our results show that VIMD achieves exceptional accuracy and robustness, even with extremely sparse points as few as 10-20 metric depth points per image. This makes the proposed VIMD a practical solution for deployment in resource constrained settings, while its robust performance and strong generalization capabilities offer significant potential across a wide range of scenarios.
Symmetry-Guided Memory Augmentation for Efficient Locomotion Learning
Training reinforcement learning (RL) policies for legged locomotion often requires extensive environment interactions, which are costly and time-consuming. We propose Symmetry-Guided Memory Augmentation (SGMA), a framework that improves training efficiency by combining structured experience augmentation with memory-based context inference. Our method leverages robot and task symmetries to generate additional, physically consistent training experiences without requiring extra interactions. To avoid the pitfalls of naive augmentation, we extend these transformations to the policy's memory states, enabling the agent to retain task-relevant context and adapt its behavior accordingly. We evaluate the approach on quadruped and humanoid robots in simulation, as well as on a real quadruped platform. Across diverse locomotion tasks involving joint failures and payload variations, our method achieves efficient policy training while maintaining robust performance, demonstrating a practical route toward data-efficient RL for legged robots.
What Do You Need for Compositional Generalization in Diffusion Planning?
In policy learning, stitching and compositional generalization refer to the extent to which the policy is able to piece together sub-trajectories of data it is trained on to generate new and diverse behaviours. While stitching has been identified as a significant strength of offline reinforcement learning, recent generative behavioural cloning (BC) methods have also shown proficiency at stitching. However, the main factors behind this are poorly understood, hindering the development of new algorithms that can reliably stitch by design. Focusing on diffusion planners trained via generative behavioural cloning, and without resorting to dynamic programming or TD-learning, we find three properties are key enablers for composition: shift equivariance, local receptive fields, and inference choices. We use these properties to explain architecture, data, and inference choices in existing generative BC methods based on diffusion planning including replanning frequency, data augmentation, and data scaling. Our experiments show that while local receptive fields are more important than shift equivariance in creating a diffusion planner capable of composition, both are crucial. Using findings from our experiments, we develop a new architecture for diffusion planners called Eq-Net, that is simple, produces diverse trajectories competitive with more computationally expensive methods such as replanning or scaling data, and can be guided to enable generalization in goal-conditioned settings. We show that Eq-Net exhibits significant compositional generalization in a variety of navigation and manipulation tasks designed to test planning diversity.
comment: 8 Pages
TURBO: Utility-Aware Bandwidth Allocation for Cloud-Augmented Autonomous Control
Autonomous driving system progress has been driven by improvements in machine learning models, whose computational demands now exceed what edge devices alone can provide. The cloud offers abundant compute, but the network has long been treated as an unreliable bottleneck rather than a co-equal part of the autonomous vehicle control loop. We argue that this separation is no longer tenable: safety-critical autonomy requires co-design of control, models, and network resource allocation itself. We introduce TURBO, a cloud-augmented control framework that addresses this challenge, formulating bandwidth allocation and control pipeline configuration across both the car and cloud as a joint optimization problem. TURBO maximizes benefit to the car while guaranteeing safety in the face of highly variable network conditions. We implement TURBO and evaluate it in both simulation and real-world deployment, showing it can improve average accuracy by up to 15.6%pt over existing on-vehicle-only pipelines. Our code is made available at www.github.com/NetSys/turbo.
comment: 34 pages, 13 figures
Multiagent Systems
Learning to Coordinate via Quantum Entanglement in Multi-Agent Reinforcement Learning
The inability to communicate poses a major challenge to coordination in multi-agent reinforcement learning (MARL). Prior work has explored correlating local policies via shared randomness, sometimes in the form of a correlation device, as a mechanism to assist in decentralized decision-making. In contrast, this work introduces the first framework for training MARL agents to exploit shared quantum entanglement as a coordination resource, which permits a larger class of communication-free correlated policies than shared randomness alone. This is motivated by well-known results in quantum physics which posit that, for certain single-round cooperative games with no communication, shared quantum entanglement enables strategies that outperform those that only use shared randomness. In such cases, we say that there is quantum advantage. Our framework is based on a novel differentiable policy parameterization that enables optimization over quantum measurements, together with a novel policy architecture that decomposes joint policies into a quantum coordinator and decentralized local actors. To illustrate the effectiveness of our proposed method, we first show that we can learn, purely from experience, strategies that attain quantum advantage in single-round games that are treated as black box oracles. We then demonstrate how our machinery can learn policies with quantum advantage in an illustrative multi-agent sequential decision-making problem formulated as a decentralized partially observable Markov decision process (Dec-POMDP).
Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics
Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned--a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.
A Generic Service-Oriented Function Offloading Framework for Connected Automated Vehicles
Function offloading is a promising solution to address limitations concerning computational capacity and available energy of Connected Automated Vehicles~(CAVs) or other autonomous robots by distributing computational tasks between local and remote computing devices in form of distributed services. This paper presents a generic function offloading framework that can be used to offload an arbitrary set of computational tasks with a focus on autonomous driving. To provide flexibility, the function offloading framework is designed to incorporate different offloading decision making algorithms and quality of service~(QoS) requirements that can be adjusted to different scenarios or the objectives of the CAVs. With a focus on the applicability, we propose an efficient location-based approach, where the decision whether tasks are processed locally or remotely depends on the location of the CAV. We apply the proposed framework on the use case of service-oriented trajectory planning, where we offload the trajectory planning task of CAVs to a Multi-Access Edge Computing~(MEC) server. The evaluation is conducted in both simulation and real-world application. It demonstrates the potential of the function offloading framework to guarantee the QoS for trajectory planning while improving the computational efficiency of the CAVs. Moreover, the simulation results also show the adaptability of the framework to diverse scenarios involving simultaneous offloading requests from multiple CAVs.
comment: 8 pages, 6 figures, 2 tables, published in RA-L
ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems
Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based evaluation framework for measuring and analyzing value drift in multi-agent systems. ValueFlow introduces a 56-value evaluation dataset derived from the Schwartz Value Survey and quantifies agents' value orientations during interaction using an LLM-as-a-judge protocol. Building on this measurement layer, ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, operationalized by two metrics: beta-susceptibility, which measures an agent's sensitivity to perturbed peer signals, and system susceptibility (SS), which captures how node-level perturbations affect final system outputs. Experiments across multiple model backbones, prompt personas, value dimensions, and network structures show that susceptibility varies widely across values and is strongly shaped by structural topology.
comment: Preprint. Under review. 18 pages, 9 figures
EvoCorps: An Evolutionary Multi-Agent Framework for Depolarizing Online Discourse
Polarization in online discourse erodes social trust and accelerates misinformation, yet technical responses remain largely diagnostic and post-hoc. Current governance approaches suffer from inherent latency and static policies, struggling to counter coordinated adversarial amplification that evolves in real-time. We present EvoCorps, an evolutionary multi-agent framework for proactive depolarization. EvoCorps frames discourse governance as a dynamic social game and coordinates roles for monitoring, planning, grounded generation, and multi-identity diffusion. A retrieval-augmented collective cognition core provides factual grounding and action--outcome memory, while closed-loop evolutionary learning adapts strategies as the environment and attackers change. We implement EvoCorps on the MOSAIC social-AI simulation platform for controlled evaluation in a multi-source news stream with adversarial injection and amplification. Across emotional polarization, viewpoint extremity, and argumentative rationality, EvoCorps improves discourse outcomes over an adversarial baseline, pointing to a practical path from detection and post-hoc mitigation to in-process, closed-loop intervention. The code is available at https://github.com/ln2146/EvoCorps.
A General Theory of Proportionality with Additive Utilities
We consider a model where a subset of candidates must be selected based on voter preferences, subject to general constraints that specify which subsets are feasible. This model generalizes committee elections with diversity constraints, participatory budgeting (including constraints specifying how funds must be allocated to projects from different pools), and public decision-making. Axioms of proportionality have recently been defined for this general model, but the proposed rules apply only to approval ballots, where each voter submits a subset of candidates she finds acceptable. We propose proportional rules for cardinal ballots, where each voter assigns a numerical value to each candidate corresponding to her utility if that candidate is selected. In developing these rules, we also introduce methods that produce proportional rankings, ensuring that every prefix of the ranking satisfies proportionality.
Altruism and Fair Objective in Mixed-Motive Markov games
Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to defect in order to benefit from the group's cooperation without contributing the associated costs, thus leading to unfair situations. In game theory, social dilemmas entail this dichotomy between individual interest and collective outcome. The most dominant approach to multi-agent cooperation is the utilitarian welfare which can produce efficient highly inequitable outcomes. This paper proposes a novel framework to foster fairer cooperation by replacing the standard utilitarian objective with Proportional Fairness. We introduce a fair altruistic utility for each agent, defined on the individual log-payoff space and derive the analytical conditions required to ensure cooperation in classic social dilemmas. We then extend this framework to sequential settings by defining a Fair Markov Game and deriving novel fair Actor-Critic algorithms to learn fair policies. Finally, we evaluate our method in various social dilemma environments.
T2VTree: User-Centered Visual Analytics for Agent-Assisted Thought-to-Video Authoring
Generative models have substantially expanded video generation capabilities, yet practical thought-to-video creation remains a multi-stage, multi-modal, and decision-intensive process. However, existing tools either hide intermediate decisions behind repeated reruns or expose operator-level workflows that make exploration traces difficult to manage, compare, and reuse. We present T2VTree, a user-centered visual analytics approach for agent-assisted thought-to-video authoring. T2VTree represents the authoring process as a tree visualization. Each node in the tree binds an editable specification (intent, referenced inputs, workflow choice, prompts, and parameters) with the resulting multimodal outputs, making refinement, branching, and provenance inspection directly operable. To reduce the burden of deciding what to do next, a set of collaborating agents translates step-level intent into an executable plan that remains visible and user-editable before execution. We further implement a visual analytics system that integrates branching authoring with in-place preview and stitching for convergent assembly, enabling end-to-end multi-scene creation without leaving the authoring context. We demonstrate T2VTreeVA through two multi-scene case studies and a comparative user study, showing how the T2VTree visualization and editable agent planning support reliable refinement, localized comparison, and practical reuse in real authoring workflows. T2VTree is available at: https://github.com/tezuka0210/T2VTree.
SynthAgent: A Multi-Agent LLM Framework for Realistic Patient Simulation -- A Case Study in Obesity with Mental Health Comorbidities AAAI 2026
Simulating high-fidelity patients offers a powerful avenue for studying complex diseases while addressing the challenges of fragmented, biased, and privacy-restricted real-world data. In this study, we introduce SynthAgent, a novel Multi-Agent System (MAS) framework designed to model obesity patients with comorbid mental disorders, including depression, anxiety, social phobia, and binge eating disorder. SynthAgent integrates clinical and medical evidence from claims data, population surveys, and patient-centered literature to construct personalized virtual patients enriched with personality traits that influence adherence, emotion regulation, and lifestyle behaviors. Through autonomous agent interactions, the system simulates disease progression, treatment response, and life management across diverse psychosocial contexts. Evaluation of more than 100 generated patients demonstrated that GPT-5 and Claude 4.5 Sonnet achieved the highest fidelity as the core engine in the proposed MAS framework, outperforming Gemini 2.5 Pro and DeepSeek-R1. SynthAgent thus provides a scalable and privacy-preserving framework for exploring patient journeys, behavioral dynamics, and decision-making processes in both medical and psychological domains.
comment: Presented in AAAI 2026 Singapore at the workshop of Health Intelligence
Collective Behavior of AI Agents: the Case of Moltbook
We present a large scale data analysis of Moltbook, a Reddit-style social media platform exclusively populated by AI agents. Analyzing over 369,000 posts and 3.0 million comments from approximately 46,000 active agents, we find that AI collective behavior exhibits many of the same statistical regularities observed in human online communities: heavy-tailed distributions of activity, power-law scaling of popularity metrics, and temporal decay patterns consistent with limited attention dynamics. However, we also identify key differences, including a sublinear relationship between upvotes and discussion size that contrasts with human behavior. These findings suggest that, while individual AI agents may differ fundamentally from humans, their emergent collective dynamics share structural similarities with human social systems.
VLM-Guided Iterative Refinement for Surgical Image Segmentation with Foundation Models
Surgical image segmentation is essential for robot-assisted surgery and intraoperative guidance. However, existing methods are constrained to predefined categories, produce one-shot predictions without adaptive refinement, and lack mechanisms for clinician interaction. We propose IR-SIS, an iterative refinement system for surgical image segmentation that accepts natural language descriptions. IR-SIS leverages a fine-tuned SAM3 for initial segmentation, employs a Vision-Language Model to detect instruments and assess segmentation quality, and applies an agentic workflow that adaptively selects refinement strategies. The system supports clinician-in-the-loop interaction through natural language feedback. We also construct a multi-granularity language-annotated dataset from EndoVis2017 and EndoVis2018 benchmarks. Experiments demonstrate state-of-the-art performance on both in-domain and out-of-distribution data, with clinician interaction providing additional improvements. Our work establishes the first language-based surgical segmentation framework with adaptive self-refinement capabilities.
CoMMa: Contribution-Aware Medical Multi-Agents From A Game-Theoretic Perspective
Recent multi-agent frameworks have broadened the ability to tackle oncology decision support tasks that require reasoning over dynamic, heterogeneous patient data. We propose Contribution-Aware Medical Multi-Agents (CoMMa), a decentralized LLM-agent framework in which specialists operate on partitioned evidence and coordinate through a game-theoretic objective for robust decision-making. In contrast to most agent architectures relying on stochastic narrative-based reasoning, CoMMa utilizes deterministic embedding projections to approximate contribution-aware credit assignment. This yields explicit evidence attribution by estimating each agent's marginal utility, producing interpretable and mathematically grounded decision pathways with improved stability. Evaluated on diverse oncology benchmarks, including a real-world multidisciplinary tumor board dataset, CoMMa achieves higher accuracy and more stable performance than data-centralized and role-based multi-agents baselines.
comment: 9 pages, 3 figures
SciDataCopilot: An Agentic Data Preparation Framework for AGI-driven Scientific Discovery
The current landscape of AI for Science (AI4S) is predominantly anchored in large-scale textual corpora, where generative AI systems excel at hypothesis generation, literature search, and multi-modal reasoning. However, a critical bottleneck for accelerating closed-loop scientific discovery remains the utilization of raw experimental data. Characterized by extreme heterogeneity, high specificity, and deep domain expertise requirements, raw data possess neither direct semantic alignment with linguistic representations nor structural homogeneity suitable for a unified embedding space. The disconnect prevents the emerging class of Artificial General Intelligence for Science (AGI4S) from effectively interfacing with the physical reality of experimentation. In this work, we extend the text-centric AI-Ready concept to Scientific AI-Ready data paradigm, explicitly formalizing how scientific data is specified, structured, and composed within a computational workflow. To operationalize this idea, we propose SciDataCopilot, an autonomous agentic framework designed to handle data ingestion, scientific intent parsing, and multi-modal integration in a end-to-end manner. By positioning data readiness as a core operational primitive, the framework provides a principled foundation for reusable, transferable systems, enabling the transition toward experiment-driven scientific general intelligence. Extensive evaluations across three heterogeneous scientific domains show that SciDataCopilot improves efficiency, scalability, and consistency over manual pipelines, with up to 30$\times$ speedup in data preparation.
RiskAgent: Synergizing Language Models with Validated Tools for Evidence-Based Risk Prediction
Large Language Models (LLMs) achieve competitive results compared to human experts in medical examinations. However, it remains a challenge to apply LLMs to complex clinical decision-making, which requires a deep understanding of medical knowledge and differs from the standardized, exam-style scenarios commonly used in current efforts. A common approach is to fine-tune LLMs for target tasks, which, however, not only requires substantial data and computational resources but also remains prone to generating `hallucinations'. In this work, we present RiskAgent, which synergizes language models with hundreds of validated clinical decision tools supported by evidence-based medicine, to provide generalizable and faithful recommendations. Our experiments show that RiskAgent not only achieves superior performance on a broad range of clinical risk predictions across diverse scenarios and diseases, but also demonstrates robust generalization in tool learning on the external MedCalc-Bench dataset, as well as in medical reasoning and question answering on three representative benchmarks, MedQA, MedMCQA, and MMLU.
comment: Code and Data are available at https://github.com/AI-in-Health/RiskAgent
Conversational No-code, Multi-agentic Disease Module Identification and Drug Repurposing Prediction with ChatDRex
Repurposing approved drugs offers a time-efficient and cost-effective alternative to traditional drug development. However, in silico prediction of repurposing candidates is challenging and requires the effective collaboration of specialists in various fields, including pharmacology, medicine, biology, and bioinformatics. Fragmented, specialized algorithms and tools often address only narrow aspects of the overall problem. Heterogeneous, unstructured data landscapes require the expertise of specialized users. Hence, these data services do not integrate smoothly across workflows. With ChatDRex, we present a conversation-based, multi-agent system that facilitates the execution of complex bioinformatic analyses aiming for network-based drug repurposing prediction. It builds on the integrated systems medicine knowledge graph (NeDRex KG). ChatDRex provides natural language access to its extensive biomedical knowledge base. It integrates bioinformatics agents for network analysis, literature mining, and drug repurposing. These are complemented by agents that evaluate functional coherence for in silico validation. Its flexible multi-agent design assigns specific tasks to specialized agents, including query routing, data retrieval, algorithm execution, and result visualization. A dedicated reasoning module keeps the user in the loop and allows for hallucination detection. By enabling physicians and researchers without computer science expertise to control complex analyses with natural language, ChatDRex democratizes access to bioinformatics as an important resource for drug repurposing. It enables clinical experts to generate hypotheses and explore drug repurposing opportunities, ultimately accelerating the discovery of novel therapies and advancing personalized medicine and translational research. ChatDRex is publicly available at apps.cosy.bio/chatdrex.
Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
Spatiotemporal Attention-Augmented Inverse Reinforcement Learning for Multi-Agent Task Allocation
Adversarial inverse reinforcement learning (IRL) for multi-agent task allocation (MATA) is challenged by non-stationary interactions and high-dimensional coordination. Unconstrained reward inference in these settings often leads to high variance and poor generalization. We propose an attention-structured adversarial IRL framework that constrains reward inference via spatiotemporal representation learning. Our method employs multi-head self-attention (MHSA) for long-range temporal dependencies and graph attention networks (GAT) for agent-task relational structures. We formulate reward inference as a low-capacity, adaptive linear transformation of the environment reward, ensuring stable and interpretable guidance. This framework decouples reward inference from policy learning and optimizes the reward model adversarially. Experiments on benchmark MATA scenarios show that our approach outperforms representative MARL baselines in convergence speed, cumulative rewards, and spatial efficiency. Results demonstrate that attention-guided, capacity-constrained reward inference is a scalable and effective mechanism for stabilizing adversarial IRL in complex multi-agent systems.
comment: Revised version with substantial new experimental results, improved analysis, and a restructured layout for better clarity
Trade in Minutes! Rationality-Driven Agentic System for Quantitative Financial Trading
Recent advancements in large language models (LLMs) and agentic systems have shown exceptional decision-making capabilities, revealing significant potential for autonomic finance. Current financial trading agents predominantly simulate anthropomorphic roles that inadvertently introduce emotional biases and rely on peripheral information, while being constrained by the necessity for continuous inference during deployment. In this paper, we pioneer the harmonization of strategic depth in agents with the mechanical rationality essential for quantitative trading. Consequently, we present TiMi (Trade in Minutes), a rationality-driven multi-agent system that architecturally decouples strategy development from minute-level deployment. TiMi leverages specialized LLM capabilities of semantic analysis, code programming, and mathematical reasoning within a comprehensive policy-optimization-deployment chain. Specifically, we propose a two-tier analytical paradigm from macro patterns to micro customization, layered programming design for trading bot implementation, and closed-loop optimization driven by mathematical reflection. Extensive evaluations across 200+ trading pairs in stock and cryptocurrency markets empirically validate the efficacy of TiMi in stable profitability, action efficiency, and risk control under volatile market dynamics.
comment: 17 pages, 6 figures
Multi-Agent Teams Hold Experts Back
Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.
comment: Preprint
On the Uncertainty of Large Language Model-Based Multi-Agent Systems
Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of uncertainty, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies and six benchmark tasks. By analyzing 245 features spanning token-, trajectory-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3% of cases, and that uncertainty dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) Certainty Preference: reducing uncertainty at any stage for any agent is critical for guaranteeing correct solutions; 2) Base Uncertainty: base models with lower entropy during problem-solving directly benefit MAS performance; and 3) Task Awareness: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the Entropy Judger, to select solutions from MAS's pass@k results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at https://github.com/AgenticFinLab/multiagent-entropy.
comment: arXiv preprint
Dynamics of Moral Behavior in Heterogeneous Populations of Learning Agents AAAI
Growing concerns about safety and alignment of AI systems highlight the importance of embedding moral capabilities in artificial agents: a promising solution is the use of learning from experience, i.e., Reinforcement Learning. In multi-agent (social) environments, complex population-level phenomena may emerge from interactions between individual learning agents. Many of the existing studies rely on simulated social dilemma environments to study the interactions of independent learning agents; however, they tend to ignore the moral heterogeneity that is likely to be present in societies of agents in practice. For example, at different points in time a single learning agent may face opponents who are consequentialist (i.e., focused on maximizing outcomes over time), norm-based (i.e., conforming to specific norms), or virtue-based (i.e., considering a combination of different virtues). The extent to which agents' co-development may be impacted by such moral heterogeneity in populations is not well understood. In this paper, we present a study of the learning dynamics of morally heterogeneous populations interacting in a social dilemma setting. Using an Iterated Prisoner's Dilemma environment with a partner selection mechanism, we investigate the extent to which the prevalence of diverse moral agents in populations affects individual agents' learning behaviors and emergent population-level outcomes. We observe several types of non-trivial interactions between pro-social and anti-social agents, and find that certain types of moral agents are able to steer selfish agents towards more cooperative behavior.
comment: Presented at AIES 2024 (7th AAAI/ACM Conference on AI, Ethics, and Society - San Jose, CA, USA) - see https://ojs.aaai.org/index.php/AIES/article/view/31736
Axiomatic Choice
People care about decision outcomes and how decisions get made, both when making decisions and reflecting on decisions. But formalizing the full range of normative concerns that drive decisions is an open challenge. We introduce Axiomatic Choice as a framework for making and evaluating decisions based on formal normative statements about decisions. These statements, or axioms, capture a wide array of desiderata, e.g., ethical constraints, beyond the typical treatment in Social Choice. Using our model of axioms and decisions we define key properties and introduce a taxonomy of axioms which may be of general interest. We then use these properties and our taxonomy to define the Decision-Evaluation Paradox, formalize the concepts of transparency and deception in explaining and justifying decisions, and reveal the limits of existing methods using axioms to make decisions.
Systems and Control (EESS)
Contraction Metric Based Safe Reinforcement Learning Force Control for a Hydraulic Actuator with Real-World Training
Force control in hydraulic actuators is notoriously difficult due to strong nonlinearities, uncertainties, and the high risks associated with unsafe exploration during learning. This paper investigates safe reinforcement learning (RL) for hy draulic force control with real-world training using contraction metric certificates. A data-driven model of a hydraulic actuator, identified from experimental data, is employed for simulation based pretraining of a Soft Actor-Critic (SAC) policy that adapts the PI gains of a feedback-linearization (FL) controller. To reduce instability during online training, we propose a quadratic-programming (QP) contraction filter that leverages a learned contraction metric to enforce approximate exponential convergence of trajectories, applying minimal corrections to the policy output. The approach is validated on a hydraulic test bench, where the RL controller is trained directly on hardware and benchmarked against a simulation-trained agent and a fixed-gain baseline. Experimental results show that real-hardware training improves force-tracking performance compared to both alternatives, while the contraction filter mitigates chattering and instabilities. These findings suggest that contraction-based certificates can enable safe RL in high force hydraulic systems, though robustness at extreme operating conditions remains a challenge.
Artificial Magnetic Conductor Frame to Improve Impedance Matching and Radiation Symmetry in 2$\times$2 Array for 6G Applications
An Artificial Magnetic Conductor (AMC) frame capable of improving the impedance matching of a 2$\times$2 array for 6G applications without degrading isolation performance is presented. The proposed frame is integrated into the array without modifying the single radiating element design. By relying on accurate full-wave simulations, it results that the addition of the frame restores the impedance matching performance, achieving a bandwidth of 1.5 GHz at 28 GHz. The isolation between each port remains under -15 dB within the operating band, thanks to the vias in the rectangular patch metasurface. Moreover, the overall structure exhibits a gain of 11.81 dBi with an aperture efficiency of 69$\%$, satisfactorily for broadband communication purposes. The proposed AMC frame represents an effective method for improving array performance without the need to alter the shape or dimensions of the single radiating element.
Automating the Wildfire Detection and Scheduling Pipeline with Maneuverable Earth Observation Satellites
Wildfires are becoming increasingly frequent, with potentially devastating consequences, including loss of life, infrastructure destruction, and severe environmental damage. Low Earth orbit satellites equipped with onboard sensors can capture critical imagery of active wildfires and enable real-time detection through machine learning algorithms applied to the acquired data. This paper presents a framework that automates the complete wildfire detection and scheduling pipeline, integrating three key components: wildfire detection in satellite imagery, statistical updating that incorporates data from repeated flyovers, and multi-satellite scheduling optimization. The framework enables wildfire detection using convolutional neural networks with sensor fusion techniques, the incorporation of subsequent flyover information using Bayesian statistics, and satellite scheduling through the state-of-the-art Reconfigurable Earth Observation Satellite Scheduling Problem. Experiments conducted using real-world wildfire events and operational Earth observation satellites demonstrate that this autonomous detection and scheduling approach effectively enhances wildfire monitoring capabilities.
comment: 44 pages
Accelerated Stabilization of Switched Linear MIMO Systems using Generalized Homogeneity
This paper addresses the problem of exponential and accelerated finite-time, as well as nearly fixed-time, stabilization of switched linear MIMO systems. The proposed approach relies on a generalized homogenization framework for switched linear systems and employs implicit Lyapunov functions for control design, covering both common and multiple Lyapunov function settings. Linear matrix equations and inequalities are derived to characterize the dilation generator and to synthesize the controller gains. Robustness of the resulting control laws with respect to system uncertainties and external disturbances is analyzed. The effectiveness of the proposed approach is illustrated through numerical examples.
Quantum Riemannian Cubics with Obstacle Avoidance for Quantum Geometric Model Predictive Control
We propose a geometric model predictive control framework for quantum systems subject to smoothness and state constraints. By formulating quantum state evolution intrinsically on the projective Hilbert space, we penalize covariant accelerations to generate smooth trajectories in the form of Riemannian cubics, while incorporating state-dependent constraints through potential functions. A structure-preserving variational discretization enables receding-horizon implementation, and a Lyapunov-type stability result is established for the closed-loop system. The approach is illustrated on the Bloch sphere for a two-level quantum system, providing a viable pathway toward predictive feedback control of constrained quantum dynamics.
karl. -- A Research Vehicle for Automated and Connected Driving
As highly automated driving is transitioning from single-vehicle closed-access testing to commercial deployments of public ride-hailing in selected areas (e.g., Waymo), automated driving and connected cooperative intelligent transport systems (C-ITS) remain active fields of research. Even though simulation is omnipresent in the development and validation life cycle of automated and connected driving technology, the complex nature of public road traffic and software that masters it still requires real-world integration and testing with actual vehicles. Dedicated vehicles for research and development allow testing and validation of software and hardware components under real-world conditions early on. They also enable collecting and publishing real-world datasets that let others conduct research without vehicle access, and support early demonstration of futuristic use cases. In this paper, we present karl., our new research vehicle for automated and connected driving. Apart from major corporations, few institutions worldwide have access to their own L4-capable research vehicles, restricting their ability to carry out independent research. This paper aims to help bridge that gap by sharing the reasoning, design choices, and technical details that went into making karl. a flexible and powerful platform for research, engineering, and validation in the context of automated and connected driving. More impressions of karl. are available at https://karl.ac.
comment: 8 pages; Accepted to be published as part of the 37th Intelligent Vehicles Symposium (IV), Detroit, MI, United States, June 22-25, 2026
Passivity-exploiting stabilization of semilinear single-track vehicle models with distributed tire friction dynamics
This paper addresses the local stabilization problem for semilinear single-track vehicle models with distributed tire friction dynamics, represented as interconnections of ordinary differential equations (ODEs) and hyperbolic partial differential equations (PDEs). A passivity-exploiting backstepping design is presented, which leverages the strict dissipativity properties of the PDE subsystem to achieve exponential stabilization of the considered ODE-PDE interconnection around a prescribed equilibrium. Sufficient conditions for local well-posedness and exponential convergence are derived by constructing a Lyapunov functional combining the lumped and distributed states. Both state-feedback and output-feedback controllers are synthesized, the latter relying on a cascaded observer. The theoretical results are corroborated with numerical simulations, considering non-ideal scenarios and accounting for external disturbances and uncertainties. Simulation results confirm that the proposed control strategy can effectively and robustly stabilize oversteer vehicles at high speeds, demonstrating the relevance of the approach for improving the safety and performance in automotive applications.
comment: 17 pages, 10 figures. Under review at Automatica
Stability and stabilization of semilinear single-track vehicle models with distributed tire friction dynamics via singular perturbation analysis
This paper investigates the stability and stabilization of semilinear single-track vehicle models with distributed tire friction dynamics, modeled as interconnections of ordinary differential equations (ODEs) and hyperbolic partial differential equations (PDEs). Motivated by the long-standing practice of neglecting transient tire dynamics in vehicle modeling and control, a rigorous justification is provided for such simplifications using singular perturbation theory. A perturbation parameter, defined as the ratio between a characteristic rolling contact length and the vehicle's longitudinal speed, is introduced to formalize the time-scale separation between rigid-body motion and tire dynamics. For sufficiently small values of this parameter, it is demonstrated that standard finite-dimensional techniques can be applied to analyze the local stability of equilibria and to design stabilizing controllers. Both state-feedback and output-feedback designs are considered, under standard stabilizability and detectability assumptions. Whilst the proposed controllers follow classical approaches, the novelty of the work lies in establishing the first mathematical framework that rigorously connects distributed tire models with conventional vehicle dynamics. The results reconcile decades of empirical findings with a formal theoretical foundation and open new perspectives for the analysis and control of ODE-PDE systems with distributed friction in automotive applications.
comment: 17 pages, 9 figures. Under review at Automatica
A Primal-Dual-Based Active Fault-Tolerant Control Scheme for Cyber-Physical Systems: Application to DC Microgrids
We consider the problem of active fault-tolerant control in cyber-physical systems composed of strictly passive linear-time invariant dynamic subsystems. We cast the problem as a constrained optimization problem and propose an augmented primal-dual gradient dynamics-based fault-tolerant control framework that enforces network-level constraints and provides optimality guarantees for the post-fault steady-state operation. By suitably interconnecting the primal-dual algorithm with the cyber-physical dynamics, we provide sufficient conditions under which the resulting closed-loop system possesses a unique and exponentially stable equilibrium point that satisfies the Karush--Kuhn--Tucker (KKT) conditions of the constrained problem. The framework's effectiveness is illustrated through numerical experiments on a DC microgrid.
Residential Peak Load Reduction via Direct Load Control under Limited Information
Thermostatically controlled loads and electric vehicles offer flexibility to reduce power peaks in low-voltage distribution networks. This flexibility can be maximized if the devices are coordinated centrally, given some level of information about the controlled devices. In this paper, we propose novel optimization-based control schemes with prediction capabilities that utilize limited information from heat pumps, electric water heaters, and electric vehicles. The objective is to flatten the total load curve seen by the distribution transformer by restricting the times at which the available flexible loads are allowed to operate, subject to the flexibility constraints of the loads to preserve customers' comfort. The original scheme was tested in a real-world setup, considering both winter and summer days. The pilot results confirmed the technical feasibility but also informed the design of an improved version of the controller. Computer simulations using the adjusted controller show that, compared to the original formulation, the improved scheme achieves greater peak reductions in summer. Additionally, comparisons were made with an ideal controller, which assumes perfect knowledge of the inflexible load profile, the models of the controlled devices, the hot water and space heating demand, and future electric vehicle charging sessions. The proposed scheme with limited information achieves almost half of the potential average daily peak reduction that the ideal controller with perfect knowledge would achieve.
A Comparative Analysis of the CERN ATLAS ITk MOPS Readout: A Feasibility Study on Production and Development Setups
The upcoming High-Luminosity upgrade of the Large Hadron Collider (LHC) necessitates a complete replacement of the ATLAS Inner Detector with the new Inner Tracker (ITk). This upgrade imposes stringent requirements on the associated Detector Control System (DCS), which is responsible for the monitoring, control, and safety of the detector. A critical component of the ITk DCS is the Monitoring of Pixel System (MOPS), which supervises the local voltages and temperatures of the new pixel detector modules. This paper introduces a dedicated testbed and verification methodology for the MOPS readout, defining a structured set of test cases for two DCS-readout architectures: a preliminary Raspberry Pi-based controller, the "MOPS-Hub Mock-up"(MH Mock-up), and the final production FPGA-based "MOPS-Hub" (MH). The methodology specifies the measurement chain for end-to-end latency, jitter, and data integrity across CAN and UART interfaces, including a unified time-stamping scheme, non-intrusive signal taps, and a consistent data-logging and analysis pipeline. This work details the load profiles and scalability scenarios (baseline operation, full-crate stress, and CAN Interface Card channel isolation), together with acceptance criteria and considerations for measurement uncertainty to ensure reproducibility. The objective is to provide a clear, repeatable procedure to qualify the MH architecture for production and deployment in the ATLAS ITk DCS. A companion paper will present the experimental results and the comparative analysis obtained using this testbed.
comment: In total 16 Pages, 4 figures, for Submition to JINST
A Multi-physics Simulation Framework for High-power Microwave Counter-unmanned Aerial System Design and Performance Evaluation
The proliferation of small unmanned aerial systems (sUAS) operating under autonomous guidance has created an urgent need for non-kinetic neutralization methods that are immune to conventional radio-frequency jamming. This paper presents a comprehensive multi-physics simulation framework for the design and performance evaluation of a high-power microwave (HPM) counter-UAS system operating at 2.45\,GHz. The framework integrates electromagnetic propagation modelling, antenna pattern analysis, electromagnetic coupling to unshielded drone wiring harnesses, and a sigmoid-based semiconductor damage probability model calibrated to published CMOS latchup thresholds. A 10{,}000-trial Monte Carlo analysis incorporating stochastic variations in transmitter power, antenna pointing error, target wire orientation, polarization mismatch, and component damage thresholds yields system-level kill probabilities with 95\% confidence intervals. For a baseline configuration of 25\,kW continuous-wave power and a 60\,cm parabolic reflector (21.2\,dBi gain), the Monte Carlo simulation predicts a kill probability of $51.4\pm1.0$\% at 20\,m, decreasing to $13.1\pm0.7$\% at 40\,m. Pulsed operation at 500\,kW peak power (1\% duty cycle) extends the 90\% kill range from approximately 18\,m to 88\,m. The framework further provides parametric design maps, safety exclusion zone calculations compliant with ICNIRP 2020 guidelines, thermal management requirements, and waveguide mode analysis. All simulation codes and results are provided for full reproducibility.
comment: 17 pages, 15 figures
Post-Collision Trajectory Restoration for a Single-track Ackermann Vehicle using Heuristic Steering and Tractive Force Functions
Post-collision trajectory restoration is a safety-critical capability for autonomous vehicles, as impact-induced lateral motion and yaw transients can rapidly drive the vehicle away from the intended path. This paper proposes a structured heuristic recovery control law that jointly commands steering and tractive force for a generalized single-track Ackermann vehicle model. The formulation explicitly accounts for time-varying longitudinal velocity in the lateral-yaw dynamics and retains nonlinear steering-coupled interaction terms that are commonly simplified in the literature. Unlike approaches that assume constant longitudinal speed, the proposed design targets the transient post-impact regime where speed variations and nonlinear coupling significantly influence recovery. The method is evaluated in simulation on the proposed generalized single-track model and a standard 3DOF single-track reference model in MATLAB, demonstrating consistent post-collision restoration behaviour across representative initial post-impact conditions.
comment: 10 pages, 6 figures
An Approach for the Qualitative Graphical Representation of the Describing Function in Nonlinear Systems Stability Analysis
The describing function method is a useful tool for the qualitative analysis of limit cycles in the stability analysis of nonlinear systems. This method is inherently approximate; therefore, it should be used for a fast qualitative analysis of the considered systems. However, plotting the exact describing function requires heavy mathematical calculations, reducing interest in this method especially from the point of view of control education. The objective of this paper is to enhance the describing function method by providing a new approach for the qualitative plotting of the describing function for piecewise nonlinearities involving discontinuities. Unlike the standard method, the proposed approach allows for a straightforward, hand-drawn plotting of the describing function using the rules introduced in this paper, simply by analyzing the shape of the nonlinearity. The proposed case studies show that the limit cycles estimation performed using the standard exact plotting of the describing function yields the same qualitative results as those obtained using the proposed qualitative method for plotting the describing function.
Controlled Flight of an Insect-Scale Flapping-Wing Robot via Integrated Onboard Sensing and Computation
Aerial insects can effortlessly navigate dense vegetation, whereas similarly sized aerial robots typically depend on offboard sensors and computation to maintain stable flight. This disparity restricts insect-scale robots to operation within motion capture environments, substantially limiting their applicability to tasks such as search-and-rescue and precision agriculture. In this work, we present a 1.29-gram aerial robot capable of hovering and tracking trajectories with solely onboard sensing and computation. The combination of a sensor suite, estimators, and a low-level controller achieved centimeter-scale positional flight accuracy. Additionally, we developed a hierarchical controller in which a human operator provides high-level commands to direct the robot's motion. In a 30-second flight experiment conducted outside a motion capture system, the robot avoided obstacles and ultimately landed on a sunflower. This level of sensing and computational autonomy represents a significant advancement for the aerial microrobotics community, further opening opportunities to explore onboard planning and power autonomy.
comment: 22 pages, 7 figures
Experimental Realization of Koopman-Model Predictive Control for an AC-DC Converter
This paper experimentally demonstrates the Koopman-Model Predictive Control (K-MPC) for a real AC-DC converter. The converter is typically modeled with a nonlinear time-variant plant. We introduce a new dynamical approach to lifting measurable dynamics from the plant and constructing a linear time-invariant model that is consistent with control objectives of the converter. We show that the lifting approach, combined with the K-MPC controller, performs well across the full experimental system and outperforms existing control strategies in terms of both steady-state and transient responses.
comment: 6 pages, 5 figures, ISIE
Pitot-Aided Attitude and Air Velocity Estimation with Almost Global Asymptotic Stability Guarantees
This paper investigates the problem of attitude and air velocity estimation for fixed-wing unmanned aerial vehicles (UAVs) using IMU measurements and at least one Pitot tube measurement, with almost global asymptotic stability (AGAS) guarantees. A cascade observer architecture is developed, in which a Riccati/Kalman-type filter estimates the body-fixed frame air velocity and the vehicle's tilt using IMU data as inputs and Pitot measurements as outputs. Under mild excitation conditions, the resulting air velocity and tilt estimation error dynamics are shown to be uniformly observable. The estimated tilt is then combined with magnetometer measurements in a nonlinear observer on SO(3) to recover the full attitude. Rigorous analysis establishes AGAS of the overall cascade structure under the uniform observability (UO) condition. The effectiveness of the proposed approach is demonstrated through validation on real flight data.
comment: 8 pages, 8 figures. Under review in IEEE CCTA2026
Real-time Load Current Monitoring of Overhead Lines Using GMR Sensors
Non-contact current monitoring has emerged as a prominent research focus owing to its non-intrusive characteristics and low maintenance requirements. However, while they offer high sensitivity, contactless sensors necessitate sophisticated design methodologies and thorough experimental validation. In this study, a Giant Magneto-Resistance (GMR) sensor is employed to monitor the instantaneous currents of a three-phase 400-volt overhead line, and its performance is evaluated against that of a conventional contact-based Hall effect sensor. A mathematical framework is developed to calculate current from the measured magnetic field signals. Furthermore, a MATLAB-based dashboard is implemented to enable real-time visualization of current measurements from both sensors under linear and non-linear load conditions. The GMR current sensor achieved a relative accuracy of 64.64% to 91.49%, with most phases above 80%. Identified improvements over this are possible, indicating that the sensing method has potential as a basis for calculating phase currents.
comment: 5 pages, 6 figures, to be published in 2026 IEEE PES Transmission and Distribution (T&D) Conference and Exposition, Chicago, IL, USA
EExApp: GNN-Based Reinforcement Learning for Radio Unit Energy Optimization in 5G O-RAN
With over 3.5 million 5G base stations deployed globally, their collective energy consumption (projected to exceed 131 TWh annually) raises significant concerns over both operational costs and environmental impacts. In this paper, we present EExAPP, a deep reinforcement learning (DRL)-based xApp for 5G Open Radio Access Network (O-RAN) that jointly optimizes radio unit (RU) sleep scheduling and distributed unit (DU) resource slicing. EExAPP uses a dual-actor-dual-critic Proximal Policy Optimization (PPO) architecture, with dedicated actor-critic pairs targeting energy efficiency and quality-of-service (QoS) compliance. A transformer-based encoder enables scalable handling of variable user equipment (UE) populations by encoding all-UE observations into fixed-dimensional representations. To coordinate the two optimization objectives, a bipartite Graph Attention Network (GAT) is used to modulate actor updates based on both critic outputs, enabling adaptive tradeoffs between power savings and QoS. We have implemented EExAPP and deployed it on a real-world 5G O-RAN testbed with live traffic, commercial RU and smartphones. Extensive over-the-air experiments and ablation studies confirm that EExAPP significantly outperforms existing methods in reducing the energy consumption of RU while maintaining QoS.
comment: Accepted by IEEE INFOCOM 2026
Genocide by Algorithm in Gaza: Artificial Intelligence, Countervailing Responsibility, and the Corruption of Public Discourse
The accelerating militarization of artificial intelligence has transformed the ethics, politics, and governance of warfare. This article interrogates how AI-driven targeting systems function as epistemic infrastructures that classify, legitimize, and execute violence, using Israel's conduct in Gaza as a paradigmatic case. Through the lens of responsibility, the article examines three interrelated dimensions: (a) political responsibility, exploring how states exploit AI to accelerate warfare while evading accountability; (b) professional responsibility, addressing the complicity of technologists, engineers, and defense contractors in the weaponization of data; and (c) personal responsibility, probing the moral agency of individuals who participate in or resist algorithmic governance. This is complemented by an examination of the position and influence of those participating in public discourse, whose narratives often obscure or normalize AI-enabled violence. The Gaza case reveals AI not as a neutral instrument but as an active participant in the reproduction of colonial hierarchies and the normalization of atrocity. Ultimately, the paper calls for a reframing of technological agency and accountability in the age of automated warfare. It concludes that confronting algorithmic violence demands a democratization of AI ethics, one that resists technocratic fatalism and centers the lived realities of those most affected by high-tech militarism.
Dynamic Passivity Multipliers for Plug-and-Play Stability Certificates of Converter-Dominated Grids
Ensuring small-signal stability in power systems with a high share of inverter-based resources (IBRs) is hampered by two factors: (i) device and network parameters are often uncertain or completely unknown, and (ii) brute-force enumeration of all topologies is computationally intractable. These challenges motivate plug-and-play (PnP) certificates that verify stability locally yet hold globally. Passivity is an attractive property because it guarantees stability under feedback and network interconnections; however, strict passivity rarely holds for practical controllers such as Grid Forming Inverters (GFMs) employing P-Q droop. This paper extends the passivity condition by constructing a dynamic, frequency-dependent multiplier that enables PnP stability certification of each component based solely on its admittance, without requiring any modification to the controller design. The multiplier is parameterised as a linear filter whose coefficients are tuned under a passivity goal. Numerical results for practical droop gains confirm the PnP rules, substantially enlarging the certified stability region while preserving the decentralised, model-agnostic nature of passivity-based PnP tests.
Shaping Energy Exchange with Gyroscopic Interconnections: a geometric approach
Gyroscopic interconnections enable redistribution of energy among degrees of freedom while preserving passivity and total energy, and they play a central role in controlled Lagrangian methods and IDA-PBC. Yet their quantitative effect on transient energy exchange and subsystem performance is not well characterised. We study a conservative mechanical system with constant skew-symmetric velocity coupling. Its dynamics are integrable and evolve on invariant two-tori, whose projections onto subsystem phase planes provide geometric description of energy exchange. When the ratio of normal-mode frequencies is rational, these projections become closed resonant Lissajous curves, enabling structured analysis of subsystem trajectories. To quantify subsystem behaviour, we introduce the inscribed-radius metric: the radius of the largest origin-centred circle contained in a projected trajectory. This gives a lower bound on attainable subsystem energy and acts as an internal performance measure. We derive resonance conditions and develop an efficient method to compute or certify the inscribed radius without time-domain simulation. Our results show that low-order resonances can strongly restrict energy depletion through phase-locking, whereas high-order resonances recover conservative bounds. These insights lead to an explicit interconnection-shaping design framework for both energy absorption and containment control strategies, while taking responsiveness into account.
comment: Conference paper submitted to the 10th IEEE Conference on Control Technology and Applications (CCTA) 2026 In Vancouver, and is currently under review
An Actor-Critic-Identifier Control Design for Increasing Energy Efficiency of Automated Electric Vehicles
Electric vehicles (EVs) are increasingly deployed, yet range limitations remain a key barrier. Improving energy efficiency via advanced control is therefore essential, and emerging vehicle automation offers a promising avenue. However, many existing strategies rely on indirect surrogates because linking power consumption to control inputs is difficult. We propose a neural-network (NN) identifier that learns this mapping online and couples it with an actor-critic reinforcement learning (RL) framework to generate optimal control commands. The resulting actor-critic-identifier architecture removes dependence on explicit models relating total power, recovered energy, and inputs, while maintaining accurate speed tracking and maximizing efficiency. Update laws are derived using Lyapunov stability analysis, and performance is validated in simulation. Compared to a traditional controller, the method increases total energy recovery by 12.84%, indicating strong potential for improving EV energy efficiency.
comment: Accepted for American Control Conference (ACC 2026)
Agile asymmetric multi-legged locomotion: contact planning via geometric mechanics and spin model duality
Legged robot research is presently focused on bipedal or quadrupedal robots, despite capabilities to build robots with many more legs to potentially improve locomotion performance. This imbalance is not necessarily due to hardware limitations, but rather to the absence of principled control frameworks that explain when and how additional legs improve locomotion performance. In multi-legged systems, coordinating many simultaneous contacts introduces a severe curse of dimensionality that challenges existing modeling and control approaches. As an alternative, multi-legged robots are typically controlled using low-dimensional gaits originally developed for bipeds or quadrupeds. These strategies fail to exploit the new symmetries and control opportunities that emerge in higher-dimensional systems. In this work, we develop a principled framework for discovering new control structures in multi-legged locomotion. We use geometric mechanics to reduce contact-rich locomotion planning to a graph optimization problem, and propose a spin model duality framework from statistical mechanics to exploit symmetry breaking and guide optimal gait reorganization. Using this approach, we identify an asymmetric locomotion strategy for a hexapod robot that achieves a forward speed of 0.61 body lengths per cycle (a 50% improvement over conventional gaits). The resulting asymmetry appears at both the control and hardware levels. At the control level, the body orientation oscillates asymmetrically between fast clockwise and slow counterclockwise turning phases for forward locomotion. At the hardware level, two legs on the same side remain unactuated and can be replaced with rigid parts without degrading performance. Numerical simulations and robophysical experiments validate the framework and reveal novel locomotion behaviors that emerge from symmetry reforming in high-dimensional embodied systems.
comment: 10 pages, 7 figures. arXiv admin note: text overlap with arXiv:2302.03019
Cryo-CMOS Antenna for Wireless Communications within a Quantum Computer Cryostat ISCA
Scaling quantum computers from a few qubits to large numbers remains one of the critical challenges in realizing practical quantum advantage. Multi-core quantum architectures have emerged as a promising solution, enabling scalability through distributed quantum processing units (QPUs) interconnected via classical and quantum links. However, the bottleneck of wired connections persists, as densely packed wired interconnects, both vertically across temperature stages and horizontally within the same layer, introduce spatial constraints, power dissipation, and latency, which could hinder performance as the number of QPUs increases. To overcome these limitations, this work proposes a cryo-compatible on-chip differential dipole antenna operating at 28 GHz to enable short-range wireless communication within a quantum computer cryostat. Temperature-dependent material properties are incorporated to accurately capture antenna behavior at 4 K. Moreover, by embedding the antenna in a realistic cryostat structure, we evaluate the feasibility of antenna operation within the cryogenic environment. The proposed antenna achieves a reflection coefficient of -20.8 dB in free space and -18.38 dB within the cryostat, demonstrating efficient impedance matching.
comment: Acceptedto be published in IEEE International Symposium on Circuits and Systems (IEEE ISCAS 2026)
The Quantum Advantage in Binary Teams and the Coordination Dilemma: Supplementary
This document contains supporting material for our paper ``The Quantum Advantage in Binary Teams and the Coordination Dilemma''
Constraint Learning in Multi-Agent Dynamic Games from Demonstrations of Local Nash Interactions
We present an inverse dynamic game-based algorithm to learn parametric constraints from a given dataset of local Nash equilibrium interactions between multiple agents. Specifically, we introduce mixed-integer linear programs (MILP) encoding the Karush-Kuhn-Tucker (KKT) conditions of the interacting agents, which recover constraints consistent with the local Nash stationarity of the interaction demonstrations. We establish theoretical guarantees that our method learns inner approximations of the true safe and unsafe sets. We also use the interaction constraints recovered by our method to design motion plans that robustly satisfy the underlying constraints. Across simulations and hardware experiments, our methods accurately inferred constraints and designed safe interactive motion plans for various classes of constraints, both convex and non-convex, from interaction demonstrations of agents with nonlinear dynamics.
Learning Constraints from Stochastic Partially-Observed Closed-Loop Demonstrations
We present a method for learning unknown parametric constraints from locally-optimal input-output trajectory data. We assume the data is generated by rollouts of stochastic nonlinear dynamics, under a single state or output feedback law and initial condition but distinct noise realizations, to robustly satisfy underlying constraints despite worst-case noise outcomes. We encode the Karush-Kuhn-Tucker (KKT) conditions of this robust optimal feedback control problem within a feasibility problem to recover constraints consistent with the local optimality of the demonstrations. We prove that our constraint learning method (i) accurately recovers the demonstrator's policy, and (ii) conservatively estimates the set of policies that ensure constraint satisfaction despite worst-case noise realizations. Moreover, we perform sensitivity analysis, proving that when demonstrations are corrupted by transmission error, the inaccuracy in the learned feedback law scales linearly in the error magnitude. Empirically, our method accurately recovers unknown constraints from simulated noisy, closed-loop demonstrations generated using dynamics, both linear and nonlinear, (e.g., unicycle and quadrotor) and a range of feedback mechanisms.
Impact of Physics-Informed Features on Neural Network Complexity for Li-ion Battery Voltage Prediction in Electric Vertical Takeoff and Landing Aircrafts
The electrification of vertical takeoff and landing aircraft demands high-fidelity battery management systems capable of predicting voltage response under aggressive power dynamics. While data-driven models offer high accuracy, they often require complex architectures and extensive training data. Conversely, equivalent circuit models (ECMs), such as the second-order model, offer physical interpretability but struggle with high C-rate non-linearities. This paper investigates the impact of integrating physics-based information into data-driven surrogate models. Specifically, we evaluate whether physics-informed features allow for the simplification of neural network architectures without compromising accuracy. Using the open-source electric vertical takeoff and landing (eVTOL) battery dataset, we compare pure data-driven models against physics-informed data models. Results demonstrate that physics-informed models achieve comparable accuracy to complex pure data-driven models while using up to 75% fewer trainable parameters, significantly reducing computational overhead for potential on-board deployment.
Boundary adaptive observer design for semilinear hyperbolic rolling contact ODE-PDE systems with uncertain friction
This paper presents an adaptive observer design for semilinear hyperbolic rolling contact ODE-PDE systems with uncertain friction characteristics parameterized by a matrix of unknown coefficients appearing in the nonlinear (and possibly non-smooth) PDE source terms. Under appropriate assumptions of forward completeness and boundary sensing, an adaptive observer is synthesized to simultaneously estimate the lumped and distributed states, as well as the uncertain friction parameters, using only boundary measurements. The observer combines a finite-dimensional parameter estimator with an infinite-dimensional description of the state error dynamics, and achieves exponential convergence under persistent excitation. The effectiveness of the proposed design is demonstrated in simulation by considering a relevant example borrowed from road vehicle dynamics.
comment: 12 pages, 5 figures. Under review at Automatica
A General Nonlinear Observer Design for Inertial Navigation Systems with Almost Global Stability Guarantees
This paper studies nonlinear observer design for rigid-body extended pose estimation using inertial measurements and generic exteroceptive sensing. The estimation problem is formulated as a cascade architecture that separates translational dynamics from rotational kinematics while preserving the geometric constraint of attitude evolution on $SO(3)$. By embedding the inertial navigation model into a Linear Time-Varying (LTV) representation, we construct an observer composed of a Kalman-Bucy-type estimator for translational states and an auxiliary unconstrained attitude variable, coupled with a nonlinear geometric reconstruction filter evolving on $SO(3)$. The cascade interconnection is analyzed within a nonlinear systems framework. We prove that uniform observability of the LTV subsystem guarantees almost global asymptotic stability of the overall nonlinear observer. For a benchmark GPS landmark aided configuration, explicit sufficient conditions on admissible trajectories are derived to ensure uniform observability. Simulation results illustrate the effectiveness of the proposed estimation framework.
comment: 8 pages
SafeLink: Safety-Critical Control Under Dynamic and Irregular Unsafe Regions
Control barrier functions (CBFs) provide a theoretical foundation for safety-critical control in robotic systems. However, most existing methods rely on explicit analytical expressions of unsafe state regions, which are often impractical for irregular and dynamic unsafe regions. This paper introduces SafeLink, a novel CBF construction method based on cost-sensitive incremental random vector functional-link (RVFL) neural networks. By designing a valid cost function, SafeLink assigns different sensitivities to safe and unsafe state points, thereby eliminating false negatives in classification of unsafe state points. Under the constructed CBF, theoretical guarantees are established regarding system safety and the Lipschitz continuity of the control inputs. Furthermore, incremental update theorems are provided, enabling precise real-time adaptation to changes in unsafe regions. An analytical expression for the gradient of SafeLink is also derived to facilitate control input computation. The proposed method is validated on the endpoint position control task of a nonlinear two-link manipulator. Experimental results demonstrate that the method effectively learns the unsafe regions and rapidly adapts as these regions change, achieving computational speeds significantly faster than baseline methods while ensuring the system safely reaches its target position.
comment: 12 pages, 7 figures
Human-Like Trajectories Generation via Receding Horizon Tracking Applied to the TickTacking Interface
TickTacking is a rhythm-based interface that allows users to control a pointer in a two-dimensional space through dual-button tapping. This paper investigates the generation of human-like trajectories using a receding horizon approach applied to the TickTacking interface in a target-tracking task. By analyzing user-generated trajectories, we identify key human behavioral features and incorporate them in a controller that mimics these behaviors. The performance of this human-inspired controller is evaluated against a baseline optimal-control-based agent, demonstrating the importance of specific control features for achieving human-like interaction. These findings contribute to the broader goal of developing rhythm-based human-machine interfaces by offering design insights that enhance user performance, improve intuitiveness, and reduce interaction frustration
A Saddle Point Algorithm for Robust Data-Driven Factor Model Problems
We study the factor model problem, which aims to uncover low-dimensional structures in high-dimensional datasets. Adopting a robust data-driven approach, we formulate the problem as a saddle-point optimization. Our primary contribution is a first-order algorithm that solves this reformulation by leveraging a linear minimization oracle (LMO). We further develop semi-closed form solutions (up to a scalar) for three specific LMOs, corresponding to the Frobenius norm, Kullback-Leibler divergence, and Gelbrich (aka Wasserstein) distance. The analysis includes explicit quantification of these LMOs' regularity conditions, notably the Lipschitz constants of the dual function, which govern the algorithm's convergence performance. Numerical experiments confirm our method's effectiveness in high-dimensional settings, outperforming standard off-the-shelf optimization solvers.
comment: Submitted to Automatica
Federated Aggregation of Demand Flexibility
This paper proposes a federated framework for demand flexibility aggregation to support grid operations. Unlike existing geometric methods that rely on a static, pre-defined base set as the geometric template for aggregation, our framework establishes a true federated process by enabling the collaborative optimization of this base set without requiring the participants sharing sensitive data with the aggregator. Specifically, we first formulate the base set optimization problem as a bilevel program. Using optimal solution functions, we then reformulate the bilevel program into a single-level, unconstrained learning task. By exploiting the decomposable structure of the overall gradient, we further design a decentralized gradient-based algorithm to solve this learning task. The entire framework, encompassing base set optimization, aggregation, and disaggregation, operates by design without exchanging raw user data. Numerical results demonstrate that our proposed framework unlocks substantially more flexibility than the approaches with static base sets, thus providing a promising framework for efficient and privacy-enhanced approaches to coordinate demand flexibility at scale.
comment: A paper submitted to IEEE Transactions on Smart Grid
Robotics
Adding More Value Than Work: Practical Guidelines for Integrating Robots into Intercultural Competence Learning
While social robots have demonstrated effectiveness in supporting students' intercultural competence development, it is unclear how they can effectively be adopted for integrated use in K-12 schools. We conducted two phases of design workshops with teachers, where they co-designed robot-mediated intercultural activities while considering student needs and school integration concerns. Using thematic analysis, we identify appropriate scenarios and roles for classroom robots, explore how robots could complement rather than replace teachers, and consider how to address ethical and compliance considerations. Our findings provide practical design guidelines for the HRI community to develop social robots that can effectively support intercultural education in K-12 schools.
From Ellipsoids to Midair Control of Dynamic Hitches
The ability to dynamically manipulate interaction between cables, carried by pairs of aerial vehicles attached to the ends of each cable, can greatly improve the versatility and agility of cable-assisted aerial manipulation. Such interlacing cables create hitches by winding two or more cables around each other, which can enclose payloads or can further develop into knots. Dynamic modeling and control of such hitches is key to mastering the inter-cable manipulation in context of cable-suspended aerial manipulation. This paper introduces an ellipsoid-based kinematic model to connect the geometric nature of a hitch created by two cables and the dynamics of the hitch driven by four aerial vehicles, which reveals the control-affine form of the system. As the constraint for maintaining tension of a cable is also control-affine, we design a quadratic programming-based controller that combines Control Lyapunov and High-Order Control Barrier Functions (CLF-HOCBF-QP) to precisely track a desired hitch position and system shape while enforcing safety constraints like cable tautness. We convert desired geometric reference configurations into target robot positions and introduce a composite error into the Lyapunov function to ensure a relative degree of one to the input. Numerical simulations validate our approach, demonstrating stable, high-speed tracking of dynamic references.
Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
comment: 15 pages
ForecastOcc: Vision-based Semantic Occupancy Forecasting
Autonomous driving requires forecasting both geometry and semantics over time to effectively reason about future environment states. Existing vision-based occupancy forecasting methods focus on motion-related categories such as static and dynamic objects, while semantic information remains largely absent. Recent semantic occupancy forecasting approaches address this gap but rely on past occupancy predictions obtained from separate networks. This makes current methods sensitive to error accumulation and prevents learning spatio-temporal features directly from images. In this work, we present ForecastOcc, the first framework for vision-based semantic occupancy forecasting that jointly predicts future occupancy states and semantic categories. Our framework yields semantic occupancy forecasts for multiple horizons directly from past camera images, without relying on externally estimated maps. We evaluate ForecastOcc in two complementary settings: multi-view forecasting on the Occ3D-nuScenes dataset and monocular forecasting on SemanticKITTI, where we establish the first benchmark for this task. We introduce the first baselines by adapting two 2D forecasting modules within our framework. Importantly, we propose a novel architecture that incorporates a temporal cross-attention forecasting module, a 2D-to-3D view transformer, a 3D encoder for occupancy prediction, and a semantic occupancy head for voxel-level forecasts across multiple horizons. Extensive experiments on both datasets show that ForecastOcc consistently outperforms baselines, yielding semantically rich, future-aware predictions that capture scene dynamics and semantics critical for autonomous driving.
Analyzing the Impact of Simulation Fidelity on the Evaluation of Autonomous Driving Motion Control
Simulation is crucial in the development of autonomous driving software. In particular, assessing control algorithms requires an accurate vehicle dynamics simulation. However, recent publications use models with varying levels of detail. This disparity makes it difficult to compare individual control algorithms. Therefore, this paper aims to investigate the influence of the fidelity of vehicle dynamics modeling on the closed-loop behavior of trajectory-following controllers. For this purpose, we introduce a comprehensive Autoware-compatible vehicle model. By simplifying this, we derive models with varying fidelity. Evaluating over 550 simulation runs allows us to quantify each model's approximation quality compared to real-world data. Furthermore, we investigate whether the influence of model simplifications changes with varying margins to the acceleration limit of the vehicle. From this, we deduce to which degree a vehicle model can be simplified to evaluate control algorithms depending on the specific application. The real-world data used to validate the simulation environment originate from the Indy Autonomous Challenge race at the Autodromo Nazionale di Monza in June 2023. They show the fastest fully autonomous lap of TUM Autonomous Motorsport, with vehicle speeds reaching 267 kph and lateral accelerations of up to 15 mps2.
comment: Accepted for publication at the IEEE IV 2024
Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps
Accurate prediction of driving scene is a challenging task due to uncertainty in sensor data, the complex behaviors of agents, and the possibility of multiple feasible futures. Existing prediction methods using occupancy grid maps primarily focus on agent-agnostic scene predictions, while agent-specific predictions provide specialized behavior insights with the help of semantic information. However, both paradigms face distinct limitations: agent-agnostic models struggle to capture the behavioral complexities of dynamic actors, whereas agent-specific approaches fail to generalize to poorly perceived or unrecognized agents; combining both enables robust and safer motion forecasting. To address this, we propose a unified framework by leveraging Dynamic Occupancy Grid Maps within a streamlined temporal decoding pipeline to simultaneously predict future occupancy state grids, vehicle grids, and scene flow grids. Relying on a lightweight spatiotemporal backbone, our approach is centered on a tailored, interdependent loss function that captures inter-grid dependencies and enables diverse future predictions. By using occupancy state information to enforce flow-guided transitions, the loss function acts as a regularizer that directs occupancy evolution while accounting for obstacles and occlusions. Consequently, the model not only predicts the specific behaviors of vehicle agents, but also identifies other dynamic entities and anticipates their evolution within the complex scene. Evaluations on real-world nuScenes and Woven Planet datasets demonstrate superior prediction performances for dynamic vehicles and generic dynamic scene elements compared to baseline methods.
comment: Updated version with major revisions; currently under the second round of review at IEEE Transactions on Intelligent Vehicles
Feasibility-Guided Planning over Multi-Specialized Locomotion Policies ICRA 2026
Planning over unstructured terrain presents a significant challenge in the field of legged robotics. Although recent works in reinforcement learning have yielded various locomotion strategies, planning over multiple experts remains a complex issue. Existing approaches encounter several constraints: traditional planners are unable to integrate skill-specific policies, whereas hierarchical learning frameworks often lose interpretability and require retraining whenever new policies are added. In this paper, we propose a feasibility-guided planning framework that successfully incorporates multiple terrain-specific policies. Each policy is paired with a Feasibility-Net, which learned to predict feasibility tensors based on the local elevation maps and task vectors. This integration allows classical planning algorithms to derive optimal paths. Through both simulated and real-world experiments, we demonstrate that our method efficiently generates reliable plans across diverse and challenging terrains, while consistently aligning with the capabilities of the underlying policies.
comment: ICRA 2026
Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities
Securing petroleum infrastructure requires balancing autonomous system efficiency with human judgment for threat escalation, a challenge unaddressed by classical facility location models assuming homogeneous resources. This paper formulates the Human-Robot Co-Dispatch Facility Location Problem (HRCD-FLP), a capacitated facility location variant incorporating tiered infrastructure criticality, human-robot supervision ratio constraints, and minimum utilization requirements. We evaluate command center selection across three technology maturity scenarios. Results show transitioning from conservative (1:3 human-robot supervision) to future autonomous operations (1:10) yields significant cost reduction while maintaining complete critical infrastructure coverage. For small problems, exact methods dominate in both cost and computation time; for larger problems, the proposed heuristic achieves feasible solutions in under 3 minutes with approximately 14% optimality gap where comparison is possible. From systems perspective, our work demonstrate that optimized planning for human-robot teaming is key to achieve both cost-effective and mission-reliable deployments.
Multi-Agent Route Planning as a QUBO Problem
Multi-Agent Route Planning considers selecting vehicles, each associated with a single predefined route, such that the spatial coverage of a road network is increased while redundant overlaps are limited. This paper gives a formal problem definition, proves NP-hardness by reduction from the Weighted Set Packing problem, and derives a Quadratic Unconstrained Binary Optimization formulation whose coefficients directly encode unique coverage rewards and pairwise overlap penalties. A single penalty parameter controls the coverage-overlap trade-off. We distinguish between a soft regime, which supports multi-objective exploration, and a hard regime, in which the penalty is strong enough to effectively enforce near-disjoint routes. We describe a practical pipeline for generating city instances, constructing candidate routes, building the QUBO matrix, and solving it with an exact mixed-integer solver (Gurobi), simulated annealing, and D-Wave hybrid quantum annealing. Experiments on Barcelona instances with up to 10 000 vehicles reveal a clear coverage-overlap knee and show that Pareto-optimal solutions are mainly obtained under the hard-penalty regime, while D-Wave hybrid solvers and Gurobi achieve essentially identical objective values with only minor differences in runtime as problem size grows.
Incremental Mapping with Measurement Synchronization & Compression
Modern autonomous vehicles and robots utilize versatile sensors for localization and mapping. The fidelity of these maps is paramount, as an accurate environmental representation is a prerequisite for stable and precise localization. Factor graphs provide a powerful approach for sensor fusion, enabling the estimation of the maximum a posteriori solution. However, the discrete nature of graph-based representations, combined with asynchronous sensor measurements, complicates consistent state estimation. The design of an optimal factor graph topology remains an open challenge, especially in multi-sensor systems with asynchronous data. Conventional approaches rely on a rigid graph structure, which becomes inefficient with sensors of disparate rates. Although preintegration techniques can mitigate this for high-rate sensors, their applicability is limited. To address this problem, this work introduces a novel approach that incrementally constructs connected factor graphs, ensuring the incorporation of all available sensor data by choosing the optimal graph topology based on the external evaluation criteria. The proposed methodology facilitates graph compression, reducing the number of nodes (optimized variables) by ~30% on average while maintaining map quality at a level comparable to conventional approaches.
comment: 8 pages, 4 figures, 1 table
Research on a Camera Position Measurement Method based on a Parallel Perspective Error Transfer Model
Camera pose estimation from sparse correspondences is a fundamental problem in geometric computer vision and remains particularly challenging in near-field scenarios, where strong perspective effects and heterogeneous measurement noise can significantly degrade the stability of analytic PnP solutions. In this paper, we present a geometric error propagation framework for camera pose estimation based on a parallel perspective approximation. By explicitly modeling how image measurement errors propagate through perspective geometry, we derive an error transfer model that characterizes the relationship between feature point distribution, camera depth, and pose estimation uncertainty. Building on this analysis, we develop a pose estimation method that leverages parallel perspective initialization and error-aware weighting within a Gauss-Newton optimization scheme, leading to improved robustness in proximity operations. Extensive experiments on both synthetic data and real-world images, covering diverse conditions such as strong illumination, surgical lighting, and underwater low-light environments, demonstrate that the proposed approach achieves accuracy and robustness comparable to state-of-the-art analytic and iterative PnP methods, while maintaining high computational efficiency. These results highlight the importance of explicit geometric error modeling for reliable camera pose estimation in challenging near-field settings.
comment: 32 pages, 19 figures
System-Level Error Propagation and Tail-Risk Amplification in Reference-Based Robotic Navigation
Image guided robotic navigation systems often rely on reference based geometric perception pipelines, where accurate spatial mapping is established through multi stage estimation processes. In biplanar X ray guided navigation, such pipelines are widely used due to their real time capability and geometric interpretability. However, navigation reliability can be constrained by an overlooked system level failure mechanism in which installation induced structural perturbations introduced at the perception stage are progressively amplified along the perception reconstruction execution chain and dominate execution level error and tail risk behavior. This paper investigates this mechanism from a system level perspective and presents a unified error propagation modeling framework that characterizes how installation induced structural perturbations propagate and couple with pixel level observation noise through biplanar imaging, projection matrix estimation, triangulation, and coordinate mapping. Using first order analytic uncertainty propagation and Monte Carlo simulations, we analyze dominant sensitivity channels and quantify worst case error behavior beyond mean accuracy metrics. The results show that rotational installation error is a primary driver of system level error amplification, while translational misalignment of comparable magnitude plays a secondary role under typical biplanar geometries. Real biplanar X ray bench top experiments further confirm that the predicted amplification trends persist under realistic imaging conditions. These findings reveal a broader structural limitation of reference based multi stage geometric perception pipelines and provide a framework for system level reliability analysis and risk aware design in safety critical robotic navigation systems.
comment: 13 pages, 8 figures
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning
Current Vision-Language-Action (VLA) models rely on fixed computational depth, expending the same amount of compute on simple adjustments and complex multi-step manipulation. While Chain-of-Thought (CoT) prompting enables variable computation, it scales memory linearly and is ill-suited for continuous action spaces. We introduce Recurrent-Depth VLA (RD-VLA), an architecture that achieves computational adaptivity via latent iterative refinement rather than explicit token generation. RD-VLA employs a recurrent, weight-tied action head that supports arbitrary inference depth with a constant memory footprint. The model is trained using truncated backpropagation through time (TBPTT) to efficiently supervise the refinement process. At inference, RD-VLA dynamically allocates compute using an adaptive stopping criterion based on latent convergence. Experiments on challenging manipulation tasks show that recurrent depth is critical: tasks that fail entirely (0 percent success) with single-iteration inference exceed 90 percent success with four iterations, while simpler tasks saturate rapidly. RD-VLA provides a scalable path to test-time compute in robotics, replacing token-based reasoning with latent reasoning to achieve constant memory usage and up to 80x inference speedup over prior reasoning-based VLA models. Project page: https://rd-vla.github.io/
comment: 11 Pages, Project page:https://rd-vla.github.io/
RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI
Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.
CoLF: Learning Consistent Leader-Follower Policies for Vision-Language-Guided Multi-Robot Cooperative Transport
In this study, we address vision-language-guided multi-robot cooperative transport, where each robot grounds natural-language instructions from onboard camera observations. A key challenge in this decentralized setting is perceptual misalignment across robots, where viewpoint differences and language ambiguity can yield inconsistent interpretations and degrade cooperative transport. To mitigate this problem, we adopt a dependent leader-follower design, where one robot serves as the leader and the other as the follower. Although such a leader-follower structure appears straightforward, learning with independent and symmetric agents often yields symmetric or unstable behaviors without explicit inductive biases. To address this challenge, we propose Consistent Leader-Follower (CoLF), a multi-agent reinforcement learning (MARL) framework for stable leader-follower role differentiation. CoLF consists of two key components: (1) an asymmetric policy design that induces leader-follower role differentiation, and (2) a mutual-information-based training objective that maximizes a variational lower bound, encouraging the follower to predict the leader's action from its local observation. The leader and follower policies are jointly optimized under the centralized training and decentralized execution (CTDE) framework to balance task execution and consistent cooperative behaviors. We validate CoLF in both simulation and real-robot experiments using two quadruped robots. The demonstration video is available at https://sites.google.com/view/colf/.
comment: 9 pages, 5 figures
Global Symmetry and Orthogonal Transformations from Geometrical Moment $n$-tuples
Detecting symmetry is crucial for effective object grasping for several reasons. Recognizing symmetrical features or axes within an object helps in developing efficient grasp strategies, as grasping along these axes typically results in a more stable and balanced grip, thereby facilitating successful manipulation. This paper employs geometrical moments to identify symmetries and estimate orthogonal transformations, including rotations and mirror transformations, for objects centered at the frame origin. It provides distinctive metrics for detecting symmetries and estimating orthogonal transformations, encompassing rotations, reflections, and their combinations. A comprehensive methodology is developed to obtain these functions in n-dimensional space, specifically moment \( n \)-tuples. Extensive validation tests are conducted on both 2D and 3D objects to ensure the robustness and reliability of the proposed approach. The proposed method is also compared to state-of-the-art work using iterative optimization for detecting multiple planes of symmetry. The results indicate that combining our method with the iterative one yields satisfactory outcomes in terms of the number of symmetry planes detected and computation time.
Virtual Community: An Open World for Humans, Robots, and Society
The rapid progress in AI and Robotics may lead to a profound societal transformation, as humans and robots begin to coexist within shared communities, introducing both opportunities and challenges. To explore this future, we present Virtual Community-an open-world platform for humans, robots, and society-built on a universal physics engine and grounded in real-world 3D scenes. With Virtual Community, we aim to enable the study of embodied social intelligence at scale. To support these, Virtual Community features: 1) An open-source multi-agent physics simulator that supports robots, humans, and their interactions within a society; 2) A large-scale, real-world aligned community generation pipeline, including vast outdoor space, diverse indoor scenes, and a community of grounded agents with rich characters and appearances. Leveraging Virtual Community, we propose two novel challenges. The Community Planning Challenge evaluates multi-agent reasoning and planning ability in open-world settings, such as cooperating to help agents with daily activities and efficiently connecting other agents. The Community Robot Challenge requires multiple heterogeneous robots to collaborate in solving complex open-world tasks. We evaluate various baselines on these tasks and demonstrate the challenges in both high-level open-world task planning and low-level cooperation controls. We hope that Virtual Community will unlock further study of human-robot coexistence within open-world environments.
comment: website https://virtual-community-ai.github.io/
Collision Risk Estimation via Loss Prediction in End-to-End Autonomous Driving
Collision risk estimation and avoidance play central roles in the safety of autonomous driving (AD) systems. Recently emerged end-to-end AD systems gain collision avoidance ability by minimizing losses to penalize planning trajectories that are too close to other objects. Despite a significant collision rate during testing, most end-to-end planners do not explicitly quantify the collision risk in their outputs. To address this, we introduce RiskMonitor, an efficient plug-and-play module that interprets planning and motion tokens from state-of-the-art end-to-end planners to estimate collision risk. Inspired by loss prediction based uncertainty quantification, RiskMonitor predicts whether the collision loss -- commonly adopted to train end-to-end planners -- is positive along planned waypoints, framing collision risk estimation as a binary classification task. We evaluate RiskMonitor on the real-world nuScenes dataset (open-loop) and the neural-rendering based simulator, NeuroNCAP (closed-loop). Our token-driven method outperforms prediction-driven approaches, including deterministic rules, Gaussian mixture models, and Monte Carlo Dropout. When integrated with a simple braking policy, RiskMonitor improves collision avoidance ability by $66.5\%$ in a closed-loop test on safety-critical scenarios. These results demonstrate that monitoring collision risk using plan and motion tokens enhances the safety of end-to-end AD without retraining it.
Neural-Augmented Kelvinlet for Real-Time Soft Tissue Deformation Modeling
Accurate and efficient modeling of soft-tissue interactions is fundamental for advancing surgical simulation, surgical robotics, and model-based surgical automation. To achieve real-time latency, classical Finite Element Method (FEM) solvers are often replaced with neural approximations; however, naively training such models in a fully data-driven manner without incorporating physical priors frequently leads to poor generalization and physically implausible predictions. We present a novel physics-informed neural simulation framework that enables real-time prediction of soft-tissue deformations under complex single- and multi-grasper interactions. Our approach integrates Kelvinlet-based analytical priors with large-scale FEM data, capturing both linear and nonlinear tissue responses. This hybrid design improves predictive accuracy and physical plausibility across diverse neural architectures while maintaining the low-latency performance required for interactive applications. We validate our method on challenging surgical manipulation tasks involving standard laparoscopic grasping tools, demonstrating substantial improvements in deformation fidelity and temporal stability over existing baselines. These results establish Kelvinlet-augmented learning as a principled and computationally efficient paradigm for real-time, physics-aware soft-tissue simulation in surgical AI.
RAP: 3D Rasterization Augmented End-to-End Planning
Imitation learning for end-to-end driving trains policies only on expert demonstrations. Once deployed in a closed loop, such policies lack recovery data: small mistakes cannot be corrected and quickly compound into failures. A promising direction is to generate alternative viewpoints and trajectories beyond the logged path. Prior work explores photorealistic digital twins via neural rendering or game engines, but these methods are prohibitively slow and costly, and thus mainly used for evaluation. In this work, we argue that photorealism is unnecessary for training end-to-end planners. What matters is semantic fidelity and scalability: driving depends on geometry and dynamics, not textures or lighting. Motivated by this, we propose 3D Rasterization, which replaces costly rendering with lightweight rasterization of annotated primitives, enabling augmentations such as counterfactual recovery maneuvers and cross-agent view synthesis. To transfer these synthetic views effectively to real-world deployment, we introduce a Raster-to-Real feature-space alignment that bridges the sim-to-real gap. Together, these components form Rasterization Augmented Planning (RAP), a scalable data augmentation pipeline for planning. RAP achieves state-of-the-art closed-loop robustness and long-tail generalization, ranking first on four major benchmarks: NAVSIM v1/v2, Waymo Open Dataset Vision-based E2E Driving, and Bench2Drive. Our results show that lightweight rasterization with feature alignment suffices to scale E2E training, offering a practical alternative to photorealistic rendering. Project page: https://alan-lanfeng.github.io/RAP/.
Agentic Vehicles for Human-Centered Mobility
Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as systems that perceive their environment and execute pre-programmed tasks independently of external input, consistent with the SAE levels of automated driving. Yet recent research and real-world deployments have begun to showcase vehicles that exhibit behaviors outside the scope of this definition. These include natural language interaction with humans, goal adaptation, contextual reasoning, external tool use, and the handling of unforeseen ethical dilemmas, enabled in part by multimodal large language models (LLMs). These developments highlight not only a gap between technical autonomy and the broader cognitive and social capacities required for human-centered mobility, but also the emergence of a form of vehicle intelligence that currently lacks a clear designation. To address this gap, the paper introduces the concept of agentic vehicles (AgVs): vehicles that integrate agentic AI systems to reason, adapt, and interact within complex environments. It synthesizes recent advances in agentic systems and suggests how AgVs can complement and even reshape conventional autonomy to ensure mobility services are aligned with user and societal needs. The paper concludes by outlining key challenges in the development and governance of AgVs and their potential role in shaping future agentic transportation systems.
An Adaptive Inspection Planning Approach Towards Routine Monitoring in Uncertain Environments ICRA 2026
In this work, we present a hierarchical framework designed to support robotic inspection under environment uncertainty. By leveraging a known environment model, existing methods plan and safely track inspection routes to visit points of interest. However, discrepancies between the model and actual site conditions, caused by either natural or human activities, can alter the surface morphology or introduce path obstructions. To address this challenge, the proposed framework divides the inspection task into: (a) generating the initial global view-plan for region of interests based on a historical map and (b) local view replanning to adapt to the current morphology of the inspection scene. The proposed hierarchy preserves global coverage objectives while enabling reactive adaptation to the local surface morphology. This enables the local autonomy to remain robust against environment uncertainty and complete the inspection tasks. We validate the approach through deployments in real-world subterranean mines using quadrupedal robot. A supplementary media highlighting the proposed method can be found here https://youtu.be/6TxK8S_83Lw.
comment: Accepted to ICRA 2026
Omni-LIVO: Robust RGB-Colored Multi-Camera Visual-Inertial-LiDAR Odometry via Photometric Migration and ESIKF Fusion
Wide field-of-view (FoV) LiDAR sensors provide dense geometry across large environments, but existing LiDAR-inertial-visual odometry (LIVO) systems generally rely on a single camera, limiting their ability to fully exploit LiDAR-derived depth for photometric alignment and scene colorization. We present Omni-LIVO, a tightly coupled multi-camera LIVO system that leverages multi-view observations to comprehensively utilize LiDAR geometric information across extended spatial regions. Omni-LIVO introduces a Cross-View direct alignment strategy that maintains photometric consistency across non-overlapping views, and extends the Error-State Iterated Kalman Filter (ESIKF) with multi-view updates and adaptive covariance. The system is evaluated on public benchmarks and our custom dataset, showing improved accuracy and robustness over state-of-the-art LIVO, LIO, and visual-inertial SLAM baselines. Code and dataset will be released upon publication.
comment: Accepted by IEEE Robotics and Automation Letters (RA-L). Early Access version available. This version supersedes all previous versions and is the official accepted manuscript for citation
Gust Estimation and Rejection with a Disturbance Observer for Proprioceptive Underwater Soft Morphing Wings ICRA
Unmanned underwater vehicles are increasingly employed for maintenance and surveying tasks at sea, but their operation in shallow waters is often hindered by hydrodynamic disturbances such as waves, currents, and turbulence. These unsteady flows can induce rapid changes in direction and speed, compromising vehicle stability and manoeuvrability. Marine organisms contend with such conditions by combining proprioceptive feedback with flexible fins and tails to reject disturbances. Inspired by this strategy, we propose soft morphing wings endowed with proprioceptive sensing to mitigate environmental perturbations. The wing's continuous deformation provides a natural means to infer dynamic disturbances: sudden changes in camber directly reflect variations in the oncoming flow. By interpreting this proprioceptive signal, a disturbance observer can reconstruct flow parameters in real time. To enable this, we develop and experimentally validate a dynamic model of a hydraulically actuated soft wing with controllable camber. We then show that curvature-based sensing allows accurate estimation of disturbances in the angle of attack. Finally, we demonstrate that a controller leveraging these proprioceptive estimates can reject disturbances in the lift response of the soft wing. By combining proprioceptive sensing with a disturbance observer, this technique mirrors biological strategies and provides a pathway for soft underwater vehicles to maintain stability in hazardous environments.
comment: 2026 IEEE International Conference on Robotics & Automation (ICRA)
Sim-to-Real Dynamic Object Manipulation on Conveyor Systems via Optimization Path Shaping
Realizing generalizable dynamic object manipulation on conveyor systems is important for enhancing manufacturing efficiency, as it eliminates specialized engineering for different scenarios. To this end, imitation learning emerges as a promising paradigm, leveraging expert demonstrations to teach a policy manipulation skills. Although the generalization of an imitation learning policy can be improved by increasing demonstrations, demonstration collection is labor-intensive. Besides, public dynamic object manipulation data is scarce. In this work, we address this data scarcity problem via generating demonstrations in a simulator. A significant challenge of using simulated data lies in the appearance gap between simulated and real-world observations. To tackle this challenge, we propose Geometry-Enhanced Model (GEM), which employs our designed appearance noise annealing strategy to shape the policy optimization path, thereby prioritizing the geometry information in observations. Extensive experiments in simulated and real-world tasks demonstrate that GEM can generalize across environment backgrounds, robot embodiments, motion dynamics, and object geometries. Notably, GEM is deployed in a real canteen for tableware collection. Without test-scene data, GEM achieves a success rate of over 97% across more than 10,000 operations.
Safe Exploration via Policy Priors
Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.
SPOT: Spatio-Temporal Obstacle-free Trajectory Planning for UAVs in an Unknown Dynamic Environment ICRA 2026
We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
comment: Accepted for publication at ICRA 2026
Knowledge-Centric Metacognitive Learning
Interactions are central to intelligent reasoning and learning abilities, with the interpretation of abstract knowledge guiding meaningful interaction with objects in the environment. While humans readily adapt to novel situations by leveraging abstract knowledge acquired over time, artificial intelligence systems lack principled mechanisms for incorporating abstract knowledge into learning, leading to fundamental challenges in the emergence of intelligent and adaptive behavior. To address this gap, we introduce knowledge-centric metacognitive learning based on three key principles: natural abstractions, knowledge-guided interactions through interpretation, and the composition of interactions for problem solving. Knowledge learning facilitates the acquisition of abstract knowledge and the association of interactions with knowledge, while object interactions guided by abstract knowledge enable the learning of transferable interaction concepts, abstract reasoning, and generalization. This metacognitive mechanism provides a principled approach for integrating knowledge into reinforcement learning and offers a promising pathway toward intelligent and adaptive behavior in artificial intelligence, robotics, and autonomous systems.
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents
While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
Residual Vector Quantization For Communication-Efficient Multi-Agent Perception ICASSP 2026
Multi-agent collaborative perception (CP) improves scene understanding by sharing information across connected agents such as autonomous vehicles, unmanned aerial vehicles, and robots. Communication bandwidth, however, constrains scalability. We present ReVQom, a learned feature codec that preserves spatial identity while compressing intermediate features. ReVQom is an end-to-end method that compresses feature dimensions via a simple bottleneck network followed by multi-stage residual vector quantization (RVQ). This allows only per-pixel code indices to be transmitted, reducing payloads from 8192 bits per pixel (bpp) of uncompressed 32-bit float features to 6-30 bpp per agent with minimal accuracy loss. On DAIR-V2X real-world CP dataset, ReVQom achieves 273x compression at 30 bpp to 1365x compression at 6 bpp. At 18 bpp (455x), ReVQom matches or outperforms raw-feature CP, and at 6-12 bpp it enables ultra-low-bandwidth operation with graceful degradation. ReVQom allows efficient and accurate multi-agent collaborative perception with a step toward practical V2X deployment.
comment: Accepted at ICASSP 2026. 5 pages
Constructing the Umwelt: Cognitive Planning through Belief-Intent Co-Evolution
This paper challenges a prevailing epistemological assumption in End-to-End Autonomous Driving: that high-performance planning necessitates high-fidelity world reconstruction. Inspired by cognitive science, we propose the Mental Bayesian Causal World Model (MBCWM) and instantiate it as the Tokenized Intent World Model (TIWM), a novel cognitive computing architecture. Its core philosophy posits that intelligence emerges not from pixel-level objective fidelity, but from the Cognitive Consistency between the agent's internal intentional world and physical reality. By synthesizing von Uexküll's $\textit{Umwelt}$ theory, the neural assembly hypothesis, and the triple causal model (integrating symbolic deduction, probabilistic induction, and force dynamics) into an end-to-end embodied planning system, we demonstrate the feasibility of this paradigm on the nuPlan benchmark. Experimental results in open-loop validation confirm that our Belief-Intent Co-Evolution mechanism effectively enhances planning performance. Crucially, in closed-loop simulations, the system exhibits emergent human-like cognitive behaviors, including map affordance understanding, free exploration, and self-recovery strategies. We identify Cognitive Consistency as the core learning mechanism: during long-term training, belief (state understanding) and intent (future prediction) spontaneously form a self-organizing equilibrium through implicit computational replay, achieving semantic alignment between internal representations and physical world affordances. TIWM offers a neuro-symbolic, cognition-first alternative to reconstruction-based planners, establishing a new direction: planning as active understanding, not passive reaction.
comment: 12 pages, 8 figures. A paradigm shift from reconstructing the world to understanding it: planning through Belief-Intent Co-Evolution
Multiagent Systems
Effect of Electoral Seat Bias on Political Polarization: A Computational Perspective
Research on the causes of political polarization points towards multiple drivers of the problem, from social and psychological to economic and technological. However, political institutions stand out, because -- while capable of exacerbating or alleviating polarization -- they can be re-engineered more readily than others. Accordingly, we analyze one class of such institutions -- electoral systems -- investigating whether the large-party seat bias found in many common systems (particularly plurality and Jefferson-D'Hondt) exacerbates polarization. Cross-national empirical data being relatively sparse and heavily confounded, we use computational methods: an agent-based Monte Carlo simulation. We model voter behavior over multiple electoral cycles, building upon the classic spatial model, but incorporating other known voter behavior patterns, such as the bandwagon effect, strategic voting, preference updating, retrospective voting, and the thermostatic effect. We confirm our hypothesis that electoral systems with a stronger large-party bias exhibit significantly higher polarization, as measured by the Mehlhaff index.
comment: 18 pages, 24 figures
Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems
Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet methods for interpretable failure detection and attribution remain underdeveloped. We introduce a two-stage gradient-based framework that provides interpretable diagnostics for three critical failure analysis tasks: (1) detecting the true initial failure source (Patient-0); (2) validating why non-attacked agents may be flagged first due to domino effects; and (3) tracing how failures propagate through learned coordination pathways. Stage 1 performs interpretable per-agent failure detection via Taylor-remainder analysis of policy-gradient costs, declaring an initial Patient-0 candidate at the first threshold crossing. Stage 2 provides validation through geometric analysis of critic derivatives-first-order sensitivity and directional second-order curvature aggregated over causal windows to construct interpretable contagion graphs. This approach explains "downstream-first" detection anomalies by revealing pathways that amplify upstream deviations. Evaluated across 500 episodes in Simple Spread (3 and 5 agents) and 100 episodes in StarCraft II using MADDPG and HATRPO, our method achieves 88.2-99.4% Patient-0 detection accuracy while providing interpretable geometric evidence for detection decisions. By moving beyond black-box detection to interpretable gradient-level forensics, this framework offers practical tools for diagnosing cascading failures in safety-critical MARL systems.
CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment
Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Existing LLM-based offensive agent evaluations rely on closed-world settings with predefined goals and binary success criteria. To address this gap, we introduce CyberExplorer, an evaluation suite with two core components: (1) an open-environment benchmark built on a virtual machine hosting 40 vulnerable web services derived from real-world CTF challenges, where agents autonomously perform reconnaissance, target selection, and exploitation without prior knowledge of vulnerability locations; and (2) a reactive multi-agent framework supporting dynamic exploration without predefined plans. CyberExplorer enables fine-grained evaluation beyond flag recovery, capturing interaction dynamics, coordination behavior, failure modes, and vulnerability discovery signals-bridging the gap between benchmarks and realistic multi-target attack scenarios.
Leader-following Consensus over Jointly Connected Switching Networks is Achievable for Exponentially Unstable Linear Systems
The leader-following consensus problem for general linear multi-agent systems over jointly connected switching networks has been a challenging problem and the solvability of the problem has been limited to the class of linear multi-agent systems whose system matrix is marginally stable. This condition is restrictive since it even excludes the most commonly used double-integrator system. This paper presents a breakthrough by demonstrating that leader-following exponential consensus is achievable for general linear multi-agent systems over jointly connected switching networks, even when the system matrix is exponentially unstable. The degree of instability can be explicitly characterized by two key quantities that arise from the jointly connected condition on a switching graph. By exploiting duality, we further show that the output-based distributed observer design problem for a general leader system is solvable over jointly connected switching networks, even when the system matrix is exponentially unstable. This is also in sharp contrast to the existing distributed observers, which rely on the assumption that the leader system is marginally stable.
DHEA-MECD: An Embodied Intelligence-Powered DRL Algorithm for AUV Tracking in Underwater Environments with High-Dimensional Features
In recent years, autonomous underwater vehicle (AUV) systems have demonstrated significant potential in complex marine exploration. However, effective AUV-based tracking remains challenging in realistic underwater environments characterized by high-dimensional features, including coupled kinematic states, spatial constraints, time-varying environmental disturbances, etc. To address these challenges, this paper proposes a hierarchical embodied-intelligence (EI) architecture for underwater multi-target tracking with AUVs in complex underwater environments. Built upon this architecture, we introduce the Double-Head Encoder-Attention-based Multi-Expert Collaborative Decision (DHEA-MECD), a novel Deep Reinforcement Learning (DRL) algorithm designed to support efficient and robust multi-target tracking. Specifically, in DHEA-MECD, a Double-Head Encoder-Attention-based information extraction framework is designed to semantically decompose raw sensory observations and explicitly model complex dependencies among heterogeneous features, including spatial configurations, kinematic states, structural constraints, and stochastic perturbations. On this basis, a motion-stage-aware multi-expert collaborative decision mechanism with Top-k expert selection strategy is introduced to support stage-adaptive decision-making. Furthermore, we propose the DHEA-MECD-based underwater multitarget tracking algorithm to enable AUV smart, stable, and anti-interference multi-target tracking. Extensive experimental results demonstrate that the proposed approach achieves superior tracking success rates, faster convergence, and improved motion optimality compared with mainstream DRL-based methods, particularly in complex and disturbance-rich marine environments.
Talk, Judge, Cooperate: Gossip-Driven Indirect Reciprocity in Self-Interested LLM Agents
Indirect reciprocity, which means helping those who help others, is difficult to sustain among decentralized, self-interested LLM agents without reliable reputation systems. We introduce Agentic Linguistic Gossip Network (ALIGN), an automated framework where agents strategically share open-ended gossip using hierarchical tones to evaluate trustworthiness and coordinate social norms. We demonstrate that ALIGN consistently improves indirect reciprocity and resists malicious entrants by identifying and ostracizing defectors without changing intrinsic incentives. Notably, we find that stronger reasoning capabilities in LLMs lead to more incentive-aligned cooperation, whereas chat models often over-cooperate even when strategically suboptimal. These results suggest that leveraging LLM reasoning through decentralized gossip is a promising path for maintaining social welfare in agentic ecosystems. Our code is available at https://github.com/shuhui-zhu/ALIGN.
Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study
Large language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against a validated NetLogo baseline, and quantitative measures of runtime efficiency and maintainability. Results show that behaviorally faithful implementations are achievable but not guaranteed, and that executability alone is insufficient for scientific use. GPT-4.1 consistently produces statistically valid and efficient implementations, with Claude 3.7 Sonnet performing well but less reliably. Overall, the findings clarify both the promise and current limitations of LLMs as model engineering tools, with implications for reproducible agent-based and environmental modelling.
Achieving Unanimous Consensus Through Multi-Agent Deliberation
Blockchain consensus mechanisms have relied on algorithms such as Proof-of-Work (PoW) and Proof-of-Stake (PoS) to ensure network functionality and integrity. However, these approaches struggle with adaptability for decision-making where the opinions of each matter rather than reaching an agreement based on honest majority or weighted consensus. This paper introduces a novel deliberation-based consensus mechanism where Large Language Models (LLMs) act as rational agents engaging in structured discussions to reach a unanimous consensus. By leveraging graded consensus and a multi-round deliberation process, our approach ensures unanimous consensus for definitive problems and graded consensus for prioritized decision problems and policies. We provide a formalization of our system and use it to show that the properties of blockchains are maintained, while also addressing the behavior in terms of adversaries, stalled deliberations, and confidence in consensus. Moreover, experimental results demonstrate system feasibility, showcasing convergence, block properties, and accuracy, which enable deliberative decision-making on blockchain networks.
comment: 6 pages, 4 figure, 2 tables
GLP: A Grassroots, Multiagent, Concurrent, Logic Programming Language
Grassroots platforms are distributed systems with multiple instances that can (1) operate independently of each other and of any global resource other than the network, and (2) coalesce into ever larger instances, possibly resulting in a single global instance. Here, we present Grassroots Logic Programs (GLP), a multiagent concurrent logic programming language designed for the implementation of grassroots platforms. We introduce the language incrementally: We recall the standard operational semantics of logic programs; introduce the operational semantics of Concurrent (single-agent) GLP as a restriction of that of LP; recall the notion of multiagent transition systems and atomic transactions; introduce the operational semantics of multiagent GLP via a multiagent transition system specified via atomic transactions; and prove multiagent GLP to be grassroots. The accompanying programming example is the grassroots social graph -- the infrastructure grassroots platform on which all others are based. With the mathematical foundations presented here: a workstation-based implementation of Concurrent GLP was developed by AI, based on the operational semantics of Concurrent GLP; a distributed peer-to-peer smartphone-based implementation of multiagent GLP is being developed by AI, based on the operational semantics of multiagent GLP; a moded type system for GLP was implemented by AI, to facilitate the specification of GLP programs by human and AI designers, for their programming by AI; all reported in detail in companion papers.
Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation ACL
Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineering where reliable symbolic reasoning is essential. Although recent advances in multi-agent LLM-based systems have enhanced their mathematical reasoning capabilities, they still lack a reliably revisable representation of the reasoning process. Existing agents either operate in rigid sequential pipelines that cannot correct earlier steps or rely on heuristic self-evaluation that can fail to identify and fix errors. In addition, programmatic context can distract language models and degrade accuracy. To address these gaps, we introduce Iteratively Improved Program Construction (IIPC), a reasoning method that iteratively refines programmatic reasoning chains and combines execution feedback with the native Chain-of-thought abilities of the base LLM to maintain high-level contextual focus. IIPC surpasses competing approaches in the majority of reasoning benchmarks on multiple base LLMs. All code and implementations are released as open source.
comment: 9 pages, 7 figures, submitted to ACL ARR 2026, hyperlink to code repository provided in the abstract
Systems and Control (EESS)
Robust and Gain-Scheduling ${\cal H}_2$ Control Techniques for LFT Uncertain and Parameter-Dependent Systems
This paper addresses the robust ${\cal H}_2$ synthesis problem for linear fractional transformation (LFT) systems subject to structured uncertainty (parameter) and white-noise disturbances. By introducing an intermediate matrix variable, we derive convex synthesis conditions in terms of linear matrix inequalities (LMIs) that enable both robust and gain-scheduled controller design for parameter-dependent systems. The proposed framework preserves the classical white-noise and impulse-response interpretation of the ${\cal H}_2$ criterion while providing certified robustness guarantees, thereby extending optimal ${\cal H}_2$ control beyond the linear time-invariant setting. Numerical and application examples demonstrate that the resulting robust ${\cal H}_2$ controllers achieve significantly reduced conservatism and improved disturbance rejection compared with conventional robust ${\cal H}_\infty$-based designs.
Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling
In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estimating the poses and shapes of objects in the scene and importing the resulting estimates into a simulator, small errors might translate to implausible configurations including object interpenetration or unstable equilibrium. This makes it difficult to predict the dynamic behavior of the scene using a digital twin, an important step in simulation-based planning and control of contact-rich behaviors. In this paper, we posit that object pose and shape estimation requires reasoning holistically over the scene (instead of reasoning about each object in isolation), accounting for object interactions and physical plausibility. Towards this goal, our first contribution is Picasso, a physics-constrained reconstruction pipeline that builds multi-object scene reconstructions by considering geometry, non-penetration, and physics. Picasso relies on a fast rejection sampling method that reasons over multi-object interactions, leveraging an inferred object contact graph to guide samples. Second, we propose the Picasso dataset, a collection of 10 contact-rich real-world scenes with ground truth annotations, as well as a metric to quantify physical plausibility, which we open-source as part of our benchmark. Finally, we provide an extensive evaluation of Picasso on our newly introduced dataset and on the YCB-V dataset, and show it largely outperforms the state of the art while providing reconstructions that are both physically plausible and more aligned with human intuition.
comment: 15 pages
Trustworthiness Layer for Foundation Models in Power Systems: Application for N-k Contingency Assessment
This work introduces for the first time, to our knowledge, a trustworthiness layer for foundation models in power systems. Using stratified conformal prediction, we devise adaptive, statistically valid confidence bounds for each output of a foundation model. For regression, this allows users to obtain an uncertainty estimate for each output; for screening, it supports conservative decisions that minimize false negatives. We demonstrate our method by enhancing GridFM, the first open-source Foundation Model for power systems, with statistically valid prediction intervals instead of heuristic error margins. We apply it for N-k contingency assessment, a combinatorial NP-Hard problem. We show that trustworthy GridFM can offer richer and more accurate information than DC Power Flow, having 2x-3x higher precision, while running up to 18x faster than AC Power Flow for systems up to 118 buses. Moving a step further, we also examine the ability of trustworthy GridFM to generalize to unseen high-order contingencies: through a rigorous analysis, we assess how a model trained on N-1 or N-2 outages extrapolates to unseen contingencies up to N-5.
Leader-following Consensus over Jointly Connected Switching Networks is Achievable for Exponentially Unstable Linear Systems
The leader-following consensus problem for general linear multi-agent systems over jointly connected switching networks has been a challenging problem and the solvability of the problem has been limited to the class of linear multi-agent systems whose system matrix is marginally stable. This condition is restrictive since it even excludes the most commonly used double-integrator system. This paper presents a breakthrough by demonstrating that leader-following exponential consensus is achievable for general linear multi-agent systems over jointly connected switching networks, even when the system matrix is exponentially unstable. The degree of instability can be explicitly characterized by two key quantities that arise from the jointly connected condition on a switching graph. By exploiting duality, we further show that the output-based distributed observer design problem for a general leader system is solvable over jointly connected switching networks, even when the system matrix is exponentially unstable. This is also in sharp contrast to the existing distributed observers, which rely on the assumption that the leader system is marginally stable.
End-to-End Secure Connection Probability in MultiLayer Networks with Heterogeneous Rician Fading
Ensuring physical-layer security in non-terrestrial networks (NTNs) is challenging due to their global coverage and multi-hop relaying across heterogeneous network layers, where the locations and channels of potential eavesdroppers are typically unknown. In this work, we derive a tractable closedform expression of the end-to-end secure connection probability (SCP) of multi-hop relay routes under heterogeneous Rician fading. The resulting formula shares the same functional form as prior Rayleigh-based approximations but for the coefficients, thereby providing analytical support for the effectiveness of heuristic posterior coefficient calibration adopted in prior work. Numerical experiments under various conditions show that the proposed scheme estimates the SCP with an 1%p error in most cases; and doubles the accuracy compared with the conventional scheme even in the worst case. As a case study, we apply the proposed framework to real-world space-air-groundsea integrated network dataset, showing that the derived SCP accurately captures observed security trends in practical settings.
Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty
Large language models (LLMs) offer significant potential for intelligent mobile services but are computationally intensive for resource-constrained devices. Mobile edge computing (MEC) allows such devices to offload inference tasks to edge servers (ESs), yet introduces latency due to communication and serverside queuing, especially in multi-user environments. In this work, we propose an uncertainty-aware offloading framework that dynamically decides whether to perform inference locally or offload it to the ES, based on token-level uncertainty and resource constraints. We define a margin-based token-level uncertainty metric and demonstrate its correlation with model accuracy. Leveraging this metric, we design a greedy offloading algorithm (GOA) that minimizes delay while maintaining accuracy by prioritizing offloading for highuncertainty queries. Our experiments show that GOA consistently achieves a favorable trade-off, outperforming baseline strategies in both accuracy and latency across varying user densities, and operates with practical computation time. These results establish GOA as a scalable and effective solution for LLM inference in MEC environments.
comment: This paper has been accepted at 2025 IEEE Globecom Workshop: WS02-GAIMC: Mutual Facilitation of Generative Artificial Intelligence and Mobile Communications
Healthcare Facility Assignment Using Real-Time Length-of-Stay Predictions: Queuing-Theoretic and Simulation-driven Machine Learning Approaches
Longer stays at healthcare facilities, driven by uncertain patient load, inefficient patient flow, and lack of real-time information about medical care, pose significant challenges for patients and healthcare providers. Providing patients with estimates of their expected real-time length of stay (RT-LOS), generated as a function of the operational state of the healthcare facility at their anticipated time of arrival (as opposed to estimates of average LOS), can help them make informed decisions regarding which facility to visit within a network. In this study, we develop a healthcare facility assignment (HFA) algorithm that assigns healthcare facilities to patients using RT-LOS predictions at facilities within the network of interest. We describe the generation of RT-LOS predictions via two methodologies: (a) an analytical queuing-theoretic approach, and (b) a hybrid simulation-driven machine learning approach. Because RT-LOS predictors are highly specific to the queuing system in question, we illustrate the development of RT-LOS predictors using both approaches by considering the outpatient experience at primary health centers. Via computational experiments, we compare outcomes from the implementation of the RT-HFA algorithm with both RT-LOS predictors to the case where patients visit the facility of their choice. Computational experiments also indicated that the RT-HFA algorithm substantially reduced patient wait times and LOS at congested facilities and led to more equitable utilization of medical resources at facilities across the network. Finally, we show numerically that the effectiveness of the RT-HFA algorithm in improving outcomes is contingent on the level of compliance with the assignment decision.
Optimized Deployment of HAPS Systems for GNSS Localization Enhancement in Urban Environments
While high altitude platform stations (HAPS) have been primarily explored as network infrastructure for communication services, their advantageous characteristics also make them promising candidates for augmenting GNSS localization. This paper proposes a metaheuristic framework to jointly optimize the number and placement of HAPS for GNSS enhancement in dense urban environments, considering practical constraints such as elevation masks, altitude limits, and ray-traced visibility from 3D city models. The problem is highly nonconvex due to the discrete HAPS count and the environment-dependent 3D Cramer-Rao lower bound (CRLB). To address this, we develop a tailored version of the adaptive special-crowding distance non-dominated sorting genetic algorithm II (ASDNSGA-II). Simulations show the method successfully identifies the minimum number of HAPS needed to satisfy a CRLB threshold and selects the configuration with the lowest CRLB within that minimum, offering a cost-effective and scalable solution for future HAPS-aided positioning systems.
Convergence Analysis of Continuous-Time Distributed Stochastic Gradient Algorithms
In this paper, we propose a new framework to study distributed optimization problems with stochastic gradients by employing a multi-agent system with continuous-time dynamics. Here the goal of the agents is to cooperatively minimize the sum of convex objective functions. When making decisions, each agent only has access to a stochastic gradient of its own objective function rather than the real gradient, and can exchange local state information with its immediate neighbors via a time-varying directed graph. Particularly, the stochasticity is depicted by the Brownian motion. To handle this problem, we propose a continuous-time distributed stochastic gradient algorithm based on the consensus algorithm and the gradient descent strategy. Under mild assumptions on the connectivity of the graph and objective functions, using convex analysis theory, the Lyapunov theory and Ito formula, we prove that the states of the agents asymptotically reach a common minimizer in expectation. Finally, a simulation example is worked out to demonstrate the effectiveness of our theoretical results.
Urban Congestion Patterns under High Electric Vehicle Penetration: A Case Study of 10 U.S. Cities
With the global energy transition and the rapid penetration of electric vehicles (EVs), the widening travel cost gap between EVs and gasoline vehicles (GVs) increasingly affects commuters' route choices and may reshape urban congestion patterns. Existing research remains in its preliminary exploratory phase. On the one hand, multi-class models do not account for fixed user class scenarios, which may not align with actual commuters; on the other hand, there is a lack of systematic quantitative analysis based on real-world complex road networks across multiple cities. As a result, the congestion effects induced by heterogeneous GV-EV cost structures may be mischaracterized or substantially underestimated. To address these limitations, this paper proposes a multi-user equilibrium (MUE) assignment model for mixed GV-EV traffic, constructs a dual algorithm with convergence guarantees, and designs multi-dimensional evaluation metrics for congestion patterns. Using 10 representative U.S. cities as a case study, this research explores the evolution trends of traffic congestion under different EV penetration scenarios based on real city-level road networks and block-level commuter origin-destination (OD) demand. The results show that full EV penetration reduces average system travel time by 2.27%--10.78% across the 10 cities, with New Orleans achieving the largest reduction (10.78%) and San Francisco the smallest (2.27%), but the effectiveness of alleviating congestion exhibits urban heterogeneity. Moreover, for cities with sufficient network redundancy, benefits are primarily concentrated during the low to medium EV penetration stage (0-0.5), though cities with topological constraints (e.g., San Francisco) show more limited improvements throughout all penetration levels. This paper can provide a foundation for formulating differentiated urban planning and congestion management policies.
Dynamic Load Model for Data Centers with Pattern-Consistent Calibration
The rapid growth of data centers has made large electronic load (LEL) modeling increasingly important for power system analysis. Such loads are characterized by fast workload-driven variability and protection-driven disconnection and reconnection behavior that are not captured by conventional load models. Existing data center load modeling includes physics-based approaches, which provide interpretable structure for grid simulation, and data-driven approaches, which capture empirical workload variability from data. However, physics-based models are typically uncalibrated to facility-level operation, while trajectory alignment in data-driven methods often leads to overfitting and unrealistic dynamic behavior. To resolve these limitations, we design the framework to leverage both physics-based structure and data-driven adaptability. The physics-based structure is parameterized to enable data-driven pattern-consistent calibration from real operational data, supporting facility-level grid planning. We further show that trajectory-level alignment is limited for inherently stochastic data center loads. Therefore, we design the calibration to align temporal and statistical patterns using temporal contrastive learning (TCL). This calibration is performed locally at the facility, and only calibrated parameters are shared with utilities, preserving data privacy. The proposed load model is calibrated by real-world operational load data from the MIT Supercloud, ASU Sol, Blue Waters, and ASHRAE datasets. Then it is integrated into the ANDES platform and evaluated on the IEEE 39-bus, NPCC 140-bus, and WECC 179-bus systems. We find that interactions among LELs can fundamentally alter post-disturbance recovery behavior, producing compound disconnection-reconnection dynamics and delayed stabilization that are not captured by uncalibrated load models.
comment: 10 pages, 13 figures
Joint Optimization of Pattern, Headway, and Fleet Size of Multiple Urban Transit Lines with Perceived Headway Consideration and Passenger Flow Allocation
This study addresses the urban transit pattern design problem, optimizing stop sequences, headways, and fleet sizes across multiple routes and periods simultaneously to minimize user costs (composed of riding, waiting, and transfer times) under operational constraints (e.g., vehicle capacity and fleet size). A destination-labeled multi-commodity network flow (MCNF) formulation is developed to solve the problem at a large scale more efficiently compared to the previous literature. The model allows for flexible pattern options without relying on pre-defined candidate sets and simultaneously considers multiple operational strategies such as express/local services, short-turning, and deadheading. It evaluates perceived headways of joint patterns for passengers, assigns passenger flows to each pattern accordingly, and allows transfers across patterns in different directions. The mixed-integer linear programming (MILP) model is demonstrated with a city-sized network of metro lines in Chicago, IL, USA, achieving near-optimal solutions in hours. The total weighted journey times are reduced by 0.61% and 5.76% under single-route and multi-period multi-route scenarios respectively. The model provides transit agencies with an efficient tool for comprehensive service design and resource allocation, improving service quality and resource utilization without additional operational costs.
comment: 25 pages, 6 figures, a previous version accepted for presentation in the 104th Transportation Research Board Annual Meeting in Washington, D.C. in January 2025
Sensor-Noise Mitigation in Extremum Seeking Control Using Adaptive Numerical Differentiation
Extremum-seeking control (ESC) is widely used to optimize performance when the system dynamics are uncertain. However, sensitivity to sensor noise is a crucial issue in ESC implementation due to the use of high-pass filters or gradient estimators. To reduce the sensitivity of ESC to noise, this paper investigates the use of the recently developed adaptive input and state estimation (AISE) technique for numerical differentiation. In particular, this paper develops extremum-seeking control with adaptive input and state estimation (ESC/AISE), where AISE replaces the high-pass filter of ESC to improve performance under sensor noise. The effectiveness of ESC/AISE is illustrated via numerical examples.
comment: 8 pages, 12 figures. Accepted to American Control Conference (ACC) 2026
Rapid and Accurate Changepoint Detection of Power System Forced Oscillations
This paper describes a new approach for using changepoint detection (CPD) to estimate the starting and stopping times of a forced oscillation (FO) in measured power system data. As with a previous application of CPD to this problem, the pruned exact linear time (PELT) algorithm is used. However, instead of allowing PELT to automatically tune its penalty parameter, a method of manually providing it is presented that dramatically reduces computation time without sacrificing accuracy. Additionally, the new algorithm requires fewer input parameters and provides a formal, data-driven approach to setting the minimum FO segment length to consider as troublesome for an electromechanical mode meter. A low-order ARMAX representation of the minniWECC model is used to test the approach, where a 98\% reduction in computation time is enjoyed with high estimation accuracy.
comment: Currently under revise and resubmit process for the proceedings of the 2026 IEEE Power and Energy Society General Meeting (PESGM26)
Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning
We prove a non-asymptotic central limit theorem for vector-valued martingale differences using Stein's method, and use Poisson's equation to extend the result to functions of Markov Chains. We then show that these results can be applied to establish a non-asymptotic central limit theorem for Temporal Difference (TD) learning with averaging.
An Ion-Intercalation Memristor for Enabling Full Parallel Writing in Crossbar Networks
Crossbar architectures have long been seen as a promising foundation for in-memory computing, using memristor arrays for high-density, energy-efficient analog computation. However, this conventional architecture suffers from a fundamental limitation: the inability to perform parallel write operations due to the sneak path problem. This arises from the structural overlap of read and write paths, forcing sequential or semi-parallel updates and severely limiting scalability. To address this, we introduce a new memristor design that decouples read and write operations at the device level. This design enables orthogonal conductive paths, and employs a reversible ion doping mechanism, inspired by lithium-ion battery principles, to modulate resistance states independently of computation. Fabricated devices exhibit near-ideal memristive characteristics and stable performance under isolated read/write conditions.
Power Reserve Procurement Considering Dependent Random Variables with PCE
This paper presents an approach for the modelling of dependent random variables using generalised polynomial chaos. This allows to write chance-constrained optimization problems with respect to a joint distribution modelling dependencies between different stochastic inputs. Arbitrary dependencies are modelled by using Gaussian copulas to construct the joint distribution. The paper exploits the problem structure and develops suitable transformations to ensure tractability. The proposed method is applied to a probabilistic power reserve procurement problem. The effectiveness of the method to capture dependencies is shown by comparing the approach with a standard approach considering independent random variables.
SPOT: Spatio-Temporal Obstacle-free Trajectory Planning for UAVs in an Unknown Dynamic Environment ICRA 2026
We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
comment: Accepted for publication at ICRA 2026
Probabilistic Control Barrier Functions: Safety in Probability for Discrete-Time Stochastic Systems
Control systems operating in the real world face countless sources of unpredictable uncertainties. These random disturbances can render deterministic guarantees inapplicable and cause catastrophic safety failures. To overcome this, this paper proposes a method for designing safe controllers for discrete-time stochastic systems that retain probabilistic guarantees of safety. To do this we modify the traditional notion of a control barrier function (CBF) to explicitly account for these stochastic uncertainties and call these new modified functions probabilistic CBFs. We show that probabilistic CBFs can be used to design controllers that guarantee safety over a finite number of time steps with a prescribed probability. Next, by leveraging various uncertainty quantification methods, such as concentration inequalities and the scenario approach, we provide a variety of sufficient conditions that result in computationally tractable controllers with tunable probabilistic guarantees across a plethora of practical scenarios. Finally, we showcase the applicability of our results in simulation and hardware for the control of a quadruped robot.
Prescriptive Artificial Intelligence: A Formal Paradigm for Auditing Human Decisions Under Uncertainty AAAI
We formalize Prescriptive Artificial Intelligence as a distinct paradigm for human-AI decision collaboration in high-stakes environments. Unlike predictive systems optimized for outcome accuracy, prescriptive systems are designed to recommend and audit human decisions under uncertainty, providing normative guidance while preserving human agency and accountability. We introduce four domain-independent axioms characterizing prescriptive systems and prove fundamental separation results. Central among these is the Imitation Incompleteness theorem, which establishes that supervised learning from historical decisions cannot correct systematic decision biases in the absence of external normative signals. Consequently, performance in decision imitation is bounded by a structural bias term epsilon_bias rather than the statistical learning rate O(1/sqrt(n)). This result formalizes the empirically observed accuracy ceiling in human decision imitation tasks and provides a principled criterion for when automation should be replaced by epistemic auditing. We demonstrate the computational realizability of the framework through an interpretable fuzzy inference system, applied as a stress test in elite soccer decision-making, where it reveals systematic decision latency and risk states obscured by outcome and status quo biases. The proposed framework establishes Prescriptive AI as a general, realizable class of decision-support systems applicable across safety-critical domains in which interpretability, contestability, and normative alignment are essential.
comment: Preprint; suitable for AI, decision sciences, and prescriptive analytics. Short versions published in Wharton Sports Analytics Journal Fall 2025 (AI Feature Spotlight) and accepted to AAAI Bridge on LM Reasoning 2026
Robotics
Vision and language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning
Vision-language models (VLMs) have recently emerged as powerful representation learning systems that align visual observations with natural language concepts, offering new opportunities for semantic reasoning in safety-critical autonomous driving. This paper investigates how vision-language representations support driving scene safety assessment and decision-making when integrated into perception, prediction, and planning pipelines. We study three complementary system-level use cases. First, we introduce a lightweight, category-agnostic hazard screening approach leveraging CLIP-based image-text similarity to produce a low-latency semantic hazard signal. This enables robust detection of diverse and out-of-distribution road hazards without explicit object detection or visual question answering. Second, we examine the integration of scene-level vision-language embeddings into a transformer-based trajectory planning framework using the Waymo Open Dataset. Our results show that naively conditioning planners on global embeddings does not improve trajectory accuracy, highlighting the importance of representation-task alignment and motivating the development of task-informed extraction methods for safety-critical planning. Third, we investigate natural language as an explicit behavioral constraint on motion planning using the doScenes dataset. In this setting, passenger-style instructions grounded in visual scene elements suppress rare but severe planning failures and improve safety-aligned behavior in ambiguous scenarios. Taken together, these findings demonstrate that vision-language representations hold significant promise for autonomous driving safety when used to express semantic risk, intent, and behavioral constraints. Realizing this potential is fundamentally an engineering problem requiring careful system design and structured grounding rather than direct feature injection.
Affine Transformable Unmanned Ground Vehicle
This paper develops the proof of concept for a novel affine transformable unmanned ground vehicle (ATUGV) with the capability of safe and aggressive deformation while carrying multiple payloads. The ATUGV is a multi-body system with mobile robots that can be used to power the ATUGV morphable motion, powered cells to enclose the mobile robots, unpowered cells to contain payloads, and a deformable structure to integrate cells through bars and joints. The objective is that all powered and unpowered cells motion can safely track a desired affine transformation, where an affine transformation can be decomposed into translation, rigid body rotation, and deformation. To this end, the paper first uses a deep neural network to structure cell interconnection in such a way that every cell can freely move over the deformation plane, and the entire structure can reconfigurably deform to track a desired affine transformation. Then, the mobile robots, contained by the powered cells and stepper motors, regulating the connections of the powered and unpowered cells, design the proper controls so that all cells safely track the desired affine transformation. The functionality of the proposed ATUGV is validated through hardware experimentation and simulation.
Looking and Listening Inside and Outside: Multimodal Artificial Intelligence Systems for Driver Safety Assessment and Intelligent Vehicle Decision-Making
The looking-in-looking-out (LILO) framework has enabled intelligent vehicle applications that understand both the outside scene and the driver state to improve safety outcomes, with examples in smart airbag deployment, takeover time prediction in autonomous control transitions, and driver attention monitoring. In this research, we propose an augmentation to this framework, making a case for the audio modality as an additional source of information to understand the driver, and in the evolving autonomy landscape, also the passengers and those outside the vehicle. We expand LILO by incorporating audio signals, forming the looking-and-listening inside-and-outside (L-LIO) framework to enhance driver state assessment and environment understanding through multimodal sensor fusion. We evaluate three example cases where audio enhances vehicle safety: supervised learning on driver speech audio to classify potential impairment states (e.g., intoxication), collection and analysis of passenger natural language instructions (e.g., "turn after that red building") to motivate how spoken language can interface with planning systems through audio-aligned instruction data, and limitations of vision-only systems where audio may disambiguate the guidance and gestures of external agents. Datasets include custom-collected in-vehicle and external audio samples in real-world environments. Pilot findings show that audio yields safety-relevant insights, particularly in nuanced or context-rich scenarios where sound is critical to safe decision-making or visual signals alone are insufficient. Challenges include ambient noise interference, privacy considerations, and robustness across human subjects, motivating further work on reliability in dynamic real-world contexts. L-LIO augments driver and scene understanding through multimodal fusion of audio and visual sensing, offering new paths for safety intervention.
LCLA: Language-Conditioned Latent Alignment for Vision-Language Navigation
We propose LCLA (Language-Conditioned Latent Alignment), a framework for vision-language navigation that learns modular perception-action interfaces by aligning sensory observations to a latent representation of an expert policy. The expert is first trained with privileged state information, inducing a latent space sufficient for control, after which its latent interface and action head are frozen. A lightweight adapter is then trained to map raw visual-language observations, via a frozen vision-language model, into the expert's latent space, reducing the problem of visuomotor learning to supervised latent alignment rather than end-to-end policy optimization. This decoupling enforces a stable contract between perception and control, enabling expert behavior to be reused across sensing modalities and environmental variations. We instantiate LCLA and evaluate it on a vision-language indoor navigation task, where aligned latent spaces yield strong in-distribution performance and robust zero-shot generalization to unseen environments, lighting conditions, and viewpoints while remaining lightweight at inference time.
"Meet My Sidekick!": Effects of Separate Identities and Control of a Single Robot in HRI
The presentation of a robot's capability and identity directly influences a human collaborator's perception and implicit trust in the robot. Unlike humans, a physical robot can simultaneously present different identities and have them reside and control different parts of the robot. This paper presents a novel study that investigates how users perceive a robot where different robot control domains (head and gripper) are presented as independent robots. We conducted a mixed design study where participants experienced one of three presentations: a single robot, two agents with shared full control (co-embodiment), or two agents with split control across robot control domains (split-embodiment). Participants underwent three distinct tasks -- a mundane data entry task where the robot provides motivational support, an individual sorting task with isolated robot failures, and a collaborative arrangement task where the robot causes a failure that directly affects the human participant. Participants perceived the robot as residing in the different control domains and were able to associate robot failure with different identities. This work signals how future robots can leverage different embodiment configurations to obtain the benefit of multiple robots within a single body.
Differentiate-and-Inject: Enhancing VLAs via Functional Differentiation Induced by In-Parameter Structural Reasoning
As robots are expected to perform increasingly diverse tasks, they must understand not only low-level actions but also the higher-level structure that determines how a task should unfold. Existing vision-language-action (VLA) models struggle with this form of task-level reasoning. They either depend on prompt-based in-context decomposition, which is unstable and sensitive to linguistic variations, or end-to-end long-horizon training, which requires large-scale demonstrations and entangles task-level reasoning with low-level control. We present in-parameter structured task reasoning (iSTAR), a framework for enhancing VLA models via functional differentiation induced by in-parameter structural reasoning. Instead of treating VLAs as monolithic policies, iSTAR embeds task-level semantic structure directly into model parameters, enabling differentiated task-level inference without external planners or handcrafted prompt inputs. This injected structure takes the form of implicit dynamic scene-graph knowledge that captures object relations, subtask semantics, and task-level dependencies in parameter space. Across diverse manipulation benchmarks, iSTAR achieves more reliable task decompositions and higher success rates than both in-context and end-to-end VLA baselines, demonstrating the effectiveness of parameter-space structural reasoning for functional differentiation and improved generalization across task variations.
VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots ICRA
Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and affective human-robot interaction. Existing progress in humanoid facial expression imitation remains limited, often failing to achieve either real-time performance or realistic expressiveness due to offline video-based inference designs and insufficient ability to capture and transfer subtle expression details. To address these limitations, we present VividFace, a real-time and realistic facial expression shadowing system for humanoid robots. An optimized imitation framework X2CNet++ enhances expressiveness by fine-tuning the human-to-humanoid facial motion transfer module and introducing a feature-adaptation training strategy for better alignment across different image sources. Real-time shadowing is further enabled by a video-stream-compatible inference pipeline and a streamlined workflow based on asynchronous I/O for efficient communication across devices. VividFace produces vivid humanoid faces by mimicking human facial expressions within 0.05 seconds, while generalizing across diverse facial configurations. Extensive real-world demonstrations validate its practical utility. Videos are available at: https://lipzh5.github.io/VividFace/.
comment: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)
TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control
Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly driven either by predefined motion trajectories, which offer limited flexibility when user intent changes, or by continuous human teleoperation, which requires constant human involvement and limits autonomy. This work addresses the problem of how to drive a universal humanoid controller in a real-time and interactive manner. We present TextOp, a real-time text-driven humanoid motion generation and control framework that supports streaming language commands and on-the-fly instruction modification during execution. TextOp adopts a two-level architecture in which a high-level autoregressive motion diffusion model continuously generates short-horizon kinematic trajectories conditioned on the current text input, while a low-level motion tracking policy executes these trajectories on a physical humanoid robot. By bridging interactive motion generation with robust whole-body control, TextOp unlocks free-form intent expression and enables smooth transitions across multiple challenging behaviors such as dancing and jumping, within a single continuous motion execution. Extensive real-robot experiments and offline evaluations demonstrate instant responsiveness, smooth whole-body motion, and precise control. The project page and the open-source code are available at https://text-op.github.io/
comment: Project Page: https://text-op.github.io/
Bridging Speech, Emotion, and Motion: a VLM-based Multimodal Edge-deployable Framework for Humanoid Robots
Effective human-robot interaction requires emotionally rich multimodal expressions, yet most humanoid robots lack coordinated speech, facial expressions, and gestures. Meanwhile, real-world deployment demands on-device solutions that can operate autonomously without continuous cloud connectivity. To bridging \underline{\textit{S}}peech, \underline{\textit{E}}motion, and \underline{\textit{M}}otion, we present \textit{SeM$^2$}, a Vision Language Model-based framework that orchestrates emotionally coherent multimodal interactions through three key components: a multimodal perception module capturing user contextual cues, a Chain-of-Thought reasoning for response planning, and a novel Semantic-Sequence Aligning Mechanism (SSAM) that ensures precise temporal coordination between verbal content and physical expressions. We implement both cloud-based and \underline{\textit{e}}dge-deployed versions (\textit{SeM$^2_e$}), with the latter knowledge distilled to operate efficiently on edge hardware while maintaining 95\% of the relative performance. Comprehensive evaluations demonstrate that our approach significantly outperforms unimodal baselines in naturalness, emotional clarity, and modal coherence, advancing socially expressive humanoid robotics for diverse real-world environments.
Going with the Flow: Koopman Behavioral Models as Implicit Planners for Visuo-Motor Dexterity
There has been rapid and dramatic progress in robots' ability to learn complex visuo-motor manipulation skills from demonstrations, thanks in part to expressive policy classes that employ diffusion- and transformer-based backbones. However, these design choices require significant data and computational resources and remain far from reliable, particularly within the context of multi-fingered dexterous manipulation. Fundamentally, they model skills as reactive mappings and rely on fixed-horizon action chunking to mitigate jitter, creating a rigid trade-off between temporal coherence and reactivity. In this work, we introduce Unified Behavioral Models (UBMs), a framework that learns to represent dexterous skills as coupled dynamical systems that capture how visual features of the environment (visual flow) and proprioceptive states of the robot (action flow) co-evolve. By capturing such behavioral dynamics, UBMs can ensure temporal coherence by construction rather than by heuristic averaging. To operationalize these models, we propose Koopman-UBM, a first instantiation of UBMs that leverages Koopman Operator theory to effectively learn a unified representation in which the joint flow of latent visual and proprioceptive features is governed by a structured linear system. We demonstrate that Koopman-UBM can be viewed as an implicit planner: given an initial condition, it analytically computes the desired robot behavior while simultaneously ''imagining'' the resulting flow of visual features over the entire skill horizon. To enable reactivity and adaptation, we introduce an online replanning strategy in which the model acts as its own runtime monitor that automatically triggers replanning when predicted and observed visual flow diverge beyond a threshold. Across seven simulated tasks and two real-world tasks, we demonstrate that K-UBM matches or exceeds the performance of state-of-the-art baselines, while offering considerably faster inference, smooth execution, robustness to occlusions, and flexible replanning.
Haptically Experienced Animacy Facilitates Emotion Regulation: A Theory-Driven Investigation
Emotion regulation (ER) is essential to mental well-being but often difficult to access, especially in high-intensity moments or for individuals with clinical vulnerabilities. While existing technology-based ER tools offer value, they typically rely on self-reflection (e.g., emotion tracking, journaling) or co-regulation through verbal modalities (reminders, text-based conversational tools), which may not be accessible or effective when most needed. The biological role of the touch modality makes it an intriguing alternate pathway, but empirical evidence is limited and under-theorized. Building on our prior theoretical framework describing how a comforting haptic co-regulating adjunct (CHORA) can support ER, we developed a zoomorphic robot CHORA with looped biomimetic breathing and heartbeat behaviors. We evaluated its effects in a mixed-methods in-lab study (N=30), providing physiological, self-report, custom questionnaire, and retrospective interview data. Our findings demonstrate the regulatory effects of haptically experienced animacy, corroborate prior work, and validate CHORA's {theoretically grounded} potential to facilitate four ER strategies.
Trace-Focused Diffusion Policy for Multi-Modal Action Disambiguation in Long-Horizon Robotic Manipulation
Generative model-based policies have shown strong performance in imitation-based robotic manipulation by learning action distributions from demonstrations. However, in long-horizon tasks, visually similar observations often recur across execution stages while requiring distinct actions, which leads to ambiguous predictions when policies are conditioned only on instantaneous observations, termed multi-modal action ambiguity (MA2). To address this challenge, we propose the Trace-Focused Diffusion Policy (TF-DP), a simple yet effective diffusion-based framework that explicitly conditions action generation on the robot's execution history. TF-DP represents historical motion as an explicit execution trace and projects it into the visual observation space, providing stage-aware context when current observations alone are insufficient. In addition, the induced trace-focused field emphasizes task-relevant regions associated with historical motion, improving robustness to background visual disturbances. We evaluate TF-DP on real-world robotic manipulation tasks exhibiting pronounced multi-modal action ambiguity and visually cluttered conditions. Experimental results show that TF-DP improves temporal consistency and robustness, outperforming the vanilla diffusion policy by 80.56 percent on tasks with multi-modal action ambiguity and by 86.11 percent under visual disturbances, while maintaining inference efficiency with only a 6.4 percent runtime increase. These results demonstrate that execution-trace conditioning offers a scalable and principled approach for robust long-horizon robotic manipulation within a single policy.
UEREBot: Learning Safe Quadrupedal Locomotion under Unstructured Environments and High-Speed Dynamic Obstacles
Quadruped robots are increasingly deployed in unstructured environments. Safe locomotion in these settings requires long-horizon goal progress, passability over uneven terrain and static constraints, and collision avoidance against high-speed dynamic obstacles. A single system cannot fully satisfy all three objectives simultaneously: planning-based decisions can be too slow, while purely reactive decisions can sacrifice goal progress and passability. To resolve this conflict, we propose UEREBot (Unstructured-Environment Reflexive Evasion Robot), a hierarchical framework that separates slow planning from instantaneous reflexive evasion and coordinates them during execution. UEREBot formulates the task as a constrained optimal control problem blueprint. It adopts a spatial--temporal planner that provides reference guidance toward the goal and threat signals. It then uses a threat-aware handoff to fuse navigation and reflex actions into a nominal command, and a control barrier function shield as a final execution safeguard. We evaluate UEREBot in Isaac Lab simulation and deploy it on a Unitree Go2 quadruped equipped with onboard perception. Across diverse environments with complex static structure and high-speed dynamic obstacles, UEREBot achieves higher avoidance success and more stable locomotion while maintaining goal progress than representative baselines, demonstrating improved safety--progress trade-offs.
Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation
Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a standard approach, yet existing methods apply static fusion strategies uniformly across all conditions, allowing modality-specific noise to propagate throughout the network. Hence, we propose CLARITY that dynamically adapts its fusion strategy to the detected scene condition. Guided by vision-language model (VLM) priors, the network learns to modulate each modality's contribution based on the illumination state while leveraging object embeddings for segmentation, rather than applying a fixed fusion policy. We further introduce two mechanisms, i.e., one which preserves valid dark-object semantics that prior noise-suppression methods incorrectly discard, and a hierarchical decoder that enforces structural consistency across scales to sharpen boundaries on thin objects. Experiments on the MFNet dataset demonstrate that CLARITY establishes a new state-of-the-art (SOTA), achieving 62.3% mIoU and 77.5% mAcc.
Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions
This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are established to collect the expert demonstration data for improving efficiency. In such a system, we present a unified framework to address the general manipulation task problem. Specifically, the proposed method consists of two phases: i) In the first phase for pretraining, the policy is created in a behavior cloning (BC) manner, through leveraging the learning data from our AR-based remote human-robot interaction system; ii) In the second phase, a contrastive learning empowered reinforcement learning (RL) method is developed to obtain more efficient and robust policy than the BC, and thus a projection head is designed to accelerate the learning progress. An event-driven augmented reward is adopted for enhancing the safety. To validate the proposed method, both the physics simulations via PyBullet and real-world experiments are carried out. The results demonstrate that compared to the classic proximal policy optimization and soft actor-critic policies, our method not only significantly speeds up the inference, but also achieves much better performance in terms of the success rate for fulfilling the manipulation tasks. By conducting the ablation study, it is confirmed that the proposed RL with contrastive learning overcomes policy collapse. Supplementary demonstrations are available at https://cyberyyc.github.io/.
RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving
Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling poses critical challenges for real-time, safety-critical deployment. In this work, we present RAPiD, a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy while eliminating diffusion sampling. Using score-regularized policy optimization, we leverage the score function of a pre-trained diffusion planner as a behavior prior to regularize policy learning. To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller, providing dense, safety-focused supervision beyond conventional imitation learning. Evaluations demonstrate that RAPiD achieves competitive performance on closed-loop nuPlan scenarios with an 8x speedup over diffusion baselines, while achieving state-of-the-art generalization among learning-based planners on the interPlan benchmark. The official website of this work is: https://github.com/ruturajreddy/RAPiD.
Why Look at It at All?: Vision-Free Multifingered Blind Grasping Using Uniaxial Fingertip Force Sensing
Grasping under limited sensing remains a fundamental challenge for real-world robotic manipulation, as vision and high-resolution tactile sensors often introduce cost, fragility, and integration complexity. This work demonstrates that reliable multifingered grasping can be achieved under extremely minimal sensing by relying solely on uniaxial fingertip force feedback and joint proprioception, without vision or multi-axis/tactile sensing. To enable such blind grasping, we employ an efficient teacher-student training pipeline in which a reinforcement-learned teacher exploits privileged simulation-only observations to generate demonstrations for distilling a transformer-based student policy operating under partial observation. The student policy is trained to act using only sensing modalities available at real-world deployment. We validate the proposed approach on real hardware across 18 objects, including both in-distribution and out-of-distribution cases, achieving a 98.3~$\%$ overall grasp success rate. These results demonstrate strong robustness and generalization beyond the simulation training distribution, while significantly reducing sensing requirements for real-world grasping systems.
comment: Submitted to Journal (under review)
Action-to-Action Flow Matching
Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling from random Gaussian noise often requires multiple iterative steps to produce clean actions, leading to high inference latency that incurs a major bottleneck for real-time control. In this paper, we challenge the necessity of uninformed noise sampling and propose Action-to-Action flow matching (A2A), a novel policy paradigm that shifts from random sampling to initialization informed by the previous action. Unlike existing methods that treat proprioceptive action feedback as static conditions, A2A leverages historical proprioceptive sequences, embedding them into a high-dimensional latent space as the starting point for action generation. This design bypasses costly iterative denoising while effectively capturing the robot's physical dynamics and temporal continuity. Extensive experiments demonstrate that A2A exhibits high training efficiency, fast inference speed, and improved generalization. Notably, A2A enables high-quality action generation in as few as a single inference step (0.56 ms latency), and exhibits superior robustness to visual perturbations and enhanced generalization to unseen configurations. Lastly, we also extend A2A to video generation, demonstrating its broader versatility in temporal modeling. Project site: https://lorenzo-0-0.github.io/A2A_Flow_Matching.
comment: 18 pages, 18 figures
Parallel Simulation of Contact and Actuation for Soft Growing Robots
Soft growing robots, commonly referred to as vine robots, have demonstrated remarkable ability to interact safely and robustly with unstructured and dynamic environments. It is therefore natural to exploit contact with the environment for planning and design optimization tasks. Previous research has focused on planning under contact for passively deforming robots with pre-formed bends. However, adding active steering to these soft growing robots is necessary for successful navigation in more complex environments. To this end, we develop a unified modeling framework that integrates vine robot growth, bending, actuation, and obstacle contact. We extend the beam moment model to include the effects of actuation on kinematics under growth and then use these models to develop a fast parallel simulation framework. We validate our model and simulator with real robot experiments. To showcase the capabilities of our framework, we apply our model in a design optimization task to find designs for vine robots navigating through cluttered environments, identifying designs that minimize the number of required actuators by exploiting environmental contacts. We show the robustness of the designs to environmental and manufacturing uncertainties. Finally, we fabricate an optimized design and successfully deploy it in an obstacle-rich environment.
comment: 37 pages, 12 figures, 2 tables. To appear in Soft Robotics
GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
Recent advances in video world modeling have enabled large-scale generative models to simulate embodied environments with high visual fidelity, providing strong priors for prediction, planning, and control. Yet, despite their realism, these models often lack geometric grounding, limiting their use in navigation tasks that require spatial coherence and stability. We introduce Reinforcement Learning with World Grounding (RLWG), a self-supervised post-training framework that aligns pretrained world models with a physically verifiable structure through geometric and perceptual rewards. Analogous to reinforcement learning from verifiable feedback (RLVR) in language models, RLWG can use multiple rewards that measure pose cycle-consistency, depth reprojection, and temporal coherence. We instantiate this framework with GrndCtrl, a reward-aligned adaptation method based on Group Relative Policy Optimization (GRPO), yielding world models that maintain stable trajectories, consistent geometry, and reliable rollouts for embodied navigation. Like post-training alignment in large language models, GrndCtrl leverages verifiable rewards to bridge generative pretraining and grounded behavior, achieving superior spatial coherence and navigation stability over supervised fine-tuning in outdoor environments.
ManiVID-3D: Generalizable View-Invariant Reinforcement Learning for Robotic Manipulation via Disentangled 3D Representations
Deploying visual reinforcement learning (RL) policies in real-world manipulation is often hindered by camera viewpoint changes. A policy trained from a fixed front-facing camera may fail when the camera is shifted -- an unavoidable situation in real-world settings where sensor placement is hard to manage appropriately. Existing methods often rely on precise camera calibration or struggle with large perspective changes. To address these limitations, we propose ManiVID-3D, a novel 3D RL architecture designed for robotic manipulation, which learns view-invariant representations through self-supervised disentangled feature learning. The framework incorporates ViewNet, a lightweight yet effective module that automatically aligns point cloud observations from arbitrary viewpoints into a unified spatial coordinate system without the need for extrinsic calibration. Additionally, we develop an efficient GPU-accelerated batch rendering module capable of processing over 5000 frames per second, enabling large-scale training for 3D visual RL at unprecedented speeds. Extensive evaluation across 10 simulated and 5 real-world tasks demonstrates that our approach achieves a 40.6% higher success rate than state-of-the-art methods under viewpoint variations while using 80% fewer parameters. The system's robustness to severe perspective changes and strong sim-to-real performance highlight the effectiveness of learning geometrically consistent representations for scalable robotic manipulation in unstructured environments.
comment: Accepted to RA-L
Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning
With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep learning and reinforcement learning technologies to enhance picking efficiency and accuracy while reducing system failure rates. Through empirical analysis, we demonstrate the effectiveness of these technologies in improving robot picking performance and adaptability to complex environments. The results show that the integrated machine learning model significantly outperforms traditional methods, effectively addressing the challenges of peak order processing, reducing operational errors, and improving overall logistics efficiency. Additionally, by analyzing environmental factors, this study further optimizes system design to ensure efficient and stable operation under variable conditions. This research not only provides innovative solutions for logistics automation but also offers a theoretical and empirical foundation for future technological development and application.
comment: Published in IEEE Xplore
RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
Recent advances in vision-language-action (VLA) models have motivated the extension of their capabilities to embodied settings, where reinforcement learning (RL) offers a principled way to optimize task success through interaction. However, existing methods remain fragmented, lacking both a unified platform for fair comparison across architectures and algorithms and an efficient system design for scalable training. To address these challenges, we introduce RLinf-VLA, a unified and efficient framework for scalable RL training of VLA models. RLinf-VLA achieves unification by providing a unified interface that standardizes the integration of diverse VLA architectures, multiple RL algorithms, and heterogeneous simulators, enabling extensibility. To ensure efficiency, the system adopts a flexible resource allocation architecture for rendering, inference, and training workloads in RL pipelines. In particular, for GPU-parallelized simulators, RLinf-VLA introduces a hybrid fine-grained pipeline allocation strategy, yielding a 1.61x-1.88x training speedup. Using this unified system, models trained with RLinf-VLA demonstrate consistent performance improvements of approximately 20-85% across multiple simulation benchmarks, including LIBERO, ManiSkill, and RoboTwin. Furthermore, we distill a set of training practices for effective RL-based VLA training. We position RLinf-VLA as a foundational system to enable efficient, unified, and reproducible research in embodied intelligence.
comment: This is the technical report of the RLinf Team, focusing on the algorithm side. For the system-level design, please refer to arXiv:2509.15965 . The open-sourced code link: https://github.com/RLinf/RLinf
StreamVLA: Breaking the Reason-Act Cycle via Completion-State Gating
Long-horizon robotic manipulation requires bridging the gap between high-level planning (System 2) and low-level control (System 1). Current Vision-Language-Action (VLA) models often entangle these processes, performing redundant multimodal reasoning at every timestep, which leads to high latency and goal instability. To address this, we present StreamVLA, a dual-system architecture that unifies textual task decomposition, visual goal imagination, and continuous action generation within a single parameter-efficient backbone. We introduce a "Lock-and-Gated" mechanism to intelligently modulate computation: only when a sub-task transition is detected, the model triggers slow thinking to generate a textual instruction and imagines the specific visual completion state, rather than generic future frames. Crucially, this completion state serves as a time-invariant goal anchor, making the policy robust to execution speed variations. During steady execution, these high-level intents are locked to condition a Flow Matching action head, allowing the model to bypass expensive autoregressive decoding for 72% of timesteps. This hierarchical abstraction ensures sub-goal focus while significantly reducing inference latency. Extensive evaluations demonstrate that StreamVLA achieves state-of-the-art performance, with a 98.5% success rate on the LIBERO benchmark and robust recovery in real-world interference scenarios, achieving a 48% reduction in latency compared to full-reasoning baselines.
SPICE-HL3: Single-Photon, Inertial, and Stereo Camera dataset for Exploration of High-Latitude Lunar Landscapes
Exploring high-latitude lunar regions presents an extremely challenging visual environment for robots. The low sunlight elevation angle and minimal light scattering result in a visual field dominated by a high dynamic range featuring long, dynamic shadows. Reproducing these conditions on Earth requires sophisticated simulators and specialized facilities. We introduce a unique dataset recorded at the LunaLab from the SnT - University of Luxembourg, an indoor test facility designed to replicate the optical characteristics of multiple lunar latitudes. Our dataset includes images, inertial measurements, and wheel odometry data from robots navigating seven distinct trajectories under multiple illumination scenarios, simulating high-latitude lunar conditions from dawn to nighttime with and without the aid of headlights, resulting in 88 distinct sequences containing a total of 1.3M images. Data was captured using a stereo RGB-inertial sensor, a monocular monochrome camera, and for the first time, a novel single-photon avalanche diode (SPAD) camera. We recorded both static and dynamic image sequences, with robots navigating at slow (5 cm/s) and fast (50 cm/s) speeds. All data is calibrated, synchronized, and timestamped, providing a valuable resource for validating perception tasks from vision-based autonomous navigation to scientific imaging for future lunar missions targeting high-latitude regions or those intended for robots operating across perceptually degraded environments. The dataset and all supplementary material can be accessed from and found at https://github.com/spaceuma/spice-hl3.
comment: 12 pages, 7 figures, dataset
You Only Pose Once: A Minimalist's Detection Transformer for Monocular RGB Category-level 9D Multi-Object Pose Estimation ICRA 2026
Accurately recovering the full 9-DoF pose of unseen instances within specific categories from a single RGB image remains a core challenge for robotics and automation. Most existing solutions still rely on pseudo-depth, CAD models, or multi-stage cascades that separate 2D detection from pose estimation. Motivated by the need for a simpler, RGB-only alternative that learns directly at the category level, we revisit a longstanding question: Can object detection and 9-DoF pose estimation be unified with high performance, without any additional data? We show that they can with our method, YOPO, a single-stage, query-based framework that treats category-level 9-DoF estimation as a natural extension of 2D detection. YOPO augments a transformer detector with a lightweight pose head, a bounding-box-conditioned translation module, and a 6D-aware Hungarian matching cost. The model is trained end-to-end only with RGB images and category-level pose labels. Despite its minimalist design, YOPO sets a new state of the art on three benchmarks. On the REAL275 dataset, it achieves 79.6% $\rm{IoU}_{50}$ and 54.1% under the $10^\circ$$10{\rm{cm}}$ metric, surpassing prior RGB-only methods and closing much of the gap to RGB-D systems. The code, models, and additional qualitative results can be found on https://mikigom.github.io/YOPO-project-page.
comment: This paper has been accepted by IEEE ICRA 2026
Collaborative-Online-Learning-Enabled Distributionally Robust Motion Control for Multi-Robot Systems
This paper develops a novel COllaborative-Online-Learning (COOL)-enabled motion control framework for multi-robot systems to avoid collision amid randomly moving obstacles whose motion distributions are partially observable through decentralized data streams. To address the notable challenge of data acquisition due to occlusion, a COOL approach based on the Dirichlet process mixture model is proposed to efficiently extract motion distribution information by exchanging among robots selected learning structures. By leveraging the fine-grained local-moment information learned through COOL, a data-stream-driven ambiguity set for obstacle motion is constructed. We then introduce a novel ambiguity set propagation method, which theoretically admits the derivation of the ambiguity sets for obstacle positions over the entire prediction horizon by utilizing obstacle current positions and the ambiguity set for obstacle motion. Additionally, we develop a compression scheme with its safety guarantee to automatically adjust the complexity and granularity of the ambiguity set by aggregating basic ambiguity sets that are close in a measure space, thereby striking an attractive trade-off between control performance and computation time. Then the probabilistic collision-free trajectories are generated through distributionally robust optimization problems. The distributionally robust obstacle avoidance constraints based on the compressed ambiguity set are equivalently reformulated by deriving separating hyperplanes through tractable semi-definite programming. Finally, we establish the probabilistic collision avoidance guarantee and the long-term tracking performance guarantee for the proposed framework. The numerical simulations are used to demonstrate the efficacy and superiority of the proposed approach compared with state-of-the-art methods.
comment: 26 pages
RoboPaint: From Human Demonstration to Any Robot and Any View
Acquiring large-scale, high-fidelity robot demonstration data remains a critical bottleneck for scaling Vision-Language-Action (VLA) models in dexterous manipulation. We propose a Real-Sim-Real data collection and data editing pipeline that transforms human demonstrations into robot-executable, environment-specific training data without direct robot teleoperation. Standardized data collection rooms are built to capture multimodal human demonstrations (synchronized 3 RGB-D videos, 11 RGB videos, 29-DoF glove joint angles, and 14-channel tactile signals). Based on these human demonstrations, we introduce a tactile-aware retargeting method that maps human hand states to robot dex-hand states via geometry and force-guided optimization. Then the retargeted robot trajectories are rendered in a photorealistic Isaac Sim environment to build robot training data. Real world experiments have demonstrated: (1) The retargeted dex-hand trajectories achieve an 84\% success rate across 10 diverse object manipulation tasks. (2) VLA policies (Pi0.5) trained exclusively on our generated data achieve 80\% average success rate on three representative tasks, i.e., pick-and-place, pushing and pouring. To conclude, robot training data can be efficiently "painted" from human demonstrations using our real-sim-real data pipeline. We offer a scalable, cost-effective alternative to teleoperation with minimal performance loss for complex dexterous manipulation.
comment: 17 pages
CTBC: Contact-Triggered Blind Climbing for Wheeled Bipedal Robots with Instruction Learning and Reinforcement Learning
In recent years, wheeled bipedal robots have garnered significant attention due to their exceptional mobility on flat terrain. However, while stair climbing has been achieved in prior studies, these existing methods often suffer from a severe lack of versatility, making them difficult to adapt to varying hardware specifications or diverse complex terrains. To overcome these limitations, we propose a generalized Contact-Triggered Blind Climbing (CTBC) framework. Upon detecting wheel-obstacle contact, the framework triggers a leg-lifting motion integrated with a strongly-guided feedforward trajectory. This allows the robot to rapidly acquire agile climbing skills, significantly enhancing its capability to traverse unstructured environments. Distinct from previous approaches, CTBC demonstrates superior robustness and adaptability, having been validated across multiple wheeled bipedal platforms with different wheel radii and tire materials. Real-world experiments demonstrate that, relying solely on proprioceptive feedback, the proposed framework enables robots to achieve reliable and continuous climbing over obstacles well beyond their wheel radius.
A Software-Only Post-Processor for Indexed Rotary Machining on GRBL-Based CNCs
Affordable desktop CNC routers are common in education, prototyping, and makerspaces, but most lack a rotary axis, limiting fabrication of rotationally symmetric or multi-sided parts. Existing solutions often require hardware retrofits, alternative controllers, or commercial CAM software, raising cost and complexity. This work presents a software-only framework for indexed rotary machining on GRBL-based CNCs. A custom post-processor converts planar toolpaths into discrete rotary steps, executed through a browser-based interface. While not equivalent to continuous 4-axis machining, the method enables practical rotary-axis fabrication using only standard, off-the-shelf mechanics, without firmware modification. By reducing technical and financial barriers, the framework expands access to multi-axis machining in classrooms, makerspaces, and small workshops, supporting hands-on learning and rapid prototyping.
comment: 13 pages, 10 figures, Python post-processor source code and web interface included
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LoRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $ε$ sub-optimality and feasible policy after $O(ε^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($γ$) and by at least 6x for larger value of $γ$.
Multiagent Systems
Progressive Multi-Agent Reasoning for Biological Perturbation Prediction
Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training.
comment: 17 pages, 4 figures, 9 tables
NAAMSE: Framework for Evolutionary Security Evaluation of Agents
AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adversaries. We propose NAAMSE, an evolutionary framework that reframes agent security evaluation as a feedback-driven optimization problem. Our system employs a single autonomous agent that orchestrates a lifecycle of genetic prompt mutation, hierarchical corpus exploration, and asymmetric behavioral scoring. By using model responses as a fitness signal, the framework iteratively compounds effective attack strategies while simultaneously ensuring "benign-use correctness", preventing the degenerate security of blanket refusal. Our experiments on Gemini 2.5 Flash demonstrate that evolutionary mutation systematically amplifies vulnerabilities missed by one-shot methods, with controlled ablations revealing that the synergy between exploration and targeted mutation uncovers high-severity failure modes. We show that this adaptive approach provides a more realistic and scalable assessment of agent robustness in the face of evolving threats. The code for NAAMSE is open source and available at https://github.com/HASHIRU-AI/NAAMSE.
Aegis: Towards Governance, Integrity, and Security of AI Voice Agents
With the rapid advancement and adoption of Audio Large Language Models (ALLMs), voice agents are now being deployed in high-stakes domains such as banking, customer service, and IT support. However, their vulnerabilities to adversarial misuse still remain unexplored. While prior work has examined aspects of trustworthiness in ALLMs, such as harmful content generation and hallucination, systematic security evaluations of voice agents are still lacking. To address this gap, we propose Aegis, a red-teaming framework for the governance, integrity, and security of voice agents. Aegis models the realistic deployment pipeline of voice agents and designs structured adversarial scenarios of critical risks, including privacy leakage, privilege escalation, resource abuse, etc. We evaluate the framework through case studies in banking call centers, IT Support, and logistics. Our evaluation shows that while access controls mitigate data-level risks, voice agents remain vulnerable to behavioral attacks that cannot be addressed through access restrictions alone, even under strict access controls. We observe systematic differences across model families, with open-weight models exhibiting higher susceptibility, underscoring the need for layered defenses that combine access control, policy enforcement, and behavioral monitoring to secure next-generation voice agents.
StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
Human writers often begin their stories with an overarching mental scene, where they envision the interactions between characters and their environment. Inspired by this creative process, we propose a novel approach to long-form story generation, termed hybrid bottom-up long-form story generation, using multi-agent simulations. In our method, agents interact within a dynamic sandbox environment, where their behaviors and interactions with one another and the environment generate emergent events. These events form the foundation for the story, enabling organic character development and plot progression. Unlike traditional top-down approaches that impose rigid structures, our hybrid bottom-up approach allows for the natural unfolding of events, fostering more spontaneous and engaging storytelling. The system is capable of generating stories exceeding 10,000 words while maintaining coherence and consistency, addressing some of the key challenges faced by current story generation models. We achieve state-of-the-art performance across several metrics. This approach offers a scalable and innovative solution for creating dynamic, immersive long-form stories that evolve organically from agent-driven interactions.
comment: Project: https://storyboxproject.github.io
Bidirectional human-AI collaboration in brain tumour assessments improves both expert human and AI agent performance
The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an inverse approach is possible: AI benefiting from the support of an expert human agent. Here, we investigate both human-AI clinical partnership paradigms in the magnetic resonance imaging-guided characterisation of patients with brain tumours. We reveal that human-AI partnerships improve accuracy and metacognitive ability not only for radiologists supported by AI, but also for AI agents supported by radiologists. Moreover, the greatest patient benefit was evident with an AI agent supported by a human one. Synergistic improvements in agent accuracy, metacognitive performance, and inter-rater agreement suggest that AI can create more capable, confident, and consistent clinical agents, whether human or model-based. Our work suggests that the maximal value of AI in healthcare could emerge not from replacing human intelligence, but from AI agents that routinely leverage and amplify it.
comment: 38 pages, 6 figures, 7 supplementary figures
MAFE: Enabling Equitable Algorithm Design in Multi-Agent Multi-Stage Decision-Making Systems
Algorithmic fairness is often studied in static or single-agent settings, yet many real-world decision-making systems involve multiple interacting entities whose multi-stage actions jointly influence long-term outcomes. Existing fairness methods applied at isolated decision points frequently fail to mitigate disparities that accumulate over time. Although recent work has modeled fairness as a sequential decision-making problem, it typically assumes centralized agents or simplified dynamics, limiting its applicability to complex social systems. We introduce MAFE, a suite of Multi-Agent Fair Environments designed to simulate realistic, modular, and dynamic systems in which fairness emerges from the interplay of multiple agents. We demonstrate MAFEs across three domains -- loan processing, healthcare, and higher education -- that support heterogeneous agents, configurable interventions, and fairness metrics. The environments are open-source and compatible with standard multi-agent reinforcement learning (MARL) libraries, enabling reproducible evaluation of fairness-aware policies. Through extensive experiments on cooperative use cases, we demonstrate how MAFE facilitates the design of equitable multi-agent algorithms and reveals critical trade-offs between fairness, performance, and coordination. MAFE provides a foundation for systematic progress in dynamic, multi-agent fairness research.
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
Large robot fleets are now common in warehouses and other logistics settings, where small control gains translate into large operational impacts. In this article, we address task scheduling for lifelong Multi-Agent Pickup-and-Delivery (MAPD) and propose a hybrid method that couples learning-based global guidance with lightweight optimization. A graph neural network policy trained via reinforcement learning outputs a desired distribution of free agents over an aggregated warehouse graph. This signal is converted into region-to-region rebalancing through a minimum-cost flow, and finalized by small, local assignment problems, preserving accuracy while keeping per-step latency within a 1 s compute budget. On congested warehouse benchmarks from the League of Robot Runners (LoRR) with up to 500 agents, our approach improves throughput by up to 10% over the 2024 winning scheduler while maintaining real-time execution. The results indicate that coupling graph-structured learned guidance with tractable solvers reduces congestion and yields a practical, scalable blueprint for high-throughput scheduling in large fleets.
Systems and Control (EESS)
Quantifying resilience for distribution system customers with SALEDI
The impact of routine smaller outages on distribution system customers in terms of customer minutes interrupted can be tracked using conventional reliability indices. However, the customer minutes interrupted in large blackout events are extremely variable, and this makes it difficult to quantify the customer impact of these extreme events with resilience metrics. We solve this problem with the System Average Large Event Duration Index SALEDI that logarithmically transforms the customer minutes interrupted. We explain how this new resilience metric works, compare it with alternatives, quantify its statistical accuracy, and illustrate its practical use with standard outage data from five utilities.
Affine Transformable Unmanned Ground Vehicle
This paper develops the proof of concept for a novel affine transformable unmanned ground vehicle (ATUGV) with the capability of safe and aggressive deformation while carrying multiple payloads. The ATUGV is a multi-body system with mobile robots that can be used to power the ATUGV morphable motion, powered cells to enclose the mobile robots, unpowered cells to contain payloads, and a deformable structure to integrate cells through bars and joints. The objective is that all powered and unpowered cells motion can safely track a desired affine transformation, where an affine transformation can be decomposed into translation, rigid body rotation, and deformation. To this end, the paper first uses a deep neural network to structure cell interconnection in such a way that every cell can freely move over the deformation plane, and the entire structure can reconfigurably deform to track a desired affine transformation. Then, the mobile robots, contained by the powered cells and stepper motors, regulating the connections of the powered and unpowered cells, design the proper controls so that all cells safely track the desired affine transformation. The functionality of the proposed ATUGV is validated through hardware experimentation and simulation.
$\partial$CBDs: Differentiable Causal Block Diagrams
Modern cyber-physical systems (CPS) integrate physics, computation, and learning, demanding modeling frameworks that are simultaneously composable, learnable, and verifiable. Yet existing approaches treat these goals in isolation: causal block diagrams (CBDs) support modular system interconnections but lack differentiability for learning; differentiable programming (DP) enables end-to-end gradient-based optimization but provides limited correctness guarantees; while contract-based verification frameworks remain largely disconnected from data-driven model refinement. To address these limitations, we introduce differentiable causal block diagrams ($\partial$CBDs), a unifying formalism that integrates these three perspectives. Our approach (i) retains the compositional structure and execution semantics of CBDs, (ii) incorporates assume--guarantee (A--G) contracts for modular correctness reasoning, and (iii) introduces residual-based contracts as differentiable, trajectory-level certificates compatible with automatic differentiation (AD), enabling gradient-based optimization and learning. Together, these elements enable a scalable, verifiable, and trainable modeling pipeline that preserves causality and modularity while supporting data-, physics-, and constraint-informed optimization for CPS.
NOMA-Assisted Multi-BS MEC Networks for Delay-Sensitive and Computation-Intensive IoT Applications
The burgeoning and ubiquitous deployment of the Internet of Things (IoT) landscape struggles with ultra-low latency demands for computation-intensive tasks in massive connectivity scenarios. In this paper, we propose an innovative uplink non-orthogonal multiple access (NOMA)-assisted multi-base station (BS) mobile edge computing (BS-MEC) network tailored for massive IoT connectivity. To fulfill the quality-of-service (QoS) requirements of delay-sensitive and computation-intensive IoT applications, we formulate a joint task offloading, user grouping, and power allocation optimization problem with the overarching objective of minimizing the system's total delay, aiming to address issues of unbalanced subchannel access, inter-group interference, computational load disparities, and device heterogeneity. To effectively tackle this problem, we first reformulate task offloading and user grouping into a non-cooperative game model and propose an exact potential game-based joint decision-making (EPG-JDM) algorithm, which dynamically selects optimal task offloading and subchannel access decisions for each IoT device based on its channel conditions, thereby achieving the Nash Equilibrium. Then, we propose a majorization-minimization (MM)-based power allocation algorithm, which transforms the original subproblem into a tractable convex optimization paradigm. Extensive simulation experiments demonstrate that our proposed EPG-JDM algorithm significantly outperforms state-of-the-art decision-making algorithms and classic heuristic algorithms, yielding performance improvements of up to 19.3% and 14.7% in terms of total delay and power consumption, respectively.
comment: 16 pages, 8 Figures, submitted to IEEE journal for potential publication
In-Context System Identification for Nonlinear Dynamics Using Large Language Models
Sparse Identification of Nonlinear Dynamics (SINDy) is a powerful method for discovering parsimonious governing equations from data, but it often requires expert tuning of candidate libraries. We propose an LLM-aided SINDy pipeline that iteratively refines candidate equations using a large language model (LLM) in the loop through in-context learning. The pipeline begins with a baseline SINDy model fit using an adaptive library and then enters a LLM-guided refinement cycle. At each iteration, the current best equations, error metrics, and domain-specific constraints are summarized in a prompt to the LLM, which suggests new equation structures. These candidate equations are parsed against a defined symbolic form and evaluated on training and test data. The pipeline uses simulation-based error as a primary metric, but also assesses structural similarity to ground truth, including matching functional forms, key terms, couplings, qualitative behavior. An iterative stopping criterion ends refinement early if test error falls below a threshold (NRMSE < 0.1) or if a maximum of 10 iterations is reached. Finally, the best model is selected, and we evaluate this LLM-aided SINDy on 63 dynamical system datasets (ODEBench) and march leuba model for boiling nuclear reactor. The results are compared against classical SINDy and show the LLM-loop consistently improves symbolic recovery with higher equation similarity to ground truth and lower test RMSE than baseline SINDy for cases with complex dynamics. This work demonstrates that an LLM can effectively guide SINDy's search through equation space, integrating data-driven error feedback with domain-inspired symbolic reasoning to discover governing equations that are not only accurate but also structurally interpretable.
comment: 6 pages, 5 figures, submitted to The 10th IEEE Conference on Control Technology and Applications (CCTA) 2026
Meta-Reinforcement Learning for Robust and Non-greedy Control Barrier Functions in Spacecraft Proximity Operations
Autonomous spacecraft inspection and docking missions require controllers that can guarantee safety under thrust constraints and uncertainty. Input-constrained control barrier functions (ICCBFs) provide a framework for safety certification under bounded actuation; however, conventional ICCBF formulations can be overly conservative and exhibit limited robustness to uncertainty, leading to high fuel consumption and reduced mission feasibility. This paper proposes a framework in which the full hierarchy of class-$\mathcal{K}$ functions defining the ICCBF recursion is parameterized and learned, enabling localized shaping of the safe set and reduced conservatism. A control margin is computed efficiently using differential algebra to enable the learned continuous-time ICCBFs to be implemented on time-sampled dynamical systems typical of spacecraft proximity operations. A meta-reinforcement learning scheme is developed to train a policy that generates ICCBF parameters over a distribution of hidden physical parameters and uncertainties, using both multilayer perceptron (MLP) and recurrent neural network (RNN) architectures. Simulation results on cruise control, spacecraft inspection, and docking scenarios demonstrate that the proposed approach maintains safety while reducing fuel consumption and improving feasibility relative to fixed class-$\mathcal{K}$ ICCBFs, with the RNN showing a particularly strong advantage in the more complex inspection case.
Why Look at It at All?: Vision-Free Multifingered Blind Grasping Using Uniaxial Fingertip Force Sensing
Grasping under limited sensing remains a fundamental challenge for real-world robotic manipulation, as vision and high-resolution tactile sensors often introduce cost, fragility, and integration complexity. This work demonstrates that reliable multifingered grasping can be achieved under extremely minimal sensing by relying solely on uniaxial fingertip force feedback and joint proprioception, without vision or multi-axis/tactile sensing. To enable such blind grasping, we employ an efficient teacher-student training pipeline in which a reinforcement-learned teacher exploits privileged simulation-only observations to generate demonstrations for distilling a transformer-based student policy operating under partial observation. The student policy is trained to act using only sensing modalities available at real-world deployment. We validate the proposed approach on real hardware across 18 objects, including both in-distribution and out-of-distribution cases, achieving a 98.3~$\%$ overall grasp success rate. These results demonstrate strong robustness and generalization beyond the simulation training distribution, while significantly reducing sensing requirements for real-world grasping systems.
comment: Submitted to Journal (under review)
Distributed Omniscient Observers for Multi-Agent Systems
This article proposes fully distributed omniscient observers for both heterogeneous and homogeneous linear multi-agent systems, through which each agent can estimate the states of all agents. The proposed observers not only contribute to distributed Nash equilibrium seeking in multi-player games, but also provide a designable self-organization mechanism for artificial swarms to emulate biological social behaviors, including sheepdog herding and honeybee dance communication.
Optimizing Chlorination in Water Distribution Systems via Surrogate-assisted Neuroevolution GECCO '26
Ensuring the microbiological safety of large, heterogeneous water distribution systems (WDS) typically requires managing appropriate levels of disinfectant residuals including chlorine. WDS include complex fluid interactions that are nonlinear and noisy, making such maintenance a challenging problem for traditional control algorithms. This paper proposes an evolutionary framework to this problem based on neuroevolution, multi-objective optimization, and surrogate modeling. Neural networks were evolved with NEAT to inject chlorine at strategic locations in the distribution network at select times. NSGA-II was employed to optimize four objectives: minimizing the total amount of chlorine injected, keeping chlorine concentrations homogeneous across the network, ensuring that maximum concentrations did not exceed safe bounds, and distributing the injections regularly over time. Each network was evaluated against a surrogate model, i.e. a neural network trained to emulate EPANET, an industry-level hydraulic WDS simulator that is accurate but infeasible in terms of computational cost to support machine learning. The evolved controllers produced a diverse range of Pareto-optimal policies that could be implemented in practice, outperforming standard reinforcement learning methods such as PPO. The results thus suggest a pathway toward improving urban water systems, and highlight the potential of using evolution with surrogate modeling to optimize complex real-world systems.
comment: 14 pages, 9 figures, GECCO '26 in-review
The Dawn of Agentic EDA: A Survey of Autonomous Digital Chip Design
The semiconductor industry faces a critical "Productivity Gap" where design complexity outpaces human capacity. While the "AI for EDA" revolution (L2) successfully optimized specific point problems, a paradigm shift toward Agentic EDA (L3) is emerging, evolving from passive prediction to autonomous orchestration of the RTL-to-GDSII flow. This survey presents the first systematic framework for this transition, framing Agentic EDA not merely as "Chat with Tools," but as a Constrained Neuro-Symbolic Optimization problem. We propose a novel taxonomy rooted in a Cognitive Stack--Perception (aligning multimodal semantics), Cognition (planning under strict constraints), and Action (deterministic tool execution)--to dissect how probabilistic agents navigate zero-tolerance physical laws. Through this lens, we analyze the landscape: (1) in Frontend, the shift from one-shot generation to dual-loop syntactic-semantic repair; (2) in Backend, the dichotomy between algorithm-centric solvers and agent-centric orchestrators that treat executable code as a latent space. Finally, we critically examine the Trustworthiness gap, advocating for Sim-to-Silicon benchmarks and formal grounding to transform brittle prototypes into resilient engineering systems.
Graphon Mean-Field Logit Dynamic: Derivation, Computation, and Applications
We present a graphon mean-field logit dynamic, a stationary mean-field game based on logit interactions. This dynamic emerges from a stochastic control problem involving a continuum of nonexchangeable and interacting agents and reduces to solving a continuum of Hamilton-Jacobi-Bellman (HJB) equations connected through a graphon that models the connections among agents. Using a fixed-point argument, we prove that this HJB system admits a unique solution in the space of bounded functions when the discount rate is high (i.e., agents are myopic). Under certain assumptions, we also establish regularity properties of the system, such as equi-continuity. We propose a finite difference scheme for computing the HJB system and prove the uniqueness and existence of its numerical solutions. The mean-field logit dynamic is applied to a case study on inland fisheries resource management in the upper Tedori River of Japan. A series of computational cases are then conducted to investigate the dependence of the dynamic on both the discount rate and graphon.
comment: Updaed on February 7, 2026
Current and temperature imbalances in parallel-connected grid storage battery modules
A key challenge with large battery systems is heterogeneous currents and temperatures in modules with parallel-connected cells. Although extreme currents and temperatures are detrimental to the performance and lifetime of battery cells, there is not a consensus on the scale of typical imbalances within grid storage modules. Here, we quantify these imbalances through simulations and experiments on an industrially representative grid storage battery module consisting of prismatic lithium iron phosphate cells, elucidating the evolution of current and temperature imbalances and their dependence on individual cell and module parameter variations. Using a sensitivity analysis, we find that varying contact resistances and cell resistances contribute strongly to temperature differences between cells, from which we define safety thresholds on cell-to-cell variability. Finally, we investigate how these thresholds change for different applications, to outline a set of robustness metrics that show how cycling at lower C-rates and narrower SOC ranges can mitigate failures.
Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning
Battery energy storage systems are increasingly deployed as fast-responding resources for grid balancing services such as frequency regulation and for mitigating renewable generation uncertainty. However, repeated charging and discharging induces cycling degradation and reduces battery lifetime. This paper studies the real-time scheduling of a heterogeneous battery fleet that collectively tracks a stochastic balancing signal subject to per-battery ramp-rate and capacity constraints, while minimizing long-term cycling degradation. Cycling degradation is fundamentally path-dependent: it is determined by charge-discharge cycles formed by the state-of-charge (SoC) trajectory and is commonly quantified via rainflow cycle counting. This non-Markovian structure makes it difficult to express degradation as an additive per-time-step cost, complicating classical dynamic programming approaches. We address this challenge by formulating the fleet scheduling problem as a Markov decision process (MDP) with constrained action space and designing a dense proxy reward that provides informative feedback at each time step while remaining aligned with long-term cycle-depth reduction. To scale learning to large state-action spaces induced by fine-grained SoC discretization and asymmetric per-battery constraints, we develop a function-approximation reinforcement learning method using an Extreme Learning Machine (ELM) as a random nonlinear feature map combined with linear temporal-difference learning. We evaluate the proposed approach on a toy Markovian signal model and on a Markovian model trained from real-world regulation signal traces obtained from the University of Delaware, and demonstrate consistent reductions in cycle-depth occurrence and degradation metrics compared to baseline scheduling policies.
comment: 11 pages, 2 figures
Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems
We study the joint operation and sizing of cooling infrastructure for commercial HVAC systems using reinforcement learning, with the objective of minimizing life-cycle cost over a 30-year horizon. The cooling system consists of a fixed-capacity electric chiller and a thermal energy storage (TES) unit, jointly operated to meet stochastic hourly cooling demands under time-varying electricity prices. The life-cycle cost accounts for both capital expenditure and discounted operating cost, including electricity consumption and maintenance. A key challenge arises from the strong asymmetry in capital costs: increasing chiller capacity by one unit is far more expensive than an equivalent increase in TES capacity. As a result, identifying the right combination of chiller and TES sizes, while ensuring zero loss-of-cooling-load under optimal operation, is a non-trivial co-design problem. To address this, we formulate the chiller operation problem for a fixed infrastructure configuration as a finite-horizon Markov Decision Process (MDP), in which the control action is the chiller part-load ratio (PLR). The MDP is solved using a Deep Q Network (DQN) with a constrained action space. The learned DQN RL policy minimizes electricity cost over historical traces of cooling demand and electricity prices. For each candidate chiller-TES sizing configuration, the trained policy is evaluated. We then restrict attention to configurations that fully satisfy the cooling demand and perform a life-cycle cost minimization over this feasible set to identify the cost-optimal infrastructure design. Using this approach, we determine the optimal chiller and thermal energy storage capacities to be 700 and 1500, respectively.
comment: 11 pages, 3 figures
Novel Stability Criteria for Discrete and Hybrid Systems via Ramanujan Inner Products
This paper introduces a Ramanujan inner product and its corresponding norm, establishing a novel framework for the stability analysis of hybrid and discrete-time systems as an alternative to traditional Euclidean metrics. We establish new $ε$-$δ$ stability conditions that utilize the unique properties of Ramanujan summations and their relationship with number-theoretic concepts. The proposed approach provides enhanced robustness guarantees and reveals fundamental connections between system stability and arithmetic properties of the system dynamics. Theoretical results are rigorously proven, and simulation results on numerical examples are presented to validate the efficacy of the proposed approach.
comment: 7 pages, 2 figures
Predictability Enables Parallelization of Nonlinear State Space Models NeurIPS '25
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) and DeepPCR (arXiv:2309.16318) recast sequential evaluation as a parallelizable optimization problem, sometimes yielding dramatic speedups. However, the factors governing the difficulty of these optimization problems remained unclear, limiting broader adoption. In this work, we establish a precise relationship between a system's dynamics and the conditioning of its corresponding optimization problem, as measured by its Polyak-Lojasiewicz (PL) constant. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior and quantified by the largest Lyapunov exponent (LLE), impacts the number of optimization steps required for evaluation. For predictable systems, the state trajectory can be computed in at worst $O((\log T)^2)$ time, where $T$ is the sequence length: a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis shows that predictable systems always yield well-conditioned optimization problems, whereas unpredictable systems lead to severe conditioning degradation. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized. We highlight predictability as a key design principle for parallelizable models.
comment: NeurIPS '25. XG and LK dual lead authors. Code: https://github.com/lindermanlab/predictability_enables_parallelization
Collaborative-Online-Learning-Enabled Distributionally Robust Motion Control for Multi-Robot Systems
This paper develops a novel COllaborative-Online-Learning (COOL)-enabled motion control framework for multi-robot systems to avoid collision amid randomly moving obstacles whose motion distributions are partially observable through decentralized data streams. To address the notable challenge of data acquisition due to occlusion, a COOL approach based on the Dirichlet process mixture model is proposed to efficiently extract motion distribution information by exchanging among robots selected learning structures. By leveraging the fine-grained local-moment information learned through COOL, a data-stream-driven ambiguity set for obstacle motion is constructed. We then introduce a novel ambiguity set propagation method, which theoretically admits the derivation of the ambiguity sets for obstacle positions over the entire prediction horizon by utilizing obstacle current positions and the ambiguity set for obstacle motion. Additionally, we develop a compression scheme with its safety guarantee to automatically adjust the complexity and granularity of the ambiguity set by aggregating basic ambiguity sets that are close in a measure space, thereby striking an attractive trade-off between control performance and computation time. Then the probabilistic collision-free trajectories are generated through distributionally robust optimization problems. The distributionally robust obstacle avoidance constraints based on the compressed ambiguity set are equivalently reformulated by deriving separating hyperplanes through tractable semi-definite programming. Finally, we establish the probabilistic collision avoidance guarantee and the long-term tracking performance guarantee for the proposed framework. The numerical simulations are used to demonstrate the efficacy and superiority of the proposed approach compared with state-of-the-art methods.
comment: 26 pages
Data-Driven Safe Output Regulation of Strict-Feedback Linear Systems with Input Delay
This paper develops a data-driven safe control framework for linear systems possessing a known strict-feedback structure, but with most plant parameters, external disturbances, and input delay being unknown. By leveraging Koopman operator theory, we utilize Krylov dynamic mode decomposition (DMD) to extract the system dynamics from measured data, enabling the reconstruction of the system and disturbance matrices. Concurrently, the batch least-squares identification (BaLSI) method is employed to identify other unknown parameters in the input channel. Using control barrier functions (CBFs) and backstepping, we first develop a full-state safe controller. Based on this, we build an output-feedback controller by performing system identification using only the output data and actuation signals as well as constructing an observer to estimate the unmeasured plant states. The proposed approach achieves: 1) finite-time identification of a substantial set of unknown system quantities, and 2) exponential convergence of the output state (the state furthest from the control input) to a reference trajectory while rigorously ensuring safety constraints. The effectiveness of the proposed method is demonstrated through a safe vehicle platooning application.
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constraint value function. We propose a novel technique that effectively minimizes the constraint value function--to satisfy the constraints; on the other hand, when all the constraints are satisfied, it can simply maximize the robust reward value function. We prove that such an algorithm finds a policy with at most $ε$ sub-optimality and feasible policy after $O(ε^{-2})$ iterations. In contrast to the state-of-the-art method, we do not need to employ a binary search, thus, we reduce the computation time by at least 4x for smaller value of discount factor ($γ$) and by at least 6x for larger value of $γ$.
Robotics
DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos
Being able to simulate the outcomes of actions in varied environments will revolutionize the development of generalist agents at scale. However, modeling these world dynamics, especially for dexterous robotics tasks, poses significant challenges due to limited data coverage and scarce action labels. As an endeavor towards this end, we introduce DreamDojo, a foundation world model that learns diverse interactions and dexterous controls from 44k hours of egocentric human videos. Our data mixture represents the largest video dataset to date for world model pretraining, spanning a wide range of daily scenarios with diverse objects and skills. To address the scarcity of action labels, we introduce continuous latent actions as unified proxy actions, enhancing interaction knowledge transfer from unlabeled videos. After post-training on small-scale target robot data, DreamDojo demonstrates a strong understanding of physics and precise action controllability. We also devise a distillation pipeline that accelerates DreamDojo to a real-time speed of 10.81 FPS and further improves context consistency. Our work enables several important applications based on generative world models, including live teleoperation, policy evaluation, and model-based planning. Systematic evaluation on multiple challenging out-of-distribution (OOD) benchmarks verifies the significance of our method for simulating open-world, contact-rich tasks, paving the way for general-purpose robot world models.
comment: Project page: https://dreamdojo-world.github.io/
Strategizing at Speed: A Learned Model Predictive Game for Multi-Agent Drone Racing
Autonomous drone racing pushes the boundaries of high-speed motion planning and multi-agent strategic decision-making. Success in this domain requires drones not only to navigate at their limits but also to anticipate and counteract competitors' actions. In this paper, we study a fundamental question that arises in this domain: how deeply should an agent strategize before taking an action? To this end, we compare two planning paradigms: the Model Predictive Game (MPG), which finds interaction-aware strategies at the expense of longer computation times, and contouring Model Predictive Control (MPC), which computes strategies rapidly but does not reason about interactions. We perform extensive experiments to study this trade-off, revealing that MPG outperforms MPC at moderate velocities but loses its advantage at higher speeds due to latency. To address this shortcoming, we propose a Learned Model Predictive Game (LMPG) approach that amortizes model predictive gameplay to reduce latency. In both simulation and hardware experiments, we benchmark our approach against MPG and MPC in head-to-head races, finding that LMPG outperforms both baselines.
Consensus-based optimization (CBO): Towards Global Optimality in Robotics
Zero-order optimization has recently received significant attention for designing optimal trajectories and policies for robotic systems. However, most existing methods (e.g., MPPI, CEM, and CMA-ES) are local in nature, as they rely on gradient estimation. In this paper, we introduce consensus-based optimization (CBO) to robotics, which is guaranteed to converge to a global optimum under mild assumptions. We provide theoretical analysis and illustrative examples that give intuition into the fundamental differences between CBO and existing methods. To demonstrate the scalability of CBO for robotics problems, we consider three challenging trajectory optimization scenarios: (1) a long-horizon problem for a simple system, (2) a dynamic balance problem for a highly underactuated system, and (3) a high-dimensional problem with only a terminal cost. Our results show that CBO is able to achieve lower costs with respect to existing methods on all three challenging settings. This opens a new framework to study global trajectory optimization in robotics.
SURE: Safe Uncertainty-Aware Robot-Environment Interaction using Trajectory Optimization
Robotic tasks involving contact interactions pose significant challenges for trajectory optimization due to discontinuous dynamics. Conventional formulations typically assume deterministic contact events, which limit robustness and adaptability in real-world settings. In this work, we propose SURE, a robust trajectory optimization framework that explicitly accounts for contact timing uncertainty. By allowing multiple trajectories to branch from possible pre-impact states and later rejoin a shared trajectory, SURE achieves both robustness and computational efficiency within a unified optimization framework. We evaluate SURE on two representative tasks with unknown impact times. In a cart-pole balancing task involving uncertain wall location, SURE achieves an average improvement of 21.6% in success rate when branch switching is enabled during control. In an egg-catching experiment using a robotic manipulator, SURE improves the success rate by 40%. These results demonstrate that SURE substantially enhances robustness compared to conventional nominal formulations.
Perception-Control Coupled Visual Servoing for Textureless Objects Using Keypoint-Based EKF
Visual servoing is fundamental to robotic applications, enabling precise positioning and control. However, applying it to textureless objects remains a challenge due to the absence of reliable visual features. Moreover, adverse visual conditions, such as occlusions, often corrupt visual feedback, leading to reduced accuracy and instability in visual servoing. In this work, we build upon learning-based keypoint detection for textureless objects and propose a method that enhances robustness by tightly integrating perception and control in a closed loop. Specifically, we employ an Extended Kalman Filter (EKF) that integrates per-frame keypoint measurements to estimate 6D object pose, which drives pose-based visual servoing (PBVS) for control. The resulting camera motion, in turn, enhances the tracking of subsequent keypoints, effectively closing the perception-control loop. Additionally, unlike standard PBVS, we propose a probabilistic control law that computes both camera velocity and its associated uncertainty, enabling uncertainty-aware control for safe and reliable operation. We validate our approach on real-world robotic platforms using quantitative metrics and grasping experiments, demonstrating that our method outperforms traditional visual servoing techniques in both accuracy and practical application.
DynaRetarget: Dynamically-Feasible Retargeting using Sampling-Based Trajectory Optimization
In this paper, we introduce DynaRetarget, a complete pipeline for retargeting human motions to humanoid control policies. The core component of DynaRetarget is a novel Sampling-Based Trajectory Optimization (SBTO) framework that refines imperfect kinematic trajectories into dynamically feasible motions. SBTO incrementally advances the optimization horizon, enabling optimization over the entire trajectory for long-horizon tasks. We validate DynaRetarget by successfully retargeting hundreds of humanoid-object demonstrations and achieving higher success rates than the state of the art. The framework also generalizes across varying object properties, such as mass, size, and geometry, using the same tracking objective. This ability to robustly retarget diverse demonstrations opens the door to generating large-scale synthetic datasets of humanoid loco-manipulation trajectories, addressing a major bottleneck in real-world data collection.
A 26-Gram Butterfly-Inspired Robot Achieving Autonomous Tailless Flight
Flapping-wing micro air vehicles (FWMAVs) have demonstrated remarkable bio-inspired agility, yet tailless two-winged configurations remain largely unexplored due to their complex fluid-structure and wing-body coupling. Here we present \textit{AirPulse}, a 26-gram butterfly-inspired FWMAV that achieves fully onboard, closed-loop, untethered flight without auxiliary control surfaces. The AirPulse robot replicates key biomechanical traits of butterfly flight, including low wing aspect ratio, compliant carbon-fiber-reinforced wings, and low-frequency, high-amplitude flapping that induces cyclic variations in the center of gravity and moment of inertia, producing characteristic body undulation. We establish a quantitative mapping between flapping modulation parameters and force-torque generation, and introduce the Stroke Timing Asymmetry Rhythm (STAR) generator, enabling smooth, stable, and linearly parameterized wingstroke asymmetry for flapping control. Integrating these with an attitude controller, the AirPulse robot maintains pitch and yaw stability despite strong oscillatory dynamics. Free-flight experiments demonstrate stable climbing and turning maneuvers via either angle offset or stroke timing modulation, marking the first onboard controlled flight of the lightest two-winged, tailless butterfly-inspired FWMAV reported in peer-reviewed literature. This work corroborates a foundational platform for lightweight, collision-proof FWMAVs, bridging biological inspiration with practical aerial robotics. Their non-invasive maneuverability is ideally suited for real-world applications, such as confined-space inspection and ecological monitoring, inaccessible to traditional drones, while their biomechanical fidelity provides a physical model to decode the principles underlying the erratic yet efficient flight of real butterflies.
SuReNav: Superpixel Graph-based Constraint Relaxation for Navigation in Over-constrained Environments ICRA 2026
We address the over-constrained planning problem in semi-static environments. The planning objective is to find a best-effort solution that avoids all hard constraint regions while minimally traversing the least risky areas. Conventional methods often rely on pre-defined area costs, limiting generalizations. Further, the spatial continuity of navigation spaces makes it difficult to identify regions that are passable without overestimation. To overcome these challenges, we propose SuReNav, a superpixel graph-based constraint relaxation and navigation method that imitates human-like safe and efficient navigation. Our framework consists of three components: 1) superpixel graph map generation with regional constraints, 2) regional-constraint relaxation using graph neural network trained on human demonstrations for safe and efficient navigation, and 3) interleaving relaxation, planning, and execution for complete navigation. We evaluate our method against state-of-the-art baselines on 2D semantic maps and 3D maps from OpenStreetMap, achieving the highest human-likeness score of complete navigation while maintaining a balanced trade-off between efficiency and safety. We finally demonstrate its scalability and generalization performance in real-world urban navigation with a quadruped robot, Spot.
comment: Accepted by ICRA 2026. Code and videos are available at https://sure-nav.github.io/
Constraint Manifold Exploration for Efficient Continuous Coverage Estimation
Many automated manufacturing processes rely on industrial robot arms to move process-specific tools along workpiece surfaces. In applications like grinding, sanding, spray painting, or inspection, they need to cover a workpiece fully while keeping their tools perpendicular to its surface. While there are approaches to generate trajectories for these applications, there are no sufficient methods for analyzing the feasibility of full surface coverage. This work proposes a sampling-based approach for continuous coverage estimation that explores reachable surface regions in the configuration space. We define an extended ambient configuration space that allows for the representation of tool position and orientation constraints. A continuation-based approach is used to explore it using two different sampling strategies. A thorough evaluation across different kinematics and environments analyzes their runtime and efficiency. This validates our ability to accurately and efficiently calculate surface coverage for complex surfaces in complicated environments.
comment: 8 pages, 7 figures
Crowd-FM: Learned Optimal Selection of Conditional Flow Matching-generated Trajectories for Crowd Navigation ICRA 2026
Safe and computationally efficient local planning for mobile robots in dense, unstructured human crowds remains a fundamental challenge. Moreover, ensuring that robot trajectories are similar to how a human moves will increase the acceptance of the robot in human environments. In this paper, we present Crowd-FM, a learning-based approach to address both safety and human-likeness challenges. Our approach has two novel components. First, we train a Conditional Flow-Matching (CFM) policy over a dataset of optimally controlled trajectories to learn a set of collision-free primitives that a robot can choose at any given scenario. The chosen optimal control solver can generate multi-modal collision-free trajectories, allowing the CFM policy to learn a diverse set of maneuvers. Secondly, we learn a score function over a dataset of human demonstration trajectories that provides a human-likeness score for the flow primitives. At inference time, computing the optimal trajectory requires selecting the one with the highest score. Our approach improves the state-of-the-art by showing that our CFM policy alone can produce collision-free navigation with a higher success rate than existing learning-based baselines. Furthermore, when augmented with inference-time refinement, our approach can outperform even expensive optimisation-based planning approaches. Finally, we validate that our scoring network can select trajectories closer to the expert data than a manually designed cost function.
comment: Accepted at IEEE ICRA 2026. Authors Antareep Singha and Laksh Nanwani have equal contributions
RAPID: Reconfigurable, Adaptive Platform for Iterative Design
Developing robotic manipulation policies is iterative and hypothesis-driven: researchers test tactile sensing, gripper geometries, and sensor placements through real-world data collection and training. Yet even minor end-effector changes often require mechanical refitting and system re-integration, slowing iteration. We present RAPID, a full-stack reconfigurable platform designed to reduce this friction. RAPID is built around a tool-free, modular hardware architecture that unifies handheld data collection and robot deployment, and a matching software stack that maintains real-time awareness of the underlying hardware configuration through a driver-level Physical Mask derived from USB events. This modular hardware architecture reduces reconfiguration to seconds and makes systematic multi-modal ablation studies practical, allowing researchers to sweep diverse gripper and sensing configurations without repeated system bring-up. The Physical Mask exposes modality presence as an explicit runtime signal, enabling auto-configuration and graceful degradation under sensor hot-plug events, so policies can continue executing when sensors are physically added or removed. System-centric experiments show that RAPID reduces the setup time for multi-modal configurations by two orders of magnitude compared to traditional workflows and preserves policy execution under runtime sensor hot-unplug events. The hardware designs, drivers, and software stack are open-sourced at https://rapid-kit.github.io/ .
Humanoid Manipulation Interface: Humanoid Whole-Body Manipulation from Robot-Free Demonstrations
Current approaches for humanoid whole-body manipulation, primarily relying on teleoperation or visual sim-to-real reinforcement learning, are hindered by hardware logistics and complex reward engineering. Consequently, demonstrated autonomous skills remain limited and are typically restricted to controlled environments. In this paper, we present the Humanoid Manipulation Interface (HuMI), a portable and efficient framework for learning diverse whole-body manipulation tasks across various environments. HuMI enables robot-free data collection by capturing rich whole-body motion using portable hardware. This data drives a hierarchical learning pipeline that translates human motions into dexterous and feasible humanoid skills. Extensive experiments across five whole-body tasks--including kneeling, squatting, tossing, walking, and bimanual manipulation--demonstrate that HuMI achieves a 3x increase in data collection efficiency compared to teleoperation and attains a 70% success rate in unseen environments.
comment: Website: https://humanoid-manipulation-interface.github.io
Efficient and Robust Modeling of Nonlinear Mechanical Systems
The development of efficient and robust dynamic models is fundamental in the field of systems and control engineering. In this paper, a new formulation for the dynamic model of nonlinear mechanical systems, that can be applied to different automotive and robotic case studies, is proposed, together with a modeling procedure allowing to automatically obtain the model formulation. Compared with the Euler-Lagrange formulation, the proposed model is shown to give superior performances in terms of robustness against measurement noise for systems exhibiting dependence on some external variables, as well as in terms of execution time when computing the inverse dynamics of the system.
Force Generative Imitation Learning: Bridging Position Trajectory and Force Commands through Control Technique
In contact-rich tasks, while position trajectories are often easy to obtain, appropriate force commands are typically unknown. Although it is conceivable to generate force commands using a pretrained foundation model such as Vision-Language-Action (VLA) models, force control is highly dependent on the specific hardware of the robot, which makes the application of such models challenging. To bridge this gap, we propose a force generative model that estimates force commands from given position trajectories. However, when dealing with unseen position trajectories, the model struggles to generate accurate force commands. To address this, we introduce a feedback control mechanism. Our experiments reveal that feedback control does not converge when the force generative model has memory. We therefore adopt a model without memory, enabling stable feedback control. This approach allows the system to generate force commands effectively, even for unseen position trajectories, improving generalization for real-world robot writing tasks.
comment: Accepted for IEEE Access
Think Proprioceptively: Embodied Visual Reasoning for VLA Manipulation
Vision-language-action (VLA) models typically inject proprioception only as a late conditioning signal, which prevents robot state from shaping instruction understanding and from influencing which visual tokens are attended throughout the policy. We introduce ThinkProprio, which converts proprioception into a sequence of text tokens in the VLM embedding space and fuses them with the task instruction at the input. This early fusion lets embodied state participate in subsequent visual reasoning and token selection, biasing computation toward action-critical evidence while suppressing redundant visual tokens. In a systematic ablation over proprioception encoding, state entry point, and action-head conditioning, we find that text tokenization is more effective than learned projectors, and that retaining roughly 15% of visual tokens can match the performance of using the full token set. Across CALVIN, LIBERO, and real-world manipulation, ThinkProprio matches or improves over strong baselines while reducing end-to-end inference latency over 50%.
The Law of Task-Achieving Body Motion: Axiomatizing Success of Robot Manipulation Actions IJCAI
Autonomous agents that perform everyday manipulation actions need to ensure that their body motions are semantically correct with respect to a task request, causally effective within their environment, and feasible for their embodiment. In order to enable robots to verify these properties, we introduce the Law of Task-Achieving Body Motion as an axiomatic correctness specification for body motions. To that end we introduce scoped Task-Environment-Embodiment (TEE) classes that represent world states as Semantic Digital Twins (SDTs) and define applicable physics models to decompose task achievement into three predicates: SatisfiesRequest for semantic request satisfaction over SDT state evolution; Causes for causal sufficiency under the scoped physics model; and CanPerform for safety and feasibility verification at the embodiment level. This decomposition yields a reusable, implementation-independent interface that supports motion synthesis and the verification of given body motions. It also supports typed failure diagnosis (semantic, causal, embodiment and out-of-scope), feasibility across robots and environments, and counterfactual reasoning about robot body motions. We demonstrate the usability of the law in practice by instantiating it for articulated container manipulation in kitchen environments on three contrasting mobile manipulation platforms
comment: 9 pages, 3 figures, submitted to the 2026 International Joint Conference on Artificial Intelligence (IJCAI)
LIBERO-X: Robustness Litmus for Vision-Language-Action Models
Reliable benchmarking is critical for advancing Vision-Language-Action (VLA) models, as it reveals their generalization, robustness, and alignment of perception with language-driven manipulation tasks. However, existing benchmarks often provide limited or misleading assessments due to insufficient evaluation protocols that inadequately capture real-world distribution shifts. This work systematically rethinks VLA benchmarking from both evaluation and data perspectives, introducing LIBERO-X, a benchmark featuring: 1) A hierarchical evaluation protocol with progressive difficulty levels targeting three core capabilities: spatial generalization, object recognition, and task instruction understanding. This design enables fine-grained analysis of performance degradation under increasing environmental and task complexity; 2) A high-diversity training dataset collected via human teleoperation, where each scene supports multiple fine-grained manipulation objectives to bridge the train-evaluation distribution gap. Experiments with representative VLA models reveal significant performance drops under cumulative perturbations, exposing persistent limitations in scene comprehension and instruction grounding. By integrating hierarchical evaluation with diverse training data, LIBERO-X offers a more reliable foundation for assessing and advancing VLA development.
comment: 19 pages, 14 figures and 8 tables
Primary Experimental Feedback on a Co-manipulated Robotic System for Assisted Cervical Surgery
Robotic-assisted surgery has emerged as a promising approach to improve surgical ergonomics, precision, and workflow efficiency, particularly in complex procedures such as cervical spine surgery. In this study, we evaluate the performance of a collaborative robotic system designed to assist surgeons in drilling tasks by assessing its accuracy in executing predefined trajectories. A total of 14 drillings were performed by eight experienced cervical surgeons, utilizing a robotic-assisted setup aimed at ensuring stability and alignment. The primary objective of this study is to quantify the deviations in the position and orientation of the drilling tool relative to the planned trajectory, providing insights into the system's reliability and potential impact on clinical outcomes. While the primary function of robotic assistance in surgery is to enhance surgeon comfort and procedural guidance rather than solely optimizing precision, understanding the system's accuracy remains crucial for its effective integration into surgical practices part of this primary experimental feedback, the study offers an in-depth analysis of the co-manipulated robotic system's performance, focusing on the experimental setup and error evaluation methods. The findings of this study will contribute to the ongoing development of robotic-assisted cervical surgery, highlighting both its advantages and areas for improvement in achieving safer and more efficient surgical workflows
DriveWorld-VLA: Unified Latent-Space World Modeling with Vision-Language-Action for Autonomous Driving
End-to-end (E2E) autonomous driving has recently attracted increasing interest in unifying Vision-Language-Action (VLA) with World Models to enhance decision-making and forward-looking imagination. However, existing methods fail to effectively unify future scene evolution and action planning within a single architecture due to inadequate sharing of latent states, limiting the impact of visual imagination on action decisions. To address this limitation, we propose DriveWorld-VLA, a novel framework that unifies world modeling and planning within a latent space by tightly integrating VLA and world models at the representation level, which enables the VLA planner to benefit directly from holistic scene-evolution modeling and reducing reliance on dense annotated supervision. Additionally, DriveWorld-VLA incorporates the latent states of the world model as core decision-making states for the VLA planner, facilitating the planner to assess how candidate actions impact future scene evolution. By conducting world modeling entirely in the latent space, DriveWorld-VLA supports controllable, action-conditioned imagination at the feature level, avoiding expensive pixel-level rollouts. Extensive open-loop and closed-loop evaluations demonstrate the effectiveness of DriveWorld-VLA, which achieves state-of-the-art performance with 91.3 PDMS on NAVSIMv1, 86.8 EPDMS on NAVSIMv2, and 0.16 3-second average collision rate on nuScenes. Code and models will be released in https://github.com/liulin815/DriveWorld-VLA.git.
comment: 20 pages, 7 tables, 12 figures
Beyond the Majority: Long-tail Imitation Learning for Robotic Manipulation ICRA 2026
While generalist robot policies hold significant promise for learning diverse manipulation skills through imitation, their performance is often hindered by the long-tail distribution of training demonstrations. Policies learned on such data, which is heavily skewed towards a few data-rich head tasks, frequently exhibit poor generalization when confronted with the vast number of data-scarce tail tasks. In this work, we conduct a comprehensive analysis of the pervasive long-tail challenge inherent in policy learning. Our analysis begins by demonstrating the inefficacy of conventional long-tail learning strategies (e.g., re-sampling) for improving the policy's performance on tail tasks. We then uncover the underlying mechanism for this failure, revealing that data scarcity on tail tasks directly impairs the policy's spatial reasoning capability. To overcome this, we introduce Approaching-Phase Augmentation (APA), a simple yet effective scheme that transfers knowledge from data-rich head tasks to data-scarce tail tasks without requiring external demonstrations. Extensive experiments in both simulation and real-world manipulation tasks demonstrate the effectiveness of APA. Our code and demos are publicly available at: https://mldxy.github.io/Project-VLA-long-tail/.
comment: accept by IEEE International Conference on Robotics and Automation (ICRA 2026), 8 pages, 6 figures,
World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy
Recent progress in robotic world models has leveraged video diffusion transformers to predict future observations conditioned on historical states and actions. While these models can simulate realistic visual outcomes, they often exhibit poor action-following precision, hindering their utility for downstream robotic learning. In this work, we introduce World-VLA-Loop, a closed-loop framework for the joint refinement of world models and Vision-Language-Action (VLA) policies. We propose a state-aware video world model that functions as a high-fidelity interactive simulator by jointly predicting future observations and reward signals. To enhance reliability, we introduce the SANS dataset, which incorporates near-success trajectories to improve action-outcome alignment within the world model. This framework enables a closed-loop for reinforcement learning (RL) post-training of VLA policies entirely within a virtual environment. Crucially, our approach facilitates a co-evolving cycle: failure rollouts generated by the VLA policy are iteratively fed back to refine the world model precision, which in turn enhances subsequent RL optimization. Evaluations across simulation and real-world tasks demonstrate that our framework significantly boosts VLA performance with minimal physical interaction, establishing a mutually beneficial relationship between world modeling and policy learning for general-purpose robotics. Project page: https://showlab.github.io/World-VLA-Loop/.
comment: 14 pages, 8 figures
MultiGraspNet: A Multitask 3D Vision Model for Multi-gripper Robotic Grasping
Vision-based models for robotic grasping automate critical, repetitive, and draining industrial tasks. Existing approaches are typically limited in two ways: they either target a single gripper and are potentially applied on costly dual-arm setups, or rely on custom hybrid grippers that require ad-hoc learning procedures with logic that cannot be transferred across tasks, restricting their general applicability. In this work, we present MultiGraspNet, a novel multitask 3D deep learning method that predicts feasible poses simultaneously for parallel and vacuum grippers within a unified framework, enabling a single robot to handle multiple end effectors. The model is trained on the richly annotated GraspNet-1Billion and SuctionNet-1Billion datasets, which have been aligned for the purpose, and generates graspability masks quantifying the suitability of each scene point for successful grasps. By sharing early-stage features while maintaining gripper-specific refiners, MultiGraspNet effectively leverages complementary information across grasping modalities, enhancing robustness and adaptability in cluttered scenes. We characterize MultiGraspNet's performance with an extensive experimental analysis, demonstrating its competitiveness with single-task models on relevant benchmarks. We run real-world experiments on a single-arm multi-gripper robotic setup showing that our approach outperforms the vacuum baseline, grasping 16% percent more seen objects and 32% more of the novel ones, while obtaining competitive results for the parallel task.
User-Centric Object Navigation: A Benchmark with Integrated User Habits for Personalized Embodied Object Search ICRA 2026
In the evolving field of robotics, the challenge of Object Navigation (ON) in household environments has attracted significant interest. Existing ON benchmarks typically place objects in locations guided by general scene priors, without accounting for the specific placement habits of individual users. This omission limits the adaptability of navigation agents in personalized household environments. To address this, we introduce User-centric Object Navigation (UcON), a new benchmark that incorporates user-specific object placement habits, referred to as user habits. This benchmark requires agents to leverage these user habits for more informed decision-making during navigation. UcON encompasses approximately 22,600 user habits across 489 object categories. UcON is, to our knowledge, the first benchmark that explicitly formalizes and evaluates habit-conditioned object navigation at scale and covers the widest range of target object categories. Additionally, we propose a habit retrieval module to extract and utilize habits related to target objects, enabling agents to infer their likely locations more effectively. Experimental results demonstrate that current SOTA methods exhibit substantial performance degradation under habit-driven object placement, while integrating user habits consistently improves success rates. Code is available at https://github.com/whcpumpkin/User-Centric-Object-Navigation.
comment: Accepted by ICRA 2026
ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking SC
Achieving stable and energy-efficient locomotion is essential for humanoid robots to operate continuously in real-world applications. Existing MPC and RL approaches often rely on energy-related metrics embedded within a multi-objective optimization framework, which require extensive hyperparameter tuning and often result in suboptimal policies. To address these challenges, we propose ECO (Energy-Constrained Optimization), a constrained RL framework that separates energy-related metrics from rewards, reformulating them as explicit inequality constraints. This method provides a clear and interpretable physical representation of energy costs, enabling more efficient and intuitive hyperparameter tuning for improved energy efficiency. ECO introduces dedicated constraints for energy consumption and reference motion, enforced by the Lagrangian method, to achieve stable, symmetric, and energy-efficient walking for humanoid robots. We evaluated ECO against MPC, standard RL with reward shaping, and four state-of-the-art constrained RL methods. Experiments, including sim-to-sim and sim-to-real transfers on the kid-sized humanoid robot BRUCE, demonstrate that ECO significantly reduces energy consumption compared to baselines while maintaining robust walking performance. These results highlight a substantial advancement in energy-efficient humanoid locomotion. All experimental demonstrations can be found on the project website: https://sites.google.com/view/eco-humanoid.
comment: IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING. PREPRINT VERSION. ACCEPTED FEB, 2026
Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation for the Last Meters
Embodied navigation holds significant promise for real-world applications such as last-mile delivery. However, most existing approaches are confined to either indoor or outdoor environments and rely heavily on strong assumptions, such as access to precise coordinate systems. While current outdoor methods can guide agents to the vicinity of a target using coarse-grained localization, they fail to enable fine-grained entry through specific building entrances, critically limiting their utility in practical deployment scenarios that require seamless outdoor-to-indoor transitions. To bridge this gap, we introduce a novel task: out-to-in prior-free instruction-driven embodied navigation. This formulation explicitly eliminates reliance on accurate external priors, requiring agents to navigate solely based on egocentric visual observations guided by instructions. To tackle this task, we propose a vision-centric embodied navigation framework that leverages image-based prompts to drive decision-making. Additionally, we present the first open-source dataset for this task, featuring a pipeline that integrates trajectory-conditioned video synthesis into the data generation process. Through extensive experiments, we demonstrate that our proposed method consistently outperforms state-of-the-art baselines across key metrics including success rate and path efficiency.
TFusionOcc: Student's t-Distribution Based Object-Centric Multi-Sensor Fusion Framework for 3D Occupancy Prediction
3D semantic occupancy prediction enables autonomous vehicles (AVs) to perceive fine-grained geometric and semantic structure of their surroundings from onboard sensors, which is essential for safe decision-making and navigation. Recent models for 3D semantic occupancy prediction have successfully addressed the challenge of describing real-world objects with varied shapes and classes. However, the intermediate representations used by existing methods for 3D semantic occupancy prediction rely heavily on 3D voxel volumes or a set of 3D Gaussians, hindering the model's ability to efficiently and effectively capture fine-grained geometric details in the 3D driving environment. This paper introduces TFusionOcc, a novel object-centric multi-sensor fusion framework for predicting 3D semantic occupancy. By leveraging multi-stage multi-sensor fusion, Student's t-distribution, and the T-Mixture model (TMM), together with more geometrically flexible primitives, such as the deformable superquadric (superquadric with inverse warp), the proposed method achieved state-of-the-art (SOTA) performance on the nuScenes benchmark. In addition, extensive experiments were conducted on the nuScenes-C dataset to demonstrate the robustness of the proposed method in different camera and lidar corruption scenarios. The code will be available at: https://github.com/DanielMing123/TFusionOcc
Now You See That: Learning End-to-End Humanoid Locomotion from Raw Pixels
Achieving robust vision-based humanoid locomotion remains challenging due to two fundamental issues: the sim-to-real gap introduces significant perception noise that degrades performance on fine-grained tasks, and training a unified policy across diverse terrains is hindered by conflicting learning objectives. To address these challenges, we present an end-to-end framework for vision-driven humanoid locomotion. For robust sim-to-real transfer, we develop a high-fidelity depth sensor simulation that captures stereo matching artifacts and calibration uncertainties inherent in real-world sensing. We further propose a vision-aware behavior distillation approach that combines latent space alignment with noise-invariant auxiliary tasks, enabling effective knowledge transfer from privileged height maps to noisy depth observations. For versatile terrain adaptation, we introduce terrain-specific reward shaping integrated with multi-critic and multi-discriminator learning, where dedicated networks capture the distinct dynamics and motion priors of each terrain type. We validate our approach on two humanoid platforms equipped with different stereo depth cameras. The resulting policy demonstrates robust performance across diverse environments, seamlessly handling extreme challenges such as high platforms and wide gaps, as well as fine-grained tasks including bidirectional long-term staircase traversal.
A Consistency-Improved LiDAR-Inertial Bundle Adjustment
Simultaneous Localization and Mapping (SLAM) using 3D LiDAR has emerged as a cornerstone for autonomous navigation in robotics. While feature-based SLAM systems have achieved impressive results by leveraging edge and planar structures, they often suffer from the inconsistent estimator associated with feature parameterization and estimated covariance. In this work, we present a consistency-improved LiDAR-inertial bundle adjustment (BA) with tailored parameterization and estimator. First, we propose a stereographic-projection representation parameterizing the planar and edge features, and conduct a comprehensive observability analysis to support its integrability with consistent estimator. Second, we implement a LiDAR-inertial BA with Maximum a Posteriori (MAP) formulation and First-Estimate Jacobians (FEJ) to preserve the accurate estimated covariance and observability properties of the system. Last, we apply our proposed BA method to a LiDAR-inertial odometry.
Towards Adaptive Environment Generation for Training Embodied Agents AAAI-26
Embodied agents struggle to generalize to new environments, even when those environments share similar underlying structures to their training settings. Most current approaches to generating these training environments follow an open-loop paradigm, without considering the agent's current performance. While procedural generation methods can produce diverse scenes, diversity without feedback from the agent is inefficient. The generated environments may be trivially easy, providing limited learning signal. To address this, we present a proof-of-concept for closed-loop environment generation that adapts difficulty to the agent's current capabilities. Our system employs a controllable environment representation, extracts fine-grained performance feedback beyond binary success or failure, and implements a closed-loop adaptation mechanism that translates this feedback into environment modifications. This feedback-driven approach generates training environments that more challenging in the ways the agent needs to improve, enabling more efficient learning and better generalization to novel settings.
comment: Accepted to AAAI-26 Bridge Program B10: Making Embodied AI Reliable with Testing and Formal Verification
Nipping the Drift in the Bud: Retrospective Rectification for Robust Vision-Language Navigation
Vision-Language Navigation (VLN) requires embodied agents to interpret natural language instructions and navigate through complex continuous 3D environments. However, the dominant imitation learning paradigm suffers from exposure bias, where minor deviations during inference lead to compounding errors. While DAgger-style approaches attempt to mitigate this by correcting error states, we identify a critical limitation: Instruction-State Misalignment. Forcing an agent to learn recovery actions from off-track states often creates supervision signals that semantically conflict with the original instruction. In response to these challenges, we introduce BudVLN, an online framework that learns from on-policy rollouts by constructing supervision to match the current state distribution. BudVLN performs retrospective rectification via counterfactual re-anchoring and decision-conditioned supervision synthesis, using a geodesic oracle to synthesize corrective trajectories that originate from valid historical states, ensuring semantic consistency. Experiments on the standard R2R-CE and RxR-CE benchmarks demonstrate that BudVLN consistently mitigates distribution shift and achieves state-of-the-art performance in both Success Rate and SPL.
HiWET: Hierarchical World-Frame End-Effector Tracking for Long-Horizon Humanoid Loco-Manipulation
Humanoid loco-manipulation requires executing precise manipulation tasks while maintaining dynamic stability amid base motion and impacts. Existing approaches typically formulate commands in body-centric frames, fail to inherently correct cumulative world-frame drift induced by legged locomotion. We reformulate the problem as world-frame end-effector tracking and propose HiWET, a hierarchical reinforcement learning framework that decouples global reasoning from dynamic execution. The high-level policy generates subgoals that jointly optimize end-effector accuracy and base positioning in the world frame, while the low-level policy executes these commands under stability constraints. We introduce a Kinematic Manifold Prior (KMP) that embeds the manipulation manifold into the action space via residual learning, reducing exploration dimensionality and mitigating kinematically invalid behaviors. Extensive simulation and ablation studies demonstrate that HiWET achieves precise and stable end-effector tracking in long-horizon world-frame tasks. We validate zero-shot sim-to-real transfer of the low-level policy on a physical humanoid, demonstrating stable locomotion under diverse manipulation commands. These results indicate that explicit world-frame reasoning combined with hierarchical control provides an effective and scalable solution for long-horizon humanoid loco-manipulation.
Action Hallucination in Generative Visual-Language-Action Models
Robot Foundation Models such as Vision-Language-Action models are rapidly reshaping how robot policies are trained and deployed, replacing hand-designed planners with end-to-end generative action models. While these systems demonstrate impressive generalization, it remains unclear whether they fundamentally resolve the long-standing challenges of robotics. We address this question by analyzing action hallucinations that violate physical constraints and their extension to plan-level failures. Focusing on latent-variable generative policies, we show that hallucinations often arise from structural mismatches between feasible robot behavior and common model architectures. We study three such barriers -- topological, precision, and horizon -- and show how they impose unavoidable tradeoffs. Our analysis provides mechanistic explanations for reported empirical failures of generative robot policies and suggests principled directions for improving reliability and trustworthiness, without abandoning their expressive power.
comment: 22 pages
Internalized Morphogenesis: A Self-Organizing Model for Growth, Replication, and Regeneration via Local Token Exchange in Modular Systems
This study presents an internalized morphogenesis model for autonomous systems, such as swarm robotics and micro-nanomachines, that eliminates the need for external spatial computation. Traditional self-organizing models often require calculations across the entire coordinate space, including empty areas, which is impractical for resource-constrained physical modules. Our proposed model achieves complex morphogenesis through strictly local interactions between adjacent modules within the "body." By extending the "Ishida token model," modules exchange integer values using an RD-inspired discrete analogue without solving differential equations. The internal potential, derived from token accumulation and aging, guides autonomous growth, shrinkage, and replication. Simulations on a hexagonal grid demonstrated the emergence of limb-like extensions, self-division, and robust regeneration capabilities following structural amputation. A key feature is the use of the body boundary as a natural sink for information entropy (tokens) to maintain a dynamic equilibrium. These results indicate that sophisticated morphological behaviors can emerge from minimal, internal-only rules. This framework offers a computationally efficient and biologically plausible approach to developing self-repairing, adaptive, and autonomous hardware.
Robots That Generate Planarity Through Geometry
Constraining motion to a flat surface is a fundamental requirement for equipment across science and engineering. Modern precision robotic motion systems, such as gantries, rely on the flatness of components, including guide rails and granite surface plates. However, translating this static flatness into motion requires precise internal alignment and tight-tolerance components that create long, error-sensitive reference chains. Here, we show that by using the geometric inversion of a sphere into a plane, we can produce robotic motion systems that derive planarity entirely from link lengths and connectivity. This allows planar motion to emerge from self-referencing geometric constraints, and without external metrology. We demonstrate these Flat-Plane Mechanisms (FPMs) from micron to meter scales and show that fabrication errors can be attenuated by an order of magnitude in the resulting flatness. Finally, we present a robotic FPM-based 3-axis positioning system that can be used for metrology surface scans ($\pm 12$-mm) and 3D printing inside narrow containers. This work establishes an alternative geometric foundation for planar motion that can be realized across size scales and opens new possibilities in metrology, fabrication, and micro-positioning.
A High-Fidelity Robotic Manipulator Teleoperation Framework for Human-Centered Augmented Reality Evaluation
Validating Augmented Reality (AR) tracking and interaction models requires precise, repeatable ground-truth motion. However, human users cannot reliably perform consistent motion due to biomechanical variability. Robotic manipulators are promising to act as human motion proxies if they can mimic human movements. In this work, we design and implement ARBot, a real-time teleoperation platform that can effectively capture natural human motion and accurately replay the movements via robotic manipulators. ARBot includes two capture models: stable wrist motion capture via a custom CV and IMU pipeline, and natural 6-DOF control via a mobile application. We design a proactively-safe QP controller to ensure smooth, jitter-free execution of the robotic manipulator, enabling it to function as a high-fidelity record and replay physical proxy. We open-source ARBot and release a benchmark dataset of 132 human and synthetic trajectories captured using ARBot to support controllable and scalable AR evaluation.
aerial-autonomy-stack -- a Faster-than-real-time, Autopilot-agnostic, ROS2 Framework to Simulate and Deploy Perception-based Drones
Unmanned aerial vehicles are rapidly transforming multiple applications, from agricultural and infrastructure monitoring to logistics and defense. Introducing greater autonomy to these systems can simultaneously make them more effective as well as reliable. Thus, the ability to rapidly engineer and deploy autonomous aerial systems has become of strategic importance. In the 2010s, a combination of high-performance compute, data, and open-source software led to the current deep learning and AI boom, unlocking decades of prior theoretical work. Robotics is on the cusp of a similar transformation. However, physical AI faces unique hurdles, often combined under the umbrella term "simulation-to-reality gap". These span from modeling shortcomings to the complexity of vertically integrating the highly heterogeneous hardware and software systems typically found in field robots. To address the latter, we introduce aerial-autonomy-stack, an open-source, end-to-end framework designed to streamline the pipeline from (GPU-accelerated) perception to (flight controller-based) action. Our stack allows the development of aerial autonomy using ROS2 and provides a common interface for two of the most popular autopilots: PX4 and ArduPilot. We show that it supports over 20x faster-than-real-time, end-to-end simulation of a complete development and deployment stack -- including edge compute and networking -- significantly compressing the build-test-release cycle of perception-based autonomy.
Realistic Synthetic Household Data Generation at Scale AAAI 2026
Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires diverse, large-scale datasets. Prior frameworks generate synthetic data for long-term human-robot interactions but fail to model the bidirectional influence between human behavior and household environments. Our proposed generative framework creates household datasets at scale through loosely coupled generation of long-term human-robot interactions and environments. Human personas influence environment generation, while environment schematics and semantics shape human-robot interactions. The generated 3D data includes rich static context such as object and environment semantics, and temporal context capturing human and agent behaviors over extended periods. Our flexible tool allows users to define dataset characteristics via natural language prompts, enabling configuration of environment and human activity data through natural language specifications. The tool creates variations of user-defined configurations, enabling scalable data generation. We validate our framework through statistical evaluation using multi-modal embeddings and key metrics: cosine similarity, mutual information gain, intervention analysis, and iterative improvement validation. Statistical comparisons show good alignment with real-world datasets (HOMER) with cosine similarity (0.60), while synthetic datasets (Wang et al.) show moderate alignment (0.27). Intervention analysis across age, organization, and sleep pattern changes shows statistically significant effects (p < 0.001) with large effect sizes (Cohen's d = 0.51-1.12), confirming bidirectional coupling translates persona traits into measurable environmental and behavioral differences. These contributions enable development and testing of household smart devices at scale.
comment: Accepted at Agentic AI Benchmarks and Applications for Enterprise Tasks workshop at AAAI 2026
Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation
Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-time, cerebellar-inspired residual control framework that augments a frozen reinforcement learning policy with online corrective actions, enabling fault recovery without modifying base policy parameters. The framework instantiates core cerebellar principles, including high-dimensional pattern separation via fixed feature expansion, parallel microzone-style residual pathways, and local error-driven plasticity with excitatory and inhibitory eligibility traces operating at distinct time scales. These mechanisms enable fast, localized correction under post-training disturbances while avoiding destabilizing global policy updates. A conservative, performance-driven meta-adaptation regulates residual authority and plasticity, preserving nominal behavior and suppressing unnecessary intervention. Experiments on MuJoCo benchmarks under actuator, dynamic, and environmental perturbations show improvements of up to $+66\%$ on \texttt{HalfCheetah-v5} and $+53\%$ on \texttt{Humanoid-v5} under moderate faults, with graceful degradation under severe shifts and complementary robustness from consolidating persistent residual corrections into policy parameters.
Continuum Robot Localization using Distributed Time-of-Flight Sensors
Localization and mapping of an environment are crucial tasks for any robot operating in unstructured environments. Time-of-flight (ToF) sensors (e.g.,~lidar) have proven useful in mobile robotics, where high-resolution sensors can be used for simultaneous localization and mapping. In soft and continuum robotics, however, these high-resolution sensors are too large for practical use. This, combined with the deformable nature of such robots, has resulted in continuum robot (CR) localization and mapping in unstructured environments being a largely untouched area. In this work, we present a localization technique for CRs that relies on small, low-resolution ToF sensors distributed along the length of the robot. By fusing measurement information with a robot shape prior, we show that accurate localization is possible despite each sensor experiencing frequent degenerate scenarios. We achieve an average localization error of 2.5cm in position and 7.2° in rotation across all experimental conditions with a 53cm long robot. We demonstrate that the results are repeated across multiple environments, in both simulation and real-world experiments, and study robustness in the estimation to deviations in the prior map.
A compliant ankle-actuated compass walker with triggering timing control
Passive dynamic walkers are widely adopted as a mathematical model to represent biped walking. The stable locomotion of these models is limited to tilted surfaces, requiring gravitational energy. Various techniques, such as actuation through the ankle and hip joints, have been proposed to extend the applicability of these models to level ground and rough terrain with improved locomotion efficiency. However, most of these techniques rely on impulsive energy injection schemes and torsional springs, which are quite challenging to implement in a physical platform. Here, a new model is proposed, named triggering controlled ankle actuated compass gait (TC-AACG), which allows non-instantaneous compliant ankle pushoff. The proposed technique can be implemented in physical platforms via series elastic actuators (SEAs). Our systematic examination shows that the proposed approach extends the locomotion capabilities of a biped model compared to impulsive ankle pushoff approach. We provide extensive simulation analysis investigating the locomotion speed, mechanical cost of transport, and basin of attraction of the proposed model.
comment: 6 figures, 6 pages
Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting
UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splatting enables photorealistic scene reconstruction from real-world data, existing methods inherently couple static lighting with geometry, severely limiting policy generalization to dynamic real-world illumination. In this paper, we propose a novel end-to-end reinforcement learning framework designed for effective zero-shot transfer to unstructured outdoors. Within a high-fidelity simulation grounded in real-world data, our policy is trained to map raw monocular RGB observations directly to continuous control commands. To overcome photometric limitations, we introduce Relightable 3D Gaussian Splatting, which decomposes scene components to enable explicit, physically grounded editing of environmental lighting within the neural representation. By augmenting training with diverse synthesized lighting conditions ranging from strong directional sunlight to diffuse overcast skies, we compel the policy to learn robust, illumination-invariant visual features. Extensive real-world experiments demonstrate that a lightweight quadrotor achieves robust, collision-free navigation in complex forest environments at speeds up to 10 m/s, exhibiting significant resilience to drastic lighting variations without fine-tuning.
comment: 12 pages, 8 figures
Constrained Group Relative Policy Optimization
While Group Relative Policy Optimization (GRPO) has emerged as a scalable framework for critic-free policy learning, extending it to settings with explicit behavioral constraints remains underexplored. We introduce Constrained GRPO, a Lagrangian-based extension of GRPO for constrained policy optimization. Constraints are specified via indicator cost functions, enabling direct optimization of violation rates through a Lagrangian relaxation. We show that a naive multi-component treatment in advantage estimation can break constrained learning: mismatched component-wise standard deviations distort the relative importance of the different objective terms, which in turn corrupts the Lagrangian signal and prevents meaningful constraint enforcement. We formally derive this effect to motivate our scalarized advantage construction that preserves the intended trade-off between reward and constraint terms. Experiments in a toy gridworld confirm the predicted optimization pathology and demonstrate that scalarizing advantages restores stable constraint control. In addition, we evaluate Constrained GRPO on robotics tasks, where it improves constraint satisfaction while increasing task success, establishing a simple and effective recipe for constrained policy optimization in embodied AI domains that increasingly rely on large multimodal foundation models.
comment: 16 pages, 6 figures
REACT: Real-time Entanglement-Aware Coverage Path Planning for Tethered Underwater Vehicles
Inspection of underwater structures with tethered underwater vehicles is often hindered by the risk of tether entanglement. We propose REACT (real-time entanglement-aware coverage path planning for tethered underwater vehicles), a framework designed to overcome this limitation. REACT comprises a computationally efficient geometry-based tether model using the signed distance field (SDF) map for accurate, real-time simulation of taut tether configurations around arbitrary structures in 3D. This model enables an efficient online replanning strategy by enforcing a maximum tether length constraint, thereby actively preventing entanglement. By integrating REACT into a coverage path planning framework, we achieve safe and entanglement-free inspection paths, previously challenging due to tether constraints. The complete REACT framework's efficacy is validated in a pipe inspection scenario, demonstrating safe navigation and full coverage inspection. Simulation results show that REACT achieves complete coverage while maintaining tether constraints and completing the total mission 20% faster than conventional planners, despite a longer inspection time due to proactive avoidance of entanglement that eliminates extensive post-mission disentanglement. Real-world experiments confirm these benefits, where REACT completes the full mission, while the baseline planner fails due to physical tether entanglement.
comment: Accepted for publication at International Conference on Robotics & Automation 2026
DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments
Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet robust 24/7 operation demands performance under a wide range of visual degradations, including low-light conditions at night or in dark environments--a core necessity that has been largely overlooked. To address this underexplored challenge, we present DarkEQA, an open-source benchmark for evaluating EQA-relevant perceptual primitives under multi-level low-light conditions. DarkEQA isolates the perception bottleneck by evaluating question answering from egocentric observations under controlled degradations, enabling attributable robustness analysis. A key design feature of DarkEQA is its physical fidelity: visual degradations are modeled in linear RAW space, simulating physics-based illumination drop and sensor noise followed by an ISP-inspired rendering pipeline. We demonstrate the utility of DarkEQA by evaluating a wide range of state-of-the-art VLMs and Low-Light Image Enhancement (LLIE) models. Our analysis systematically reveals VLMs' limitations when operating under these challenging visual conditions. Project website: https://darkeqa-benchmark.github.io/
comment: This work has been submitted to the IEEE for possible publication
CRISP -- Compliant ROS2 Controllers for Learning-Based Manipulation Policies and Teleoperation
Learning-based controllers, such as diffusion policies and vision-language action models, often generate low-frequency or discontinuous robot state changes. Achieving smooth reference tracking requires a low-level controller that converts high-level targets commands into joint torques, enabling compliant behavior during contact interactions. We present CRISP, a lightweight C++ implementation of compliant Cartesian and joint-space controllers for the ROS2 control standard, designed for seamless integration with high-level learning-based policies as well as teleoperation. The controllers are compatible with any manipulator that exposes a joint-torque interface. Through our Python and Gymnasium interfaces, CRISP provides a unified pipeline for recording data from hardware and simulation and deploying high-level learning-based policies seamlessly, facilitating rapid experimentation. The system has been validated on hardware with the Franka Robotics FR3 and in simulation with the Kuka IIWA14 and Kinova Gen3. Designed for rapid integration, flexible deployment, and real-time performance, our implementation provides a unified pipeline for data collection and policy execution, lowering the barrier to applying learning-based methods on ROS2-compatible manipulators. Detailed documentation is available at the project website - https://utiasDSL.github.io/crisp_controllers.
comment: 5 pages, 5 figures
Robust Meta-Learning of Vehicle Yaw Rate Dynamics via Conditional Neural Processes
Trajectory planners of autonomous vehicles usually rely on physical models to predict the vehicle behavior. However, despite their suitability, physical models have some shortcomings. On the one hand, simple models suffer from larger model errors and more restrictive assumptions. On the other hand, complex models are computationally more demanding and depend on environmental and operational parameters. In each case, the drawbacks can be associated to a certain degree to the physical modeling of the yaw rate dynamics. Therefore, this paper investigates the yaw rate prediction based on conditional neural processes (CNP), a data-driven meta-learning approach, to simultaneously achieve low errors, adequate complexity and robustness to varying parameters. Thus, physical models can be enhanced in a targeted manner to provide accurate and computationally efficient predictions to enable safe planning in autonomous vehicles. High fidelity simulations for a variety of driving scenarios and different types of cars show that CNP makes it possible to employ and transfer knowledge about the yaw rate based on current driving dynamics in a human-like manner, yielding robustness against changing environmental and operational conditions.
comment: Published in 2023 62nd IEEE IEEE Conference on Decision and Control (CDC), Singapore, Singapore, December 13 - 15, 2023
Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows
Alongside optimization-based planners, sampling-based approaches are often used in trajectory planning for autonomous driving due to their simplicity. Model predictive path integral control is a framework that builds upon optimization principles while incorporating stochastic sampling of input trajectories. This paper investigates several sampling approaches for trajectory generation. In this context, normalizing flows originating from the field of variational inference are considered for the generation of sampling distributions, as they model transformations of simple to more complex distributions. Accordingly, learning-based normalizing flow models are trained for a more efficient exploration of the input domain for the task at hand. The developed algorithm and the proposed sampling distributions are evaluated in two simulation scenarios.
comment: Accepted to be published as part of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju Shinhwa World, Jeju Island, Korea, June 2-5, 2024
HyPlan: Hybrid Learning-Assisted Planning Under Uncertainty for Safe Autonomous Driving
We present a novel hybrid learning-assisted planning method, named HyPlan, for solving the collision-free navigation problem for self-driving cars in partially observable traffic environments. HyPlan combines methods for multi-agent behavior prediction, deep reinforcement learning with proximal policy optimization and approximated online POMDP planning with heuristic confidence-based vertical pruning to reduce its execution time without compromising safety of driving. Our experimental performance analysis on the CARLA-CTS2 benchmark of critical traffic scenarios with pedestrians revealed that HyPlan may navigate safer than selected relevant baselines and perform significantly faster than considered alternative online POMDP planners.
Encoding Tactile Stimuli for Braille Recognition with Organoids
This study proposes a transferable encoding strategy that maps tactile sensor data to electrical stimulation patterns, enabling neural organoids to perform an open-loop artificial tactile Braille classification task. Human forebrain organoids cultured on a low-density microelectrode array (MEA) are systematically stimulated to characterize the relationship between electrical stimulation parameters (number of pulse, phase amplitude, phase duration, and trigger delay) and organoid responses, measured as spike activity and spatial displacement of the center of activity. Implemented on event-based tactile inputs recorded from the Evetac sensor, our system achieved an average Braille letter classification accuracy of 61% with a single organoid, which increased significantly to 83% when responses from a three-organoid ensemble were combined. Additionally, the multi-organoid configuration demonstrated enhanced robustness against various types of artificially introduced noise. This research demonstrates the potential of organoids as low-power, adaptive bio-hybrid computational elements and provides a foundational encoding framework for future scalable bio-hybrid computing architectures.
Less Is More: Scalable Visual Navigation from Limited Data
Imitation learning provides a powerful framework for goal-conditioned visual navigation in mobile robots, enabling obstacle avoidance while respecting human preferences and social norms. However, its effectiveness depends critically on the quality and diversity of training data. In this work, we show how classical geometric planners can be leveraged to generate synthetic trajectories that complement costly human demonstrations. We train Less is More (LiMo), a transformer-based visual navigation policy that predicts goal-conditioned SE(2) trajectories from a single RGB observation, and find that augmenting limited expert demonstrations with planner-generated supervision yields substantial performance gains. Through ablations and complementary qualitative and quantitative analyses, we characterize how dataset scale and diversity affect planning performance. We demonstrate real-robot deployment and argue that robust visual navigation is enabled not by simply collecting more demonstrations, but by strategically curating diverse, high-quality datasets. Our results suggest that scalable, embodiment-specific geometric supervision is a practical path toward data-efficient visual navigation.
comment: v2: Minor text edits, reference formatting fixes, and project page link added
Right-Side-Out: Learning Zero-Shot Sim-to-Real Garment Reversal
Turning garments right-side out is a challenging manipulation task: it is highly dynamic, entails rapid contact changes, and is subject to severe visual occlusion. We introduce Right-Side-Out, a zero-shot sim-to-real framework that effectively solves this challenge by exploiting task structures. We decompose the task into Drag/Fling to create and stabilize an access opening, followed by Insert&Pull to invert the garment. Each step uses a depth-inferred, keypoint-parameterized bimanual primitive that sharply reduces the action space while preserving robustness. Efficient data generation is enabled by our custom-built, high-fidelity, GPU-parallel Material Point Method (MPM) simulator that models thin-shell deformation and provides robust and efficient contact handling for batched rollouts. Built on the simulator, our fully automated pipeline scales data generation by randomizing garment geometry, material parameters, and viewpoints, producing depth, masks, and per-primitive keypoint labels without any human annotations. With a single depth camera, policies trained entirely in simulation deploy zero-shot on real hardware, achieving up to 81.3% success rate. By employing task decomposition and high fidelity simulation, our framework enables tackling highly dynamic, severely occluded tasks without laborious human demonstrations.
comment: More details and supplementary material are on the website: https://right-side-out.github.io
GNSS-based Lunar Orbit and Clock Estimation With Stochastic Cloning UD Filter
This paper presents a terrestrial GNSS-based orbit and clock estimation framework for lunar navigation satellites. To enable high-precision estimation under the low-observability conditions encountered at lunar distances, we develop a stochastic-cloning UD-factorized filter and delayed-state smoother that provide enhanced numerical stability when processing precise time-differenced carrier phase (TDCP) measurements. A comprehensive dynamics and measurement model is formulated, explicitly accounting for relativistic coupling between orbital and clock states, lunar time-scale transformations, and signal propagation delays including ionospheric, plasmaspheric, and Shapiro effects. The proposed approach is evaluated using high-fidelity Monte-Carlo simulations incorporating realistic multi-constellation GNSS geometry, broadcast ephemeris errors, lunar satellite dynamics, and ionospheric and plasmaspheric delay computed from empirical electron density models. Simulation results demonstrate that combining ionosphere-free pseudorange and TDCP measurements achieves meter-level orbit accuracy and sub-millimeter-per-second velocity accuracy, satisfying the stringent signal-in-space error requirements of future Lunar Augmented Navigation Services (LANS).
comment: Submitted to the Journal of Guidance, Control, and Dynamics
Information-Theoretic Graph Fusion with Vision-Language-Action Model for Policy Reasoning and Dual Robotic Control
Teaching robots dexterous skills from human videos remains challenging due to the reliance on low-level trajectory imitation, which fails to generalize across object types, spatial layouts, and manipulator configurations. We propose Graph-Fused Vision-Language-Action (GF-VLA), a framework that enables dual-arm robotic systems to perform task-level reasoning and execution directly from RGB and Depth human demonstrations. GF-VLA first extracts Shannon-information-based cues to identify hands and objects with the highest task relevance, then encodes these cues into temporally ordered scene graphs that capture both hand-object and object-object interactions. These graphs are fused with a language-conditioned transformer that generates hierarchical behavior trees and interpretable Cartesian motion commands. To improve execution efficiency in bimanual settings, we further introduce a cross-hand selection policy that infers optimal gripper assignment without explicit geometric reasoning. We evaluate GF-VLA on four structured dual-arm block assembly tasks involving symbolic shape construction and spatial generalization. Experimental results show that the information-theoretic scene representation achieves over 95 percent graph accuracy and 93 percent subtask segmentation, supporting the LLM planner in generating reliable and human-readable task policies. When executed by the dual-arm robot, these policies yield 94 percent grasp success, 89 percent placement accuracy, and 90 percent overall task success across stacking, letter-building, and geometric reconfiguration scenarios, demonstrating strong generalization and robustness across diverse spatial and semantic variations.
comment: Journal accepted by Information Fusion
Lan-grasp: Using Large Language Models for Semantic Object Grasping and Placement
In this paper, we propose Lan-grasp, a novel approach towards more appropriate semantic grasping and placing. We leverage foundation models to equip the robot with a semantic understanding of object geometry, enabling it to identify the right place to grasp, which parts to avoid, and the natural pose for placement. This is an important contribution to grasping and utilizing objects in a more meaningful and safe manner. We leverage a combination of a Large Language Model, a Vision-Language Model, and a traditional grasp planner to generate grasps that demonstrate a deeper semantic understanding of the objects. Building on foundation models provides us with a zero-shot grasp method that can handle a wide range of objects without requiring further training or fine-tuning. We also propose a method for safely putting down a grasped object. The core idea is to rotate the object upright utilizing a pretrained generative model and the reasoning capabilities of a VLM. We evaluate our method in real-world experiments on a custom object dataset and present the results of a survey that asks participants to choose an object part appropriate for grasping. The results show that the grasps generated by our method are consistently ranked higher by the participants than those generated by a conventional grasping planner and a recent semantic grasping approach. In addition, we propose a Visual Chain-of-Thought feedback loop to assess grasp feasibility in complex scenarios. This mechanism enables dynamic reasoning and generates alternative grasp strategies when needed, ensuring safer and more effective grasping outcomes.
Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking
Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. There has been significant work in incorporating Lyapunov-based stability guarantees in RL algorithms with key challenges being selecting a candidate Lyapunov function, computational complexity by using excessive function approximators and conservative policies by incorporating stability criterion in the learning process. In this work we propose a novel Lyapunov-constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We propose use of extended dynamic mode decomposition (EDMD) to produce a linear approximation of the system and use this approximation to derive a closed form solution for candidate Lyapunov function. This derived Lyapunov function is incorporated in the SAC algorithm to further provide guarantees for a policy that stabilizes the nonlinear system. The results are evaluated trajectory tracking of a 2D Quadrotor environment based on safe-control-gym. The proposed algorithm shows training convergence and decaying violations for Lyapunov stability criterion compared to baseline vanilla SAC algorithm. GitHub Repository: https://github.com/DhruvKushwaha/LC-SAC-Quadrotor-Trajectory-Tracking
comment: 11 pages, 7 Figures, submitted to IEEE RA-L
Embodying Physical Computing into Soft Robots
Softening and onboarding computers and controllers is one of the final frontiers in soft robotics towards their robustness and intelligence for everyday use. In this regard, embodying soft and physical computing presents exciting potential. Physical computing seeks to encode inputs into a mechanical computing kernel and leverage the internal interactions among this kernel's constituent elements to compute the output. Moreover, such input-to-output evolution can be re-programmable. This perspective paper proposes a framework for embodying physical computing into soft robots and discusses three unique strategies in the literature: analog oscillators, physical reservoir computing, and physical algorithmic computing. These embodied computers enable the soft robot to perform complex behaviors that would otherwise require CMOS-based electronics -- including coordinated locomotion with obstacle avoidance, payload weight and orientation classification, and programmable operation based on logical rules. This paper will detail the working principles of these embodied physical computing methods, survey the current state-of-the-art, and present a perspective for future development.
Multiagent Systems
Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI
Grassroots Logic Programs (GLP) is a concurrent logic programming language with variables partitioned into paired \emph{readers} and \emph{writers}, conjuring both linear logic and futures/promises: an assignment is produced at most once via the sole occurrence of a writer (promise) and consumed at most once via the sole occurrence of its paired reader (future), and may contain additional readers and/or writers, enabling the concise expression of rich multidirectional communication modalities. GLP was designed as a language for grassroots platforms -- distributed systems with multiple instances that can operate independently of each other and of any global resource, and can coalesce into ever larger instances -- with its target architecture being smartphones communicating peer-to-peer. The operational semantics of Concurrent (single-agent) GLP and of multiagent GLP (maGLP) were defined via transition systems/multiagent transition systems, respectively. Here, we describe the mathematics developed to facilitate the workstation- and smartphone-based implementations of GLP by AI in Dart. We developed dGLP -- implementation-ready deterministic operational semantics for single-agent GLP -- and proved it correct with respect to the Concurrent GLP operational semantics; dGLP was used by AI as a formal spec, from which it developed a workstation-based implementation of GLP. We developed madGLP -- an implementation-ready multiagent operational semantics for maGLP -- and proved it correct with respect to the maGLP operational semantics; madGLP is deterministic at the agent level (not at the system level due to communication asynchrony), and is being used by AI as a formal spec from which it develops a smartphone-based implementation of maGLP.
Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding ICLR 2026
Multi-Agent Path Finding (MAPF) is a representative multi-agent coordination problem, where multiple agents are required to navigate to their respective goals without collisions. Solving MAPF optimally is known to be NP-hard, leading to the adoption of learning-based approaches to alleviate the online computational burden. Prevailing approaches, such as Graph Neural Networks (GNNs), are typically constrained to pairwise message passing between agents. However, this limitation leads to suboptimal behaviours and critical issues, such as attention dilution, particularly in dense environments where group (i.e. beyond just two agents) coordination is most critical. Despite the importance of such higher-order interactions, existing approaches have not been able to fully explore them. To address this representational bottleneck, we introduce HMAGAT (Hypergraph Multi-Agent Attention Network), a novel architecture that leverages attentional mechanisms over directed hypergraphs to explicitly capture group dynamics. Empirically, HMAGAT establishes a new state-of-the-art among learning-based MAPF solvers: e.g., despite having just 1M parameters and being trained on 100$\times$ less data, it outperforms the current SoTA 85M parameter model. Through detailed analysis of HMAGAT's attention values, we demonstrate how hypergraph representations mitigate the attention dilution inherent in GNNs and capture complex interactions where pairwise methods fail. Our results illustrate that appropriate inductive biases are often more critical than the training data size or sheer parameter count for multi-agent problems.
comment: Accepted at ICLR 2026
Sample-Efficient Policy Space Response Oracles with Joint Experience Best Response AAMAS 2026
Multi-agent reinforcement learning (MARL) offers a scalable alternative to exact game-theoretic analysis but suffers from non-stationarity and the need to maintain diverse populations of strategies that capture non-transitive interactions. Policy Space Response Oracles (PSRO) address these issues by iteratively expanding a restricted game with approximate best responses (BRs), yet per-agent BR training makes it prohibitively expensive in many-agent or simulator-expensive settings. We introduce Joint Experience Best Response (JBR), a drop-in modification to PSRO that collects trajectories once under the current meta-strategy profile and reuses this joint dataset to compute BRs for all agents simultaneously. This amortizes environment interaction and improves the sample efficiency of best-response computation. Because JBR converts BR computation into an offline RL problem, we propose three remedies for distribution-shift bias: (i) Conservative JBR with safe policy improvement, (ii) Exploration-Augmented JBR that perturbs data collection and admits theoretical guarantees, and (iii) Hybrid BR that interleaves JBR with periodic independent BR updates. Across benchmark multi-agent environments, Exploration-Augmented JBR achieves the best accuracy-efficiency trade-off, while Hybrid BR attains near-PSRO performance at a fraction of the sample cost. Overall, JBR makes PSRO substantially more practical for large-scale strategic learning while preserving equilibrium robustness.
comment: Accepted at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
Prism: Spectral Parameter Sharing for Multi-Agent Reinforcement Learning
Parameter sharing is a key strategy in multi-agent reinforcement learning (MARL) for improving scalability, yet conventional fully shared architectures often collapse into homogeneous behaviors. Recent methods introduce diversity through clustering, pruning, or masking, but typically compromise resource efficiency. We propose Prism, a parameter sharing framework that induces inter-agent diversity by representing shared networks in the spectral domain via singular value decomposition (SVD). All agents share the singular vector directions while learning distinct spectral masks on singular values. This mechanism encourages inter-agent diversity and preserves scalability. Extensive experiments on both homogeneous (LBF, SMACv2) and heterogeneous (MaMuJoCo) benchmarks show that Prism achieves competitive performance with superior resource efficiency.
The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization
Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on erroneous reasoning. Existing methods lack principled mechanisms to detect or prevent such failures. To address this gap, we first propose a hierarchical metric that quantifies behavioral uncertainty at three levels: intra-agent (individual reasoning uncertainty), inter-agent (interactive uncertainty), and system-level (output uncertainty). Empirical analysis across several benchmarks reveals that our proposed uncertainty quantification reliably indicates system failures, which demonstrates the validity of using them as diagnostic metrics to indicate the system failure. Subsequently, we propose a mitigation strategy by formulating an uncertainty-driven policy optimization to penalize self-contradiction, peer conflict, and low-confidence outputs in a dynamic debating environment. Experiments demonstrate that our proposed uncertainty-driven mitigation reliably calibrates the multi-agent system by consistently improving decision accuracy while reducing system disagreement.
Lemon Agent Technical Report
Recent advanced LLM-powered agent systems have exhibited their remarkable capabilities in tackling complex, long-horizon tasks. Nevertheless, they still suffer from inherent limitations in resource efficiency, context management, and multimodal perception. Based on these observations, Lemon Agent is introduced, a multi-agent orchestrator-worker system built on a newly proposed AgentCortex framework, which formalizes the classic Planner-Executor-Memory paradigm through an adaptive task execution mechanism. Our system integrates a hierarchical self-adaptive scheduling mechanism that operates at both the overall orchestrator layer and workers layer. This mechanism can dynamically adjust computational intensity based on task complexity. It enables orchestrator to allocate one or more workers for parallel subtask execution, while workers can further improve operational efficiency by invoking tools concurrently. By virtue of this two-tier architecture, the system achieves synergistic balance between global task coordination and local task execution, thereby optimizing resource utilization and task processing efficiency in complex scenarios. To reduce context redundancy and increase information density during parallel steps, we adopt a three-tier progressive context management strategy. To make fuller use of historical information, we propose a self-evolving memory system, which can extract multi-dimensional valid information from all historical experiences to assist in completing similar tasks. Furthermore, we provide an enhanced MCP toolset. Empirical evaluations on authoritative benchmarks demonstrate that our Lemon Agent can achieve a state-of-the-art 91.36% overall accuracy on GAIA and secures the top position on the xbench-DeepSearch leaderboard with a score of 77+.
Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents
Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, balance similarities and differences among characters, and intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.
comment: Accepted to ACM Transactions on Computer-Human Interaction (TOCHI)
Nash Equilibria in Games with Playerwise Concave Coupling Constraints: Existence and Computation
We study the existence and computation of Nash equilibria in concave games where the players' admissible strategies are subject to shared coupling constraints. Under playerwise concavity of constraints, we prove existence of Nash equilibria. Our proof leverages topological fixed point theory and novel structural insights into the contractibility of feasible sets, and relaxes strong assumptions for existence in prior work. Having established existence, we address the question of whether in the presence of coupling constraints, playerwise independent learning dynamics have convergence guarantees. We address this positively for the class of potential games by designing a convergent algorithm. To account for the possibly nonconvex feasible region, we employ a log barrier regularized gradient ascent with adaptive stepsizes. Starting from an initial feasible strategy profile and under exact gradient feedback, the proposed method converges to an $ε$-approximate constrained Nash equilibrium within $\mathcal{O}(ε^{-3})$ iterations.
Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics
Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.
MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due to their \textbf{reliance on static, pre-collected data distributions}. In this paper, we introduce \textbf{MAGIC}, a novel multi-turn multi-agent reinforcement learning framework that formulates LLM safety alignment as an adversarial asymmetric game. Specifically, an attacker agent learns to iteratively rewrite original queries into deceptive prompts, while a defender agent simultaneously optimizes its policy to recognize and refuse such inputs. This dynamic process triggers a \textbf{co-evolution}, where the attacker's ever-changing strategies continuously uncover long-tail vulnerabilities, driving the defender to generalize to unseen attack patterns. Remarkably, we observe that the attacker, endowed with initial reasoning ability, evolves \textbf{novel, previously unseen combinatorial strategies} through iterative RL training, underscoring our method's substantial potential. Theoretically, we provide insights into a more robust game equilibrium and derive safety guarantees. Extensive experiments validate our framework's effectiveness, demonstrating superior defense success rates without compromising the helpfulness of the model. Our code is available at https://github.com/BattleWen/MAGIC.
Systematic Failures in Collective Reasoning under Distributed Information in Multi-Agent LLMs
Multi-agent systems built on large language models (LLMs) are expected to enhance decision-making by pooling distributed information, yet systematically evaluating this capability has remained challenging. We introduce HiddenBench, a 65-task benchmark grounded in the Hidden Profile paradigm, which isolates collective reasoning under distributed information from individual reasoning ability. Evaluating 15 frontier LLMs, we find that multi-agent LLMs achieve only 30.1% accuracy under distributed information, compared to 80.7% accuracy for single agents given complete information. We trace this gap to a systematic failure mode: agents cannot recognize or act under latent information asymmetry-they fail to reason about what others might know but have not yet expressed, leading to premature convergence on shared evidence while critical distributed facts remain unexplored. These failures persist across prompting strategies, communication depths, and group sizes-and worsen as groups scale. While some models (e.g., Gemini-2.5-Flash/Pro) outperform others, neither model scale nor individual reasoning accuracy reliably predicts collective performance. Our results identify failures in collective information exploration in decision-making as a key limitation of multi-agent LLMs, and provide a theory-grounded, reproducible framework for diagnosing collective reasoning failures.
PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling
Large language model (LLM)-based multi-agent systems enable expressive agent reasoning but are expensive to scale and poorly calibrated for timestep-aligned state-transition simulation, while classical agent-based models (ABMs) offer interpretability but struggle to integrate rich individual-level signals and non-stationary behaviors. We propose PhysicsAgentABM, which shifts inference to behaviorally coherent agent clusters: state-specialized symbolic agents encode mechanistic transition priors, a multimodal neural transition model captures temporal and interaction dynamics, and uncertainty-aware epistemic fusion yields calibrated cluster-level transition distributions. Individual agents then stochastically realize transitions under local constraints, decoupling population inference from entity-level variability. We further introduce ANCHOR, an LLM agent-driven clustering strategy based on cross-contextual behavioral responses and a novel contrastive loss, reducing LLM calls by up to 6-8 times. Experiments across public health, finance, and social sciences show consistent gains in event-time accuracy and calibration over mechanistic, neural, and LLM baselines. By re-architecting generative ABM around population-level inference with uncertainty-aware neuro-symbolic fusion, PhysicsAgentABM establishes a new paradigm for scalable and calibrated simulation with LLMs.
AISAC: An Integrated multi-agent System for Transparent, Retrieval-Grounded Scientific Assistance
AI Scientific Assistant Core (AISAC) is a transparent, modular multi-agent runtime developed at Argonne National Laboratory to support long-horizon, evidence-grounded scientific reasoning. Rather than proposing new agent algorithms or claiming autonomous scientific discovery, AISAC contributes a governed execution substrate that operationalizes key requirements for deploying agentic AI in scientific practice, including explicit role semantics, budgeted context management, traceable execution, and reproducible interaction with tools and knowledge. AISAC enforces four structural guarantees for scientific reasoning: (1) declarative agent registration with runtime-enforced role semantics and automatic system prompt generation; (2) budgeted orchestration via explicit per-turn context and delegation depth limits; (3) role-aligned memory access across episodic, dialogue, and evidence layers; and (4) trace-driven transparency through persistent execution records and a live event-stream interface. These guarantees are implemented through hybrid persistent memory (SQLite and dual FAISS indices), governed retrieval with agent-scoped RAG, structured tool execution with schema validation, and a configuration-driven bootstrap mechanism that enables project specific extension without modifying the shared core. AISAC is currently deployed across multiple scientific workflows at Argonne, including combustion science, materials research, and energy process safety, demonstrating its use as a reusable substrate for domain-specialized AI scientific assistants.
Systems and Control (EESS)
Optimal Derivative Feedback Control for an Active Magnetic Levitation System: An Experimental Study on Data-Driven Approaches
This paper presents the design and implementation of data-driven optimal derivative feedback controllers for an active magnetic levitation system. A direct, model-free control design method based on the reinforcement learning framework is compared with an indirect optimal control design derived from a numerically identified mathematical model of the system. For the direct model-free approach, a policy iteration procedure is proposed, which adds an iteration layer called the epoch loop to gather multiple sets of process data, providing a more diverse dataset and helping reduce learning biases. This direct control design method is evaluated against a comparable optimal control solution designed from a plant model obtained through the combined Dynamic Mode Decomposition with Control (DMDc) and Prediction Error Minimization (PEM) system identification. Results show that while both controllers can stabilize and improve the performance of the magnetic levitation system when compared to controllers designed from a nominal model, the direct model-free approach consistently outperforms the indirect solution when multiple epochs are allowed. The iterative refinement of the optimal control law over the epoch loop provides the direct approach a clear advantage over the indirect method, which relies on a single set of system data to determine the identified model and control.
comment: 10 pages, 9 figures. Preprint; manuscript under journal review
UnifSrv: AP Selection for Achieving Uniformly Good Performance of CF-MIMO in Realistic Urban Networks
Under the ideal assumption of uniform propagation, cell-free massive MIMO (CF-mMIMO) provides uniformly high throughput over the network by effectively surrounding each user with its serving access point (AP) set. However, in realistic non-uniform urban propagation environments, it is difficult to consistently select good limited serving AP sets, resulting in significantly degraded throughput, reintroducing "edge-effect" for the worst-served users. To restore the uniformly good performance of scalable CF-mMIMO in realistic urban networks, we formulate a novel multi-objective optimization problem to jointly achieve high throughput by maximizing the sum data rate, uniform throughput by maximizing Jain's fairness index of the throughput per user, and scalability by minimizing the serving AP set size. We then propose the UnifSrv AP selection algorithms to solve this optimization problem, consisting of a deep reinforcement learning (DRL)-based algorithm UnifSrv-DRL and a heuristic algorithm UnifSrv-heu. We conduct a comprehensive performance evaluation of scalable CF-mMIMO under realistic urban network distributions, propagation, and mobility patterns, showing that the prior benchmark AP selection schemes fail to provide uniformly high throughput in practice. By contrast, UnifSrv at least doubles the throughput compared to prior benchmarks, or achieves comparable throughput but with half of the serving AP set size. Importantly, our heuristic algorithm achieves equivalent throughput to our DRL one, but with orders of magnitude lower complexity. We thus for the first time propose an AP selection algorithm that achieves uniformly good CF-mMIMO performance in realistic urban networks with low complexity.
Efficient and Robust Modeling of Nonlinear Mechanical Systems
The development of efficient and robust dynamic models is fundamental in the field of systems and control engineering. In this paper, a new formulation for the dynamic model of nonlinear mechanical systems, that can be applied to different automotive and robotic case studies, is proposed, together with a modeling procedure allowing to automatically obtain the model formulation. Compared with the Euler-Lagrange formulation, the proposed model is shown to give superior performances in terms of robustness against measurement noise for systems exhibiting dependence on some external variables, as well as in terms of execution time when computing the inverse dynamics of the system.
Force Generative Imitation Learning: Bridging Position Trajectory and Force Commands through Control Technique
In contact-rich tasks, while position trajectories are often easy to obtain, appropriate force commands are typically unknown. Although it is conceivable to generate force commands using a pretrained foundation model such as Vision-Language-Action (VLA) models, force control is highly dependent on the specific hardware of the robot, which makes the application of such models challenging. To bridge this gap, we propose a force generative model that estimates force commands from given position trajectories. However, when dealing with unseen position trajectories, the model struggles to generate accurate force commands. To address this, we introduce a feedback control mechanism. Our experiments reveal that feedback control does not converge when the force generative model has memory. We therefore adopt a model without memory, enabling stable feedback control. This approach allows the system to generate force commands effectively, even for unseen position trajectories, improving generalization for real-world robot writing tasks.
comment: Accepted for IEEE Access
Structured Learning for Electromagnetic Field Modeling and Real-Time Inversion
Precise magnetic field modeling is fundamental to the closed-loop control of electromagnetic navigation systems (eMNS) and the analytical Multipole Expansion Model (MPEM) is the current standard. However, the MPEM relies on strict physical assumptions regarding source symmetry and isolation, and requires optimization-based calibration that is highly sensitive to initialization. These constraints limit its applicability to systems with complex or irregular coil geometries. This work introduces an alternative modeling paradigm based on multi-layer perceptrons that learns nonlinear magnetic mappings while strictly preserving the linear dependence on currents. As a result, the field models enable fast, closed-form minimum-norm inversion with evaluation times of approximately 1 ms, which is critical for high-bandwidth magnetic control. For model training and evaluation we use large-scale, high-density datasets collected from the research-grade OctoMag and clinical-grade Navion systems. Our results demonstrate that data-driven models achieve predictive fidelity equivalent to the MPEM while maintaining comparable data efficiency. Furthermore, we demonstrate that straightforward design choices effectively eliminate spurious workspace ill-conditioning frequently reported in MPEM-based calibration. To facilitate future research, we release the complete codebase and datasets open source.
Safety Controller Synthesis for Stochastic Polynomial Time-Delayed Systems
This work develops a theoretical framework for safety controller synthesis in discrete-time stochastic nonlinear polynomial systems subject to time-invariant delays (dt-SNPS-td). While safety analysis of stochastic systems using control barrier certificates (CBC) has been widely studied, developing safety controllers for stochastic systems with time delays remains largely unexplored. The main challenge arises from the need to account for the influence of delayed components when formulating and enforcing safety conditions. To address this, we employ Krasovskii control barrier certificates, which extend the conventional CBC framework by augmenting it with an additional summation term that captures the influence of delayed states. This formulation integrates both the current and delayed components into a unified barrier structure, enabling safety synthesis for stochastic systems with time delays. The proposed approach synthesizes safety controllers under input constraints, offering probabilistic safety guarantees robust to such delays: it ensures that all trajectories of the dt-SNPS-td remain within the prescribed safe region while fulfilling a quantified probabilistic bound. To achieve this, our method reformulates the safety constraints as a sum-of-squares optimization program, enabling the systematic construction of Krasovskii CBC together with their associated safety controllers. We validate the proposed framework through three case studies, including two physical systems, demonstrating its effectiveness and practical applicability.
A Consistency-Improved LiDAR-Inertial Bundle Adjustment
Simultaneous Localization and Mapping (SLAM) using 3D LiDAR has emerged as a cornerstone for autonomous navigation in robotics. While feature-based SLAM systems have achieved impressive results by leveraging edge and planar structures, they often suffer from the inconsistent estimator associated with feature parameterization and estimated covariance. In this work, we present a consistency-improved LiDAR-inertial bundle adjustment (BA) with tailored parameterization and estimator. First, we propose a stereographic-projection representation parameterizing the planar and edge features, and conduct a comprehensive observability analysis to support its integrability with consistent estimator. Second, we implement a LiDAR-inertial BA with Maximum a Posteriori (MAP) formulation and First-Estimate Jacobians (FEJ) to preserve the accurate estimated covariance and observability properties of the system. Last, we apply our proposed BA method to a LiDAR-inertial odometry.
Advances in Battery Energy Storage Management: Control and Economic Synergies
The existing literature on Battery Energy Storage Systems (BESS) predominantly focuses on two main areas: control system design aimed at achieving grid stability and the techno-economic analysis of BESS dispatch on power grid. However, with the increasing incorporation of ancillary services into power grids, a more comprehensive approach to energy management systems is required. Such an approach should not only optimize revenue generation from BESS but also ensure the safe, efficient, and reliable operation of lithium-ion batteries. This research seeks to bridge this gap by exploring literature that addresses both the economic and operational dimensions of BESS. Specifically, it examines how economic aspects of grid duty cycles can align with control schemes deployed in BESS systems. This alignment, or synergy, could be instrumental in creating robust digital twins virtual representations of BESS systems that enhance both grid stability and revenue potential. The literature review is organized into five key categories: (1) ancillary services for BESS, exploring support functions that BESS can provide to power grids; (2) control systems developed for real-time BESS power flow management, ensuring smooth operations under dynamic grid conditions; (3) optimization algorithms for BESS dispatch, focusing on efficient energy allocation strategies; (4) techno-economic analyses of BESS and battery systems to assess their financial viability; and (5) digital twin technologies for real-world BESS deployments, enabling advanced predictive maintenance and performance optimization. This review will identify potential synergies, research gaps, and emerging trends, paving the way for future innovations in BESS management and deployment strategies.
comment: Pre Print
Nipping the Drift in the Bud: Retrospective Rectification for Robust Vision-Language Navigation
Vision-Language Navigation (VLN) requires embodied agents to interpret natural language instructions and navigate through complex continuous 3D environments. However, the dominant imitation learning paradigm suffers from exposure bias, where minor deviations during inference lead to compounding errors. While DAgger-style approaches attempt to mitigate this by correcting error states, we identify a critical limitation: Instruction-State Misalignment. Forcing an agent to learn recovery actions from off-track states often creates supervision signals that semantically conflict with the original instruction. In response to these challenges, we introduce BudVLN, an online framework that learns from on-policy rollouts by constructing supervision to match the current state distribution. BudVLN performs retrospective rectification via counterfactual re-anchoring and decision-conditioned supervision synthesis, using a geodesic oracle to synthesize corrective trajectories that originate from valid historical states, ensuring semantic consistency. Experiments on the standard R2R-CE and RxR-CE benchmarks demonstrate that BudVLN consistently mitigates distribution shift and achieves state-of-the-art performance in both Success Rate and SPL.
Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks
Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, and privacy concerns. Deploying large models (LMs) in mobile edge networks emerges as a promising solution. However, it also poses new challenges, including heterogeneous multi-modal LMs with diverse resource demands and inference speeds, varied prompt/output modalities that complicate orchestration, and resource-limited infrastructure ill-suited for concurrent LM execution. In response, we propose a Multi-Agentic AI framework for latency- and fairness-aware multi-modal LM inference in mobile edge networks. Our solution includes a long-term planning agent, a short-term prompt scheduling agent, and multiple on-node LM deployment agents, all powered by foundation language models. These agents cooperatively optimize prompt routing and LM deployment through natural language reasoning over runtime telemetry and historical experience. To evaluate its performance, we further develop a city-wide testbed that supports network monitoring, containerized LM deployment, intra-server resource management, and inter-server communications. Experiments demonstrate that our solution reduces average latency by over 80% and improves fairness (Normalized Jain index) to 0.90 compared to other baselines. Moreover, our solution adapts quickly without fine-tuning, offering a generalizable solution for optimizing GenAI services in edge environments.
Learning Nonlinear Systems In-Context: From Synthetic Data to Real-World Motor Control ICASSP 2026
LLMs have shown strong in-context learning (ICL) abilities, but have not yet been extended to signal processing systems. Inspired by their design, we have proposed for the first time ICL using transformer models applicable to motor feedforward control, a critical task where classical PI and physics-based methods struggle with nonlinearities and complex load conditions. We propose a transformer based model architecture that separates signal representation from system behavior, enabling both few-shot finetuning and one-shot ICL. Pretrained on a large corpus of synthetic linear and nonlinear systems, the model learns to generalize to unseen system dynamics of real-world motors only with a handful of examples. In experiments, our approach generalizes across multiple motor load configurations, transforms untuned examples into accurate feedforward predictions, and outperforms PI controllers and physics-based feedforward baselines. These results demonstrate that ICL can bridge synthetic pretraining and real-world adaptability, opening new directions for data efficient control of physical systems.
comment: Accepted to be presented in IEEE ICASSP 2026
Airspace-aware Contingency Landing Planning
This paper develops a real-time, search-based aircraft contingency landing planner that minimizes traffic disruptions while accounting for ground risk. The airspace model captures dense air traffic departure and arrival flows, helicopter corridors, and prohibited zones and is demonstrated with a Washington, D.C., area case study. Historical Automatic Dependent Surveillance-Broadcast (ADS-B) data are processed to estimate air traffic density. A low-latency computational geometry algorithm generates proximity-based heatmaps around high-risk corridors and restricted regions. Airspace risk is quantified as the cumulative exposure time of a landing trajectory within congested regions, while ground risk is assessed from overflown population density to jointly guide trajectory selection. A landing site selection module further mitigates disruption to nominal air traffic operations. Benchmarking against minimum-risk Dubins solutions demonstrates that the proposed planner achieves lower joint risk and reduced airspace disruption while maintaining real-time performance. Under airspace-risk-only conditions, the planner generates trajectories within an average of 2.9 seconds on a laptop computer. Future work will incorporate dynamic air traffic updates to enable spatiotemporal contingency landing planning that minimizes the need for real-time traffic rerouting.
Set-Based Control Barrier Functions for Scalable Safety Filter Design
Industrial control applications require high performance under strict constraints. Control barrier functions (CBFs) provide principled safety mechanisms, but constructing CBF-based safety filters for large-scale systems is challenging. We introduce set-based CBFs for linear systems with convex constraints by defining the barrier via the Minkowski functional of a control invariant set. This invariant set can be obtained from scalable computations, including reachability analysis and model predictive control (MPC). The approach yields tunable safety filters with dampened intervention and asymptotic stability of the set of safe states. We derive reformulations embedding set-based CBF constraints into convex optimization for common set representations and present learning-based approximations reducing runtime while preserving safety. We demonstrate the approach through simulations on a high-dimensional system and a motion control task, and validate the method experimentally on an electric drive with short sampling times.
Multi-mode Fault Diagnosis Datasets of Three-phase Asynchronous Motor Under Variable Working Conditions
Three-phase asynchronous motor are fundamental components in industrial systems, and their failure can lead to significant operational downtime and economic losses. Vibration and current signals are effective indicators for monitoring motor health and diagnosing faults. However, motors in real applications often operate under variable conditions such as fluctuating speeds and loads, which complicate the fault diagnosis process. This paper presents a comprehensive dataset collected from a three-phase asynchronous motor under various fault types and severities, operating under diverse speed and load conditions. The dataset includes both single faults and mechanical-electrical compound faults, such as rotor unbalance, stator winding short circuits, bearing faults, and their combinations. Data were acquired under both steady and transitional conditions, with signals including triaxial vibration, three-phase currents, torque, and key-phase signals. This dataset supports the development and validation of robust fault diagnosis methods for electric motors under realistic operating conditions.
comment: 13 pages, 9 figures
Enhanced Ground-Satellite Direct Access via Onboard Rydberg Atomic Quantum Receivers
Ground-satellite links for 6G networks face critical challenges, including severe path loss, tight size-weight-power limits, and congested spectrum, all of which significantly hinder the performance of traditional radio frequency (RF) front ends. This article introduces the Rydberg Atomic Quantum Receiver (RAQR) for onboard satellite systems, a millimeter-scale front end that converts radio fields to optical signals through atomic electromagnetically induced transparency. RAQR's high sensitivity and high frequency selectivity address link budget, payload, and interference challenges while fitting within space constraints. A hybrid atomic-electronic design and supporting signal model demonstrate enhanced data rate, coverage, and sensing accuracy relative to conventional RF receivers. The article concludes with integration strategies, distributed-satellite concepts, and open research problems for bringing RAQR-enabled satellite payloads into service.
comment: Accepted by IEEE Wireless Communications
REACT: Real-time Entanglement-Aware Coverage Path Planning for Tethered Underwater Vehicles
Inspection of underwater structures with tethered underwater vehicles is often hindered by the risk of tether entanglement. We propose REACT (real-time entanglement-aware coverage path planning for tethered underwater vehicles), a framework designed to overcome this limitation. REACT comprises a computationally efficient geometry-based tether model using the signed distance field (SDF) map for accurate, real-time simulation of taut tether configurations around arbitrary structures in 3D. This model enables an efficient online replanning strategy by enforcing a maximum tether length constraint, thereby actively preventing entanglement. By integrating REACT into a coverage path planning framework, we achieve safe and entanglement-free inspection paths, previously challenging due to tether constraints. The complete REACT framework's efficacy is validated in a pipe inspection scenario, demonstrating safe navigation and full coverage inspection. Simulation results show that REACT achieves complete coverage while maintaining tether constraints and completing the total mission 20% faster than conventional planners, despite a longer inspection time due to proactive avoidance of entanglement that eliminates extensive post-mission disentanglement. Real-world experiments confirm these benefits, where REACT completes the full mission, while the baseline planner fails due to physical tether entanglement.
comment: Accepted for publication at International Conference on Robotics & Automation 2026
Detection and Identification of Sensor Attacks Using Partially Attack-Free Data
In this paper, we investigate data-driven attack detection and identification in a model-free setting. We consider a practically motivated scenario in which the available dataset may be compromised by malicious sensor attacks, but contains an unknown, contiguous, partially attack-free interval. The control input is assumed to include a small stochastic watermarking signal. Under these assumptions, we establish sufficient conditions for attack detection and identification from partially attack-free data. We also develop data-driven detection and identification procedures and characterize their computational complexity. Notably, the proposed framework does not impose a limit on the number of compromised sensors; thus, it can detect and identify attacks even when all sensor outputs are compromised outside the attack-free interval, provided that the attack-free interval is sufficiently long. Finally, we demonstrate the effectiveness of the proposed framework via numerical simulations.
From a Frequency-Domain Willems' Lemma to Data-Driven Predictive Control
Willems' fundamental lemma has recently received an impressive amount of attention from the data-driven control community. In this paper, we formulate a version of this celebrated result based on frequency-domain data. In doing so, we bridge the gap between recent developments in data-driven control, and the readily-available techniques and expertise for non-parametric frequency-domain identification. We also generalize our results to combine multiple frequency-domain data sets to form a sufficiently rich data set. Building on these results, we propose a data-driven predictive control scheme based on measured frequency-domain data of the plant. This novel scheme provides a frequency-domain counterpart of the well-known data-enabled predictive control scheme DeePC based on time-domain data. Under appropriate conditions, the new frequency-domain data-driven predictive control (FreePC) scheme is equivalent to the corresponding DeePC scheme. We demonstrate the benefits of FreePC and the use of frequency-domain data in several examples and a numerical case study, including the ability to collect data in closed loop, computational benefits, and intuitive visualization of the data.
Experimental Sensitivity Enhancement of a Quantum Rydberg Atom-Based RF Receiver with a Metamaterial GRIN Lens
We experimentally demonstrate enhanced sensitivity of an atom-based Rydberg radio frequency (RF) receiver integrated with a gradient refractive index (GRIN) Luneburg-type metamaterial lens. By analyzing the electromagnetically induced transparency (EIT) effect in Cesium vapor, we compare receiver performance with and without the GRIN lens under a 2.2~GHz and a 3.6~GHz far-field excitation. Our measurements reveal a significant amplification of the EIT window when the lens is introduced, consistent with the theoretical prediction that the local E-field enhancement at the vapor cell reduces the minimum detectable electric field and improves the microwave electric field measurement sensitivity of the Rydberg atom-based RF receiver over an ultrawide bandwidth of the lens. This experimental validation demonstrates the potential of metamaterial-enhanced quantum RF sensing for a wide range of applications, such as electromagnetic compatibility (EMC) testing, quantum radar, and wireless communication.
Reservoir Predictive Path Integral Control for Unknown Nonlinear Dynamics
Neural networks have found extensive application in data-driven control of nonlinear dynamical systems, yet fast online identification and control of unknown dynamics remain central challenges. To meet these challenges, this paper integrates echo-state networks (ESNs)--reservoir computing models implemented with recurrent neural networks--and model predictive path integral (MPPI) control--sampling-based variants of model predictive control. The proposed reservoir predictive path integral (RPPI) enables fast learning of nonlinear dynamics with ESNs and exploits the learned nonlinearities directly in MPPI control computation without linearization approximations. This framework is further extended to uncertainty-aware RPPI (URPPI), which achieves robust stochastic control by treating ESN output weights as random variables and minimizing an expected cost over their distribution to account for identification errors. Experiments on controlling a Duffing oscillator and a four-tank system demonstrate that URPPI improves control performance, reducing control costs by up to 60% compared to traditional quadratic programming-based model predictive control methods.
comment: Submitted to IEEE for possible publication, 13 pages, 5 figures
Exposing Barriers to Flexibility Aggregation in Unbalanced Distribution Networks
The increasing integration of distributed energy resources (DER) offers new opportunities for distribution system operators (DSO) to improve network operation through flexibility services. To utilise flexible resources, various DER flexibility aggregation methods have been proposed, such as the concept of aggregated P-Q flexibility areas. Yet, many existing studies assume perfect coordination among DER and rely on single-phase power flow analysis, thus overlooking barriers to flexibility aggregation in real unbalanced systems. To quantify the impact of these barriers, this paper proposes a three-phase optimal power flow (OPF) framework for P-Q flexibility assessment, implemented as an open-source Julia tool 3FlexAnalyser.jl. The framework explicitly accounts for voltage unbalance and imperfect coordination among DER in low voltage (LV) distribution networks. Simulations on an illustrative 5-bus system and a real 221-bus LV network in the UK reveal that over 30% of the theoretical aggregated flexibility potential can be lost due to phase unbalance and lack of coordination across phases. These findings highlight the need for improved flexibility aggregation tools applicable to real unbalanced distribution networks.
ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
A Mathematical Formalization of Self-Determining Agency
Defining agency is an extremely important challenge for cognitive science and artificial intelligence. Physics generally describes mechanical happenings, but there remains an unbridgeable gap between these and the acts of agents. To discuss the morality and responsibility of agents, it is necessary to model acts; whether such responsible acts can be fully explained by physical determinism remains an ongoing debate. Although we have already proposed a physical agent determinism model that appears to go beyond mere mechanical happenings, we have not yet established a strict mathematical formalism to eliminate ambiguity. Here, we explain why a physical system can follow coarse-graining agent-level determination without violating physical laws by formulating supervenient causation. Generally, supervenience including coarse graining does not change without a change in its lower base; therefore, a single supervenience alone cannot define supervenient causation. We define supervenient causation as the causal efficacy from the supervenience level to its lower base level. Although an algebraic expression composed of the multiple supervenient functions does supervenes on the base, an index sequence that determines the algebraic expression does not supervene on the base. Therefore, the sequence can possess unique dynamical laws that are independent of the lower base level. This independent dynamics creates the possibility for temporally preceding changes at the supervenience level to cause changes at the lower base level. Such a dual-laws system is considered useful for modeling self-determining agents such as humans.
Pascal-Weighted Genetic Algorithms: A Binomially-Structured Recombination Framework
This paper introduces a new family of multi-parent recombination operators for Genetic Algorithms (GAs), based on normalized Pascal (binomial) coefficients. Unlike classical two-parent crossover operators, Pascal-Weighted Recombination (PWR) forms offsprings as structured convex combination of multiple parents, using binomially shaped weights that emphasize central inheritance while suppressing disruptive variance. We develop a mathematical framework for PWR, derive variance-transfer properties, and analyze its effect on schema survival. The operator is extended to real-valued, binary/logit, and permutation representations. We evaluate the proposed method on four representative benchmarks: (i) PID controller tuning evaluated using the ITAE metric, (ii) FIR low-pass filter design under magnitude-response constraints, (iii) wireless power-modulation optimization under SINR coupling, and (iv) the Traveling Salesman Problem (TSP). We demonstrate how, across these benchmarks, PWR consistently yields smoother convergence, reduced variance, and achieves 9-22% performance gains over standard recombination operators. The approach is simple, algorithm-agnostic, and readily integrable into diverse GA architectures.
comment: 23 pages, 8 figures
DUST: A Framework for Data-Driven Density Steering
We consider the problem of data-driven stochastic optimal control of an unknown LTI dynamical system. Assuming the process noise is normally distributed, we pose the problem of steering the state's mean and covariance to a target normal distribution, under noisy data collected from the underlying system, a problem commonly referred to as covariance steering (CS). A novel framework for Data-driven Uncertainty quantification and density STeering (DUST) is presented that simultaneously characterizes the noise affecting the measured data and designs an optimal affine-feedback controller to steer the density of the state to a prescribed terminal value. We use both indirect and direct data-driven design approaches based on the notions of persistency of excitation and subspace identification to exactly represent the mean and covariance dynamics of the state in terms of the data and noise realizations. Since both the mean and the covariance steering sub-problems are plagued with stochastic uncertainty arising from noisy data collection, we first estimate the noise realization from this dataset and subsequently compute tractable upper bounds on the estimation errors. The first and second moment steering problems are then solved to optimality using techniques from robust control and robust optimization. Lastly, we present an alternative control design approach based on the certainty equivalence principle and interpret the problem as one of CS under multiplicative uncertainty. We analyze the performance and efficacy of each of these data-driven approaches using a case study and compare them with their model-based counterparts.
comment: Full-length version of paper published in Automatica
Stochastic LQR Design With Disturbance Preview
This paper considers the discrete-time, stochastic LQR problem with $p$ steps of disturbance preview information where $p$ is finite. We first derive the solution for this problem on a finite horizon with linear, time-varying dynamics and time-varying costs. Next, we derive the solution on the infinite horizon with linear, time-invariant dynamics and time-invariant costs. Our proofs rely on the well-known principle of optimality. We provide an independent proof for the principle of optimality that relies only on nested information structure. Finally, we show that the finite preview controller converges to the optimal noncausal controller as the preview horizon $p$ tends to infinity. We also provide a simple example to illustrate both the finite and infinite horizon results.
Regret of $H_\infty$ Preview Controllers
This paper studies preview control in both the $H_\infty$ and regret-optimal settings. The plant is modeled as a discrete-time, linear time-invariant system subject to external disturbances. The performance baseline is the optimal non-causal controller that has full knowledge of the disturbance sequence. We first review the construction of the $H_\infty$ preview controller with $p$-steps of disturbance preview. We then show that the closed-loop $H_\infty$ performance of this preview controller converges as $p\to \infty$ to the performance of the optimal non-causal controller. Furthermore, we prove that the optimal regret of the preview controller converges to zero. These results demonstrate that increasing preview length allows controllers to asymptotically achieve non-causal performance in both the $H_\infty$ and regret frameworks. A numerical example illustrates the theoretical results.
Competitive Equilibrium for Electricity Markets with Spatially Flexible Loads
Electric vehicle charging and geo-distributed datacenters introduce spatially flexible loads (FLs) that couple power, transportation, and datacenter networks. These couplings create a closed-loop feedback between locational marginal prices (LMPs) and decisions of the FL systems, challenging the foundations of conventional competitive equilibrium (CE) in electricity markets. This paper studies a notion of generalized competitive equilibrium (GCE) that aims to capture such price-demand interactions across the interconnected infrastructures. We establish structural conditions under which the GCE preserves key properties of the conventional CE, including existence, uniqueness, and efficiency, without requiring detailed knowledge of decision processes for individual FL systems. The framework generalizes to settings where the grid is coupled with multiple FL systems. Stylized examples and case studies on the New York ISO grid, coupled with the Sioux Falls transportation and distributed datacenter networks, demonstrate the use of our theoretical framework and illustrate the mutual influence among the grid and the studied FL systems.
Decision-dependent Robust Charging Infrastructure Planning for Light-duty Truck Electrification at Industrial Sites
Many industrial sites and digital logistics platforms rely on diesel-powered light-duty trucks to transport workers and small-scale facilities, which results in a significant amount of greenhouse gas emissions (GHGs). To address this, we develop a robust model for planning charging infrastructure to electrify light-duty trucks at industrial sites. The model is formulated as a mixed-integer linear program (MILP) that optimizes the charging infrastructure selection (across multiple charger types and locations) and determines charging schedules for each truck based on the selected infrastructure. Given the strict stop times and schedules at industrial sites, we introduce a scheduling-with-abandonment problem in which trucks forgo charging if their waiting time exceeds a maximum threshold. We further incorporate the impacts of overnight charging and range anxiety on drivers' waiting and abandonment behaviors. To model stochastic, heterogeneous parking durations, we classified trucks using machine learning (ML) methods based on contextual and time-location features. We then constructed decision-dependent, feature-driven robust uncertainty sets in which parking-time variability varies flexibly with drivers' charging choices. These feature-driven sets are applied to two robust optimization formulations with decision-dependent uncertainty (RO-DDU), resulting in distinct outcomes and managerial implications. We conduct a case study at an open-pit mining site to plan charger installations across eight charging zones, serving approximately 200 trucks. By decomposing the problem into a short rolling horizon or using a heuristic approach for the full-year or representative-day dataset, the model achieves an optimality gap of less than 0.1\% under diverse uncertainty scenarios.
Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking
Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. There has been significant work in incorporating Lyapunov-based stability guarantees in RL algorithms with key challenges being selecting a candidate Lyapunov function, computational complexity by using excessive function approximators and conservative policies by incorporating stability criterion in the learning process. In this work we propose a novel Lyapunov-constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We propose use of extended dynamic mode decomposition (EDMD) to produce a linear approximation of the system and use this approximation to derive a closed form solution for candidate Lyapunov function. This derived Lyapunov function is incorporated in the SAC algorithm to further provide guarantees for a policy that stabilizes the nonlinear system. The results are evaluated trajectory tracking of a 2D Quadrotor environment based on safe-control-gym. The proposed algorithm shows training convergence and decaying violations for Lyapunov stability criterion compared to baseline vanilla SAC algorithm. GitHub Repository: https://github.com/DhruvKushwaha/LC-SAC-Quadrotor-Trajectory-Tracking
comment: 11 pages, 7 Figures, submitted to IEEE RA-L
Robotics
CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction ICRA 2026
To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines. The experiment videos, code, and dataset are available on our project website: https://comm-cp.github.io.
comment: IEEE International Conference on Robotics and Automation (ICRA 2026); Project Website: https://comm-cp.github.io/
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
Humans rarely plan whole-body interactions with objects at the level of explicit whole-body movements. High-level intentions, such as affordance, define the goal, while coordinated balance, contact, and manipulation can emerge naturally from underlying physical and motor priors. Scaling such priors is key to enabling humanoids to compose and generalize loco-manipulation skills across diverse contexts while maintaining physically coherent whole-body coordination. To this end, we introduce InterPrior, a scalable framework that learns a unified generative controller through large-scale imitation pretraining and post-training by reinforcement learning. InterPrior first distills a full-reference imitation expert into a versatile, goal-conditioned variational policy that reconstructs motion from multimodal observations and high-level intent. While the distilled policy reconstructs training behaviors, it does not generalize reliably due to the vast configuration space of large-scale human-object interactions. To address this, we apply data augmentation with physical perturbations, and then perform reinforcement learning finetuning to improve competence on unseen goals and initializations. Together, these steps consolidate the reconstructed latent skills into a valid manifold, yielding a motion prior that generalizes beyond the training data, e.g., it can incorporate new behaviors such as interactions with unseen objects. We further demonstrate its effectiveness for user-interactive control and its potential for real robot deployment.
comment: Webpage: https://sirui-xu.github.io/InterPrior/
Learning Event-Based Shooter Models from Virtual Reality Experiments
Virtual reality (VR) has emerged as a powerful tool for evaluating school security measures in high-risk scenarios such as school shootings, offering experimental control and high behavioral fidelity. However, assessing new interventions in VR requires recruiting new participant cohorts for each condition, making large-scale or iterative evaluation difficult. These limitations are especially restrictive when attempting to learn effective intervention strategies, which typically require many training episodes. To address this challenge, we develop a data-driven discrete-event simulator (DES) that models shooter movement and in-region actions as stochastic processes learned from participant behavior in VR studies. We use the simulator to examine the impact of a robot-based shooter intervention strategy. Once shown to reproduce key empirical patterns, the DES enables scalable evaluation and learning of intervention strategies that are infeasible to train directly with human subjects. Overall, this work demonstrates a high-to-mid fidelity simulation workflow that provides a scalable surrogate for developing and evaluating autonomous school-security interventions.
comment: Preprint under review for conference publication. 9 pages, 4 figures, 4 tables
Visuo-Tactile World Models
We introduce multi-task Visuo-Tactile World Models (VT-WM), which capture the physics of contact through touch reasoning. By complementing vision with tactile sensing, VT-WM better understands robot-object interactions in contact-rich tasks, avoiding common failure modes of vision-only models under occlusion or ambiguous contact states, such as objects disappearing, teleporting, or moving in ways that violate basic physics. Trained across a set of contact-rich manipulation tasks, VT-WM improves physical fidelity in imagination, achieving 33% better performance at maintaining object permanence and 29% better compliance with the laws of motion in autoregressive rollouts. Moreover, experiments show that grounding in contact dynamics also translates to planning. In zero-shot real-robot experiments, VT-WM achieves up to 35% higher success rates, with the largest gains in multi-step, contact-rich tasks. Finally, VT-WM demonstrates significant downstream versatility, effectively adapting its learned contact dynamics to a novel task and achieving reliable planning success with only a limited set of demonstrations.
comment: Preprint
Location-Aware Dispersion on Anonymous Graphs
The well-studied DISPERSION problem is a fundamental coordination problem in distributed robotics, where a set of mobile robots must relocate so that each occupies a distinct node of a network. DISPERSION assumes that a robot can settle at any node as long as no other robot settles on that node. In this work, we introduce LOCATION-AWARE DISPERSION, a novel generalization of DISPERSION that incorporates location awareness: Let $G = (V, E)$ be an anonymous, connected, undirected graph with $n = |V|$ nodes, each labeled with a color $\sf{col}(v) \in C = \{c_1, \dots, c_t\}, t\leq n$. A set $R = \{r_1, \dots, r_k\}$ of $k \leq n$ mobile robots is given, where each robot $r_i$ has an associated color $\mathsf{col}(r_i) \in C$. Initially placed arbitrarily on the graph, the goal is to relocate the robots so that each occupies a distinct node of the same color. When $|C|=1$, LOCATION-AWARE DISPERSION reduces to DISPERSION. There is a solution to DISPERSION in graphs with any $k\leq n$ without knowing $k,n$. Like DISPERSION, the goal is to solve LOCATION-AWARE DISPERSION minimizing both time and memory requirement at each agent. We develop several deterministic algorithms with guaranteed bounds on both time and memory requirement. We also give an impossibility and a lower bound for any deterministic algorithm for LOCATION-AWARE DISPERSION. To the best of our knowledge, the presented results collectively establish the algorithmic feasibility of LOCATION-AWARE DISPERSION in anonymous networks and also highlight the challenges on getting an efficient solution compared to the solutions for DISPERSION.
comment: 3 tables, 2 figures, 6 pseudo-codes
From Bench to Flight: Translating Drone Impact Tests into Operational Safety Limits
Indoor micro-aerial vehicles (MAVs) are increasingly used for tasks that require close proximity to people, yet practitioners lack practical methods to tune motion limits based on measured impact risk. We present an end-to-end, open toolchain that converts benchtop impact tests into deployable safety governors for drones. First, we describe a compact and replicable impact rig and protocol for capturing force-time profiles across drone classes and contact surfaces. Second, we provide data-driven models that map pre-impact speed to impulse and contact duration, enabling direct computation of speed bounds for a target force limit. Third, we release scripts and a ROS2 node that enforce these bounds online and log compliance, with support for facility-specific policies. We validate the workflow on multiple commercial off-the-shelf quadrotors and representative indoor assets, demonstrating that the derived governors preserve task throughput while meeting force constraints specified by safety stakeholders. Our contribution is a practical bridge from measured impacts to runtime limits, with shareable datasets, code, and a repeatable process that teams can adopt to certify indoor MAV operations near humans.
Residual Reinforcement Learning for Waste-Container Lifting Using Large-Scale Cranes with Underactuated Tools
This paper studies the container lifting phase of a waste-container recycling task in urban environments, performed by a hydraulic loader crane equipped with an underactuated discharge unit, and proposes a residual reinforcement learning (RRL) approach that combines a nominal Cartesian controller with a learned residual policy. All experiments are conducted in simulation, where the task is characterized by tight geometric tolerances between the discharge-unit hooks and the container rings relative to the overall crane scale, making precise trajectory tracking and swing suppression essential. The nominal controller uses admittance control for trajectory tracking and pendulum-aware swing damping, followed by damped least-squares inverse kinematics with a nullspace posture term to generate joint velocity commands. A PPO-trained residual policy in Isaac Lab compensates for unmodeled dynamics and parameter variations, improving precision and robustness without requiring end-to-end learning from scratch. We further employ randomized episode initialization and domain randomization over payload properties, actuator gains, and passive joint parameters to enhance generalization. Simulation results demonstrate improved tracking accuracy, reduced oscillations, and higher lifting success rates compared to the nominal controller alone.
comment: 12 pages
Constrained Group Relative Policy Optimization
While Group Relative Policy Optimization (GRPO) has emerged as a scalable framework for critic-free policy learning, extending it to settings with explicit behavioral constraints remains underexplored. We introduce Constrained GRPO, a Lagrangian-based extension of GRPO for constrained policy optimization. Constraints are specified via indicator cost functions, enabling direct optimization of violation rates through a Lagrangian relaxation. We show that a naive multi-component treatment in advantage estimation can break constrained learning: mismatched component-wise standard deviations distort the relative importance of the different objective terms, which in turn corrupts the Lagrangian signal and prevents meaningful constraint enforcement. We formally derive this effect to motivate our scalarized advantage construction that preserves the intended trade-off between reward and constraint terms. Experiments in a toy gridworld confirm the predicted optimization pathology and demonstrate that scalarizing advantages restores stable constraint control. In addition, we evaluate Constrained GRPO on robotics tasks, where it improves constraint satisfaction while increasing task success, establishing a simple and effective recipe for constrained policy optimization in embodied AI domains that increasingly rely on large multimodal foundation models.
comment: 16 pages, 6 figures
A Hybrid Autoencoder for Robust Heightmap Generation from Fused Lidar and Depth Data for Humanoid Robot Locomotion
Reliable terrain perception is a critical prerequisite for the deployment of humanoid robots in unstructured, human-centric environments. While traditional systems often rely on manually engineered, single-sensor pipelines, this paper presents a learning-based framework that uses an intermediate, robot-centric heightmap representation. A hybrid Encoder-Decoder Structure (EDS) is introduced, utilizing a Convolutional Neural Network (CNN) for spatial feature extraction fused with a Gated Recurrent Unit (GRU) core for temporal consistency. The architecture integrates multimodal data from an Intel RealSense depth camera, a LIVOX MID-360 LiDAR processed via efficient spherical projection, and an onboard IMU. Quantitative results demonstrate that multimodal fusion improves reconstruction accuracy by 7.2% over depth-only and 9.9% over LiDAR-only configurations. Furthermore, the integration of a 3.2 s temporal context reduces mapping drift.
Sparse Video Generation Propels Real-World Beyond-the-View Vision-Language Navigation
Why must vision-language navigation be bound to detailed and verbose language instructions? While such details ease decision-making, they fundamentally contradict the goal for navigation in the real-world. Ideally, agents should possess the autonomy to navigate in unknown environments guided solely by simple and high-level intents. Realizing this ambition introduces a formidable challenge: Beyond-the-View Navigation (BVN), where agents must locate distant, unseen targets without dense and step-by-step guidance. Existing large language model (LLM)-based methods, though adept at following dense instructions, often suffer from short-sighted behaviors due to their reliance on short-horimzon supervision. Simply extending the supervision horizon, however, destabilizes LLM training. In this work, we identify that video generation models inherently benefit from long-horizon supervision to align with language instructions, rendering them uniquely suitable for BVN tasks. Capitalizing on this insight, we propose introducing the video generation model into this field for the first time. Yet, the prohibitive latency for generating videos spanning tens of seconds makes real-world deployment impractical. To bridge this gap, we propose SparseVideoNav, achieving sub-second trajectory inference guided by a generated sparse future spanning a 20-second horizon. This yields a remarkable 27x speed-up compared to the unoptimized counterpart. Extensive real-world zero-shot experiments demonstrate that SparseVideoNav achieves 2.5x the success rate of state-of-the-art LLM baselines on BVN tasks and marks the first realization of such capability in challenging night scenes.
Scalable and General Whole-Body Control for Cross-Humanoid Locomotion
Learning-based whole-body controllers have become a key driver for humanoid robots, yet most existing approaches require robot-specific training. In this paper, we study the problem of cross-embodiment humanoid control and show that a single policy can robustly generalize across a wide range of humanoid robot designs with one-time training. We introduce XHugWBC, a novel cross-embodiment training framework that enables generalist humanoid control through: (1) physics-consistent morphological randomization, (2) semantically aligned observation and action spaces across diverse humanoid robots, and (3) effective policy architectures modeling morphological and dynamical properties. XHugWBC is not tied to any specific robot. Instead, it internalizes a broad distribution of morphological and dynamical characteristics during training. By learning motion priors from diverse randomized embodiments, the policy acquires a strong structural bias that supports zero-shot transfer to previously unseen robots. Experiments on twelve simulated humanoids and seven real-world robots demonstrate the strong generalization and robustness of the resulting universal controller.
Task-Oriented Robot-Human Handovers on Legged Manipulators
Task-oriented handovers (TOH) are fundamental to effective human-robot collaboration, requiring robots to present objects in a way that supports the human's intended post-handover use. Existing approaches are typically based on object- or task-specific affordances, but their ability to generalize to novel scenarios is limited. To address this gap, we present AFT-Handover, a framework that integrates large language model (LLM)-driven affordance reasoning with efficient texture-based affordance transfer to achieve zero-shot, generalizable TOH. Given a novel object-task pair, the method retrieves a proxy exemplar from a database, establishes part-level correspondences via LLM reasoning, and texturizes affordances for feature-based point cloud transfer. We evaluate AFT-Handover across diverse task-object pairs, showing improved handover success rates and stronger generalization compared to baselines. In a comparative user study, our framework is significantly preferred over the current state-of-the-art, effectively reducing human regrasping before tool use. Finally, we demonstrate TOH on legged manipulators, highlighting the potential of our framework for real-world robot-human handovers.
comment: Accepted to 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2026
From Vision to Decision: Neuromorphic Control for Autonomous Navigation and Tracking
Robotic navigation has historically struggled to reconcile reactive, sensor-based control with the decisive capabilities of model-based planners. This duality becomes critical when the absence of a predominant option among goals leads to indecision, challenging reactive systems to break symmetries without computationally-intense planners. We propose a parsimonious neuromorphic control framework that bridges this gap for vision-guided navigation and tracking. Image pixels from an onboard camera are encoded as inputs to dynamic neuronal populations that directly transform visual target excitation into egocentric motion commands. A dynamic bifurcation mechanism resolves indecision by delaying commitment until a critical point induced by the environmental geometry. Inspired by recently proposed mechanistic models of animal cognition and opinion dynamics, the neuromorphic controller provides real-time autonomy with a minimal computational burden, a small number of interpretable parameters, and can be seamlessly integrated with application-specific image processing pipelines. We validate our approach in simulation environments as well as on an experimental quadrotor platform.
HiCrowd: Hierarchical Crowd Flow Alignment for Dense Human Environments ICRA
Navigating through dense human crowds remains a significant challenge for mobile robots. A key issue is the freezing robot problem, where the robot struggles to find safe motions and becomes stuck within the crowd. To address this, we propose HiCrowd, a hierarchical framework that integrates reinforcement learning (RL) with model predictive control (MPC). HiCrowd leverages surrounding pedestrian motion as guidance, enabling the robot to align with compatible crowd flows. A high-level RL policy generates a follow point to align the robot with a suitable pedestrian group, while a low-level MPC safely tracks this guidance with short horizon planning. The method combines long-term crowd aware decision making with safe short-term execution. We evaluate HiCrowd against reactive and learning-based baselines in offline setting (replaying recorded human trajectories) and online setting (human trajectories are updated to react to the robot in simulation). Experiments on a real-world dataset and a synthetic crowd dataset show that our method outperforms in navigation efficiency and safety, while reducing freezing behaviors. Our results suggest that leveraging human motion as guidance, rather than treating humans solely as dynamic obstacles, provides a powerful principle for safe and efficient robot navigation in crowds.
comment: Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA)
TOLEBI: Learning Fault-Tolerant Bipedal Locomotion via Online Status Estimation and Fallibility Rewards ICRA
With the growing employment of learning algorithms in robotic applications, research on reinforcement learning for bipedal locomotion has become a central topic for humanoid robotics. While recently published contributions achieve high success rates in locomotion tasks, scarce attention has been devoted to the development of methods that enable to handle hardware faults that may occur during the locomotion process. However, in real-world settings, environmental disturbances or sudden occurrences of hardware faults might yield severe consequences. To address these issues, this paper presents TOLEBI (A faulT-tOlerant Learning framEwork for Bipedal locomotIon) that handles faults on the robot during operation. Specifically, joint locking, power loss and external disturbances are injected in simulation to learn fault-tolerant locomotion strategies. In addition to transferring the learned policy to the real robot via sim-to-real transfer, an online joint status module incorporated. This module enables to classify joint conditions by referring to the actual observations at runtime under real-world conditions. The validation experiments conducted both in real-world and simulation with the humanoid robot TOCABI highlight the applicability of the proposed approach. To our knowledge, this manuscript provides the first learning-based fault-tolerant framework for bipedal locomotion, thereby fostering the development of efficient learning methods in this field.
comment: Accepted for Publication at IEEE International Conference on Robotics and Automation (ICRA) 2026
PIRATR: Parametric Object Inference for Robotic Applications with Transformers in 3D Point Clouds ICRA
We present PIRATR, an end-to-end 3D object detection framework for robotic use cases in point clouds. Extending PI3DETR, our method streamlines parametric 3D object detection by jointly estimating multi-class 6-DoF poses and class-specific parametric attributes directly from occlusion-affected point cloud data. This formulation enables not only geometric localization but also the estimation of task-relevant properties for parametric objects, such as a gripper's opening, where the 3D model is adjusted according to simple, predefined rules. The architecture employs modular, class-specific heads, making it straightforward to extend to novel object types without re-designing the pipeline. We validate PIRATR on an automated forklift platform, focusing on three structurally and functionally diverse categories: crane grippers, loading platforms, and pallets. Trained entirely in a synthetic environment, PIRATR generalizes effectively to real outdoor LiDAR scans, achieving a detection mAP of 0.919 without additional fine-tuning. PIRATR establishes a new paradigm of pose-aware, parameterized perception. This bridges the gap between low-level geometric reasoning and actionable world models, paving the way for scalable, simulation-trained perception systems that can be deployed in dynamic robotic environments. Code available at https://github.com/swingaxe/piratr.
comment: 8 Pages, 11 Figures, Accepted at 2026 IEEE International Conference on Robotics & Automation (ICRA) Vienna
IndustryShapes: An RGB-D Benchmark dataset for 6D object pose estimation of industrial assembly components and tools ICRA 2026
We introduce IndustryShapes, a new RGB-D benchmark dataset of industrial tools and components, designed for both instance-level and novel object 6D pose estimation approaches. The dataset provides a realistic and application-relevant testbed for benchmarking these methods in the context of industrial robotics bridging the gap between lab-based research and deployment in real-world manufacturing scenarios. Unlike many previous datasets that focus on household or consumer products or use synthetic, clean tabletop datasets, or objects captured solely in controlled lab environments, IndustryShapes introduces five new object types with challenging properties, also captured in realistic industrial assembly settings. The dataset has diverse complexity, from simple to more challenging scenes, with single and multiple objects, including scenes with multiple instances of the same object and it is organized in two parts: the classic set and the extended set. The classic set includes a total of 4,6k images and 6k annotated poses. The extended set introduces additional data modalities to support the evaluation of model-free and sequence-based approaches. To the best of our knowledge, IndustryShapes is the first dataset to offer RGB-D static onboarding sequences. We further evaluate the dataset on a representative set of state-of-the art methods for instance-based and novel object 6D pose estimation, including also object detection, segmentation, showing that there is room for improvement in this domain. The dataset page can be found in https://pose-lab.github.io/IndustryShapes.
comment: To appear in ICRA 2026
VLN-Pilot: Large Vision-Language Model as an Autonomous Indoor Drone Operator
This paper introduces VLN-Pilot, a novel framework in which a large Vision-and-Language Model (VLLM) assumes the role of a human pilot for indoor drone navigation. By leveraging the multimodal reasoning abilities of VLLMs, VLN-Pilot interprets free-form natural language instructions and grounds them in visual observations to plan and execute drone trajectories in GPS-denied indoor environments. Unlike traditional rule-based or geometric path-planning approaches, our framework integrates language-driven semantic understanding with visual perception, enabling context-aware, high-level flight behaviors with minimal task-specific engineering. VLN-Pilot supports fully autonomous instruction-following for drones by reasoning about spatial relationships, obstacle avoidance, and dynamic reactivity to unforeseen events. We validate our framework on a custom photorealistic indoor simulation benchmark and demonstrate the ability of the VLLM-driven agent to achieve high success rates on complex instruction-following tasks, including long-horizon navigation with multiple semantic targets. Experimental results highlight the promise of replacing remote drone pilots with a language-guided autonomous agent, opening avenues for scalable, human-friendly control of indoor UAVs in tasks such as inspection, search-and-rescue, and facility monitoring. Our results suggest that VLLM-based pilots may dramatically reduce operator workload while improving safety and mission flexibility in constrained indoor environments.
Virtual-Tube-Based Cooperative Transport Control for Multi-UAV Systems in Constrained Environments
This paper proposes a novel control framework for cooperative transportation of cable-suspended loads by multiple unmanned aerial vehicles (UAVs) operating in constrained environments. Leveraging virtual tube theory and principles from dissipative systems theory, the framework facilitates efficient multi-UAV collaboration for navigating obstacle-rich areas. The proposed framework offers several key advantages. (1) It achieves tension distribution and coordinated transportation within the UAV-cable-load system with low computational overhead, dynamically adapting UAV configurations based on obstacle layouts to facilitate efficient navigation. (2) By integrating dissipative systems theory, the framework ensures high stability and robustness, essential for complex multi-UAV operations. The effectiveness of the proposed approach is validated through extensive simulations, demonstrating its scalability for large-scale multi-UAV systems. Furthermore, the method is experimentally validated in outdoor scenarios, showcasing its practical feasibility and robustness under real-world conditions.
comment: 10 pages, 8 figures
DECO: Decoupled Multimodal Diffusion Transformer for Bimanual Dexterous Manipulation with a Plugin Tactile Adapter
Overview of the Proposed DECO Framework.} DECO is a DiT-based policy that decouples multimodal conditioning. Image and action tokens interact via joint self attention, while proprioceptive states and optional conditions are injected through adaptive layer normalization. Tactile signals are injected via cross attention, while a lightweight LoRA-based adapter is used to efficiently fine-tune the pretrained policy. DECO is also accompanied by DECO-50, a bimanual dexterous manipulation dataset with tactile sensing, consisting of 4 scenarios and 28 sub-tasks, covering more than 50 hours of data, approximately 5 million frames, and 8,000 successful trajectories.
comment: 17 pages, 8 figures
TaSA: Two-Phased Deep Predictive Learning of Tactile Sensory Attenuation for Improving In-Grasp Manipulation ICRA2026
Humans can achieve diverse in-hand manipulations, such as object pinching and tool use, which often involve simultaneous contact between the object and multiple fingers. This is still an open issue for robotic hands because such dexterous manipulation requires distinguishing between tactile sensations generated by their self-contact and those arising from external contact. Otherwise, object/robot breakage happens due to contacts/collisions. Indeed, most approaches ignore self-contact altogether, by constraining motion to avoid/ignore self-tactile information during contact. While this reduces complexity, it also limits generalization to real-world scenarios where self-contact is inevitable. Humans overcome this challenge through self-touch perception, using predictive mechanisms that anticipate the tactile consequences of their own motion, through a principle called sensory attenuation, where the nervous system differentiates predictable self-touch signals, allowing novel object stimuli to stand out as relevant. Deriving from this, we introduce TaSA, a two-phased deep predictive learning framework. In the first phase, TaSA explicitly learns self-touch dynamics, modeling how a robot's own actions generate tactile feedback. In the second phase, this learned model is incorporated into the motion learning phase, to emphasize object contact signals during manipulation. We evaluate TaSA on a set of insertion tasks, which demand fine tactile discrimination: inserting a pencil lead into a mechanical pencil, inserting coins into a slot, and fixing a paper clip onto a sheet of paper, with various orientations, positions, and sizes. Across all tasks, policies trained with TaSA achieve significantly higher success rates than baseline methods, demonstrating that structured tactile perception with self-touch based on sensory attenuation is critical for dexterous robotic manipulation.
comment: 8 pages, 8 figures, 8 tables, ICRA2026 accepted
MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation
Visual Language Navigation (VLN) is one of the fundamental capabilities for embodied intelligence and a critical challenge that urgently needs to be addressed. However, existing methods are still unsatisfactory in terms of both success rate (SR) and generalization: Supervised Fine-Tuning (SFT) approaches typically achieve higher SR, while Training-Free (TF) approaches often generalize better, but it is difficult to obtain both simultaneously. To this end, we propose a Memory-Execute-Review framework. It consists of three parts: a hierarchical memory module for providing information support, an execute module for routine decision-making and actions, and a review module for handling abnormal situations and correcting behavior. We validated the effectiveness of this framework on the Object Goal Navigation task. Across 4 datasets, our average SR achieved absolute improvements of 7% and 5% compared to all baseline methods under TF and Zero-Shot (ZS) settings, respectively. On the most commonly used HM3D_v0.1 and the more challenging open vocabulary dataset HM3D_OVON, the SR improved by 8% and 6%, under ZS settings. Furthermore, on the MP3D and HM3D_OVON datasets, our method not only outperformed all TF methods but also surpassed all SFT methods, achieving comprehensive leadership in both SR (5% and 2%) and generalization.
comment: 9 pages, 2 figures, 5 tables, conference
Ontology-Driven Robotic Specification Synthesis
This paper addresses robotic system engineering for safety- and mission-critical applications by bridging the gap between high-level objectives and formal, executable specifications. The proposed method, Robotic System Task to Model Transformation Methodology (RSTM2) is an ontology-driven, hierarchical approach using stochastic timed Petri nets with resources, enabling Monte Carlo simulations at mission, system, and subsystem levels. A hypothetical case study demonstrates how the RSTM2 method supports architectural trades, resource allocation, and performance analysis under uncertainty. Ontological concepts further enable explainable AI-based assistants, facilitating fully autonomous specification synthesis. The methodology offers particular benefits to complex multi-robot systems, such as the NASA CADRE mission, representing decentralized, resource-aware, and adaptive autonomous systems of the future.
comment: 8 pages, 9 figures, 3 tables, journal
Benchmarking Affordance Generalization with BusyBox
Vision-Language-Action (VLA) models have been attracting the attention of researchers and practitioners thanks to their promise of generalization. Although single-task policies still offer competitive performance, VLAs are increasingly able to handle commands and environments unseen in their training set. While generalization in vision and language space is undoubtedly important for robust versatile behaviors, a key meta-skill VLAs need to possess is affordance generalization -- the ability to manipulate new objects with familiar physical features. In this work, we present BusyBox, a physical benchmark for systematic semi-automatic evaluation of VLAs' affordance generalization. BusyBox consists of 6 modules with switches, sliders, wires, buttons, a display, and a dial. The modules can be swapped and rotated to create a multitude of BusyBox variations with different visual appearances but the same set of affordances. We empirically demonstrate that generalization across BusyBox variants is highly challenging even for strong open-weights VLAs such as $π_{0.5}$ and GR00T-N1.6. To encourage the research community to evaluate their own VLAs on BusyBox and to propose new affordance generalization experiments, we have designed BusyBox to be easy to build in most robotics labs. We release the full set of CAD files for 3D-printing its parts as well as a bill of materials for (optionally) assembling its electronics. We also publish a dataset of language-annotated demonstrations that we collected using the common bimanual Mobile Aloha robot on the canonical BusyBox configuration. All of the released materials are available at https://microsoft.github.io/BusyBox.
RoboPaint: From Human Demonstration to Any Robot and Any View
Acquiring large-scale, high-fidelity robot demonstration data remains a critical bottleneck for scaling Vision-Language-Action (VLA) models in dexterous manipulation. We propose a Real-Sim-Real data collection and data editing pipeline that transforms human demonstrations into robot-executable, environment-specific training data without direct robot teleoperation. Standardized data collection rooms are built to capture multimodal human demonstrations (synchronized 3 RGB-D videos, 11 RGB videos, 29-DoF glove joint angles, and 14-channel tactile signals). Based on these human demonstrations, we introduce a tactile-aware retargeting method that maps human hand states to robot dex-hand states via geometry and force-guided optimization. Then the retargeted robot trajectories are rendered in a photorealistic Isaac Sim environment to build robot training data. Real world experiments have demonstrated: (1) The retargeted dex-hand trajectories achieve an 84\% success rate across 10 diverse object manipulation tasks. (2) VLA policies (Pi0.5) trained exclusively on our generated data achieve 80\% average success rate on three representative tasks, i.e., pick-and-place, pushing and pouring. To conclude, robot training data can be efficiently "painted" from human demonstrations using our real-sim-real data pipeline. We offer a scalable, cost-effective alternative to teleoperation with minimal performance loss for complex dexterous manipulation.
comment: 17 pages
A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps
Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.
comment: 11 pages, 6 figures, 5 tables, Submitted to RSS 2026
Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates
Neural Lyapunov and barrier certificates have recently been used as powerful tools for verifying the safety and stability properties of deep reinforcement learning (RL) controllers. However, existing methods offer guarantees only under fixed ideal unperturbed dynamics, limiting their reliability in real-world applications where dynamics may deviate due to uncertainties. In this work, we study the problem of synthesizing \emph{robust neural Lyapunov barrier certificates} that maintain their guarantees under perturbations in system dynamics. We formally define a robust Lyapunov barrier function and specify sufficient conditions based on Lipschitz continuity that ensure robustness against bounded perturbations. We propose practical training objectives that enforce these conditions via adversarial training, Lipschitz neighborhood bound, and global Lipschitz regularization. We validate our approach in two practically relevant environments, Inverted Pendulum and 2D Docking. The former is a widely studied benchmark, while the latter is a safety-critical task in autonomous systems. We show that our methods significantly improve both certified robustness bounds (up to $4.6$ times) and empirical success rates under strong perturbations (up to $2.4$ times) compared to the baseline. Our results demonstrate effectiveness of training robust neural certificates for safe RL under perturbations in dynamics.
Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework
Soccer presents a significant challenge for humanoid robots, demanding tightly integrated perception-action capabilities for tasks like perception-guided kicking and whole-body balance control. Existing approaches suffer from inter-module instability in modular pipelines or conflicting training objectives in end-to-end frameworks. We propose Perception-Action integrated Decision-making (PAiD), a progressive architecture that decomposes soccer skill acquisition into three stages: motion-skill acquisition via human motion tracking, lightweight perception-action integration for positional generalization, and physics-aware sim-to-real transfer. This staged decomposition establishes stable foundational skills, avoids reward conflicts during perception integration, and minimizes sim-to-real gaps. Experiments on the Unitree G1 demonstrate high-fidelity human-like kicking with robust performance under diverse conditions-including static or rolling balls, various positions, and disturbances-while maintaining consistent execution across indoor and outdoor scenarios. Our divide-and-conquer strategy advances robust humanoid soccer capabilities and offers a scalable framework for complex embodied skill acquisition. The project page is available at https://soccer-humanoid.github.io/.
comment: 13 pages, 9 figures, conference
Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions
Enabling robots to explore and act in unfamiliar environments under ambiguous human instructions by interactively identifying task-relevant objects (e.g., identifying cups or beverages for "I'm thirsty") remains challenging for existing vision-language model (VLM)-based methods. This challenge stems from inefficient reasoning and the lack of environmental interaction, which hinder real-time task planning and execution. To address this, We propose Affordance-Aware Interactive Decision-Making and Execution for Ambiguous Instructions (AIDE), a dual-stream framework that integrates interactive exploration with vision-language reasoning, where Multi-Stage Inference (MSI) serves as the decision-making stream and Accelerated Decision-Making (ADM) as the execution stream, enabling zero-shot affordance analysis and interpretation of ambiguous instructions. Extensive experiments in simulation and real-world environments show that AIDE achieves the task planning success rate of over 80\% and more than 95\% accuracy in closed-loop continuous execution at 10 Hz, outperforming existing VLM-based methods in diverse open-world scenarios.
comment: 14 pages, 10 figures, 8 tables
Low-Cost Underwater In-Pipe Centering and Inspection Using a Minimal-Sensing Robot
Autonomous underwater inspection of submerged pipelines is challenging due to confined geometries, turbidity, and the scarcity of reliable localization cues. This paper presents a minimal-sensing strategy that enables a free-swimming underwater robot to center itself and traverse a flooded pipe of known radius using only an IMU, a pressure sensor, and two sonars: a downward-facing single-beam sonar and a rotating 360 degree sonar. We introduce a computationally efficient method for extracting range estimates from single-beam sonar intensity data, enabling reliable wall detection in noisy and reverberant conditions. A closed-form geometric model leverages the two sonar ranges to estimate the pipe center, and an adaptive, confidence-weighted proportional-derivative (PD) controller maintains alignment during traversal. The system requires no Doppler velocity log, external tracking, or complex multi-sensor arrays. Experiments in a submerged 46 cm-diameter pipe using a Blue Robotics BlueROV2 heavy remotely operated vehicle demonstrate stable centering and successful full-pipe traversal despite ambient flow and structural deformations. These results show that reliable in-pipe navigation and inspection can be achieved with a lightweight, computationally efficient sensing and processing architecture, advancing the practicality of autonomous underwater inspection in confined environments.
RFM-Pose:Reinforcement-Guided Flow Matching for Fast Category-Level 6D Pose Estimation
Object pose estimation is a fundamental problem in computer vision and plays a critical role in virtual reality and embodied intelligence, where agents must understand and interact with objects in 3D space. Recently, score based generative models have to some extent solved the rotational symmetry ambiguity problem in category level pose estimation, but their efficiency remains limited by the high sampling cost of score-based diffusion. In this work, we propose a new framework, RFM-Pose, that accelerates category-level 6D object pose generation while actively evaluating sampled hypotheses. To improve sampling efficiency, we adopt a flow-matching generative model and generate pose candidates along an optimal transport path from a simple prior to the pose distribution. To further refine these candidates, we cast the flow-matching sampling process as a Markov decision process and apply proximal policy optimization to fine-tune the sampling policy. In particular, we interpret the flow field as a learnable policy and map an estimator to a value network, enabling joint optimization of pose generation and hypothesis scoring within a reinforcement learning framework. Experiments on the REAL275 benchmark demonstrate that RFM-Pose achieves favorable performance while significantly reducing computational cost. Moreover, similar to prior work, our approach can be readily adapted to object pose tracking and attains competitive results in this setting.
comment: This work has been submitted to the IEEE for possible publication
MobileManiBench: Simplifying Model Verification for Mobile Manipulation
Vision-language-action models have advanced robotic manipulation but remain constrained by reliance on the large, teleoperation-collected datasets dominated by the static, tabletop scenes. We propose a simulation-first framework to verify VLA architectures before real-world deployment and introduce MobileManiBench, a large-scale benchmark for mobile-based robotic manipulation. Built on NVIDIA Isaac Sim and powered by reinforcement learning, our pipeline autonomously generates diverse manipulation trajectories with rich annotations (language instructions, multi-view RGB-depth-segmentation images, synchronized object/robot states and actions). MobileManiBench features 2 mobile platforms (parallel-gripper and dexterous-hand robots), 2 synchronized cameras (head and right wrist), 630 objects in 20 categories, 5 skills (open, close, pull, push, pick) with over 100 tasks performed in 100 realistic scenes, yielding 300K trajectories. This design enables controlled, scalable studies of robot embodiments, sensing modalities, and policy architectures, accelerating research on data efficiency and generalization. We benchmark representative VLA models and report insights into perception, reasoning, and control in complex simulated environments.
Informative Path Planning with Guaranteed Estimation Uncertainty
Environmental monitoring robots often need to reconstruct spatial fields (e.g., salinity, temperature, bathymetry) under tight distance and energy constraints. Classical boustrophedon lawnmower surveys provide geometric coverage guarantees but can waste effort by oversampling predictable regions. In contrast, informative path planning (IPP) methods leverage spatial correlations to reduce oversampling, yet typically offer no guarantees on reconstruction quality. This paper bridges these approaches by addressing informative path planning with guaranteed estimation uncertainty: computing the shortest path whose measurements ensure that the Gaussian-process (GP) posterior variance -- an intrinsic uncertainty measure that lower-bounds the mean-squared prediction error under the GP model -- falls below a user-specified threshold over the monitoring region. We propose a three-stage approach: (i) learn a GP model from available prior information; (ii) transform the learned GP kernel into binary coverage maps for each candidate sensing location, indicating which locations' uncertainty can be reduced below a specified target; and (iii) plan a near-shortest route whose combined coverage satisfies the global uncertainty constraint. To address heterogeneous phenomena, we incorporate a nonstationary kernel that captures spatially varying correlation structure, and we accommodate non-convex environments with obstacles. Algorithmically, we present methods with provable approximation guarantees for sensing-location selection and for the joint selection-and-routing problem under a travel budget. Experiments on real-world topographic data show that our planners meet the uncertainty target using fewer sensing locations and shorter travel distances than a recent baseline, and field experiments with bathymetry-mapping autonomous surface and underwater vehicles demonstrate real-world feasibility.
comment: 16 pages, 11 figures, preprint
PLATO Hand: Shaping Contact Behavior with Fingernails for Precise Manipulation
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that embeds a rigid fingernail within a compliant pulp. This design shapes contact behavior to enable diverse interaction modes across a range of object geometries. We develop a strain-energy-based bending-indentation model to guide the fingertip design and to explain how guided contact preserves local indentation while suppressing global bending. Experimental results show that the proposed robotic hand design demonstrates improved pinching stability, enhanced force observability, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. Together, these results show that coupling structured contact geometry with a force-motion transparent mechanism provides a principled, physically embodied approach to precise manipulation.
MORPH Wheel: A Passive Variable-Radius Wheel Embedding Mechanical Behavior Logic for Input-Responsive Transformation
This paper introduces the Mechacnially prOgrammed Radius-adjustable PHysical (MORPH) wheel, a fully passive variable-radius wheel that embeds mechanical behavior logic for torque-responsive transformation. Unlike conventional variable transmission systems relying on actuators, sensors, and active control, the MORPH wheel achieves passive adaptation solely through its geometry and compliant structure. The design integrates a torque-response coupler and spring-loaded connecting struts to mechanically adjust the wheel radius between 80 mm and 45 mm in response to input torque, without any electrical components. The MORPH wheel provides three unique capabilities rarely achieved simultaneously in previous passive designs: (1) bidirectional operation with unlimited rotation through a symmetric coupler; (2) high torque capacity exceeding 10 N with rigid power transmission in drive mode; and (3) precise and repeatable transmission ratio control governed by deterministic kinematics. A comprehensive analytical model was developed to describe the wheel's mechanical behavior logic, establishing threshold conditions for mode switching between direct drive and radius transformation. Experimental validation confirmed that the measured torque-radius and force-displacement characteristics closely follow theoretical predictions across wheel weights of 1.8-2.8kg. Robot-level demonstrations on varying loads (0-25kg), slopes, and unstructured terrains further verified that the MORPH wheel passively adjusts its radius to provide optimal transmission ratio. The MORPH wheel exemplifies a mechanically programmed structure, embedding intelligent, context-dependent behavior directly into its physical design. This approach offers a new paradigm for passive variable transmission and mechanical intelligence in robotic mobility systems operating in unpredictable or control-limited environments.
comment: 14 pages, 16 figures. Under review at IEEE Transactions on Robotics
A Dialogue-Based Human-Robot Interaction Protocol for Wheelchair and Robotic Arm Integrated Control
People with lower and upper body disabilities can benefit from wheelchairs and robotic arms to improve mobility and independence. Prior assistive interfaces, such as touchscreens and voice-driven predefined commands, often remain unintuitive and struggle to capture complex user intent. We propose a natural, dialogue based human robot interaction protocol that simulates an intelligent agent capable of communicating with users to understand intent and execute assistive actions. In a pilot study, five participants completed five assistive tasks (cleaning, drinking, feeding, drawer opening, and door opening) through dialogue-based interaction with a wheelchair and robotic arm. As a baseline, participants were required to open a door using the manual control (a wheelchair joystick and a game controller for the arm) and complete a questionnaire to gather their feedback. By analyzing the post-study questionnaires, we found that most participants enjoyed the dialogue-based interaction and assistive robot autonomy.
Coupled Local and Global World Models for Efficient First Order RL
World models offer a promising avenue for more faithfully capturing complex dynamics, including contacts and non-rigidity, as well as complex sensory information, such as visual perception, in situations where standard simulators struggle. However, these models are computationally complex to evaluate, posing a challenge for popular RL approaches that have been successfully used with simulators to solve complex locomotion tasks but yet struggle with manipulation. This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments. At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method: a full-scale world model generates accurate forward trajectories, while a lightweight latent-space surrogate approximates its local dynamics for efficient gradient computation. This coupling of a local and global world model ensures high-fidelity unrolling alongside computationally tractable differentiation. We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency. We further evaluate our approach through an ego-centric object manipulation task with a quadruped. Together, these results demonstrate that learning inside data-driven world models is a promising pathway for solving hard-to-model RL tasks in image space without reliance on hand-crafted physics simulators.
Addressing the Waypoint-Action Gap in End-to-End Autonomous Driving via Vehicle Motion Models
End-to-End Autonomous Driving (E2E-AD) systems are typically grouped by the nature of their outputs: (i) waypoint-based models that predict a future trajectory, and (ii) action-based models that directly output throttle, steer and brake. Most recent benchmark protocols and training pipelines are waypoint-based, which makes action-based policies harder to train and compare, slowing their progress. To bridge this waypoint-action gap, we propose a novel, differentiable vehicle-model framework that rolls out predicted action sequences to their corresponding ego-frame waypoint trajectories while supervising in waypoint space. Our approach enables action-based architectures to be trained and evaluated, for the first time, within waypoint-based benchmarks without modifying the underlying evaluation protocol. We extensively evaluate our framework across multiple challenging benchmarks and observe consistent improvements over the baselines. In particular, on NAVSIM \texttt{navhard} our approach achieves state-of-the-art performance. Our code will be made publicly available upon acceptance.
comment: 8 pages, 3 figures
Bioinspired Kirigami Capsule Robot for Minimally Invasive Gastrointestinal Biopsy ICRA
Wireless capsule endoscopy (WCE) has transformed gastrointestinal (GI) diagnostics by enabling noninvasive visualization of the digestive tract, yet its diagnostic yield remains constrained by the absence of biopsy capability, as histological analysis is still the gold standard for confirming disease. Conventional biopsy using forceps, needles, or rotating blades is invasive, limited in reach, and carries risks of perforation or mucosal trauma, while fluid- or microbiota-sampling capsules cannot provide structured tissue for pathology, leaving a critical gap in swallowable biopsy solutions. Here we present the Kiri-Capsule, a kirigami-inspired capsule robot that integrates deployable PI-film flaps actuated by a compact dual-cam mechanism to achieve minimally invasive and repeatable tissue collection. The kirigami surface remains flat during locomotion but transforms into sharp protrusions upon cam-driven stretching, enabling controlled penetration followed by rotary scraping, with specimens retained in internal fan-shaped cavities. Bench tests confirmed that PI films exhibit a Young's modulus of approximately 20 MPa and stable deployment angles (about 34$^\circ$ at 15% strain), while ex vivo porcine studies demonstrated shallow penetration depths (median $\sim$0.61 mm, range 0.46--0.66 mm) and biopsy yields comparable to standard forceps (mean $\sim$10.9 mg for stomach and $\sim$18.9 mg for intestine), with forces within safe ranges reported for GI biopsy. These findings demonstrate that the Kiri-Capsule bridges passive imaging and functional biopsy, providing a swallowable, depth-controlled, and histology-ready solution that advances capsule-based diagnostics toward safe and effective clinical application.
comment: 8 pages, 11 figures, accepted to IEEE ICRA
AnyThermal: Towards Learning Universal Representations for Thermal Perception ICRA
We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation using thermal images. Existing thermal backbones that follow task-specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task-specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB-Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB-Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected in 4 environments. We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.
comment: Accepted at IEEE ICRA (International Conference on Robotics & Automation) 2026
Active Localization of Unstable Systems with Coarse Information SC
We study localization and control for unstable systems under coarse, single-bit sensing. Motivated by understanding the fundamental limitations imposed by such minimal feedback, we identify sufficient conditions under which the initial state can be recovered despite instability and extremely sparse measurements. Building on these conditions, we develop an active localization algorithm that integrates a set-based estimator with a control strategy derived from Voronoi partitions, which provably estimates the initial state while ensuring the agent remains in informative regions. Under the derived conditions, the proposed approach guarantees exponential contraction of the initial-state uncertainty, and the result is further supported by numerical experiments. These findings can offer theoretical insight into localization in robotics, where sensing is often limited to coarse abstractions such as keyframes, segmentations, or line-based features.
comment: 10 pages, 4 figures, Accepted by International Conference on Hybrid Systems: Computation and Control (HSCC) 2026
Transformer-Based Reinforcement Learning for Autonomous Orbital Collision Avoidance in Partially Observable Environments
We introduce a Transformer-based Reinforcement Learning framework for autonomous orbital collision avoidance that explicitly models the effects of partial observability and imperfect monitoring in space operations. The framework combines a configurable encounter simulator, a distance-dependent observation model, and a sequential state estimator to represent uncertainty in relative motion. A central contribution of this work is the use of transformer-based Partially Observable Markov Decision Process (POMDP) architecture, which leverage long-range temporal attention to interpret noisy and intermittent observations more effectively than traditional architectures. This integration provides a foundation for training collision avoidance agents that can operate more reliably under imperfect monitoring environments.
Dynamic Modeling, Parameter Identification and Numerical Analysis of Flexible Cables in Flexibly Connected Dual-AUV Systems
This research presents a dynamic modeling framework and parameter identification methods for describing the highly nonlinear behaviors of flexibly connected dual-AUV systems. The modeling framework is established based on the lumped mass method, integrating axial elasticity, bending stiffness, added mass and hydrodynamic forces, thereby accurately capturing the time-varying response of the forces and cable configurations. To address the difficulty of directly measuring material-related and hydrodynamic coefficients, this research proposes a parameter identification method that combines the physical model with experimental data. High-precision inversion of the equivalent Youngs modulus and hydrodynamic coefficients is performed through tension experiments under multiple configurations, effectively demonstrating that the identified model maintains predictive consistency in various operational conditions. Further numerical analysis indicates that the dynamic properties of flexible cable exhibit significant nonlinear characteristics, which are highly dependent on material property variations and AUV motion conditions. This nonlinear dynamic behavior results in two typical response states, slack and taut, which are jointly determined by boundary conditions and hydrodynamic effects, significantly affecting the cable configuration and endpoint loads. In this research, the dynamics of flexible cables under complex boundary conditions is revealed, providing a theoretical foundation for the design, optimization and further control research of similar systems.
Physical Human-Robot Interaction: A Critical Review of Safety Constraints
This paper aims to provide a clear and rigorous understanding of commonly recognized safety constraints in physical human-robot interaction, particularly regarding ISO/TS 15066. We investigate the derivation of these constraints, critically examine the underlying assumptions, and evaluate their practical implications for system-level safety and performance in industrially relevant scenarios. Key design parameters within safety-critical control architectures are identified, and numerical examples are provided to quantify performance degradation arising from typical approximations and design decisions in manufacturing environments. Within this analysis, the fundamental role of energy in safety assessment is emphasized, providing focused insights into energy-based safety methodologies for collaborative industrial robot systems.
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather than relying on prehensile manipulation. Despite a growing body of research in this field, most solutions are evaluated using case-specific setups, limiting reproducibility and cross-comparison. In this paper, we present Bench-NPIN, the first comprehensive benchmark for non-prehensile interactive navigation. Bench-NPIN includes multiple components: 1) a comprehensive range of simulated environments for non-prehensile interactive navigation tasks, including navigating a maze with movable obstacles, autonomous ship navigation in icy waters, box delivery, and area clearing, each with varying levels of complexity; 2) a set of evaluation metrics that capture unique aspects of interactive navigation, such as efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-NPIN to evaluate example implementations of established baselines across environments. Bench-NPIN is an open-source Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.
comment: This paper has been withdrawn by the authors. This paper has been superseded by arXiv:2512.11736
Enriching physical-virtual interaction in AR gaming by tracking identical objects via an egocentric partial observation frame
Augmented reality (AR) games, particularly those designed for head-mounted displays, have grown increasingly prevalent. However, most existing systems depend on pre-scanned, static environments and rely heavily on continuous tracking or marker-based solutions, which limit adaptability in dynamic physical spaces. This is particularly problematic for AR headsets and glasses, which typically follow the user's head movement and cannot maintain a fixed, stationary view of the scene. Moreover, continuous scene observation is neither power-efficient nor practical for wearable devices, given their limited battery and processing capabilities. A persistent challenge arises when multiple identical objects are present in the environment-standard object tracking pipelines often fail to maintain consistent identities without uninterrupted observation or external sensors. These limitations hinder fluid physical-virtual interactions, especially in dynamic or occluded scenes where continuous tracking is infeasible. To address this, we introduce a novel optimization-based framework for re-identifying identical objects in AR scenes using only one partial egocentric observation frame captured by a headset. We formulate the problem as a label assignment task solved via integer programming, augmented with a Voronoi diagram-based pruning strategy to improve computational efficiency. This method reduces computation time by 50% while preserving 91% accuracy in simulated experiments. Moreover, we evaluated our approach in quantitative synthetic and quantitative real-world experiments. We also conducted three qualitative real-world experiments to demonstrate the practical utility and generalizability for enabling dynamic, markerless object interaction in AR environments. Our video demo is available at https://youtu.be/RwptEfLtW1U.
Dull, Dirty, Dangerous: Understanding the Past, Present, and Future of a Key Motivation for Robotics
In robotics, the concept of "dull, dirty, and dangerous" (DDD) work has been used to motivate where robots might be useful. In this paper, we conduct an empirical analysis of robotics publications between 1980 and 2024 that mention DDD, and find that only 2.7% of publications define DDD and 8.7% of publications provide concrete examples of tasks or jobs that are DDD. We then review the social science literature on "dull," "dirty," and "dangerous" work to provide definitions and guidance on how to conceptualize DDD for robotics. Finally, we propose a framework that helps the robotics community consider the job context for our technology, encouraging a more informed perspective on how robotics may impact human labor.
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization ICRA 2025
Ground texture localization using a downward-facing camera offers a low-cost, high-precision localization solution that is robust to dynamic environments and requires no environmental modification. We present a significantly improved bag-of-words (BoW) image retrieval system for ground texture localization, achieving substantially higher accuracy for global localization and higher precision and recall for loop closure detection in SLAM. Our approach leverages an approximate $k$-means (AKM) vocabulary with soft assignment, and exploits the consistent orientation and constant scale constraints inherent to ground texture localization. Identifying the different needs of global localization vs. loop closure detection for SLAM, we present both high-accuracy and high-speed versions of our algorithm. We test the effect of each of our proposed improvements through an ablation study and demonstrate our method's effectiveness for both global localization and loop closure detection. With numerous ground texture localization systems already using BoW, our method can readily replace other generic BoW systems in their pipeline and immediately improve their results.
comment: Accepted to ICRA 2025
Constraint-Aware Discrete-Time PID Gain Optimization for Robotic Joint Control Under Actuator Saturation
The precise regulation of rotary actuation is fundamental in autonomous robotics, yet practical PID loops deviate from continuous-time theory due to discrete-time execution, actuator saturation, and small delays and measurement imperfections. We present an implementation-aware analysis and tuning workflow for saturated discrete-time joint control. We (i) derive PI stability regions under Euler and exact zero-order-hold (ZOH) discretizations using the Jury criterion, (ii) evaluate a discrete back-calculation anti-windup realization under saturation-dominant regimes, and (iii) propose a hybrid-certified Bayesian optimization workflow that screens analytically unstable candidates and behaviorally unsafe transients while optimizing a robust IAE objective with soft penalties on overshoot and saturation duty. Baseline sweeps ($τ=1.0$~s, $Δt=0.01$~s, $u\in[-10,10]$) quantify rise/settle trends for P/PI/PID. Under a randomized model family emulating uncertainty, delay, noise, quantization, and tighter saturation, robustness-oriented tuning improves median IAE from $0.843$ to $0.430$ while keeping median overshoot below $2\%$. In simulation-only tuning, the certification screen rejects $11.6\%$ of randomly sampled gains within bounds before full robust evaluation, improving sample efficiency.
comment: Pending IEEE Transactions on Robotics Publication
FilMBot: A High-Speed Soft Parallel Robotic Micromanipulator
Soft robotic manipulators are generally slow despite their great adaptability, resilience, and compliance. This limitation also extends to current soft robotic micromanipulators. Here, we introduce FilMBot, a 3-DOF film-based, electromagnetically actuated, soft kinematic robotic micromanipulator achieving speeds up to 2117 °/s and 2456 °/s in α and \{beta} angular motions, with corresponding linear velocities of 1.61 m/s and 1.92 m/s using a 4-cm needle end-effector, 0.54 m/s along the Z axis, and 1.57 m/s during Z-axis morph switching. The robot can reach ~1.50 m/s in path-following tasks, with an operational bandwidth below ~30 Hz, and remains responsive at 50 Hz. It demonstrates high precision (~6.3 μm, or ~0.05% of its workspace) in path-following tasks, with precision remaining largely stable across frequencies. The novel combination of the low-stiffness soft kinematic film structure and strong electromagnetic actuation in FilMBot opens new avenues for soft robotics. Furthermore, its simple construction and inexpensive, readily accessible components could broaden the application of micromanipulators beyond current academic and professional users.
comment: 13 pages, 16 figures
Do Robots Really Need Anthropomorphic Hands? -- A Comparison of Human and Robotic Hands
Human manipulation skills represent a pinnacle of their voluntary motor functions, requiring the coordination of many degrees of freedom and processing of high-dimensional sensor input to achieve such a high level of dexterity. Thus, we attempt to answer whether the human hand, with its associated biomechanical properties, sensors, and control mechanisms, is an ideal that we should strive for in robotics-do we really need anthropomorphic robotic hands? This survey can help practitioners to make the trade-off between hand complexity and potential manipulation skills. We provide an overview of the human hand, a comparison of commercially available robotic and prosthetic hands, and a systematic review of hand mechanisms and skills that they are capable of. This leads to follow-up questions. What is the minimum requirement for mechanisms and sensors to implement most skills that a robot needs? What is missing to reach human-level dexterity? Can we improve upon human dexterity? Although complex five-fingered hands are often used as the ultimate goal for robotic manipulators, they are not necessary for all tasks. We found that wrist flexibility and finger abduction/adduction are often more important for manipulation capabilities. Increasing the number of fingers, actuators, or degrees of freedom is not always necessary. Three fingers often are a good compromise between simplicity and dexterity. Non-anthropomorphic hand designs with two opposing pairs of fingers or human hands with six fingers can further increase dexterity, suggesting that the human hand is not the optimum. Consequently, we argue for function-based rather than form-based biomimicry.
RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Contextual Adaptation ICRA 2026
Efficiently finding targets in complex environments is fundamental to real-world embodied applications. While recent advances in multimodal foundation models have enabled zero-shot object goal navigation, allowing robots to search for arbitrary objects without fine-tuning, existing methods face two key limitations: (1) heavy reliance on precise depth and pose information provided by simulators, which restricts applicability in real-world scenarios; and (2) lack of in-context learning (ICL) capability, making it difficult to quickly adapt to new environments, as in leveraging short videos. To address these challenges, we propose RANGER, a novel zero-shot, open-vocabulary semantic navigation framework that operates using only a monocular camera. Leveraging powerful 3D foundation models, RANGER eliminates the dependency on depth and pose while exhibiting strong ICL capability. By simply observing a short video of a new environment, the system can also significantly improve task efficiency without requiring architectural modifications or fine-tuning. The framework integrates several key components: keyframe-based 3D reconstruction, semantic point cloud generation, vision-language model (VLM)-driven exploration value estimation, high-level adaptive waypoint selection, and low-level action execution. Experiments on the HM3D benchmark and real-world environments demonstrate that RANGER achieves competitive performance in terms of navigation success rate and exploration efficiency, while showing superior ICL adaptability, with no previous 3D mapping of the environment required.
comment: Accepted at ICRA 2026
TACO: Temporal Consensus Optimization for Continual Neural Mapping
Neural implicit mapping has emerged as a powerful paradigm for robotic navigation and scene understanding. However, real-world robotic deployment requires continual adaptation to changing environments under strict memory and computation constraints, which existing mapping systems fail to support. Most prior methods rely on replaying historical observations to preserve consistency and assume static scenes. As a result, they cannot adapt to continual learning in dynamic robotic settings. To address these challenges, we propose TACO (TemporAl Consensus Optimization), a replay-free framework for continual neural mapping. We reformulate mapping as a temporal consensus optimization problem, where we treat past model snapshots as temporal neighbors. Intuitively, our approach resembles a model consulting its own past knowledge. We update the current map by enforcing weighted consensus with historical representations. Our method allows reliable past geometry to constrain optimization while permitting unreliable or outdated regions to be revised in response to new observations. TACO achieves a balance between memory efficiency and adaptability without storing or replaying previous data. Through extensive simulated and real-world experiments, we show that TACO robustly adapts to scene changes, and consistently outperforms other continual learning baselines.
RFS: Reinforcement Learning with Residual Flow Steering for Dexterous Manipulation
Imitation learning has emerged as an effective approach for bootstrapping sequential decision-making in robotics, achieving strong performance even in high-dimensional dexterous manipulation tasks. Recent behavior cloning methods further leverage expressive generative models, such as diffusion models and flow matching, to represent multimodal action distributions. However, policies pretrained in this manner often exhibit limited generalization and require additional fine-tuning to achieve robust performance at deployment time. Such adaptation must preserve the global exploration benefits of pretraining while enabling rapid correction of local execution errors. We propose Residual Flow Steering(RFS), a data-efficient reinforcement learning framework for adapting pretrained generative policies. RFS steers a pretrained flow-matching policy by jointly optimizing a residual action and a latent noise distribution, enabling complementary forms of exploration: local refinement through residual corrections and global exploration through latent-space modulation. This design allows efficient adaptation while retaining the expressive structure of the pretrained policy. We demonstrate the effectiveness of RFS on dexterous manipulation tasks, showing efficient fine-tuning in both simulation and real-world settings when adapting pretrained base policies. Project website:https://weirdlabuw.github.io/rfs.
A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control
This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.
comment: v3: Major revision--Revised title; introduced the Sliced Learning framework; added comparative experiments, extended theoretical results, and supplementary materials (such as algorithms and proofs)
Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills
Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration between arms. In this paper, we introduce a hierarchical framework that frames this challenge as an integrated skill planning & scheduling problem, going beyond purely sequential decision-making to support simultaneous skill invocation. Our approach is built upon a library of single-arm and bimanual primitive skills, each trained using Reinforcement Learning (RL) in GPU-accelerated simulation. We then train a Transformer-based planner on a dataset of skill compositions to act as a high-level scheduler, simultaneously predicting the discrete schedule of skills as well as their continuous parameters. We demonstrate that our method achieves higher success rates on complex, contact-rich tasks than end-to-end RL approaches and produces more efficient, coordinated behaviors than traditional sequential-only planners.
MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning
Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. Using the lightweight Qwen-0.5B LLM, MindDrive achieves Driving Score (DS) of 78.04 and Success Rate (SR) of 55.09% on the challenging Bench2Drive benchmark. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.
comment: 16 pages, 12 figures, 6 tables; Project Page: https://xiaomi-mlab.github.io/MindDrive/
HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at https://tonywang-0517.github.io/hord/.
Observability-Aware Control for Quadrotor Formation Flight with Range-only Measurement
Cooperative Localization is a promising approach to achieving safe quadrotor formation flight through precise positioning via low-cost inter-drone sensors. This paper develops an observability-aware control principle tailored to quadrotor formation flight with range-only inter-drone measurements. The control principle is based on a novel approximation of the local observability Gramian (LOG), which we name the Short-Term Local Observability Gramian (STLOG). The validity of STLOG is established by proving its link to directional estimation precision in nonlinear systems. We propose the Observability Predictive Controller (OPC), a receding-horizon controller that generates optimal inputs to enhance information gain in weakly observable state directions by maximizing the minimum eigenvalue of the STLOG. This reduces the risk of estimator divergence due to the unbounded growth of uncertainty in weakly observed state components. Monte Carlo simulations and flight experiments are conducted with quadrotors in a GNSS-denied ferrying mission, showing that the OPC improves positioning confidence and estimator robustness.
comment: 37 pages, 5 figures
SPIDER: Scalable Physics-Informed Dexterous Retargeting
Learning dexterous and agile policy for humanoid and dexterous hand control requires large-scale demonstrations, but collecting robot-specific data is prohibitively expensive. In contrast, abundant human motion data is readily available from motion capture, videos, and virtual reality, which could help address the data scarcity problem. However, due to the embodiment gap and missing dynamic information like force and torque, these demonstrations cannot be directly executed on robots. To bridge this gap, we propose Scalable Physics-Informed DExterous Retargeting (SPIDER), a physics-based retargeting framework to transform and augment kinematic-only human demonstrations to dynamically feasible robot trajectories at scale. Our key insight is that human demonstrations should provide global task structure and objective, while large-scale physics-based sampling with curriculum-style virtual contact guidance should refine trajectories to ensure dynamical feasibility and correct contact sequences. SPIDER scales across diverse 9 humanoid/dexterous hand embodiments and 6 datasets, improving success rates by 18% compared to standard sampling, while being 10X faster than reinforcement learning (RL) baselines, and enabling the generation of a 2.4M frames dynamic-feasible robot dataset for policy learning. As a universal physics-based retargeting method, SPIDER can work with diverse quality data and generate diverse and high-quality data to enable efficient policy learning with methods like RL.
comment: Project website: https://jc-bao.github.io/spider-project/
A Taxonomy for Evaluating Generalist Robot Manipulation Policies
Machine learning for robot manipulation promises to unlock generalization to novel tasks and environments. But how should we measure the progress of these policies towards generalization? Evaluating and quantifying generalization is the Wild West of modern robotics, with each work proposing and measuring different types of generalization in their own, often difficult to reproduce settings. In this work, our goal is (1) to outline the forms of generalization we believe are important for robot manipulation in a comprehensive and fine-grained manner, and (2) to provide reproducible guidelines for measuring these notions of generalization. We first propose STAR-Gen, a taxonomy of generalization for robot manipulation structured around visual, semantic, and behavioral generalization. Next, we instantiate STAR-Gen with two case studies on real-world benchmarking: one based on open-source models and the Bridge V2 dataset, and another based on the bimanual ALOHA 2 platform that covers more dexterous and longer horizon tasks. Our case studies reveal many interesting insights: for example, we observe that open-source vision-language-action models often struggle with semantic generalization, despite pre-training on internet-scale language datasets. We provide videos and other supplementary material at stargen-taxonomy.github.io.
comment: IEEE Robotics and Automation Letters (RA-L)
Multiagent Systems
CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction ICRA 2026
To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines. The experiment videos, code, and dataset are available on our project website: https://comm-cp.github.io.
comment: IEEE International Conference on Robotics and Automation (ICRA 2026); Project Website: https://comm-cp.github.io/
PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling
Large language model (LLM)-based multi-agent systems enable expressive agent reasoning but are expensive to scale and poorly calibrated for timestep-aligned state-transition simulation, while classical agent-based models (ABMs) offer interpretability but struggle to integrate rich individual-level signals and non-stationary behaviors. We propose PhysicsAgentABM, which shifts inference to behaviorally coherent agent clusters: state-specialized symbolic agents encode mechanistic transition priors, a multimodal neural transition model captures temporal and interaction dynamics, and uncertainty-aware epistemic fusion yields calibrated cluster-level transition distributions. Individual agents then stochastically realize transitions under local constraints, decoupling population inference from entity-level variability. We further introduce ANCHOR, an LLM agent-driven clustering strategy based on cross-contextual behavioral responses and a novel contrastive loss, reducing LLM calls by up to 6-8 times. Experiments across public health, finance, and social sciences show consistent gains in event-time accuracy and calibration over mechanistic, neural, and LLM baselines. By re-architecting generative ABM around population-level inference with uncertainty-aware neuro-symbolic fusion, PhysicsAgentABM establishes a new paradigm for scalable and calibrated simulation with LLMs.
Learning to Share: Selective Memory for Efficient Parallel Agentic Systems
Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results. To improve robustness and solution quality, recent approaches deploy multiple agent teams running in parallel to explore diverse reasoning trajectories. However, parallel execution comes at a significant computational cost: when different teams independently reason about similar sub-problems or execute analogous steps, they repeatedly perform substantial overlapping computation. To address these limitations, in this paper, we propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic frameworks that enables selective cross-team information reuse while controlling context growth. LTS introduces a global memory bank accessible to all teams and a lightweight controller that decides whether intermediate agent steps should be added to memory or not. The controller is trained using stepwise reinforcement learning with usage-aware credit assignment, allowing it to identify information that is globally useful across parallel executions. Experiments on the AssistantBench and GAIA benchmarks show that LTS significantly reduces overall runtime while matching or improving task performance compared to memory-free parallel baselines, demonstrating that learned memory admission is an effective strategy for improving the efficiency of parallel agentic systems. Project page: https://joefioresi718.github.io/LTS_webpage/
Location-Aware Dispersion on Anonymous Graphs
The well-studied DISPERSION problem is a fundamental coordination problem in distributed robotics, where a set of mobile robots must relocate so that each occupies a distinct node of a network. DISPERSION assumes that a robot can settle at any node as long as no other robot settles on that node. In this work, we introduce LOCATION-AWARE DISPERSION, a novel generalization of DISPERSION that incorporates location awareness: Let $G = (V, E)$ be an anonymous, connected, undirected graph with $n = |V|$ nodes, each labeled with a color $\sf{col}(v) \in C = \{c_1, \dots, c_t\}, t\leq n$. A set $R = \{r_1, \dots, r_k\}$ of $k \leq n$ mobile robots is given, where each robot $r_i$ has an associated color $\mathsf{col}(r_i) \in C$. Initially placed arbitrarily on the graph, the goal is to relocate the robots so that each occupies a distinct node of the same color. When $|C|=1$, LOCATION-AWARE DISPERSION reduces to DISPERSION. There is a solution to DISPERSION in graphs with any $k\leq n$ without knowing $k,n$. Like DISPERSION, the goal is to solve LOCATION-AWARE DISPERSION minimizing both time and memory requirement at each agent. We develop several deterministic algorithms with guaranteed bounds on both time and memory requirement. We also give an impossibility and a lower bound for any deterministic algorithm for LOCATION-AWARE DISPERSION. To the best of our knowledge, the presented results collectively establish the algorithmic feasibility of LOCATION-AWARE DISPERSION in anonymous networks and also highlight the challenges on getting an efficient solution compared to the solutions for DISPERSION.
comment: 3 tables, 2 figures, 6 pseudo-codes
Emulating Aggregate Human Choice Behavior and Biases with GPT Conversational Agents
Cognitive biases often shape human decisions. While large language models (LLMs) have been shown to reproduce well-known biases, a more critical question is whether LLMs can predict biases at the individual level and emulate the dynamics of biased human behavior when contextual factors, such as cognitive load, interact with these biases. We adapted three well-established decision scenarios into a conversational setting and conducted a human experiment (N=1100). Participants engaged with a chatbot that facilitates decision-making through simple or complex dialogues. Results revealed robust biases. To evaluate how LLMs emulate human decision-making under similar interactive conditions, we used participant demographics and dialogue transcripts to simulate these conditions with LLMs based on GPT-4 and GPT-5. The LLMs reproduced human biases with precision. We found notable differences between models in how they aligned human behavior. This has important implications for designing and evaluating adaptive, bias-aware LLM-based AI systems in interactive contexts.
comment: Accepted at CHI'26. arXiv admin note: substantial text overlap with arXiv:2601.11049
AI Agent Systems for Supply Chains: Structured Decision Prompts and Memory Retrieval AAMAS 2026
This study investigates large language model (LLM) -based multi-agent systems (MASs) as a promising approach to inventory management, which is a key component of supply chain management. Although these systems have gained considerable attention for their potential to address the challenges associated with typical inventory management methods, key uncertainties regarding their effectiveness persist. Specifically, it is unclear whether LLM-based MASs can consistently derive optimal ordering policies and adapt to diverse supply chain scenarios. To address these questions, we examine an LLM-based MAS with a fixed-ordering strategy prompt that encodes the stepwise processes of the problem setting and a safe-stock strategy commonly used in inventory management. Our empirical results demonstrate that, even without detailed prompt adjustments, an LLM-based MAS can determine optimal ordering decisions in a restricted scenario. To enhance adaptability, we propose a novel agent called AIM-RM, which leverages similar historical experiences through similarity matching. Our results show that AIM-RM outperforms benchmark methods across various supply chain scenarios, highlighting its robustness and adaptability.
comment: A full version of the extended abstract accepted by the 25th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2026)
LinguistAgent: A Reflective Multi-Model Platform for Automated Linguistic Annotation
Data annotation remains a significant bottleneck in the Humanities and Social Sciences, particularly for complex semantic tasks such as metaphor identification. While Large Language Models (LLMs) show promise, a significant gap remains between the theoretical capability of LLMs and their practical utility for researchers. This paper introduces LinguistAgent, an integrated, user-friendly platform that leverages a reflective multi-model architecture to automate linguistic annotation. The system implements a dual-agent workflow, comprising an Annotator and a Reviewer, to simulate a professional peer-review process. LinguistAgent supports comparative experiments across three paradigms: Prompt Engineering (Zero/Few-shot), Retrieval-Augmented Generation, and Fine-tuning. We demonstrate LinguistAgent's efficacy using the task of metaphor identification as an example, providing real-time token-level evaluation (Precision, Recall, and $F_1$ score) against human gold standards. The application and codes are released on https://github.com/Bingru-Li/LinguistAgent.
A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps
Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.
comment: 11 pages, 6 figures, 5 tables, Submitted to RSS 2026
Towards a Science of Collective AI: LLM-based Multi-Agent Systems Need a Transition from Blind Trial-and-Error to Rigorous Science
Recent advancements in Large Language Models (LLMs) have greatly extended the capabilities of Multi-Agent Systems (MAS), demonstrating significant effectiveness across a wide range of complex and open-ended domains. However, despite this rapid progress, the field still relies heavily on empirical trial-and-error. It lacks a unified and principled scientific framework necessary for systematic optimization and improvement. This bottleneck stems from the ambiguity of attribution: first, the absence of a structured taxonomy of factors leaves researchers restricted to unguided adjustments; second, the lack of a unified metric fails to distinguish genuine collaboration gain from mere resource accumulation. In this paper, we advocate for a transition to design science through an integrated framework. We advocate to establish the collaboration gain metric ($Γ$) as the scientific standard to isolate intrinsic gains from increased budgets. Leveraging $Γ$, we propose a factor attribution paradigm to systematically identify collaboration-driving factors. To support this, we construct a systematic MAS factor library, structuring the design space into control-level presets and information-level dynamics. Ultimately, this framework facilitates the transition from blind experimentation to rigorous science, paving the way towards a true science of Collective AI.
RuleSmith: Multi-Agent LLMs for Automated Game Balancing
Game balancing is a longstanding challenge requiring repeated playtesting, expert intuition, and extensive manual tuning. We introduce RuleSmith, the first framework that achieves automated game balancing by leveraging the reasoning capabilities of multi-agent LLMs. It couples a game engine, multi-agent LLMs self-play, and Bayesian optimization operating over a multi-dimensional rule space. As a proof of concept, we instantiate RuleSmith on CivMini, a simplified civilization-style game containing heterogeneous factions, economy systems, production rules, and combat mechanics, all governed by tunable parameters. LLM agents interpret textual rulebooks and game states to generate actions, to conduct fast evaluation of balance metrics such as win-rate disparities. To search the parameter landscape efficiently, we integrate Bayesian optimization with acquisition-based adaptive sampling and discrete projection: promising candidates receive more evaluation games for accurate assessment, while exploratory candidates receive fewer games for efficient exploration. Experiments show that RuleSmith converges to highly balanced configurations and provides interpretable rule adjustments that can be directly applied to downstream game systems. Our results illustrate that LLM simulation can serve as a powerful surrogate for automating design and balancing in complex multi-agent environments.
AgentSpawn: Adaptive Multi-Agent Collaboration Through Dynamic Spawning for Long-Horizon Code Generation
Long-horizon code generation requires sustained context and adaptive expertise across domains. Current multi-agent systems use static workflows that cannot adapt when runtime analysis reveals unanticipated complexity. We propose AgentSpawn, an architecture enabling dynamic agent collaboration through: (1) automatic memory transfer during spawning, (2) adaptive spawning policies triggered by runtime complexity metrics, and (3) coherence protocols for concurrent modifications. AgentSpawn addresses five critical gaps in existing research around memory continuity, skill inheritance, task resumption, runtime spawning, and concurrent coherence. Experimental validation demonstrates AgentSpawn achieves 34% higher completion rates than static baselines on benchmarks like SWE-bench while reducing memory overhead by 42% through selective slicing.
comment: 18 pages, 4 figures, 6 tables
Scaling Multi-Agent Epistemic Planning through GNN-Derived Heuristics
Multi-agent Epistemic Planning (MEP) is an autonomous planning framework for reasoning about both the physical world and the beliefs of agents, with applications in domains where information flow and awareness among agents are critical. The richness of MEP requires states to be represented as Kripke structures, i.e., directed labeled graphs. This representation limits the applicability of existing heuristics, hindering the scalability of epistemic solvers, which must explore an exponential search space without guidance, resulting often in intractability. To address this, we exploit Graph Neural Networks (GNNs) to learn patterns and relational structures within epistemic states, to guide the planning process. GNNs, which naturally capture the graph-like nature of Kripke models, allow us to derive meaningful estimates of state quality -- e.g., the distance from the nearest goal -- by generalizing knowledge obtained from previously solved planning instances. We integrate these predictive heuristics into an epistemic planning pipeline and evaluate them against standard baselines, showing improvements in the scalability of multi-agent epistemic planning.
LLM-Based Social Simulations Require a Boundary
This position paper argues that LLM-based social simulations require clear boundaries to make meaningful contributions to social science. While Large Language Models (LLMs) offer promising capabilities for simulating human behavior, their tendency to produce homogeneous outputs, acting as an "average persona", fundamentally limits their ability to capture the behavioral diversity essential for complex social dynamics. We examine why heterogeneity matters for social simulations and how current LLMs fall short, analyzing the relationship between mean alignment and variance in LLM-generated behaviors. Through a systematic review of representative studies, we find that validation practices often fail to match the heterogeneity requirements of research questions: while most papers include ground truth comparisons, fewer than half explicitly assess behavioral variance, and most that do report lower variance than human populations. We propose that researchers should: (1) match validation depth to the heterogeneity demands of their research questions, (2) explicitly report variance alongside mean alignment, and (3) constrain claims to collective-level qualitative patterns when variance is insufficient. Rather than dismissing LLM-based simulation, we advocate for a boundary-aware approach that ensures these methods contribute genuine insights to social science.
High-probability Convergence Guarantees of Decentralized SGD
Convergence in high-probability (HP) has attracted increasing interest, due to implying exponentially decaying tail bounds and strong guarantees for individual runs of an algorithm. While many works study HP guarantees in centralized settings, much less is understood in the decentralized setup, where existing works require strong assumptions, like uniformly bounded gradients, or asymptotically vanishing noise. This results in a significant gap between the assumptions used to establish convergence in the HP and the mean-squared error (MSE) sense, and is also contrary to centralized settings, where it is known that $\mathtt{SGD}$ converges in HP under the same conditions on the cost function as needed for MSE convergence. Motivated by these observations, we study the HP convergence of Decentralized $\mathtt{SGD}$ ($\mathtt{DSGD}$) in the presence of light-tailed noise, providing several strong results. First, we show that $\mathtt{DSGD}$ converges in HP under the same conditions on the cost as in the MSE sense, removing the restrictive assumptions used in prior works. Second, our sharp analysis yields order-optimal rates for both non-convex and strongly convex costs. Third, we establish a linear speed-up in the number of users, leading to matching, or strictly better transient times than those obtained from MSE results, further underlining the tightness of our analysis. To the best of our knowledge, this is the first work that shows $\mathtt{DSGD}$ achieves a linear speed-up in the HP sense. Our relaxed assumptions and sharp rates stem from several technical results of independent interest, including a result on the variance-reduction effect of decentralized methods in the HP sense, as well as a novel bound on the MGF of strongly convex costs, which is of interest even in centralized settings. Finally, we provide experiments that validate our theory.
comment: 47 pages, 2 figures
Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent Community
The rapid emergence of autonomous large language model agents has given rise to persistent, large-scale agent ecosystems whose collective behavior cannot be adequately understood through anecdotal observation or small-scale simulation. This paper introduces data-driven silicon sociology as a systematic empirical framework for studying social structure formation among interacting artificial agents. We present a pioneering large-scale data mining investigation of an in-the-wild agent society by analyzing Moltbook, a social platform designed primarily for agent-to-agent interaction. At the time of study, Moltbook hosted over 150,000 registered autonomous agents operating across thousands of agent-created sub-communities. Using programmatic and non-intrusive data acquisition, we collected and analyzed the textual descriptions of 12,758 submolts, which represent proactive sub-community partitioning activities within the ecosystem. Treating agent-authored descriptions as first-class observational artifacts, we apply rigorous preprocessing, contextual embedding, and unsupervised clustering techniques to uncover latent patterns of thematic organization and social space structuring. The results show that autonomous agents systematically organize collective space through reproducible patterns spanning human-mimetic interests, silicon-centric self-reflection, and early-stage economic and coordination behaviors. Rather than relying on predefined sociological taxonomies, these structures emerge directly from machine-generated data traces. This work establishes a methodological foundation for data-driven silicon sociology and demonstrates that data mining techniques can provide a powerful lens for understanding the organization and evolution of large autonomous agent societies.
comment: 11 pages, 3 figures. This update refines the framing of novelty claims by replacing absolute "first" statements with more precise and scoped formulations (e.g., "one of the earliest"). Our systematic methodological and empirical contributions remain unchanged
Systems and Control (EESS)
GUARDIAN: Safety Filtering for Systems with Perception Models Subject to Adversarial Attacks
Safety filtering is an effective method for enforcing constraints in safety-critical systems, but existing methods typically assume perfect state information. This limitation is especially problematic for systems that rely on neural network (NN)-based state estimators, which can be highly sensitive to noise and adversarial input perturbations. We address these problems by introducing GUARDIAN: Guaranteed Uncertainty-Aware Reachability Defense against Adversarial INterference, a safety filtering framework that provides formal safety guarantees for systems with NN-based state estimators. At runtime, GUARDIAN uses neural network verification tools to provide guaranteed bounds on the system's state estimate given possible perturbations to its observation. It then uses a modified Hamilton-Jacobi reachability formulation to construct a safety filter that adjusts the nominal control input based on the verified state bounds and safety constraints. The result is an uncertainty-aware filter that ensures safety despite the system's reliance on an NN estimator with noisy, possibly adversarial, input observations. Theoretical analysis and numerical experiments demonstrate that GUARDIAN effectively defends systems against adversarial attacks that would otherwise lead to a violation of safety constraints.
comment: 6 pages, 4 figures, submitted to L-CSS/CDC
Normalization of ReLU Dual for Cut Generation in Stochastic Mixed-Integer Programs
We study the Rectified Linear Unit (ReLU) dual, an existing dual formulation for stochastic programs that reformulates non-anticipativity constraints using ReLU functions to generate tight, non-convex, and mixed-integer representable cuts. While this dual reformulation guarantees convergence with mixed-integer state variables, it admits multiple optimal solutions that can yield weak cuts. To address this issue, we propose normalizing the dual in the extended space to identify solutions that yield stronger cuts. We prove that the resulting normalized cuts are tight and Pareto-optimal in the original state space. We further compare normalization with existing regularization-based approaches for handling dual degeneracy and explain why normalization offers key advantages. In particular, we show that normalization can recover any cut obtained via regularization, whereas the converse does not hold. Computational experiments demonstrate that the proposed approach outperforms existing methods by consistently yielding stronger cuts and reducing solution times on harder instances.
A Hybrid Data-Driven Algorithm for Real-Time Friction Force Estimation in Hydraulic Cylinders
Hydraulic systems are widely utilized in industrial applications due to their high force generation, precise control, and ability to function in harsh environments. Hydraulic cylinders, as actuators in these systems, apply force and position through the displacement of hydraulic fluid, but their operation is significantly influenced by friction force. Achieving precision in hydraulic cylinders requires an accurate friction model under various operating conditions. Existing analytical models, often derived from experimental tests, necessitate the identification or estimation of influencing factors but are limited in adaptability and computational efficiency. This research introduces a data-driven, hybrid algorithm based on Long Short-Term Memory (LSTM) networks and Random Forests for nonlinear friction force estimation. The algorithm effectively combines feature detection and estimation processes using training data acquired from an experimental hydraulic test setup. It achieves a consistent and stable model error of less than 10% across diverse operating conditions and external load variations, ensuring robust performance in complex situations. The computational cost of the algorithm is 1.51 milliseconds per estimation, making it suitable for real-time applications. The proposed method addresses the limitations of analytical models by delivering high precision and computational efficiency. The algorithm's performance is validated through detailed analysis and experimental results, including direct comparisons with the LuGre model. The comparison highlights that while the LuGre model offers a theoretical foundation for friction modeling, its performance is limited by its inability to dynamically adjust to varying operational conditions of the hydraulic cylinder, further emphasizing the advantages of the proposed hybrid approach in real-time applications.
comment: Published in: 2025 33rd International Conference on Electrical Engineering (ICEE), Publisher IEEE
Privacy-Preserving Dynamic Average Consensus by Masking Reference Signals
In multi-agent systems, dynamic average consensus (DAC) is a decentralized estimation strategy in which a set of agents tracks the average of time-varying reference signals. Because DAC requires exchanging state information with neighbors, attackers may gain access to these states and infer private information. In this paper, we develop a privacy-preserving method that protects each agent's reference signal from external eavesdroppers and honest-but-curious agents while achieving the same convergence accuracy and convergence rate as conventional DAC. Our approach masks the reference signals by having each agent draw a random real number for each neighbor, exchanges that number over an encrypted channel at the initialization, and computes a masking value to form a masked reference. Then the agents run the conventional DAC algorithm using the masked references. Convergence and privacy analyses show that the proposed algorithm matches the convergence properties of conventional DAC while preserving the privacy of the reference signals. Numerical simulations validate the effectiveness of the proposed privacy-preserving DAC algorithm.
comment: Accepted at ACC 2026
From Vision to Decision: Neuromorphic Control for Autonomous Navigation and Tracking
Robotic navigation has historically struggled to reconcile reactive, sensor-based control with the decisive capabilities of model-based planners. This duality becomes critical when the absence of a predominant option among goals leads to indecision, challenging reactive systems to break symmetries without computationally-intense planners. We propose a parsimonious neuromorphic control framework that bridges this gap for vision-guided navigation and tracking. Image pixels from an onboard camera are encoded as inputs to dynamic neuronal populations that directly transform visual target excitation into egocentric motion commands. A dynamic bifurcation mechanism resolves indecision by delaying commitment until a critical point induced by the environmental geometry. Inspired by recently proposed mechanistic models of animal cognition and opinion dynamics, the neuromorphic controller provides real-time autonomy with a minimal computational burden, a small number of interpretable parameters, and can be seamlessly integrated with application-specific image processing pipelines. We validate our approach in simulation environments as well as on an experimental quadrotor platform.
UAV Trajectory Optimization via Improved Noisy Deep Q-Network
This paper proposes an Improved Noisy Deep Q-Network (Noisy DQN) to enhance the exploration and stability of Unmanned Aerial Vehicle (UAV) when applying deep reinforcement learning in simulated environments. This method enhances the exploration ability by combining the residual NoisyLinear layer with an adaptive noise scheduling mechanism, while improving training stability through smooth loss and soft target network updates. Experiments show that the proposed model achieves faster convergence and up to $+40$ higher rewards compared to standard DQN and quickly reach to the minimum number of steps required for the task 28 in the 15 * 15 grid navigation environment set up. The results show that our comprehensive improvements to the network structure of NoisyNet, exploration control, and training stability contribute to enhancing the efficiency and reliability of deep Q-learning.
Observer-based Control of Multi-agent Systems under STL Specifications
This paper proposes a decentralized controller for large-scale heterogeneous multi-agent systems subject to bounded external disturbances, where agents must satisfy Signal Temporal Logic (STL) specifications requiring cooperation among non-communicating agents. To address the lack of direct communication, we employ a decentralized k-hop Prescribed Performance State Observer (k-hop PPSO) to provide each agent with state estimates of those agents it cannot communicate with. By leveraging the performance bounds on the state estimation errors guaranteed by the k-hop PPSO, we first modify the space robustness of the STL tasks to account for these errors, and then exploit the modified robustness to design a decentralized continuous-time feedback controller that ensures satisfaction of the STL tasks even under worst-case estimation errors. A simulation result is provided to validate the proposed framework.
comment: This paper has been submitted for consideration to the 23rd IFAC World Congress
Fairness-aware design of nudging policies under stochasticity and prejudices
We present an injustice-aware innovation-diffusion model extending the Generalized Linear Threshold framework by assigning agents activation thresholds drawn from a Beta distribution to capture the stochastic nature of adoption shaped by inequalities. Because incentive policies themselves can inadvertently amplify these inequalities, building on this model, we design a fair Model Predictive Control (MPC) scheme that incorporates equality and equity objectives for allocating incentives. Simulations using real mobility-habit data show that injustice reduces overall adoption, while equality smooths incentive distribution and equity reduces disparities in the final outcomes. Thus, incorporating fairness ensures effective diffusion without exacerbating existing social inequalities.
comment: Submitted to IFAC WC 2026
On Path-based Marginal Cost of Heterogeneous Traffic Flow for General Networks
Path marginal cost (PMC) is a crucial component in solving path-based system-optimal dynamic traffic assignment (SO-DTA), dynamic origin-destination demand estimation (DODE), and network resilience analysis. However, accurately evaluating PMC in heterogeneous traffic conditions poses significant challenges. Previous studies often focus on homogeneous traffic flow of single vehicle class and do not well address the interactive effect of heterogeneous traffic flows and the resultant computational issues. This study proposes a novel but simple method for approximately evaluating PMC in complex heterogeneous traffic condition. The method decomposes PMC into intra-class and inter-class terms and uses conversion factor derived from heterogeneous link dynamics to explicitly model the intricate relationships between vehicle classes. Additionally, the method considers the non-differentiable issue that arises when mixed traffic flow approaches system optimum conditions. The proposed method is tested on a small corridor network with synthetic demand and a large-scale network with calibrated demand from real-world data. Results demonstrated that our method exhibits superior performance in solving bi-class SO-DTA problems, yielding lower total travel cost and capturing the multi-class flow competition at the system optimum state.
Toward Operationalizing Rasmussen: Drift Observability on the Simplex for Evolving Systems
Monitoring drift into failure is hindered by Euclidean anomaly detection that can conflate safe operational trade-offs with risk accumulation in signals expressed as shares, and by architectural churn that makes fixed schemas (and learned models) stale before rare boundary events occur. Rasmussen's dynamic safety model motivates drift under competing pressures, but operationalizing it for software is difficult because many high-value operational signals (effort, remaining margin, incident impact) are compositional and their parts evolve. We propose a vision for drift observability on the simplex: model drift and boundary proximity in Aitchison geometry to obtain coordinate-invariant direction and distance-to-safety in interpretable balance coordinates. To remain comparable under churn, a monitor would continuously refresh its part inventory and policy-defined boundaries from engineering artifacts and apply lineage-aware aggregation. We outline early-warning diagnostics and falsifiable hypotheses for future evaluation.
Emergence-as-Code for Self-Governing Reliable Systems
SLO-as-code has made per-service} reliability declarative, but user experience is defined by journeys whose reliability is an emergent property of microservice topology, routing, redundancy, timeouts/fallbacks, shared failure domains, and tail amplification. As a result, journey objectives (e.g., "checkout p99 < 400 ms") are often maintained outside code and drift as the system evolves, forcing teams to either miss user expectations or over-provision and gate releases with ad-hoc heuristics. We propose Emergence-as-Code (EmaC), a vision for making journey reliability computable and governable via intent plus evidence. An EmaC spec declares journey intent (objective, control-flow operators, allowed actions) and binds it to atomic SLOs and telemetry. A runtime inference component consumes operational artifacts (e.g., tracing and traffic configuration) to synthesize a candidate journey model with provenance and confidence. From the last accepted model, the EmaC compiler/controller derives bounded journey SLOs and budgets under explicit correlation assumptions (optimistic independence vs. pessimistic shared fate), and emits control-plane artifacts (burn-rate alerts, rollout gates, action guards) that are reviewable in a Git workflow. An anonymized artifact repository provides a runnable example specification and generated outputs.
Ontology-Driven Robotic Specification Synthesis
This paper addresses robotic system engineering for safety- and mission-critical applications by bridging the gap between high-level objectives and formal, executable specifications. The proposed method, Robotic System Task to Model Transformation Methodology (RSTM2) is an ontology-driven, hierarchical approach using stochastic timed Petri nets with resources, enabling Monte Carlo simulations at mission, system, and subsystem levels. A hypothetical case study demonstrates how the RSTM2 method supports architectural trades, resource allocation, and performance analysis under uncertainty. Ontological concepts further enable explainable AI-based assistants, facilitating fully autonomous specification synthesis. The methodology offers particular benefits to complex multi-robot systems, such as the NASA CADRE mission, representing decentralized, resource-aware, and adaptive autonomous systems of the future.
comment: 8 pages, 9 figures, 3 tables, journal
Robust data-driven model-reference control of linear perturbed systems via sliding mode generation
This paper introduces a data-based integral sliding mode control scheme for robustification of model-reference controllers, accommodating generic multivariable linear systems with unknown dynamics and affected by matched disturbances. Specifically, an integral sliding mode control (ISMC) law is recast into a data-based framework relying on an integral sliding variable depending only on the reference model, without the need of modeling the plant. The main strength of the proposed approach is the enforcement of the desired reference model in closed-loop under sliding mode conditions, despite the lack of knowledge of the model dynamics and the presence of the matched disturbances. Moreover, the conditions required to guarantee an integral sliding mode generation and the closed-loop stability are formally analyzed in the paper, remarking the generality of the proposed data-driven integral sliding mode control (DD-ISMC) with respect to the related model-based counterpart. Finally, the main practices for the data-based design of the proposed control scheme are deeply discussed in the paper, and the proposed method is tested in simulation on a benchmark example, and experimentally on a real laboratory setup. Simulation and experimental evidence fully corroborates the theoretical analysis, thus motivating further research in this direction.
Distributed Model Predictive Control for Energy and Comfort Optimization in Large Buildings Using Piecewise Affine Approximation
The control of large buildings encounters challenges in computational efficiency due to their size and nonlinear components. To address these issues, this paper proposes a Piecewise Affine (PWA)-based distributed scheme for Model Predictive Control (MPC) that optimizes energy and comfort through PWA-based quadratic programming. We utilize the Alternating Direction Method of Multipliers (ADMM) for effective decomposition and apply the PWA technique to handle the nonlinear components. To solve the resulting large-scale nonconvex problems, the paper introduces a convex ADMM algorithm that transforms the nonconvex problem into a series of smaller convex problems, significantly enhancing computational efficiency. Furthermore, we demonstrate that the convex ADMM algorithm converges to a local optimum of the original problem. A case study involving 36 zones validates the effectiveness of the proposed method. Our proposed method reduces execution time by 86\% compared to the centralized version.
Policy-Driven Orchestration Framework for Multi-Operator Non-Terrestrial Networks
Non-terrestrial networks (NTNs) have gained significant attention for their scalability and wide coverage in next-generation communication systems. A large number of NTN nodes, such as satellites, are required to establish a global NTN, but not all operators have the capability to deploy such a system. Therefore, cooperation among multiple operators, facilitated by an orchestrator, enables the construction of virtually large-scale constellations. In this paper, we propose a weak-control-based orchestration framework that coordinates multiple NTN operators while ensuring that operations align with the policies of both the orchestrator and the individual operators. Unlike centralized orchestration frameworks, where the orchestrator determines the entire route from source to destination, the proposed framework allows each operator to select preferred routes from multiple candidates provided by the orchestrator. To evaluate the effectiveness of our proposed framework, we conducted numerical simulations under various scenarios and network configurations including dynamic NTN environments with time-varying topologies, showing that inter-operator cooperation improves the availability of feasible end-to-end routes. Furthermore, we analyzed the iterative negotiation process to address policy conflicts and quantitatively demonstrated the "price of autonomy," where strict individual policies degrade global feasibility and performance. The results also demonstrate that outcomes of the proposed framework depend on the operators' policies and that hop count and latency increase as the number of operators grows. These findings validate the proposed framework's ability to deliver practical benefits of orchestrated multi-operator collaboration in future NTN environments.
comment: Accepted for publication in IEEE Transactions on Communications
A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps
Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.
comment: 11 pages, 6 figures, 5 tables, Submitted to RSS 2026
Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates
Neural Lyapunov and barrier certificates have recently been used as powerful tools for verifying the safety and stability properties of deep reinforcement learning (RL) controllers. However, existing methods offer guarantees only under fixed ideal unperturbed dynamics, limiting their reliability in real-world applications where dynamics may deviate due to uncertainties. In this work, we study the problem of synthesizing \emph{robust neural Lyapunov barrier certificates} that maintain their guarantees under perturbations in system dynamics. We formally define a robust Lyapunov barrier function and specify sufficient conditions based on Lipschitz continuity that ensure robustness against bounded perturbations. We propose practical training objectives that enforce these conditions via adversarial training, Lipschitz neighborhood bound, and global Lipschitz regularization. We validate our approach in two practically relevant environments, Inverted Pendulum and 2D Docking. The former is a widely studied benchmark, while the latter is a safety-critical task in autonomous systems. We show that our methods significantly improve both certified robustness bounds (up to $4.6$ times) and empirical success rates under strong perturbations (up to $2.4$ times) compared to the baseline. Our results demonstrate effectiveness of training robust neural certificates for safe RL under perturbations in dynamics.
A Short and Unified Convergence Analysis of the SAG, SAGA, and IAG Algorithms
Stochastic variance-reduced algorithms such as Stochastic Average Gradient (SAG) and SAGA, and their deterministic counterparts like the Incremental Aggregated Gradient (IAG) method, have been extensively studied in large-scale machine learning. Despite their popularity, existing analyses for these algorithms are disparate, relying on different proof techniques tailored to each method. Furthermore, the original proof of SAG is known to be notoriously involved, requiring computer-aided analysis. Focusing on finite-sum optimization with smooth and strongly convex objective functions, our main contribution is to develop a single unified convergence analysis that applies to all three algorithms: SAG, SAGA, and IAG. Our analysis features two key steps: (i) establishing a bound on delays due to stochastic sub-sampling using simple concentration tools, and (ii) carefully designing a novel Lyapunov function that accounts for such delays. The resulting proof is short and modular, providing the first high-probability bounds for SAG and SAGA that can be seamlessly extended to non-convex objectives and Markov sampling. As an immediate byproduct of our new analysis technique, we obtain the best known rates for the IAG algorithm, significantly improving upon prior bounds.
Nonlinear Predictive Cost Adaptive Control of Pseudo-Linear Input-Output Models Using Polynomial, Fourier, and Cubic Spline Observables
Control of nonlinear systems with high levels of uncertainty is practically relevant and theoretically challenging. This paper presents a numerical investigation of an adaptive nonlinear model predictive control (MPC) technique that relies entirely on online system identification without prior modeling, training, or data collection. In particular, the paper considers predictive cost adaptive control (PCAC), which is an extension of generalized predictive control. Nonlinear PCAC (NPCAC) uses recursive least squares (RLS) with subspace of information forgetting (SIFt) to identify a discrete-time, pseudo-linear, input-output model, which is used with iterative MPC for nonlinear receding-horizon optimization. The performance of NPCAC is illustrated using polynomial, Fourier, and cubic-spline basis functions.
The Necessity of a Holistic Safety Evaluation Framework for AI-Based Automation Features
The intersection of Safety of Intended Functionality (SOTIF) and Functional Safety (FuSa) analysis of driving automation features has traditionally excluded Quality Management (QM) components from rigorous safety impact evaluations. While QM components are not typically classified as safety-relevant, recent developments in artificial intelligence (AI) integration reveal that such components can contribute to SOTIF-related hazardous risks. Compliance with emerging AI safety standards, such as ISO/PAS 8800, necessitates re-evaluating safety considerations for these components. This paper examines the necessity of conducting holistic safety analysis and risk assessment on AI components, emphasizing their potential to introduce hazards with the capacity to violate risk acceptance criteria when deployed in safety-critical driving systems, particularly in perception algorithms. Using case studies, we demonstrate how deficiencies in AI-driven perception systems can emerge even in QM-classified components, leading to unintended functional behaviors with critical safety implications. By bridging theoretical analysis with practical examples, this paper argues for the adoption of comprehensive FuSa, SOTIF, and AI standards-driven methodologies to identify and mitigate risks in AI components. The findings demonstrate the importance of revising existing safety frameworks to address the evolving challenges posed by AI, ensuring comprehensive safety assurance across all component classifications spanning multiple safety standards.
PLATO Hand: Shaping Contact Behavior with Fingernails for Precise Manipulation
We present the PLATO Hand, a dexterous robotic hand with a hybrid fingertip that embeds a rigid fingernail within a compliant pulp. This design shapes contact behavior to enable diverse interaction modes across a range of object geometries. We develop a strain-energy-based bending-indentation model to guide the fingertip design and to explain how guided contact preserves local indentation while suppressing global bending. Experimental results show that the proposed robotic hand design demonstrates improved pinching stability, enhanced force observability, and successful execution of edge-sensitive manipulation tasks, including paper singulation, card picking, and orange peeling. Together, these results show that coupling structured contact geometry with a force-motion transparent mechanism provides a principled, physically embodied approach to precise manipulation.
Chasing Tails: How Do People Respond to Wait Time Distributions?
We use a series of pre-registered, incentive-compatible online experiments to investigate how people evaluate and choose among different waiting time distributions. Our main findings are threefold. First, consistent with prior literature, people show an aversion to both longer expected waits and higher variance. Second, and more surprisingly, moment-based utility models fail to capture preferences when distributions have thick-right tails: indeed, decision-makers strongly prefer distributions with long-right tails (where probability mass is more evenly distributed over a larger support set) relative to tails that exhibit a spike near the maximum possible value, even when controlling for mean, variance, and higher moments. Conditional Value at Risk (CVaR) utility models commonly used in portfolio theory predict these choices well. Third, when given a choice, decision-makers overwhelmingly seek information about right-tail outcomes. These results have practical implications for service operations: (1) service designs that create a spike in long waiting times (such as priority or dedicated queue designs) may be particularly aversive; (2) when informativeness is the goal, providers should prioritize sharing right-tail probabilities or percentiles; and (3) to increase service uptake, providers can strategically disclose (or withhold) distributional information depending on right-tail shape.
A hard-constrained NN learning framework for rapidly restoring AC-OPF from DC-OPF
This paper proposes a hard-constrained unsupervised learning framework for rapidly solving the non-linear and non-convex AC optimal power flow (AC-OPF) problem in real-time operation. Without requiring ground-truth AC-OPF solutions, feasibility and optimality are ensured through a properly designed learning environment and training loss. Inspired by residual learning, the neural network (NN) learns the correction mapping from the DC-OPF solution to the active power setpoints of the generators through re-dispatch. A subsequent optimization model is utilized to restore the optimal AC-OPF solution, and the resulting projection difference is employed as the training loss. A replay buffer is utilized to enhance learning efficiency by fully leveraging past data pairs. The optimization model is cast as a differentiable optimization layer, where the gradient is derived by applying the implicit function theorem to the KKT conditions at the optimal solution. Tested on IEEE-118 and PEGASE-9241 bus systems, numerical results demonstrate that the proposed NN can obtain strictly feasible and near-optimal solutions with reduced computational time compared to conventional optimization solvers. In addition, aided by the updated DC-OPF solution under varying topologies, the trained NN, together with the PF solver, can rapidly find the corresponding AC solution. The proposed method achieves a $40\times$ time speedup, while maintaining an average constraint violation on the order of $10^{-4}$ and an optimization gap below $1\%$.
Dynamic Quantum Optimal Communication Topology Design for Consensus Control in Linear Multi-Agent Systems
This paper proposes a quantum framework for the design of communication topologies in consensus-based multi-agent systems. The communication graph is selected online by solving a mixed-integer quadratic program (MIQP) that minimizes a cost combining communication and distance penalties with degree-regularization terms, while enforcing exact connectivity through a flow-based formulation. To cope with the combinatorial complexity of this NP-hard problem, we develop a three-block ADMM scheme that decomposes the MIQP into a convex quadratic program in relaxed edge and flow variables, a pure binary unconstrained subproblem, and a closed-form auxiliary update. The binary subproblem is mapped to a quadratic unconstrained binary optimization (QUBO) Hamiltonian and approximately solved via quantum imaginary time evolution (QITE). The resulting time-varying, optimizer-generated Laplacians are applied to linear first- and second-order consensus dynamics. Numerical simulations on networks demonstrate that the proposed method produces connected topologies that satisfy degree constraints, achieve consensus, and incur costs comparable to those of classical mixed-integer solvers, thereby illustrating how quantum algorithms can be embedded as topology optimizers within closed-loop distributed control architectures.
Active Localization of Unstable Systems with Coarse Information SC
We study localization and control for unstable systems under coarse, single-bit sensing. Motivated by understanding the fundamental limitations imposed by such minimal feedback, we identify sufficient conditions under which the initial state can be recovered despite instability and extremely sparse measurements. Building on these conditions, we develop an active localization algorithm that integrates a set-based estimator with a control strategy derived from Voronoi partitions, which provably estimates the initial state while ensuring the agent remains in informative regions. Under the derived conditions, the proposed approach guarantees exponential contraction of the initial-state uncertainty, and the result is further supported by numerical experiments. These findings can offer theoretical insight into localization in robotics, where sensing is often limited to coarse abstractions such as keyframes, segmentations, or line-based features.
comment: 10 pages, 4 figures, Accepted by International Conference on Hybrid Systems: Computation and Control (HSCC) 2026
Dynamic Modeling, Parameter Identification and Numerical Analysis of Flexible Cables in Flexibly Connected Dual-AUV Systems
This research presents a dynamic modeling framework and parameter identification methods for describing the highly nonlinear behaviors of flexibly connected dual-AUV systems. The modeling framework is established based on the lumped mass method, integrating axial elasticity, bending stiffness, added mass and hydrodynamic forces, thereby accurately capturing the time-varying response of the forces and cable configurations. To address the difficulty of directly measuring material-related and hydrodynamic coefficients, this research proposes a parameter identification method that combines the physical model with experimental data. High-precision inversion of the equivalent Youngs modulus and hydrodynamic coefficients is performed through tension experiments under multiple configurations, effectively demonstrating that the identified model maintains predictive consistency in various operational conditions. Further numerical analysis indicates that the dynamic properties of flexible cable exhibit significant nonlinear characteristics, which are highly dependent on material property variations and AUV motion conditions. This nonlinear dynamic behavior results in two typical response states, slack and taut, which are jointly determined by boundary conditions and hydrodynamic effects, significantly affecting the cable configuration and endpoint loads. In this research, the dynamics of flexible cables under complex boundary conditions is revealed, providing a theoretical foundation for the design, optimization and further control research of similar systems.
Physical Human-Robot Interaction: A Critical Review of Safety Constraints
This paper aims to provide a clear and rigorous understanding of commonly recognized safety constraints in physical human-robot interaction, particularly regarding ISO/TS 15066. We investigate the derivation of these constraints, critically examine the underlying assumptions, and evaluate their practical implications for system-level safety and performance in industrially relevant scenarios. Key design parameters within safety-critical control architectures are identified, and numerical examples are provided to quantify performance degradation arising from typical approximations and design decisions in manufacturing environments. Within this analysis, the fundamental role of energy in safety assessment is emphasized, providing focused insights into energy-based safety methodologies for collaborative industrial robot systems.
Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem
Urban mobility systems are transitioning toward electric, on-demand services, creating operational challenges for fleet management under energy and service-quality constraints. The Electric Dial-a-Ride Problem (E-DARP) extends the classical dial-a-ride problem by incorporating limited battery capacity and nonlinear charging dynamics, increasing computational complexity and limiting the scalability of exact methods for real-time use. This paper proposes a deep reinforcement learning approach based on an edge-centric graph neural network encoder and an attention-driven route construction policy. By operating directly on edge attributes such as travel time and energy consumption, the method captures non-Euclidean, asymmetric, and energy-dependent routing costs in real road networks. The learned policy jointly optimizes routing, charging, and service quality without relying on Euclidean assumptions or handcrafted heuristics. The approach is evaluated on two case studies using ride-sharing data from San Francisco. On benchmark instances, the method achieves solutions within 0.4% of best-known results while reducing computation times by orders of magnitude. A second case study considers large-scale instances with up to 250 request pairs, realistic energy models, and nonlinear charging. On these instances, the learned policy outperforms Adaptive Large Neighborhood Search (ALNS) by 9.5% in solution quality while achieving 100% service completion, with inference times under 10 seconds compared to hours for the metaheuristic. Finally, sensitivity analyses quantify the impact of battery capacity, fleet size, ride-sharing capacity, and reward weights, while robustness experiments show that deterministically trained policies generalize effectively under stochastic conditions.
Privacy-Preserving Distributed Control for a Networked Battery Energy Storage System
The increasing deployment of distributed Battery Energy Storage Systems (BESSs) in modern power grids necessitates effective coordination strategies to ensure state-of-charge (SoC) balancing and accurate power delivery. While distributed control frameworks offer scalability and resilience, they also raise significant privacy concerns due to the need for inter-agent information exchange. This paper presents a novel privacy-preserving distributed control algorithm for SoC balancing in a networked BESS. The proposed framework includes distributed power allocation law that is designed based on two privacy-preserving distributed estimators, one for the average unit state and the other for the average desired power. The average unit state estimator is designed via the state decomposition method without disclosing sensitive internal states. The proposed power allocation law based on these estimators ensures asymptotic SoC balancing and global power delivery while safeguarding agent privacy from external eavesdroppers. The effectiveness and privacy-preserving properties of the proposed control strategy are demonstrated through simulation results.
comment: Accepted for publication in Journal of Energy Storage
ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
Stochastic Generalized Dynamic Games with Coupled Chance Constraints
This paper investigates stochastic generalized dynamic games with coupling chance constraints, where agents have incomplete information about uncertainties satisfying a concentration of measure property. This problem, in general, is non-convex and NP-hard. To address this, we propose a convex under-approximation by replacing chance constraints with tightened expected-value constraints, yielding a tractable game. We prove the existence of a stochastic generalized Nash equilibrium (SGNE) in this new game and show that its variational SGNE is an $\boldsymbol{\varepsilon}$-SGNE for the original game, with $\boldsymbol{\varepsilon}$ expressed via the approximation errors and Lagrange multipliers. A semi-decentralized, sampling-based algorithm with time-varying step sizes is developed, requiring no prior knowledge of the uncertainty distribution or expectation evaluations. Unlike existing methods, it avoids step-size tuning based on Lipschitz constants or adaptive rules. Under standard assumptions on the pseudo-gradient, the algorithm converges almost surely to an SGNE.
Dynamic Resource Allocation with Karma: An Experimental Study
A system of non-tradable credits that flow between individuals like karma, hence proposed under that name, is a mechanism for repeated resource allocation that comes with attractive efficiency and fairness properties, in theory. In this study, we test karma in an online experiment in which human subjects repeatedly compete for a resource with time-varying and stochastic individual preferences or urgency to acquire the resource. We confirm that karma has significant and sustained welfare benefits even in a population with no prior training. We identify mechanism usage in contexts with sporadic high urgency, more so than with frequent moderate urgency, and implemented as a simple (binary) karma bidding scheme as particularly effective for welfare improvements: relatively larger aggregate efficiency gains are realized that are (almost) Pareto superior. These findings provide guidance for further testing and for future implementation plans of such mechanisms in the real world.
Approximating Analytically-Intractable Likelihood Densities with Deterministic Arithmetic for Optimal Particle Filtering
Particle filtering algorithms have enabled practical solutions to problems in autonomous robotics (self-driving cars, UAVs, warehouse robots), target tracking, and econometrics, with further applications in speech processing and medicine (patient monitoring). Yet, their inherent weakness at representing the likelihood of the observation (which often leads to particle degeneracy) remains unaddressed for real-time resource-constrained systems. Improvements such as the optimal proposal and auxiliary particle filter mitigate this issue under specific circumstances and with increased computational cost. This work presents a new particle filtering method and its implementation, which enables tunably-approximative representation of arbitrary likelihood densities as program transformations of parametric distributions.Our method leverages a recent computing platform thatcan perform deterministic computation on probability distributionrepresentations (UxHw) without relying on stochastic methods.For non-Gaussian non-linear systems and with an optimal-auxiliary particle filter, we benchmark the likelihood evaluation error and speed for a total of 294840 evaluation points. For such models, the results show that the UxHw method leads to as much as 37.7x speedup compared to the Monte Carlo alternative. For narrow uniform measurement uncertainty, the particle filter falsely assigns zero likelihood as much as 81.89% of the time whereas UxHw achieves 1.52% false-zero rate. The UxHw approach achieves filter RMSE improvement of as much as 18.9% (average 3.3%) over the Monte Carlo alternative.
Novel Stability Criteria for Discrete and Hybrid Systems via Ramanujan Inner Products
This paper introduces a Ramanujan inner product and its corresponding norm, establishing a novel framework for the stability analysis of hybrid and discrete-time systems as an alternative to traditional Euclidean metrics. We establish new $ε$-$δ$ stability conditions that utilize the unique properties of Ramanujan summations and their relationship with number-theoretic concepts. The proposed approach provides enhanced robustness guarantees and reveals fundamental connections between system stability and arithmetic properties of the system dynamics. Theoretical results are rigorously proven, and simulation results on numerical examples are presented to validate the efficacy of the proposed approach.
comment: 7 pages, 2 figures
Maximum Likelihood Estimation for System Identification of Networks of Dynamical Systems
This paper investigates maximum likelihood estimation for direct system identification in networks of dynamical systems. We establish that the proposed approach is both consistent and efficient. In addition, it is more generally applicable than existing methods, since it can be employed even when measurements are unavailable for all network nodes, provided that network identifiability is satisfied. Finally, we demonstrate that the maximum likelihood problem can be formulated without relying on a predictor, which is key to achieving computationally efficient numerical solutions.
comment: This work has been submitted to the IEEE for possible publication. Submitted to IEEE Transactions on Automatic Control
A Sliced Learning Framework for Online Disturbance Identification in Quadrotor SO(3) Attitude Control
This paper introduces a dimension-decomposed geometric learning framework called Sliced Learning for disturbance identification in quadrotor geometric attitude control. Instead of conventional learning-from-states, this framework adopts a learning-from-error strategy by using the Lie-algebraic error representation as the input feature, enabling axis-wise space decomposition (``slicing") while preserving the SO(3) structure. This is highly consistent with the geometric mechanism of cognitive control observed in neuroscience, where neural systems organize adaptive representations within structured subspaces to enable cognitive flexibility and efficiency. Based on this framework, we develop a lightweight and structurally interpretable Sliced Adaptive-Neuro Mapping (SANM) module. The high-dimensional mapping for online identification is axially ``sliced" into multiple low-dimensional submappings (``slices"), implemented by shallow neural networks and adaptive laws. These neural networks and adaptive laws are updated online via Lyapunov-based adaptation within their respective shared subspaces. To enhance interpretability, we prove exponential convergence despite time-varying disturbances and inertia uncertainties. To our knowledge, Sliced Learning is among the first frameworks to demonstrate lightweight online neural adaptation at 400 Hz on resource-constrained microcontroller units (MCUs), such as STM32, with real-world experimental validation.
comment: v3: Major revision--Revised title; introduced the Sliced Learning framework; added comparative experiments, extended theoretical results, and supplementary materials (such as algorithms and proofs)
Optimal Scheduling of Electricity and Water in Renewable-Colocated Desalination Plants
We develop a mathematical framework for the optimal scheduling of flexible water desalination plants (WDPs) as hybrid generator-load resources. WDPs integrate thermal generation, membrane-based controllable loads, and renewable energy sources, offering unique operational flexibility for power system operations. They can simultaneously participate in two markets: selling desalinated water to a water utility, and bidirectionally transacting electricity with the grid based on their net electricity demand. We formulate the scheduling decision problem of a profit-maximizing WDP, capturing operational, technological, and market-based coupling between water and electricity flows. The threshold-based structure we derive provides computationally tractable coordination suitable for large-scale deployment, offering operational insights into how thermal generation and membrane-based loads complementarily provide continuous bidirectional flexibility. The thresholds are analytically characterized in closed form as explicit functions of technology and tariff parameters. We examine how small changes in the exogenous tariff and technology parameters affect the WDP's profit. Extensive simulations illustrate the optimal WDP's operation, profit, and water-electricity exchange, demonstrating significant improvements relative to benchmark algorithms.
comment: 14 pages, 7 figures, 1 table
Robotics
Capturing Visual Environment Structure Correlates with Control Performance
The choice of visual representation is key to scaling generalist robot policies. However, direct evaluation via policy rollouts is expensive, even in simulation. Existing proxy metrics focus on the representation's capacity to capture narrow aspects of the visual world, like object shape, limiting generalization across environments. In this paper, we take an analytical perspective: we probe pretrained visual encoders by measuring how well they support decoding of environment state -- including geometry, object structure, and physical attributes -- from images. Leveraging simulation environments with access to ground-truth state, we show that this probing accuracy strongly correlates with downstream policy performance across diverse environments and learning settings, significantly outperforming prior metrics and enabling efficient representation selection. More broadly, our study provides insight into the representational properties that support generalizable manipulation, suggesting that learning to encode the latent physical state of the environment is a promising objective for control.
PDF-HR: Pose Distance Fields for Humanoid Robots
Pose and motion priors play a crucial role in humanoid robotics. Although such priors have been widely studied in human motion recovery (HMR) domain with a range of models, their adoption for humanoid robots remains limited, largely due to the scarcity of high-quality humanoid motion data. In this work, we introduce Pose Distance Fields for Humanoid Robots (PDF-HR), a lightweight prior that represents the robot pose distribution as a continuous and differentiable manifold. Given an arbitrary pose, PDF-HR predicts its distance to a large corpus of retargeted robot poses, yielding a smooth measure of pose plausibility that is well suited for optimization and control. PDF-HR can be integrated as a reward shaping term, a regularizer, or a standalone plausibility scorer across diverse pipelines. We evaluate PDF-HR on various humanoid tasks, including single-trajectory motion tracking, general motion tracking, style-based motion mimicry, and general motion retargeting. Experiments show that this plug-and-play prior consistently and substantially strengthens strong baselines. Code and models will be released.
comment: \href{https://gaoyukang33.github.io/PDF-HR/}{Project page}
Beyond the Control Equations: An Artifact Study of Implementation Quality in Robot Control Software
A controller -- a software module managing hardware behavior -- is a key component of a typical robot system. While control theory gives safety guarantees for standard controller designs, the practical implementation of controllers in software introduces complexities that are often overlooked. Controllers are often designed in continuous space, while the software is executed in discrete space, undermining some of the theoretical guarantees. Despite extensive research on control theory and control modeling, little attention has been paid to the implementations of controllers and how their theoretical guarantees are ensured in real-world software systems. We investigate 184 real-world controller implementations in open-source robot software. We examine their application context, the implementation characteristics, and the testing methods employed to ensure correctness. We find that the implementations often handle discretization in an ad hoc manner, leading to potential issues with real-time reliability. Challenges such as timing inconsistencies, lack of proper error handling, and inadequate consideration of real-time constraints further complicate matters. Testing practices are superficial, no systematic verification of theoretical guarantees is used, leaving possible inconsistencies between expected and actual behavior. Our findings highlight the need for improved implementation guidelines and rigorous verification techniques to ensure the reliability and safety of robotic controllers in practice.
PuppetAI: A Customizable Platform for Designing Tactile-Rich Affective Robot Interaction
We introduce PuppetAI, a modular soft robot interaction platform. This platform offers a scalable cable-driven actuation system and a customizable, puppet-inspired robot gesture framework, supporting a multitude of interaction gesture robot design formats. The platform comprises a four-layer decoupled software architecture that includes perceptual processing, affective modeling, motion scheduling, and low-level actuation. We also implemented an affective expression loop that connects human input to the robot platform by producing real-time emotional gestural responses to human vocal input. For our own designs, we have worked with nuanced gestures enacted by "soft robots" with enhanced dexterity and "pleasant-to-touch" plush exteriors. By reducing operational complexity and production costs while enhancing customizability, our work creates an adaptable and accessible foundation for future tactile-based expressive robot research. Our goal is to provide a platform that allows researchers to independently construct or refine highly specific gestures and movements performed by social robots.
Dull, Dirty, Dangerous: Understanding the Past, Present, and Future of a Key Motivation for Robotics
In robotics, the concept of "dull, dirty, and dangerous" (DDD) work has been used to motivate where robots might be useful. In this paper, we conduct an empirical analysis of robotics publications between 1980 and 2024 that mention DDD, and find that only 2.7% of publications define DDD and 8.7% of publications provide concrete examples of tasks or jobs that are DDD. We then review the social science literature on "dull," "dirty," and "dangerous" work to provide definitions and guidance on how to conceptualize DDD for robotics. Finally, we propose a framework that helps the robotics community consider the job context for our technology, encouraging a more informed perspective on how robotics may impact human labor.
AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limitations, we introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning. First, we employ an agentic pipeline where a Vision-Language Model (VLM) guides a generative model to synthesize a complete, watertight object mesh with high-fidelity texture, independent of video occlusions. Second, bypassing fragile SfM entirely, we propose a robust anchor-and-track strategy. We initialize the object pose at a single interaction onset frame using a foundation model and propagate it temporally by leveraging the strong visual similarity between our generated asset and video observations. Finally, a contact-aware optimization integrates semantic, geometric, and interaction stability constraints to enforce physical plausibility. Extensive experiments on HO3D, DexYCB, and in-the-wild videos reveal that AGILE outperforms baselines in global geometric accuracy while demonstrating exceptional robustness on challenging sequences where prior art frequently collapses. By prioritizing physical validity, our method produces simulation-ready assets validated via real-to-sim retargeting for robotic applications.
comment: 11 pages
From Vision to Assistance: Gaze and Vision-Enabled Adaptive Control for a Back-Support Exoskeleton
Back-support exoskeletons have been proposed to mitigate spinal loading in industrial handling, yet their effectiveness critically depends on timely and context-aware assistance. Most existing approaches rely either on load-estimation techniques (e.g., EMG, IMU) or on vision systems that do not directly inform control. In this work, we present a vision-gated control framework for an active lumbar occupational exoskeleton that leverages egocentric vision with wearable gaze tracking. The proposed system integrates real-time grasp detection from a first-person YOLO-based perception system, a finite-state machine (FSM) for task progression, and a variable admittance controller to adapt torque delivery to both posture and object state. A user study with 15 participants performing stooping load lifting trials under three conditions (no exoskeleton, exoskeleton without vision, exoskeleton with vision) shows that vision-gated assistance significantly reduces perceived physical demand and improves fluency, trust, and comfort. Quantitative analysis reveals earlier and stronger assistance when vision is enabled, while questionnaire results confirm user preference for the vision-gated mode. These findings highlight the potential of egocentric vision to enhance the responsiveness, ergonomics, safety, and acceptance of back-support exoskeletons.
Relational Scene Graphs for Object Grounding of Natural Language Commands
Robots are finding wider adoption in human environments, increasing the need for natural human-robot interaction. However, understanding a natural language command requires the robot to infer the intended task and how to decompose it into executable actions, and to ground those actions in the robot's knowledge of the environment, including relevant objects, agents, and locations. This challenge can be addressed by combining the capabilities of Large language models (LLMs) to understand natural language with 3D scene graphs (3DSGs) for grounding inferred actions in a semantic representation of the environment. However, many 3DSGs lack explicit spatial relations between objects, even though humans often rely on these relations to describe an environment. This paper investigates whether incorporating open- or closed-vocabulary spatial relations into 3DSGs can improve the ability of LLMs to interpret natural language commands. To address this, we propose an LLM-based pipeline for target object grounding from open-vocabulary language commands and a vision language model (VLM)-based pipeline to add open-vocabulary spatial edges to 3DSGs from images captured while mapping. Finally, two LLMs are evaluated in a study assessing their performance on the downstream task of target object grounding. Our study demonstrates that explicit spatial relations improve the ability of LLMs to ground objects. Moreover, open-vocabulary relation generation with VLMs proves feasible from robot-captured images, but their advantage over closed-vocabulary relations is found to be limited.
comment: In review for RA-L
Radar-Inertial Odometry For Computationally Constrained Aerial Navigation
Recently, the progress in the radar sensing technology consisting in the miniaturization of the packages and increase in measuring precision has drawn the interest of the robotics research community. Indeed, a crucial task enabling autonomy in robotics is to precisely determine the pose of the robot in space. To fulfill this task sensor fusion algorithms are often used, in which data from one or several exteroceptive sensors like, for example, LiDAR, camera, laser ranging sensor or GNSS are fused together with the Inertial Measurement Unit (IMU) measurements to obtain an estimate of the navigation states of the robot. Nonetheless, owing to their particular sensing principles, some exteroceptive sensors are often incapacitated in extreme environmental conditions, like extreme illumination or presence of fine particles in the environment like smoke or fog. Radars are largely immune to aforementioned factors thanks to the characteristics of electromagnetic waves they use. In this thesis, we present Radar-Inertial Odometry (RIO) algorithms to fuse the information from IMU and radar in order to estimate the navigation states of a (Uncrewed Aerial Vehicle) UAV capable of running on a portable resource-constrained embedded computer in real-time and making use of inexpensive, consumer-grade sensors. We present novel RIO approaches relying on the multi-state tightly-coupled Extended Kalman Filter (EKF) and Factor Graphs (FG) fusing instantaneous velocities of and distances to 3D points delivered by a lightweight, low-cost, off-the-shelf Frequency Modulated Continuous Wave (FMCW) radar with IMU readings. We also show a novel way to exploit advances in deep learning to retrieve 3D point correspondences in sparse and noisy radar point clouds.
Can We Redesign a Shoulder Exosuit to Enhance Comfort and Usability Without Losing Assistance?
Reduced shoulder mobility limits upper-limb function and the performance of activities of daily living across a wide range of conditions. Wearable exosuits have shown promise in assisting arm elevation, reducing muscle effort, and supporting functional movements; however, comfort is rarely prioritized as an explicit design objective, despite it strongly affects real-life, long-term usage. This study presents a redesigned soft shoulder exosuit (Soft Shoulder v2) developed to address comfort-related limitations identified in our previous version, while preserving assistive performance. In parallel, assistance was also improved, shifting from the coronal plane to the sagittal plane to better support functionally relevant hand positioning. A controlled comparison between the previous (v1) and redesigned (v2) modules was conducted in eight healthy participants, who performed static holding, dynamic lifting, and a functional pick and place task. Muscle activity, kinematics, and user-reported outcomes were assessed. Both versions increased endurance time, reduced deltoid activation, and preserved transparency during unpowered shoulder elevation. However, the difference between them emerged most clearly during functional tasks and comfort evaluation. The redesigned module facilitated forward arm positioning and increased transverse plane mobility by up to 30 deg, without increasing muscular demand. User-reported outcomes further indicated a substantial improvement in wearability, with markedly lower perceived pressure and higher ratings in effectiveness, ease of use, and comfort compared to the previous design. Taken together, these findings show that targeted, user-centered design refinements can improve comfort and functional interaction without compromising assistive performance, advancing the development of soft exosuits suitable for prolonged and daily use.
Act, Sense, Act: Learning Non-Markovian Active Perception Strategies from Large-Scale Egocentric Human Data
Achieving generalizable manipulation in unconstrained environments requires the robot to proactively resolve information uncertainty, i.e., the capability of active perception. However, existing methods are often confined in limited types of sensing behaviors, restricting their applicability to complex environments. In this work, we formalize active perception as a non-Markovian process driven by information gain and decision branching, providing a structured categorization of visual active perception paradigms. Building on this perspective, we introduce CoMe-VLA, a cognitive and memory-aware vision-language-action (VLA) framework that leverages large-scale human egocentric data to learn versatile exploration and manipulation priors. Our framework integrates a cognitive auxiliary head for autonomous sub-task transitions and a dual-track memory system to maintain consistent self and environmental awareness by fusing proprioceptive and visual temporal contexts. By aligning human and robot hand-eye coordination behaviors in a unified egocentric action space, we train the model progressively in three stages. Extensive experiments on a wheel-based humanoid have demonstrated strong robustness and adaptability of our proposed method across diverse long-horizon tasks spanning multiple active perception scenarios.
A Unified Complementarity-based Approach for Rigid-Body Manipulation and Motion Prediction
Robotic manipulation in unstructured environments requires planners to reason jointly about free-space motion and sustained, frictional contact with the environment. Existing (local) planning and simulation frameworks typically separate these regimes or rely on simplified contact representations, particularly when modeling non-convex or distributed contact patches. Such approximations limit the fidelity of contact-mode transitions and hinder the robust execution of contact-rich behaviors in real time. This paper presents a unified discrete-time modeling framework for robotic manipulation that consistently captures both free motion and frictional contact within a single mathematical formalism (Unicomp). Building on complementarity-based rigid-body dynamics, we formulate free-space motion and contact interactions as coupled linear and nonlinear complementarity problems, enabling principled transitions between contact modes without enforcing fixed-contact assumptions. For planar patch contact, we derive a frictional contact model from the maximum power dissipation principle in which the set of admissible contact wrenches is represented by an ellipsoidal limit surface. This representation captures coupled force-moment effects, including torsional friction, while remaining agnostic to the underlying pressure distribution across the contact patch. The resulting formulation yields a discrete-time predictive model that relates generalized velocities and contact wrenches through quadratic constraints and is suitable for real-time optimization-based planning. Experimental results show that the proposed approach enables stable, physically consistent behavior at interactive speeds across tasks, from planar pushing to contact-rich whole-body maneuvers.
comment: 18 pages, 7 figures
S-MUSt3R: Sliding Multi-view 3D Reconstruction
The recent paradigm shift in 3D vision led to the rise of foundation models with remarkable capabilities in 3D perception from uncalibrated images. However, extending these models to large-scale RGB stream 3D reconstruction remains challenging due to memory limitations. This work proposes S-MUSt3R, a simple and efficient pipeline that extends the limits of foundation models for monocular 3D reconstruction. Our approach addresses the scalability bottleneck of foundation models through a simple strategy of sequence segmentation followed by segment alignment and lightweight loop closure optimization. Without model retraining, we benefit from remarkable 3D reconstruction capacities of MUSt3R model and achieve trajectory and reconstruction performance comparable to traditional methods with more complex architecture. We evaluate S-MUSt3R on TUM, 7-Scenes and proprietary robot navigation datasets and show that S-MUSt3R runs successfully on long RGB sequences and produces accurate and consistent 3D reconstruction. Our results highlight the potential of leveraging the MUSt3R model for scalable monocular 3D scene in real-world settings, with an important advantage of making predictions directly in the metric space.
comment: 8 pages, 5 figures, 5 tables
TACO: Temporal Consensus Optimization for Continual Neural Mapping
Neural implicit mapping has emerged as a powerful paradigm for robotic navigation and scene understanding. However, real-world robotic deployment requires continual adaptation to changing environments under strict memory and computation constraints, which existing mapping systems fail to support. Most prior methods rely on replaying historical observations to preserve consistency and assume static scenes. As a result, they cannot adapt to continual learning in dynamic robotic settings. To address these challenges, we propose TACO (TemporAl Consensus Optimization), a replay-free framework for continual neural mapping. We reformulate mapping as a temporal consensus optimization problem, where we treat past model snapshots as temporal neighbors. Intuitively, our approach resembles a model consulting its own past knowledge. We update the current map by enforcing weighted consensus with historical representations. Our method allows reliable past geometry to constrain optimization while permitting unreliable or outdated regions to be revised in response to new observations. TACO achieves a balance between memory efficiency and adaptability without storing or replaying previous data. Through extensive simulated and real-world experiments, we show that TACO robustly adapts to scene changes, and consistently outperforms other continual learning baselines.
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
Deploying humanoid robots in real-world settings is fundamentally challenging, as it demands tight integration of perception, locomotion, and manipulation under partial-information observations and dynamically changing environments. As well as transitioning robustly between sub-tasks of different types. Towards addressing these challenges, we propose a novel task - EgoActing, which requires directly grounding high-level instructions into various, precise, spatially aware humanoid actions. We further instantiate this task by introducing EgoActor, a unified and scalable vision-language model (VLM) that can predict locomotion primitives (e.g., walk, turn, move sideways, change height), head movements, manipulation commands, and human-robot interactions to coordinate perception and execution in real-time. We leverage broad supervision over egocentric RGB-only data from real-world demonstrations, spatial reasoning question-answering, and simulated environment demonstrations, enabling EgoActor to make robust, context-aware decisions and perform fluent action inference (under 1s) with both 8B and 4B parameter models. Extensive evaluations in both simulated and real-world environments demonstrate that EgoActor effectively bridges abstract task planning and concrete motor execution, while generalizing across diverse tasks and unseen environments.
The Supportiveness-Safety Tradeoff in LLM Well-Being Agents
Large language models (LLMs) are being integrated into socially assistive robots (SARs) and other conversational agents providing mental health and well-being support. These agents are often designed to sound empathic and supportive in order to maximize user's engagement, yet it remains unclear how increasing the level of supportive framing in system prompts influences safety relevant behavior. We evaluated 6 LLMs across 3 system prompts with varying levels of supportiveness on 80 synthetic queries spanning 4 well-being domains (1440 responses). An LLM judge framework, validated against human ratings, assessed safety and care quality. Moderately supportive prompts improved empathy and constructive support while maintaining safety. In contrast, strongly validating prompts significantly degraded safety and, in some cases, care across all domains, with substantial variation across models. We discuss implications for prompt design, model selection, and domain specific safeguards in SARs deployment.
Robot-Assisted Group Tours for Blind People
Group interactions are essential to social functioning, yet effective engagement relies on the ability to recognize and interpret visual cues, making such engagement a significant challenge for blind people. In this paper, we investigate how a mobile robot can support group interactions for blind people. We used the scenario of a guided tour with mixed-visual groups involving blind and sighted visitors. Based on insights from an interview study with blind people (n=5) and museum experts (n=5), we designed and prototyped a robotic system that supported blind visitors to join group tours. We conducted a field study in a science museum where each blind participant (n=8) joined a group tour with one guide and two sighted participants (n=8). Findings indicated users' sense of safety from the robot's navigational support, concerns in the group participation, and preferences for obtaining environmental information. We present design implications for future robotic systems to support blind people's mixed-visual group participation.
comment: In Proceedings of ACM CHI 2026 conference on Human Factors in Computing Systems
Gust Estimation and Rejection with a Disturbance Observer for Proprioceptive Underwater Soft Morphing Wings ICRA
Unmanned underwater vehicles are increasingly employed for maintenance and surveying tasks at sea, but their operation in shallow waters is often hindered by hydrodynamic disturbances such as waves, currents, and turbulence. These unsteady flows can induce rapid changes in direction and speed, compromising vehicle stability and manoeuvrability. Marine organisms contend with such conditions by combining proprioceptive feedback with flexible fins and tails to reject disturbances. Inspired by this strategy, we propose soft morphing wings endowed with proprioceptive sensing to mitigate environmental perturbations. The wing's continuous deformation provides a natural means to infer dynamic disturbances: sudden changes in camber directly reflect variations in the oncoming flow. By interpreting this proprioceptive signal, a disturbance observer can reconstruct flow parameters in real time. To enable this, we develop and experimentally validate a dynamic model of a hydraulically actuated soft wing with controllable camber. We then show that curvature-based sensing allows accurate estimation of disturbances in the angle of attack. Finally, we demonstrate that a controller leveraging these proprioceptive estimates can reject disturbances in the lift response of the soft wing. By combining proprioceptive sensing with a disturbance observer, this technique mirrors biological strategies and provides a pathway for soft underwater vehicles to maintain stability in hazardous environments.
comment: 2026 IEEE International Conference on Robotics & Automation (ICRA)
Integrated Exploration and Sequential Manipulation on Scene Graph with LLM-based Situated Replanning ICRA 2026
In partially known environments, robots must combine exploration to gather information with task planning for efficient execution. To address this challenge, we propose EPoG, an Exploration-based sequential manipulation Planning framework on Scene Graphs. EPoG integrates a graph-based global planner with a Large Language Model (LLM)-based situated local planner, continuously updating a belief graph using observations and LLM predictions to represent known and unknown objects. Action sequences are generated by computing graph edit operations between the goal and belief graphs, ordered by temporal dependencies and movement costs. This approach seamlessly combines exploration and sequential manipulation planning. In ablation studies across 46 realistic household scenes and 5 long-horizon daily object transportation tasks, EPoG achieved a success rate of 91.3%, reducing travel distance by 36.1% on average. Furthermore, a physical mobile manipulator successfully executed complex tasks in unknown and dynamic environments, demonstrating EPoG's potential for real-world applications.
comment: 8 pages, 7 figures; accepted by ICRA 2026
HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation
Humanoid robots can suffer significant performance drops under small changes in dynamics, task specifications, or environment setup. We propose HoRD, a two-stage learning framework for robust humanoid control under domain shift. First, we train a high-performance teacher policy via history-conditioned reinforcement learning, where the policy infers latent dynamics context from recent state--action trajectories to adapt online to diverse randomized dynamics. Second, we perform online distillation to transfer the teacher's robust control capabilities into a transformer-based student policy that operates on sparse root-relative 3D joint keypoint trajectories. By combining history-conditioned adaptation with online distillation, HoRD enables a single policy to adapt zero-shot to unseen domains without per-domain retraining. Extensive experiments show HoRD outperforms strong baselines in robustness and transfer, especially under unseen domains and external perturbations. Code and project page are available at \href{https://tonywang-0517.github.io/hord/}{https://tonywang-0517.github.io/hord/}.
Quantile Transfer for Reliable Operating Point Selection in Visual Place Recognition
Visual Place Recognition (VPR) is a key component for localisation in GNSS-denied environments, but its performance critically depends on selecting an image matching threshold (operating point) that balances precision and recall. Thresholds are typically hand-tuned offline for a specific environment and fixed during deployment, leading to degraded performance under environmental change. We propose a method that, given a user-defined precision requirement, automatically selects the operating point of a VPR system to maximise recall. The method uses a small calibration traversal with known correspondences and transfers thresholds to deployment via quantile normalisation of similarity score distributions. This quantile transfer ensures that thresholds remain stable across calibration sizes and query subsets, making the method robust to sampling variability. Experiments with multiple state-of-the-art VPR techniques and datasets show that the proposed approach consistently outperforms the state-of-the-art, delivering up to 25% higher recall in high-precision operating regimes. The method eliminates manual tuning by adapting to new environments and generalising across operating conditions. Our code will be released upon acceptance.
Safe and Stylized Trajectory Planning for Autonomous Driving via Diffusion Model
Achieving safe and stylized trajectory planning in complex real-world scenarios remains a critical challenge for autonomous driving systems. This paper proposes the SDD Planner, a diffusion-based framework designed to effectively reconcile safety constraints with driving styles in real time. The framework integrates two core modules: a Multi-Source Style-Aware Encoder, which employs distance-sensitive attention to fuse dynamic agent data and environmental contexts for heterogeneous safety-style perception; and a Style-Guided Dynamic Trajectory Generator, which adaptively modulates priority weights within the diffusion denoising process to generate user-preferred yet safe trajectories. Extensive experiments demonstrate that SDD Planner achieves state-of-the-art performance. On the StyleDrive benchmark, it improves the SM-PDMS metric by 3.9% over WoTE, the strongest baseline. Furthermore, on the NuPlan Test14 and Test14-hard benchmarks, SDD Planner ranks first with overall scores of 91.76 and 80.32, respectively, outperforming leading methods such as PLUTO. Real-vehicle closed-loop tests further confirm that SDD Planner maintains high safety standards while aligning with preset driving styles, validating its practical applicability for real-world deployment.
comment: 12 pages, 7 figures, submitted to IEEE Transactions on Intelligent Transportation Systems
GeneralVLA: Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning
Large foundation models have shown strong open-world generalization to complex problems in vision and language, but similar levels of generalization have yet to be achieved in robotics. One fundamental challenge is that the models exhibit limited zero-shot capability, which hampers their ability to generalize effectively to unseen scenarios. In this work, we propose GeneralVLA (Generalizable Vision-Language-Action Models with Knowledge-Guided Trajectory Planning), a hierarchical vision-language-action (VLA) model that can be more effective in utilizing the generalization of foundation models, enabling zero-shot manipulation and automatically generating data for robotics. In particular, we study a class of hierarchical VLA model where the high-level ASM (Affordance Segmentation Module) is finetuned to perceive image keypoint affordances of the scene; the mid-level 3DAgent carries out task understanding, skill knowledge, and trajectory planning to produce a 3D path indicating the desired robot end-effector trajectory. The intermediate 3D path prediction is then served as guidance to the low-level, 3D-aware control policy capable of precise manipulation. Compared to alternative approaches, our method requires no real-world robotic data collection or human demonstration, making it much more scalable to diverse tasks and viewpoints. Empirically, GeneralVLA successfully generates trajectories for 14 tasks, significantly outperforming state-of-the-art methods such as VoxPoser. The generated demonstrations can train more robust behavior cloning policies than training with human demonstrations or from data generated by VoxPoser, Scaling-up, and Code-As-Policies. We believe GeneralVLA can be the scalable method for both generating data for robotics and solving novel tasks in a zero-shot setting. Code: https://github.com/AIGeeksGroup/GeneralVLA. Website: https://aigeeksgroup.github.io/GeneralVLA.
AppleVLM: End-to-end Autonomous Driving with Advanced Perception and Planning-Enhanced Vision-Language Models
End-to-end autonomous driving has emerged as a promising paradigm integrating perception, decision-making, and control within a unified learning framework. Recently, Vision-Language Models (VLMs) have gained significant attention for their potential to enhance the robustness and generalization of end-to-end driving models in diverse and unseen scenarios. However, existing VLM-based approaches still face challenges, including suboptimal lane perception, language understanding biases, and difficulties in handling corner cases. To address these issues, we propose AppleVLM, an advanced perception and planning-enhanced VLM model for robust end-to-end driving. AppleVLM introduces a novel vision encoder and a planning strategy encoder to improve perception and decision-making. Firstly, the vision encoder fuses spatial-temporal information from multi-view images across multiple timesteps using a deformable transformer mechanism, enhancing robustness to camera variations and facilitating scalable deployment across different vehicle platforms. Secondly, unlike traditional VLM-based approaches, AppleVLM introduces a dedicated planning modality that encodes explicit Bird's-Eye-View spatial information, mitigating language biases in navigation instructions. Finally, a VLM decoder fine-tuned by a hierarchical Chain-of-Thought integrates vision, language, and planning features to output robust driving waypoints. We evaluate AppleVLM in closed-loop experiments on two CARLA benchmarks, achieving state-of-the-art driving performance. Furthermore, we deploy AppleVLM on an AGV platform and successfully showcase real-world end-to-end autonomous driving in complex outdoor environments.
Towards Next-Generation SLAM: A Survey on 3DGS-SLAM Focusing on Performance, Robustness, and Future Directions
Traditional Simultaneous Localization and Mapping (SLAM) systems often face limitations including coarse rendering quality, insufficient recovery of scene details, and poor robustness in dynamic environments. 3D Gaussian Splatting (3DGS), with its efficient explicit representation and high-quality rendering capabilities, offers a new reconstruction paradigm for SLAM. This survey comprehensively reviews key technical approaches for integrating 3DGS with SLAM. We analyze performance optimization of representative methods across four critical dimensions: rendering quality, tracking accuracy, reconstruction speed, and memory consumption, delving into their design principles and breakthroughs. Furthermore, we examine methods for enhancing the robustness of 3DGS-SLAM in complex environments such as motion blur and dynamic environments. Finally, we discuss future challenges and development trends in this area. This survey aims to provide a technical reference for researchers and foster the development of next-generation SLAM systems characterized by high fidelity, efficiency, and robustness.
Viewpoint Matters: Dynamically Optimizing Viewpoints with Masked Autoencoder for Visual Manipulation
Robotic manipulation continues to be a challenge, and imitation learning (IL) enables robots to learn tasks from expert demonstrations. Current IL methods typically rely on fixed camera setups, where cameras are manually positioned in static locations, imposing significant limitations on adaptability and coverage. Inspired by human active perception, where humans dynamically adjust their viewpoint to capture the most relevant and least noisy information, we propose MAE-Select, a novel framework for active viewpoint selection in single-camera robotic systems. MAE-Select fully leverages pre-trained multi-view masked autoencoder representations and dynamically selects the next most informative viewpoint at each time chunk without requiring labeled viewpoints. Extensive experiments demonstrate that MAE-Select improves the capabilities of single-camera systems and, in some cases, even surpasses multi-camera setups. The project will be available at https://mae-select.github.io.
comment: 5 pages, 2 figures, 3 tables
SPOT-Occ: Sparse Prototype-guided Transformer for Camera-based 3D Occupancy Prediction
Achieving highly accurate and real-time 3D occupancy prediction from cameras is a critical requirement for the safe and practical deployment of autonomous vehicles. While this shift to sparse 3D representations solves the encoding bottleneck, it creates a new challenge for the decoder: how to efficiently aggregate information from a sparse, non-uniformly distributed set of voxel features without resorting to computationally prohibitive dense attention. In this paper, we propose a novel Prototype-based Sparse Transformer Decoder that replaces this costly interaction with an efficient, two-stage process of guided feature selection and focused aggregation. Our core idea is to make the decoder's attention prototype-guided. We achieve this through a sparse prototype selection mechanism, where each query adaptively identifies a compact set of the most salient voxel features, termed prototypes, for focused feature aggregation. To ensure this dynamic selection is stable and effective, we introduce a complementary denoising paradigm. This approach leverages ground-truth masks to provide explicit guidance, guaranteeing a consistent query-prototype association across decoder layers. Our model, dubbed SPOT-Occ, outperforms previous methods with a significant margin in speed while also improving accuracy. Source code is released at https://github.com/chensuzeyu/SpotOcc.
comment: 8 pages, 6 figures
GeoLanG: Geometry-Aware Language-Guided Grasping with Unified RGB-D Multimodal Learning ICRA 2025
Language-guided grasping has emerged as a promising paradigm for enabling robots to identify and manipulate target objects through natural language instructions, yet it remains highly challenging in cluttered or occluded scenes. Existing methods often rely on multi-stage pipelines that separate object perception and grasping, which leads to limited cross-modal fusion, redundant computation, and poor generalization in cluttered, occluded, or low-texture scenes. To address these limitations, we propose GeoLanG, an end-to-end multi-task framework built upon the CLIP architecture that unifies visual and linguistic inputs into a shared representation space for robust semantic alignment and improved generalization. To enhance target discrimination under occlusion and low-texture conditions, we explore a more effective use of depth information through the Depth-guided Geometric Module (DGGM), which converts depth into explicit geometric priors and injects them into the attention mechanism without additional computational overhead. In addition, we propose Adaptive Dense Channel Integration, which adaptively balances the contributions of multi-layer features to produce more discriminative and generalizable visual representations. Extensive experiments on the OCID-VLG dataset, as well as in both simulation and real-world hardware, demonstrate that GeoLanG enables precise and robust language-guided grasping in complex, cluttered environments, paving the way toward more reliable multimodal robotic manipulation in real-world human-centric settings.
comment: IEEE ICRA 2025
Reshaping Action Error Distributions for Reliable Vision-Language-Action Models
In robotic manipulation, vision-language-action (VLA) models have emerged as a promising paradigm for learning generalizable and scalable robot policies. Most existing VLA frameworks rely on standard supervised objectives, typically cross-entropy for discrete actions and mean squared error (MSE) for continuous action regression, which impose strong pointwise constraints on individual predictions. In this work, we focus on continuous-action VLA models and move beyond conventional MSE-based regression by reshaping action error distributions during training. Drawing on information-theoretic principles, we introduce Minimum Error Entropy (MEE) into modern VLA architectures and propose a trajectory-level MEE objective, together with two weighted variants, combined with MSE for continuous-action VLA training. We evaluate our approaches across standard, few-shot, and noisy settings on multiple representative VLA architectures, using simulation benchmarks such as LIBERO and SimplerEnv as well as real-world robotic manipulation tasks. Experimental results demonstrate consistent improvements in success rates and robustness across these settings. Under imbalanced data regimes, the gains persist within a well-characterized operating range, while incurring negligible additional training cost and no impact on inference efficiency. We further provide theoretical analyses that explain why MEE-based supervision is effective and characterize its practical range. Project Page: https://cognition2actionlab.github.io/VLA-TMEE.github.io/
OAT: Ordered Action Tokenization
Autoregressive policies offer a compelling foundation for scalable robot learning by enabling discrete abstraction, token-level reasoning, and flexible inference. However, applying autoregressive modeling to continuous robot actions requires an effective action tokenization scheme. Existing approaches either rely on analytical discretization methods that produce prohibitively long token sequences, or learned latent tokenizers that lack structure, limiting their compatibility with next-token prediction. In this work, we identify three desiderata for action tokenization - high compression, total decodability, and a left-to-right causally ordered token space - and introduce Ordered Action Tokenization (OAT), a learned action tokenizer that satisfies all three. OAT discretizes action chunks into an ordered sequence of tokens using transformer with registers, finite scalar quantization, and ordering-inducing training mechanisms. The resulting token space aligns naturally with autoregressive generation and enables prefix-based detokenization, yielding an anytime trade-off between inference cost and action fidelity. Across more than 20 tasks spanning four simulation benchmarks and real-world settings, autoregressive policies equipped with OAT consistently outperform prior tokenization schemes and diffusion-based baselines, while offering significantly greater flexibility at inference time.
ALORE: Autonomous Large-Object Rearrangement with a Legged Manipulator
Endowing robots with the ability to rearrange various large and heavy objects, such as furniture, can substantially alleviate human workload. However, this task is extremely challenging due to the need to interact with diverse objects and efficiently rearrange multiple objects in complex environments while ensuring collision-free loco-manipulation. In this work, we present ALORE, an autonomous large-object rearrangement system for a legged manipulator that can rearrange various large objects across diverse scenarios. The proposed system is characterized by three main features: (i) a hierarchical reinforcement learning training pipeline for multi-object environment learning, where a high-level object velocity controller is trained on top of a low-level whole-body controller to achieve efficient and stable joint learning across multiple objects; (ii) two key modules, a unified interaction configuration representation and an object velocity estimator, that allow a single policy to regulate planar velocity of diverse objects accurately; and (iii) a task-and-motion planning framework that jointly optimizes object visitation order and object-to-target assignment, improving task efficiency while enabling online replanning. Comparisons against strong baselines show consistent superiority in policy generalization, object-velocity tracking accuracy, and multi-object rearrangement efficiency. Key modules are systematically evaluated, and extensive simulations and real-world experiments are conducted to validate the robustness and effectiveness of the entire system, which successfully completes 8 continuous loops to rearrange 32 chairs over nearly 40 minutes without a single failure, and executes long-distance autonomous rearrangement over an approximately 40 m route. The open-source packages are available at https://zhihaibi.github.io/Alore/.
SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models
Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robotic control, with test-time scaling (TTS) gaining attention to enhance robustness beyond training. However, existing TTS methods for VLAs require additional training, verifiers, and multiple forward passes, making them impractical for deployment. Moreover, they intervene only at action decoding while keeping visual representations fixed-insufficient under perceptual ambiguity, where reconsidering how to perceive is as important as deciding what to do. To address these limitations, we propose SCALE, a simple inference strategy that jointly modulates visual perception and action based on 'self-uncertainty', inspired by uncertainty-driven exploration in Active Inference theory-requiring no additional training, no verifier, and only a single forward pass. SCALE broadens exploration in both perception and action under high uncertainty, while focusing on exploitation when confident-enabling adaptive execution across varying conditions. Experiments on simulated and real-world benchmarks demonstrate that SCALE improves state-of-the-art VLAs and outperforms existing TTS methods while maintaining single-pass efficiency.
comment: 20 pages, 8 figures
Natural Language Instructions for Scene-Responsive Human-in-the-Loop Motion Planning in Autonomous Driving using Vision-Language-Action Models
Instruction-grounded driving, where passenger language guides trajectory planning, requires vehicles to understand intent before motion. However, most prior instruction-following planners rely on simulation or fixed command vocabularies, limiting real-world generalization. doScenes, the first real-world dataset linking free-form instructions (with referentiality) to nuScenes ground-truth motion, enables instruction-conditioned planning. In this work, we adapt OpenEMMA, an open-source MLLM-based end-to-end driving framework that ingests front-camera views and ego-state and outputs 10-step speed-curvature trajectories, to this setting, presenting a reproducible instruction-conditioned baseline on doScenes and investigate the effects of human instruction prompts on predicted driving behavior. We integrate doScenes directives as passenger-style prompts within OpenEMMA's vision-language interface, enabling linguistic conditioning before trajectory generation. Evaluated on 849 annotated scenes using ADE, we observe that instruction conditioning substantially improves robustness by preventing extreme baseline failures, yielding a 98.7% reduction in mean ADE. When such outliers are removed, instructions still influence trajectory alignment, with well-phrased prompts improving ADE by up to 5.1%. We use this analysis to discuss what makes a "good" instruction for the OpenEMMA framework. We release the evaluation prompts and scripts to establish a reproducible baseline for instruction-aware planning. GitHub: https://github.com/Mi3-Lab/doScenes-VLM-Planning
GenMRP: A Generative Multi-Route Planning Framework for Efficient and Personalized Real-Time Industrial Navigation
Existing industrial-scale navigation applications contend with massive road networks, typically employing two main categories of approaches for route planning. The first relies on precomputed road costs for optimal routing and heuristic algorithms for generating alternatives, while the second, generative methods, has recently gained significant attention. However, the former struggles with personalization and route diversity, while the latter fails to meet the efficiency requirements of large-scale real-time scenarios. To address these limitations, we propose GenMRP, a generative framework for multi-route planning. To ensure generation efficiency, GenMRP first introduces a skeleton-to-capillary approach that dynamically constructs a relevant sub-network significantly smaller than the full road network. Within this sub-network, routes are generated iteratively. The first iteration identifies the optimal route, while the subsequent ones generate alternatives that balance quality and diversity using the newly proposed correctional boosting approach. Each iteration incorporates road features, user historical sequences, and previously generated routes into a Link Cost Model to update road costs, followed by route generation using the Dijkstra algorithm. Extensive experiments show that GenMRP achieves state-of-the-art performance with high efficiency in both offline and online environments. To facilitate further research, we have publicly released the training and evaluation dataset. GenMRP has been fully deployed in a real-world navigation app, demonstrating its effectiveness and benefits.
A Modern System Recipe for Situated Embodied Human-Robot Conversation with Real-Time Multimodal LLMs and Tool-Calling
Situated embodied conversation requires robots to interleave real-time dialogue with active perception: deciding what to look at, when to look, and what to say under tight latency constraints. We present a simple, minimal system recipe that pairs a real-time multimodal language model with a small set of tool interfaces for attention and active perception. We study six home-style scenarios that require frequent attention shifts and increasing perceptual scope. Across four system variants, we evaluate turn-level tool-decision correctness against human annotations and collect subjective ratings of interaction quality. Results indicate that real-time multimodal large language models and tool use for active perception is a promising direction for practical situated embodied conversation.
comment: 9 pages, 7 figures
MA3DSG: Multi-Agent 3D Scene Graph Generation for Large-Scale Indoor Environments
Current 3D scene graph generation (3DSGG) approaches heavily rely on a single-agent assumption and small-scale environments, exhibiting limited scalability to real-world scenarios. In this work, we introduce Multi-Agent 3D Scene Graph Generation (MA3DSG) model, the first framework designed to tackle this scalability challenge using multiple agents. We develop a training-free graph alignment algorithm that efficiently merges partial query graphs from individual agents into a unified global scene graph. Leveraging extensive analysis and empirical insights, our approach enables conventional single-agent systems to operate collaboratively without requiring any learnable parameters. To rigorously evaluate 3DSGG performance, we propose MA3DSG-Bench-a benchmark that supports diverse agent configurations, domain sizes, and environmental conditions-providing a more general and extensible evaluation framework. This work lays a solid foundation for scalable, multi-agent 3DSGG research.
Shaping Expressiveness in Robotics: The Role of Design Tools in Crafting Embodied Robot Movements
As robots increasingly become part of shared human spaces, their movements must transcend basic functionality by incorporating expressive qualities to enhance engagement and communication. This paper introduces a movement-centered design pedagogy designed to support engineers in creating expressive robotic arm movements. Through a hands-on interactive workshop informed by interdisciplinary methodologies, participants explored various creative possibilities, generating valuable insights into expressive motion design. The iterative approach proposed integrates analytical frameworks from dance, enabling designers to examine motion through dynamic and embodied dimensions. A custom manual remote controller facilitates interactive, real-time manipulation of the robotic arm, while dedicated animation software supports visualization, detailed motion sequencing, and precise parameter control. Qualitative analysis of this interactive design process reveals that the proposed "toolbox" effectively bridges the gap between human intent and robotic expressiveness resulting in more intuitive and engaging expressive robotic arm movements.
Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking
Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. There has significant work in incorporating Lyapunov-based stability guarantees in RL algorithms with key challenges being selecting a candidate Lyapunov function, computational complexity by using excessive function approximators and conservative policies by incorporating stability criterion in the learning process. In this work we propose a novel Lyapunov-constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We propose use of extended dynamic mode decomposition (EDMD) to produce a linear approximation of the system and use this approximation to derive a closed form solution for candidate Lyapunov function. This derived Lyapunov function is incorporated in the SAC algorithm to further provide guarantees for a policy that stabilizes the nonlinear system. The results are evaluated trajectory tracking of a 2D Quadrotor environment based on safe-control-gym. The proposed algorithm shows training convergence and decaying violations for Lyapunov stability criterion compared to baseline vanilla SAC algorithm. GitHub Repository: https://github.com/DhruvKushwaha/LC-SAC-Quadrotor-Trajectory-Tracking
comment: 12 pages, 7 Figures, submitted to IEEE RA-L
Multi-threaded Recast-Based A* Pathfinding for Scalable Navigation in Dynamic Game Environments
While the A* algorithm remains the industry standard for game pathfinding, its integration into dynamic 3D environments faces trade-offs between computational performance and visual realism. This paper proposes a multi-threaded framework that enhances standard A* through Recast-based mesh generation, Bezier-curve trajectory smoothing, and density analysis for crowd coordination. We evaluate our system across ten incremental phases, from 2D mazes to complex multi-level dynamic worlds. Experimental results demonstrate that the framework maintains 350+ FPS with 1000 simultaneous agents and achieves collision-free crowd navigation through density-aware path coordination.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
Heterogeneous multi-robot systems are increasingly deployed in long-horizon missions that require coordination among robots with diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.5% over both LLM-only and PDDL-based variants.
Modelling Pedestrian Behaviour in Autonomous Vehicle Encounters Using Naturalistic Dataset
Understanding how pedestrians adjust their movement when interacting with autonomous vehicles (AVs) is essential for improving safety in mixed traffic. This study examines micro-level pedestrian behaviour during midblock encounters in the NuScenes dataset using a hybrid discrete choice-machine learning framework based on the Residual Logit (ResLogit) model. The model incorporates temporal, spatial, kinematic, and perceptual indicators. These include relative speed, visual looming, remaining distance, and directional collision risk proximity (CRP) measures. Results suggest that some of these variables may meaningfully influence movement adjustments, although predictive performance remains moderate. Marginal effects and elasticities indicate strong directional asymmetries in risk perception, with frontal and rear CRP showing opposite influences. The remaining distance exhibits a possible mid-crossing threshold. Relative speed cues appear to have a comparatively less effect. These patterns may reflect multiple behavioural tendencies driven by both risk perception and movement efficiency.
Trojan Attacks on Neural Network Controllers for Robotic Systems
Neural network controllers are increasingly deployed in robotic systems for tasks such as trajectory tracking and pose stabilization. However, their reliance on potentially untrusted training pipelines or supply chains introduces significant security vulnerabilities. This paper investigates backdoor (Trojan) attacks against neural controllers, using a differential-drive mobile robot platform as a case study. In particular, assuming that the robot's tracking controller is implemented as a neural network, we design a lightweight, parallel Trojan network that can be embedded within the controller. This malicious module remains dormant during normal operation but, upon detecting a highly specific trigger condition defined by the robot's pose and goal parameters, compromises the primary controller's wheel velocity commands, resulting in undesired and potentially unsafe robot behaviours. We provide a proof-of-concept implementation of the proposed Trojan network, which is validated through simulation under two different attack scenarios. The results confirm the effectiveness of the proposed attack and demonstrate that neural network-based robotic control systems are subject to potentially critical security threats.
comment: Paper submitted to the 2026 IEEE Conference on Control Technology and Applications (CCTA)
GAMMS: Graph based Adversarial Multiagent Modeling Simulator
As intelligent systems and multi-agent coordination become increasingly central to real-world applications, there is a growing need for simulation tools that are both scalable and accessible. Existing high-fidelity simulators, while powerful, are often computationally expensive and ill-suited for rapid prototyping or large-scale agent deployments. We present GAMMS (Graph based Adversarial Multiagent Modeling Simulator), a lightweight yet extensible simulation framework designed to support fast development and evaluation of agent behavior in environments that can be represented as graphs. GAMMS emphasizes five core objectives: scalability, ease of use, integration-first architecture, fast visualization feedback, and real-world grounding. It enables efficient simulation of complex domains such as urban road networks and communication systems, supports integration with external tools (e.g., machine learning libraries, planning solvers), and provides built-in visualization with minimal configuration. GAMMS is agnostic to policy type, supporting heuristic, optimization-based, and learning-based agents, including those using large language models. By lowering the barrier to entry for researchers and enabling high-performance simulations on standard hardware, GAMMS facilitates experimentation and innovation in multi-agent systems, autonomous planning, and adversarial modeling. The framework is open-source and available at https://github.com/GAMMSim/GAMMS/
A Framework for Combining Optimization-Based and Analytic Inverse Kinematics
Analytic and optimization methods for solving inverse kinematics (IK) problems have been deeply studied throughout the history of robotics. The two strategies have complementary strengths and weaknesses, but developing a unified approach to take advantage of both methods has proved challenging. A key challenge faced by optimization approaches is the complicated nonlinear relationship between the joint angles and the end-effector pose. When this must be handled concurrently with additional nonconvex constraints like collision avoidance, optimization IK algorithms may suffer high failure rates. We present a new formulation for optimization IK that uses an analytic IK solution as a change of variables, and is fundamentally easier for optimizers to solve. We test our methodology on three popular solvers, representing three different paradigms for constrained nonlinear optimization. Extensive experimental comparisons demonstrate that our new formulation achieves higher success rates than the old formulation and baseline methods across various challenging IK problems, including collision avoidance, grasp selection, and humanoid stability.
comment: 19 pages, 5 figures, 6 tables. Under submission
Evaluating Robustness and Adaptability in Learning-Based Mission Planning for Active Debris Removal SP
Autonomous mission planning for Active Debris Removal (ADR) must balance efficiency, adaptability, and strict feasibility constraints on fuel and mission duration. This work compares three planners for the constrained multi-debris rendezvous problem in Low Earth Orbit: a nominal Masked Proximal Policy Optimization (PPO) policy trained under fixed mission parameters, a domain-randomized Masked PPO policy trained across varying mission constraints for improved robustness, and a plain Monte Carlo Tree Search (MCTS) baseline. Evaluations are conducted in a high-fidelity orbital simulation with refueling, realistic transfer dynamics, and randomized debris fields across 300 test cases in nominal, reduced fuel, and reduced mission time scenarios. Results show that nominal PPO achieves top performance when conditions match training but degrades sharply under distributional shift, while domain-randomized PPO exhibits improved adaptability with only moderate loss in nominal performance. MCTS consistently handles constraint changes best due to online replanning but incurs orders-of-magnitude higher computation time. The findings underline a trade-off between the speed of learned policies and the adaptability of search-based methods, and suggest that combining training-time diversity with online planning could be a promising path for future resilient ADR mission planners.
comment: Presented at Conference: International Conference on Space Robotics (ISPARO,2025) At: Sendai,Japan
Beware Untrusted Simulators -- Reward-Free Backdoor Attacks in Reinforcement Learning ICLR 2026
Simulated environments are a key piece in the success of Reinforcement Learning (RL), allowing practitioners and researchers to train decision making agents without running expensive experiments on real hardware. Simulators remain a security blind spot, however, enabling adversarial developers to alter the dynamics of their released simulators for malicious purposes. Therefore, in this work we highlight a novel threat, demonstrating how simulator dynamics can be exploited to stealthily implant action-level backdoors into RL agents. The backdoor then allows an adversary to reliably activate targeted actions in an agent upon observing a predefined ``trigger'', leading to potentially dangerous consequences. Traditional backdoor attacks are limited in their strong threat models, assuming the adversary has near full control over an agent's training pipeline, enabling them to both alter and observe agent's rewards. As these assumptions are infeasible to implement within a simulator, we propose a new attack ``Daze'' which is able to reliably and stealthily implant backdoors into RL agents trained for real world tasks without altering or even observing their rewards. We provide formal proof of Daze's effectiveness in guaranteeing attack success across general RL tasks along with extensive empirical evaluations on both discrete and continuous action space domains. We additionally provide the first example of RL backdoor attacks transferring to real, robotic hardware. These developments motivate further research into securing all components of the RL training pipeline to prevent malicious attacks.
comment: 10 pages main body, ICLR 2026
Reinforcement Learning Enhancement Using Vector Semantic Representation and Symbolic Reasoning for Human-Centered Autonomous Emergency Braking
The problem with existing camera-based Deep Reinforcement Learning approaches is twofold: they rarely integrate high-level scene context into the feature representation, and they rely on rigid, fixed reward functions. To address these challenges, this paper proposes a novel pipeline that produces a neuro-symbolic feature representation that encompasses semantic, spatial, and shape information, as well as spatially boosted features of dynamic entities in the scene, with an emphasis on safety-critical road users. It also proposes a Soft First-Order Logic (SFOL) reward function that balances human values via a symbolic reasoning module. Here, semantic and spatial predicates are extracted from segmentation maps and applied to linguistic rules to obtain reward weights. Quantitative experiments in the CARLA simulation environment show that the proposed neuro-symbolic representation and SFOL reward function improved policy robustness and safety-related performance metrics compared to baseline representations and reward formulations across varying traffic densities and occlusion levels. The findings demonstrate that integrating holistic representations and soft reasoning into Reinforcement Learning can support more context-aware and value-aligned decision-making for autonomous driving.
comment: 12 pages, 7 figures, 5 tables
Optimizing Mission Planning for Multi-Debris Rendezvous Using Reinforcement Learning with Refueling and Adaptive Collision Avoidance
As the orbital environment around Earth becomes increasingly crowded with debris, active debris removal (ADR) missions face significant challenges in ensuring safe operations while minimizing the risk of in-orbit collisions. This study presents a reinforcement learning (RL) based framework to enhance adaptive collision avoidance in ADR missions, specifically for multi-debris removal using small satellites. Small satellites are increasingly adopted due to their flexibility, cost effectiveness, and maneuverability, making them well suited for dynamic missions such as ADR. Building on existing work in multi-debris rendezvous, the framework integrates refueling strategies, efficient mission planning, and adaptive collision avoidance to optimize spacecraft rendezvous operations. The proposed approach employs a masked Proximal Policy Optimization (PPO) algorithm, enabling the RL agent to dynamically adjust maneuvers in response to real-time orbital conditions. Key considerations include fuel efficiency, avoidance of active collision zones, and optimization of dynamic orbital parameters. The RL agent learns to determine efficient sequences for rendezvousing with multiple debris targets, optimizing fuel usage and mission time while incorporating necessary refueling stops. Simulated ADR scenarios derived from the Iridium 33 debris dataset are used for evaluation, covering diverse orbital configurations and debris distributions to demonstrate robustness and adaptability. Results show that the proposed RL framework reduces collision risk while improving mission efficiency compared to traditional heuristic approaches. This work provides a scalable solution for planning complex multi-debris ADR missions and is applicable to other multi-target rendezvous problems in autonomous space mission planning.
comment: Accpeted at Conference: 15th IAA Symposium on Small Satellites for Earth System Observation At: Berlin
ReFORM: Reflected Flows for On-support Offline RL via Noise Manipulation ICLR 2026
Offline reinforcement learning (RL) aims to learn the optimal policy from a fixed dataset generated by behavior policies without additional environment interactions. One common challenge that arises in this setting is the out-of-distribution (OOD) error, which occurs when the policy leaves the training distribution. Prior methods penalize a statistical distance term to keep the policy close to the behavior policy, but this constrains policy improvement and may not completely prevent OOD actions. Another challenge is that the optimal policy distribution can be multimodal and difficult to represent. Recent works apply diffusion or flow policies to address this problem, but it is unclear how to avoid OOD errors while retaining policy expressiveness. We propose ReFORM, an offline RL method based on flow policies that enforces the less restrictive support constraint by construction. ReFORM learns a behavior cloning (BC) flow policy with a bounded source distribution to capture the support of the action distribution, then optimizes a reflected flow that generates bounded noise for the BC flow while keeping the support, to maximize the performance. Across 40 challenging tasks from the OGBench benchmark with datasets of varying quality and using a constant set of hyperparameters for all tasks, ReFORM dominates all baselines with hand-tuned hyperparameters on the performance profile curves.
comment: 24 pages, 17 figures; Accepted by the fourteenth International Conference on Learning Representations (ICLR 2026)
VISTA: Enhancing Visual Conditioning via Track-Following Preference Optimization in Vision-Language-Action Models
Vision-Language-Action (VLA) models have demonstrated strong performance across a wide range of robotic manipulation tasks. Despite the success, extending large pretrained Vision-Language Models (VLMs) to the action space can induce vision-action misalignment, where action predictions exhibit weak dependence on the current visual state, leading to unreliable action outputs. In this work, we study VLA models through the lens of visual conditioning and empirically show that successful rollouts consistently exhibit stronger visual dependence than failed ones. Motivated by this observation, we propose a training framework that explicitly strengthens visual conditioning in VLA models. Our approach first aligns action prediction with visual input via preference optimization on a track-following surrogate task, and then transfers the enhanced alignment to instruction-following task through latent-space distillation during supervised finetuning. Without introducing architectural modifications or additional data collection, our method improves both visual conditioning and task performance for discrete OpenVLA, and further yields consistent gains when extended to the continuous OpenVLA-OFT setting. Project website: https://vista-vla.github.io/ .
comment: In submission. Project website: https://vista-vla.github.io/
Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping
Operating effectively in novel real-world environments requires robotic systems to estimate and interact with previously unseen objects. Current state-of-the-art models address this challenge by using large amounts of training data and test-time samples to build black-box scene representations. In this work, we introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping without relying on any additional 3D data or test-time samples. Our model solves a series of constrained optimization problems to estimate physically consistent scene parameters, such as meshes, lighting conditions, material properties, and 6D poses of previously unseen objects from a single RGBD image and bounding boxes. We evaluated our approach on standard model-free few-shot benchmarks and demonstrated that it outperforms existing algorithms for model-free few-shot pose estimation. Furthermore, we validated the accuracy of our scene reconstructions by applying our algorithm to a zero-shot grasping task. By enabling zero-shot, physically-consistent scene reconstruction and grasping without reliance on extensive datasets or test-time sampling, our approach offers a pathway towards more data efficient, interpretable and generalizable robot autonomy in novel environments.
comment: Submitted to IEEE Robotics and Automation Letters (RA-L) for review. This version includes the statement required by IEEE for preprints
Signal or 'Noise': Human Reactions to Robot Errors in the Wild
In the real world, robots frequently make errors, yet little is known about people's social responses to errors outside of lab settings. Prior work has shown that social signals are reliable and useful for error management in constrained interactions, but it is unclear if this holds in the real world - especially with a non-social robot in repeated and group interactions with successive or propagated errors. To explore this, we built a coffee robot and conducted a public field deployment ($N = 49$). We found that participants consistently expressed varied social signals in response to errors and other stimuli, particularly during group interactions. Our findings suggest that social signals in the wild are rich (with participants volunteering information about the interaction), but "noisy." We discuss lessons, benefits, and challenges for using social signals in real-world HRI.
Applying Ground Robot Fleets in Urban Search: Understanding Professionals' Operational Challenges and Design Opportunities
Urban searches demand rapid, defensible decisions and sustained physical effort under high cognitive and situational load. Incident commanders must plan, coordinate, and document time-critical operations, while field searchers execute evolving tasks in uncertain environments. With recent advances in technology, ground-robot fleets paired with computer-vision-based situational awareness and LLM-powered interfaces offer the potential to ease these operational burdens. However, no dedicated studies have examined how public safety professionals perceive such technologies or envision their integration into existing practices, risking building technically sophisticated yet impractical solutions. To address this gap, we conducted focus-group sessions with eight police officers across five local departments in Virginia. Our findings show that ground robots could reduce professionals' reliance on paper references, mental calculations, and ad-hoc coordination, alleviating cognitive and physical strain in four key challenge areas: (1) partitioning the workforce across multiple search hypotheses, (2) retaining group awareness and situational awareness, (3) building route planning that fits the lost-person profile, and (4) managing cognitive and physical fatigue under uncertainty. We further identify four design opportunities and requirements for future ground-robot fleet integration in public-safety operations: (1) scalable multi-robot planning and control interfaces, (2) agency-specific route optimization, (3) real-time replanning informed by debrief updates, and (4) vision-assisted cueing that preserves operational trust while reducing cognitive workload. We conclude with design implications for deployable, accountable, and human-centered urban-search support systems
comment: Under review
MIGHTY: Hermite Spline-based Efficient Trajectory Planning
Hard-constraint trajectory planners often rely on commercial solvers and demand substantial computational resources. Existing soft-constraint methods achieve faster computation, but either (1) decouple spatial and temporal optimization or (2) restrict the search space. To overcome these limitations, we introduce MIGHTY, a Hermite spline-based planner that performs spatiotemporal optimization while fully leveraging the continuous search space of a spline. In simulation, MIGHTY achieves a 9.3% reduction in computation time and a 13.1% reduction in travel time over state-of-the-art baselines, with a 100% success rate. In hardware, MIGHTY completes multiple high-speed flights up to 6.7 m/s in a cluttered static environment and long-duration flights with dynamically added obstacles.
comment: 10 pages, 12 figures
Improved Bag-of-Words Image Retrieval with Geometric Constraints for Ground Texture Localization ICRA 2025
Ground texture localization using a downward-facing camera offers a low-cost, high-precision localization solution that is robust to dynamic environments and requires no environmental modification. We present a significantly improved bag-of-words (BoW) image retrieval system for ground texture localization, achieving substantially higher accuracy for global localization and higher precision and recall for loop closure detection in SLAM. Our approach leverages an approximate $k$-means (AKM) vocabulary with soft assignment, and exploits the consistent orientation and constant scale constraints inherent to ground texture localization. Identifying the different needs of global localization vs. loop closure detection for SLAM, we present both high-accuracy and high-speed versions of our algorithm. We test the effect of each of our proposed improvements through an ablation study and demonstrate our method's effectiveness for both global localization and loop closure detection. With numerous ground texture localization systems already using BoW, our method can readily replace other generic BoW systems in their pipeline and immediately improve their results.
comment: Accepted to ICRA 2025
Model Reconciliation through Explainability and Collaborative Recovery in Assistive Robotics ICRA
Whenever humans and robots work together, it is essential that unexpected robot behavior can be explained to the user. Especially in applications such as shared control the user and the robot must share the same model of the objects in the world, and the actions that can be performed on these objects. In this paper, we achieve this with a so-called model reconciliation framework. We leverage a Large Language Model to predict and explain the difference between the robot's and the human's mental models, without the need of a formal mental model of the user. Furthermore, our framework aims to solve the model divergence after the explanation by allowing the human to correct the robot. We provide an implementation in an assistive robotics domain, where we conduct a set of experiments with a real wheelchair-based mobile manipulator and its digital twin.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026
Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
Recent studies demonstrate that diffusion planners benefit from sparse-step planning over single-step planning. Training models to skip steps in their trajectories helps capture long-term dependencies without additional memory or computational cost. However, predicting excessively sparse plans degrades performance. We hypothesize this temporal density threshold is non-uniform across a planning horizon and that certain parts of a predicted trajectory should be more densely generated. We propose Mixed-Density Diffuser (MDD), a diffusion planner where the densities throughout the horizon are tunable hyperparameters. We show that MDD surpasses the SOTA Diffusion Veteran (DV) framework across the Maze2D, Franka Kitchen, and Antmaze Datasets for Deep Data-Driven Reinforcement Learning (D4RL) task domains, achieving a new SOTA on the D4RL benchmark.
comment: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) (under review)
Realistic adversarial scenario generation via human-like pedestrian model for autonomous vehicle control parameter optimisation
Autonomous vehicles (AVs) are rapidly advancing and are expected to play a central role in future mobility. Ensuring their safe deployment requires reliable interaction with other road users, not least pedestrians. Direct testing on public roads is costly and unsafe for rare but critical interactions, making simulation a practical alternative. Within simulation-based testing, adversarial scenarios are widely used to probe safety limits, but many prioritise difficulty over realism, producing exaggerated behaviours which may result in AV controllers that are overly conservative. We propose an alternative method, instead using a cognitively inspired pedestrian model featuring both inter-individual and intra-individual variability to generate behaviourally plausible adversarial scenarios. We provide a proof of concept demonstration of this method's potential for AV control optimisation, in closed-loop testing and tuning of an AV controller. Our results show that replacing the rule-based CARLA pedestrian with the human-like model yields more realistic gap acceptance patterns and smoother vehicle decelerations. Unsafe interactions occur only for certain pedestrian individuals and conditions, underscoring the importance of human variability in AV testing. Adversarial scenarios generated by this model can be used to optimise AV control towards safer and more efficient behaviour. Overall, this work illustrates how incorporating human-like road user models into simulation-based adversarial testing can enhance the credibility of AV evaluation and provide a practical basis to behaviourally informed controller optimisation.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Interactive Environmental Learning in Physical Embodied Systems
Embodied intelligence aims to enable robots to learn, reason, and generalize robustly across complex real-world environments. However, existing approaches often struggle with partial observability, fragmented spatial reasoning, and inefficient integration of heterogeneous memories, limiting their capacity for long-horizon adaptation. To address this, we introduce RoboMemory, a brain-inspired framework that unifies Spatial, Temporal, Episodic, and Semantic memory within a parallelized architecture for efficient long-horizon planning and interactive learning. Its core innovations are a dynamic spatial knowledge graph for scalable, consistent memory updates and a closed-loop planner with a critic module for adaptive decision-making. Extensive experiments on EmbodiedBench show that RoboMemory, instantiated with Qwen2.5-VL-72B-Ins, improves the average success rate by 26.5% over its strong baseline and even surpasses the closed-source SOTA, Claude-3.5-Sonnet. Real-world trials further confirm its capability for cumulative learning, with performance consistently improving over repeated tasks. Our results position RoboMemory as a scalable foundation for memory-augmented embodied agents, bridging insights from cognitive neuroscience with practical robotic autonomy.
Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints
Autonomous navigation for nano-scale unmanned aerial vehicles (nano-UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub-100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state-of-the-art in sensing, computing, and control architectures designed specifically for these sub- 100mW computational envelopes. We critically analyse the transition from classical geometry-based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra-low-power System-on-Chips (SoCs) and neuromorphic event-based control. Beyond algorithms, we evaluate the hardware-software co-design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning-based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long-term endurance, robust obstacle avoidance in dynamic environments, and the "Sim-to-Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data-driven perception to enable fully autonomous, agile nano-UAVs in GPS-denied environments.
comment: 30 pages, 5 figures, 2 table. Review article
LiDAR, GNSS and IMU Sensor Fine Alignment through Dynamic Time Warping to Construct 3D City Maps
LiDAR-based 3D mapping suffers from cumulative drift causing global misalignment, particularly in GNSS-constrained environments. To address this, we propose a unified framework that fuses LiDAR, GNSS, and IMU data for high-resolution city-scale mapping. The method performs velocity-based temporal alignment using Dynamic Time Warping and refines GNSS and IMU signals via extended Kalman filtering. Local maps are built using Normal Distributions Transform-based registration and pose graph optimization with loop closure detection, while global consistency is enforced using GNSS-constrained anchors followed by fine registration of overlapping segments. We also introduce a large-scale multimodal dataset captured in Perth, Western Australia to facilitate future research in this direction. Our dataset comprises 144,000 frames acquired with a 128-channel Ouster LiDAR, synchronized RTK-GNSS trajectories, and MEMS-IMU measurements across 21 urban loops. To assess geometric consistency, we evaluated our method using alignment metrics based on road centerlines and intersections to capture both global and local accuracy. The proposed framework reduces the average global alignment error from 3.32m to 1.24m, achieving a 61.4% improvement, and significantly decreases the intersection centroid offset from 13.22m to 2.01m, corresponding to an 84.8% enhancement. The constructed high-fidelity map and raw dataset are publicly available through https://ieee-dataport.org/documents/perth-cbd-high-resolution-lidar-map-gnss-and-imu-calibration, and its visualization can be viewed at https://www.youtube.com/watch?v=-ZUgs1KyMks. The source code is available at https://github.com/HaitianWang/LiDAR-GNSS-and-IMU-Sensor-Fine-Alignment-through-Dynamic-Time-Warping-to-Construct-3D-City-Maps. This dataset and method together establish a new benchmark for evaluating 3D city mapping in GNSS-constrained environments.
comment: This paper has been submitted to IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS) and is currently under review
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Robotic generalization relies on physical intelligence: the ability to reason about state changes, contact-rich interactions, and long-horizon planning under egocentric perception and action. Vision Language Models (VLMs) are essential to Vision-Language-Action (VLA) systems, but the reliance on third-person training data creates a viewpoint gap for humanoid robots. Collecting massive robot-centric data is an ideal but impractical solution due to cost and diversity constraints. Conversely, human egocentric videos offer a highly scalable data source with rich interaction context, yet the embodiment mismatch prevents the direct application. To bridge this gap, we propose an Egocentric2Embodiment Translation Pipeline that transforms raw human egocentric videos into multi-level, schema-driven embodiment supervision with enforced evidence grounding and temporal consistency, enabling the construction of the Egocentric2Embodiment dataset (E2E-3M) at scale. An egocentric-aware embodied brain, termed PhysBrain, is obtained by training on the E2E-3M dataset. PhysBrain exhibits substantially improved egocentric understanding, particularly for planning. It provides an egocentric-aware initialization that enables more sample-efficient VLA fine-tuning and higher success rates, demonstrating effective transfer from human egocentric supervision to downstream robot control.
comment: 21 pages, 8 figures
Analytical Inverse Kinematic Solution for "Moz1" NonSRS 7-DOF Robot arm with novel arm angle
This paper presents an analytical solution to the inverse kinematic problem(IKP) for the seven degree-of-freedom (7-DOF) Moz1 Robot Arm with offsets on wrist. We provide closed-form solutions with the novel arm angle . it allow fully self-motion and solve the problem of algorithmic singularities within the workspace. It also provides information on how the redundancy is resolved in a new arm angle representation where traditional SEW angle faied to be defined and how singularities are handled. The solution is simple, fast and exact, providing full solution space (i.e. all 16 solutions) per pose.
A Survey on Vision-Language-Action Models for Embodied AI
Embodied AI is widely recognized as a cornerstone of artificial general intelligence because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models and vision-language models, a new category of multimodal models -- referred to as vision-language-action models (VLAs) -- has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at predicting low-level actions. The third line comprises high-level task planners capable of decomposing long-horizon tasks into a sequence of subtasks, thereby guiding VLAs to follow more general user instructions. Furthermore, we provide an extensive summary of relevant resources, including datasets, simulators, and benchmarks. Finally, we discuss the challenges facing VLAs and outline promising future directions in embodied AI. A curated repository associated with this survey is available at: https://github.com/yueen-ma/Awesome-VLA.
comment: Project page: https://github.com/yueen-ma/Awesome-VLA
A Unified Candidate Set with Scene-Adaptive Refinement via Diffusion for End-to-End Autonomous Driving
End-to-end autonomous driving is increasingly adopting a multimodal planning paradigm that generates multiple trajectory candidates and selects the final plan, making candidate-set design critical. A fixed trajectory vocabulary provides stable coverage in routine driving but often misses optimal solutions in complex interactions, while scene-adaptive refinement can cause over-correction in simple scenarios by unnecessarily perturbing already strong vocabulary trajectories.We propose CdDrive, which preserves the original vocabulary candidates and augments them with scene-adaptive candidates generated by vocabulary-conditioned diffusion denoising. Both candidate types are jointly scored by a shared selection module, enabling reliable performance across routine and highly interactive scenarios. We further introduce HATNA (Horizon-Aware Trajectory Noise Adapter) to improve the smoothness and geometric continuity of diffusion candidates via temporal smoothing and horizon-aware noise modulation. Experiments on NAVSIM v1 and NAVSIM v2 demonstrate leading performance, and ablations verify the contribution of each component. Code: https://github.com/WWW-TJ/CdDrive.
Game-Based and Gamified Robotics Education: A Comparative Systematic Review and Design Guidelines
Robotics education fosters computational thinking, creativity, and problem-solving, but remains challenging due to technical complexity. Game-based learning (GBL) and gamification offer engagement benefits, yet their comparative impact remains unclear. We present the first PRISMA-aligned systematic review and comparative synthesis of GBL and gamification in robotics education, analyzing 95 studies from 12,485 records across four databases (2014-2025). We coded each study's approach, learning context, skill level, modality, pedagogy, and outcomes (k = .918). Three patterns emerged: (1) approach-context-pedagogy coupling (GBL more prevalent in informal settings, while gamification dominated formal classrooms [p < .001] and favored project-based learning [p = .009]); (2) emphasis on introductory programming and modular kits, with limited adoption of advanced software (~17%), advanced hardware (~5%), or immersive technologies (~22%); and (3) short study horizons, relying on self-report. We propose eight research directions and a design space outlining best practices and pitfalls, offering actionable guidance for robotics education.
comment: Accepted for publication at Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. 26 pages, 14 figures, 7 tables;
ProAct: A Benchmark and Multimodal Framework for Structure-Aware Proactive Response
While passive agents merely follow instructions, proactive agents align with higher-level objectives, such as assistance and safety by continuously monitoring the environment to determine when and how to act. However, developing proactive agents is hindered by the lack of specialized resources. To address this, we introduce ProAct-75, a benchmark designed to train and evaluate proactive agents across diverse domains, including assistance, maintenance, and safety monitoring. Spanning 75 tasks, our dataset features 91,581 step-level annotations enriched with explicit task graphs. These graphs encode step dependencies and parallel execution possibilities, providing the structural grounding necessary for complex decision-making. Building on this benchmark, we propose ProAct-Helper, a reference baseline powered by a Multimodal Large Language Model (MLLM) that grounds decision-making in state detection, and leveraging task graphs to enable entropy-driven heuristic search for action selection, allowing agents to execute parallel threads independently rather than mirroring the human's next step. Extensive experiments demonstrate that ProAct-Helper outperforms strong closed-source models, improving trigger detection mF1 by 6.21%, saving 0.25 more steps in online one-step decision, and increasing the rate of parallel actions by 15.58%.
Learning-based Observer for Coupled Disturbance
Achieving high-precision control for robotic systems is hindered by the low-fidelity dynamical model and external disturbances. Especially, the intricate coupling between internal uncertainties and external disturbances further exacerbates this challenge. This study introduces an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning philosophies. Concretely, by resorting to Chebyshev series expansion, the coupled disturbance is firstly decomposed into an unknown parameter matrix and two known structures dependent on system state and external disturbance respectively. A regularized least squares algorithm is subsequently formalized to learn the parameter matrix using historical time-series data. Finally, a polynomial disturbance observer is specifically devised to achieve a high-precision estimation of the coupled disturbance by utilizing the learned portion. The proposed algorithm is evaluated through extensive simulations and real flight tests. We believe this work can offer a new pathway to integrate learning approaches into control frameworks for addressing longstanding challenges in robotic applications.
comment: 10 pages, 7 figures
Geometry-aware 4D Video Generation for Robot Manipulation ICLR 2026
Understanding and predicting dynamics of the physical world can enhance a robot's ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of generated videos by supervising the model with cross-view pointmap alignment during training. Through this geometric supervision, the model learns a shared 3D scene representation, enabling it to generate spatio-temporally aligned future video sequences from novel viewpoints given a single RGB-D image per view, and without relying on camera poses as input. Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets. We further show that the predicted 4D videos can be used to recover robot end-effector trajectories using an off-the-shelf 6DoF pose tracker, yielding robot manipulation policies that generalize well to novel camera viewpoints.
comment: ICLR 2026; Project website: https://robot4dgen.github.io
Doppler-SLAM: Doppler-Aided Radar-Inertial and LiDAR-Inertial Simultaneous Localization and Mapping
Simultaneous localization and mapping (SLAM) is a critical capability for autonomous systems. Traditional SLAM approaches, which often rely on visual or LiDAR sensors, face significant challenges in adverse conditions such as low light or featureless environments. To overcome these limitations, we propose a novel Doppler-aided radar-inertial and LiDAR-inertial SLAM framework that leverages the complementary strengths of 4D radar, FMCW LiDAR, and inertial measurement units. Our system integrates Doppler velocity measurements and spatial data into a tightly-coupled front-end and graph optimization back-end to provide enhanced ego velocity estimation, accurate odometry, and robust mapping. We also introduce a Doppler-based scan-matching technique to improve front-end odometry in dynamic environments. In addition, our framework incorporates an innovative online extrinsic calibration mechanism, utilizing Doppler velocity and loop closure to dynamically maintain sensor alignment. Extensive evaluations on both public and proprietary datasets show that our system significantly outperforms state-of-the-art radar-SLAM and LiDAR-SLAM frameworks in terms of accuracy and robustness. To encourage further research, the code of our Doppler-SLAM and our dataset are available at: https://github.com/Wayne-DWA/Doppler-SLAM.
comment: 8 pages, 7 figures
Haptic bilateral teleoperation system for free-hand dental procedures
Free-hand dental procedures are typically repetitive, time-consuming and require high precision and manual dexterity. Robots can play a key role in improving procedural accuracy and safety, enhancing patient comfort, and reducing operator workload. However, robotic solutions for free-hand procedures remain limited or completely lacking. To address this gap, we develop a haptic bilateral teleoperation system (HBTS) for free-hand dental procedures (FH-HBTS). The system includes a mechanical end-effector, compatible with standard clinical tools, and equipped with an endoscopic camera for improved visibility of the intervention site. By ensuring motion and force correspondence between the operator's and the robot's actions, monitored through visual feedback, we enhance the operator's sensory awareness and motor accuracy. Furthermore, to ensure procedural safety, we limit interaction forces by scaling the motion references provided to the admittance controller based solely on measured contact forces. This ensures effective force limitation in all contact states without requiring prior knowledge of the environment. The proposed FH-HBTS is validated both through a technical evaluation and an in-vitro pre-clinical study conducted on a dental model under clinically representative conditions. The results show that the system improves the naturalness, safety, and accuracy of teleoperation, highlighting its potential to enhance free-hand dental procedures.
comment: 13 pages, 8 figures
SAP-CoPE: Social-Aware Planning using Cooperative Pose Estimation with Infrastructure Sensor Nodes
Autonomous driving systems must operate smoothly in human-populated indoor environments, where challenges arise including limited perception and occlusions when relying only on onboard sensors, as well as the need for socially compliant motion planning that accounts for human psychological comfort zones. These factors complicate accurate recognition of human intentions and the generation of comfortable, socially aware trajectories. To address these challenges, we propose SAP-CoPE, an indoor navigation system that integrates cooperative infrastructure with a novel 3D human pose estimation method and a socially-aware model predictive control (MPC)-based motion planner. In the perception module, an optimization problem is formulated to account for uncertainty propagation in the camera projection matrix while enforcing human joint coherence. The proposed method is adaptable to both single- and multi-camera configurations and can incorporate sparse LiDAR point-cloud data. For motion planning, we integrate a psychology inspired personal-space field using the information from estimated human poses into an MPC framework to enhance socially comfort in human-populated environments. Extensive real-world evaluations demonstrate the effectiveness of the proposed approach in generating socially aware trajectories for autonomous systems.
comment: This paper has been submitted to the IEEE Transactions on Automation Science and Engineering
Learning-Based Modeling of a Magnetically Steerable Soft Suction Device for Endoscopic Endonasal Interventions
This paper introduces a learning-based modeling framework for a magnetically steerable soft suction device designed for endoscopic endonasal brain tumor resection. The device is miniaturized (4 mm outer diameter, 2 mm inner diameter, 40 mm length), 3D printed using biocompatible SIL 30 material, and integrates embedded Fiber Bragg Grating (FBG) sensors for real-time shape feedback. Shape reconstruction is represented using four Bezier control points, providing a compact representation of deformation. A data-driven model was trained on 5,097 experimental samples to learn the mapping from magnetic field parameters (magnitude: 0-14 mT, frequency: 0.2-1.0 Hz, vertical tip distances: 90-100 mm) to Bezier control points defining the robot's 3D shape. Both Neural Network (NN) and Random Forest (RF) architectures were compared. The RF model outperformed the NN, achieving a mean RMSE of 0.087 mm in control point prediction and 0.064 mm in shape reconstruction error. Feature importance analysis revealed that magnetic field components predominantly influence distal control points, while frequency and distance affect the base configuration. Unlike prior studies applying general machine learning to soft robotic data, this framework introduces a new paradigm linking magnetic actuation inputs directly to geometric Bezier control points, creating an interpretable, low-dimensional deformation representation. This integration of magnetic field characterization, embedded FBG sensing, and Bezier-based learning provides a unified strategy extensible to other magnetically actuated continuum robots. By enabling sub-millimeter shape prediction and real-time inference, this work advances intelligent control of magnetically actuated soft robotic tools in minimally invasive neurosurgery.
Multiagent Systems
El Agente Quntur: A research collaborator agent for quantum chemistry
Quantum chemistry is a foundational enabling tool for the fields of chemistry, materials science, computational biology and others. Despite of its power, the practical application of quantum chemistry simulations remains in the hands of qualified experts due to methodological complexity, software heterogeneity, and the need for informed interpretation of results. To bridge the accessibility gap for these tools and expand their reach to chemists with broader backgrounds, we introduce El Agente Quntur, a hierarchical, multi-agent AI system designed to operate not merely as an automation tool but as a research collaborator for computational quantum chemistry. Quntur was designed following three main strategies: i) elimination of hard-coded procedural policies in favour of reasoning-driven decisions, ii) construction of general and composable actions that facilitate generalization and efficiency, and iii) implementation of guided deep research to integrate abstract quantum-chemical reasoning across subdisciplines and a detailed understanding of the software's internal logic and syntax. Although instantiated in ORCA, these design principles are applicable to research agents more generally and easily expandable to additional quantum chemistry packages and beyond. Quntur supports the full range of calculations available in ORCA 6.0 and reasons over software documentation and scientific literature to plan, execute, adapt, and analyze in silico chemistry experiments following best practices. We discuss the advances and current bottlenecks in agentic systems operating at the research level in computational chemistry, and outline a roadmap toward a fully autonomous end-to-end computational chemistry research agent.
El Agente Estructural: An Artificially Intelligent Molecular Editor
We present El Agente Estructural, a multimodal, natural-language-driven geometry-generation and manipulation agent for autonomous chemistry and molecular modelling. Unlike molecular generation or editing via generative models, Estructural mimics how human experts directly manipulate molecular systems in three dimensions by integrating a comprehensive set of domain-informed tools and vision-language models. This design enables precise control over atomic or functional group replacements, atomic connectivity, and stereochemistry without the need to rebuild extensive core molecular frameworks. Through a series of representative case studies, we demonstrate that Estructural enables chemically meaningful geometry manipulation across a wide range of real-world scenarios. These include site-selective functionalization, ligand binding, ligand exchange, stereochemically controlled structure construction, isomer interconversion, fragment-level structural analysis, image-guided generation of structures from schematic reaction mechanisms, and mechanism-driven geometry generation and modification. These examples illustrate how multimodal reasoning, when combined with specialized geometry-aware tools, supports interactive and context-aware molecular modelling beyond structure generation. Looking forward, the integration of Estructural into El Agente Quntur, an autonomous multi-agent quantum chemistry platform, enhances its capabilities by adding sophisticated tools for the generation and editing of three-dimensional structures.
WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
Dual Mind World Model Inspired Network Digital Twin for Access Scheduling
Emerging networked systems such as industrial IoT and real-time cyber-physical infrastructures demand intelligent scheduling strategies capable of adapting to dynamic traffic, deadlines, and interference constraints. In this work, we present a novel Digital Twin-enabled scheduling framework inspired by Dual Mind World Model (DMWM) architecture, for learning-informed and imagination-driven network control. Unlike conventional rule-based or purely data-driven policies, the proposed DMWM combines short-horizon predictive planning with symbolic model-based rollout, enabling the scheduler to anticipate future network states and adjust transmission decisions accordingly. We implement the framework in a configurable simulation testbed and benchmark its performance against traditional heuristics and reinforcement learning baselines under varied traffic conditions. Our results show that DMWM achieves superior performance in bursty, interference-limited, and deadline-sensitive environments, while maintaining interpretability and sample efficiency. The proposed design bridges the gap between network-level reasoning and low-overhead learning, marking a step toward scalable and adaptive NDT-based network optimization.
SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing
We present SPEAR, a multi-agent coordination framework for smart contract auditing that applies established MAS patterns in a realistic security analysis workflow. SPEAR models auditing as a coordinated mission carried out by specialized agents: a Planning Agent prioritizes contracts using risk-aware heuristics, an Execution Agent allocates tasks via the Contract Net protocol, and a Repair Agent autonomously recovers from brittle generated artifacts using a programmatic-first repair policy. Agents maintain local beliefs updated through AGM-compliant revision, coordinate via negotiation and auction protocols, and revise plans as new information becomes available. An empirical study compares the multi-agent design with centralized and pipeline-based alternatives under controlled failure scenarios, focusing on coordination, recovery behavior, and resource use.
From Assumptions to Actions: Turning LLM Reasoning into Uncertainty-Aware Planning for Embodied Agents ICLR 2026
Embodied agents operating in multi-agent, partially observable, and decentralized environments must plan and act despite pervasive uncertainty about hidden objects and collaborators' intentions. Recent advances in applying Large Language Models (LLMs) to embodied agents have addressed many long-standing challenges, such as high-level goal decomposition and online adaptation. Yet, uncertainty is still primarily mitigated through frequent inter-agent communication. This incurs substantial token and time costs, and can disrupt established workflows, when human partners are involved. We introduce PCE, a Planner-Composer-Evaluator framework that converts the fragmented assumptions latent in LLM reasoning traces into a structured decision tree. Internal nodes encode environment assumptions and leaves map to actions; each path is then scored by scenario likelihood, goal-directed gain, and execution cost to guide rational action selection without heavy communication. Across two challenging multi-agent benchmarks (C-WAH and TDW-MAT) and three diverse LLM backbones, PCE consistently outperforms communication-centric baselines in success rate and task efficiency while showing comparable token usage. Ablation results indicate that the performance gains obtained by scaling model capacity or reasoning depth persist even when PCE is applied, while PCE consistently raises the baseline across both capacity and reasoning-depth scales, confirming that structured uncertainty handling complements both forms of scaling. A user study further demonstrates that PCE produces communication patterns that human partners perceive as more efficient and trustworthy. Together, these results establish a principled route for turning latent LLM assumptions into reliable strategies for uncertainty-aware planning.
comment: 31 pages, 10 figures, Accepted ICLR 2026
Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration
Multi-expert systems, where multiple Large Language Models (LLMs) collaborate to solve complex tasks, are increasingly adopted for high-performance reasoning and generation. However, the orchestration policies governing expert interaction and sequencing remain largely opaque. We introduce INFORM, an interpretability analysis that treats orchestration as an explicit, analyzable computation, enabling the decoupling of expert interaction structure, execution order, and causal attribution. We use INFORM to evaluate an orchestrator on GSM8K, HumanEval, and MMLU using a homogeneous consortium of ten instruction-tuned experts drawn from LLaMA-3.1 8B, Qwen-3 8B, and DeepSeek-R1 8B, with controlled decoding-temperature variation, and a secondary heterogeneous consortium spanning 1B-7B parameter models. Across tasks, routing dominance is a poor proxy for functional necessity. We reveal a divergence between relational importance, captured by routing mass and interaction topology, and intrinsic importance, measured via gradient-based causal attribution: frequently selected experts often act as interaction hubs with limited causal influence, while sparsely routed experts can be structurally critical. Orchestration behaviors emerge asynchronously, with expert centralization preceding stable routing confidence and expert ordering remaining non-deterministic. Targeted ablations show that masking intrinsically important experts induces disproportionate collapse in interaction structure compared to masking frequent peers, confirming that INFORM exposes causal and structural dependencies beyond accuracy metrics alone.
On the Uncertainty of Large Language Model-Based Multi-Agent Systems
Multi-agent systems (MAS) have emerged as a prominent paradigm for leveraging large language models (LLMs) to tackle complex tasks. However, the mechanisms governing the effectiveness of MAS built upon publicly available LLMs, specifically the underlying rationales for their success or failure, remain largely unexplored. In this paper, we revisit MAS through the perspective of uncertainty, considering both intra- and inter-agent dynamics by investigating entropy transitions during problem-solving across various topologies and six benchmark tasks. By analyzing 245 features spanning token-, trajectory-, and round-level entropy, we counterintuitively find that a single agent outperforms MAS in approximately 43.3% of cases, and that uncertainty dynamics are largely determined during the first round of interaction. Furthermore, we provide three key observations: 1) Certainty Preference: reducing uncertainty at any stage for any agent is critical for guaranteeing correct solutions; 2) Base Uncertainty: base models with lower entropy during problem-solving directly benefit MAS performance; and 3) Task Awareness: entropy dynamics of MAS play varying roles across different tasks. Building on these insights, we introduce a simple yet effective algorithm, the Entropy Judger, to select solutions from MAS's pass@k results, leading to consistent accuracy improvements across all MAS configurations and tasks. Our source code is available at https://github.com/AgenticFinLab/multiagent-entropy.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning
Heterogeneous multi-robot systems are increasingly deployed in long-horizon missions that require coordination among robots with diverse capabilities. However, existing planning approaches struggle to construct accurate symbolic representations and maintain plan consistency in dynamic environments. Classical PDDL planners require manually crafted symbolic models, while LLM-based planners often ignore agent heterogeneity and environmental uncertainty. We introduce KGLAMP, a knowledge-graph-guided LLM planning framework for heterogeneous multi-robot teams. The framework maintains a structured knowledge graph encoding object relations, spatial reachability, and robot capabilities, which guides the LLM in generating accurate PDDL problem specifications. The knowledge graph serves as a persistent, dynamically updated memory that incorporates new observations and triggers replanning upon detecting inconsistencies, enabling symbolic plans to adapt to evolving world states. Experiments on the MAT-THOR benchmark show that KGLAMP improves performance by at least 25.5% over both LLM-only and PDDL-based variants.
Communication Enhances LLMs' Stability in Strategic Thinking
Large Language Models (LLMs) often exhibit pronounced context-dependent variability that undermines predictable multi-agent behavior in tasks requiring strategic thinking. Focusing on models that range from 7 to 9 billion parameters in size engaged in a ten-round repeated Prisoner's Dilemma, we evaluate whether short, costless pre-play messages emulating the cheap-talk paradigm affect strategic stability. Our analysis uses simulation-level bootstrap resampling and nonparametric inference to compare cooperation trajectories fitted with LOWESS regression across both the messaging and the no-messaging condition. We demonstrate consistent reductions in trajectory noise across a majority of the model-context pairings being studied. The stabilizing effect persists across multiple prompt variants and decoding regimes, though its magnitude depends on model choice and contextual framing, with models displaying higher baseline volatility gaining the most. While communication rarely produces harmful instability, we document a few context-specific exceptions and identify the limited domains in which communication harms stability. These findings position cheap-talk style communication as a low-cost, practical tool for improving the predictability and reliability of strategic behavior in multi-agent LLM systems.
comment: 15 pages, 1 figure, 6 tables
Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning AAMAS 2026
Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME's variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67--83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.
comment: Accepted by AAMAS 2026 (oral) with appendix
A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI
Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products. The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area.
SimCity: Multi-Agent Urban Development Simulation with Rich Interactions
Large Language Models (LLMs) open new possibilities for constructing realistic and interpretable macroeconomic simulations. We present SimCity, a multi-agent framework that leverages LLMs to model an interpretable macroeconomic system with heterogeneous agents and rich interactions. Unlike classical equilibrium models that limit heterogeneity for tractability, or traditional agent-based models (ABMs) that rely on hand-crafted decision rules, SimCity enables flexible, adaptive behavior with transparent natural-language reasoning. Within SimCity, four core agent types (households, firms, a central bank, and a government) deliberate and participate in a frictional labor market, a heterogeneous goods market, and a financial market. Furthermore, a Vision-Language Model (VLM) determines the geographic placement of new firms and renders a mapped virtual city, allowing us to study both macroeconomic regularities and urban expansion dynamics within a unified environment. To evaluate the framework, we compile a checklist of canonical macroeconomic phenomena, including price elasticity of demand, Engel's Law, Okun's Law, the Phillips Curve, and the Beveridge Curve, and show that SimCity naturally reproduces these empirical patterns while remaining robust across simulation runs.
comment: 32 pages, 8 figures
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, which often require centralized execution. Decentralized LLM collaboration is more appealing in practice, as agents can run inference in parallel with flexible deployments. Also, current approaches use Monte Carlo methods for fine-tuning, which suffer from high variance and thus require more samples to train effectively. Actor-critic methods are prevalent in MARL for dealing with these issues, so we developed Multi-Agent Actor-Critic (MAAC) methods to optimize decentralized LLM collaboration. In this paper, we analyze when and why these MAAC methods are beneficial. We propose 2 MAAC approaches, \textbf{CoLLM-CC} with a \textbf{C}entralized \textbf{C}ritic and \textbf{CoLLM-DC} with \textbf{D}ecentralized \textbf{C}ritics. Our experiments across writing, coding, and game-playing domains show that Monte Carlo methods and CoLLM-DC can achieve performance comparable to CoLLM-CC in short-horizon and dense-reward settings. However, they both underperform CoLLM-CC on long-horizon or sparse-reward tasks, where Monte Carlo methods require substantially more samples and CoLLM-DC struggles to converge. Our code is available at https://github.com/OpenMLRL/CoMLRL/releases/tag/v1.3.2.
Cooperative Flexibility Exchange: Fair and Comfort-Aware Decentralized Resource Allocation
The growing electricity demand and use of smart appliances are placing pressure on power grids, making efficient energy management more important than ever. The existing energy management systems often prioritize system efficiency (balanced energy demand and supply) at the expense of consumer comfort. This paper addresses this gap by proposing a novel decentralized multi-agent coordination-based demand-side management system. The proposed system enables individual agents to coordinate for demand-side energy optimization while improving consumer comfort and maintaining system efficiency. A key innovation of this work is the introduction of a slot exchange mechanism, where agents first receive optimized appliance-level energy consumption schedules and then coordinate with each other to adjust these schedules through slot exchanges to improve their comfort even when agents show non-altruistic behaviour. It also scales well with large populations and promotes fairness by balancing satisfaction levels across consumers. For performance evaluation, a real-world dataset is used, and the results demonstrate that the proposed slot exchange mechanism increases consumer comfort and fairness without raising system inefficiency cost, making it a practical and scalable solution for future smart grids.
Scaling Multiagent Systems with Process Rewards
While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (1) credit assignment across agents, and (2) sample efficiency of expensive multiagent rollouts. In this work, we propose finetuning multiagent systems with per-action process rewards from AI feedback (MAPPA) to address both. Through assigning credit to individual agent actions rather than only at task completion, MAPPA enables fine-grained supervision without ground truth labels while extracting maximal training signal from each rollout. We demonstrate our approach on competition math problems and tool-augmented data analysis tasks. On unseen math problems, MAPPA achieves +5.0--17.5pp on AIME and +7.8--17.2pp on AMC. For data analysis tasks, our method improves success rate by +16.7pp while quality metrics improve by up to 47%, validating that per-action supervision can lead to improvements across different multiagent systems on various domains. By addressing these challenges, our work takes a first step toward scaling multiagent systems for complex, long-horizon tasks with minimal human supervision.
Steering the Herd: A Framework for LLM-based Control of Social Learning
Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.
A computational framework for human values
In the diverse array of work investigating the nature of human values from psychology, philosophy and social sciences, there is a clear consensus that values guide behaviour. More recently, a recognition that values provide a means to engineer ethical AI has emerged. Indeed, Stuart Russell proposed shifting AI's focus away from simply ``intelligence'' towards intelligence ``provably aligned with human values''. This challenge -- the value alignment problem -- with others including an AI's learning of human values, aggregating individual values to groups, and designing computational mechanisms to reason over values, has energised a sustained research effort. Despite this, no formal, computational definition of values has yet been proposed. We address this through a formal conceptual framework rooted in the social sciences, that provides a foundation for the systematic, integrated and interdisciplinary investigation into how human values can support designing ethical AI.
Systems and Control (EESS)
Review of Superconducting Qubit Devices and Their Large-Scale Integration
The superconducting qubit quantum computer is one of the most promising quantum computing architectures for large-scale integration due to its maturity and close proximity to the well-established semiconductor manufacturing infrastructure. From an education perspective, it also bridges classical microwave electronics and quantum electrodynamics. In this paper, we will review the basics of quantum computers, superconductivity, and Josephson junctions. We then introduce important technologies and concepts related to DiVincenzo's criteria, which are the necessary conditions for the superconducting qubits to work as a useful quantum computer. Firstly, we will discuss various types of superconducting qubits formed with Josephson junctions, from which we will understand the trade-off across multiple design parameters, including their noise immunity. Secondly, we will discuss different schemes to achieve entanglement gate operations, which are a major bottleneck in achieving more efficient fault-tolerant quantum computing. Thirdly, we will review readout engineering, including the implementations of the Purcell filters and quantum-limited amplifiers. Finally, we will discuss the nature and review the studies of two-level system defects, which are currently the limiting factor of qubit coherence time. DiVincenzo's criteria are only the necessary conditions for a technology to be eligible for quantum computing. To have a useful quantum computer, large-scale integration is required. We will review proposals and developments for the large-scale integration of superconducting qubit devices. By comparing with the application of electronic design automation (EDA) in semiconductors, we will also review the use of EDA in superconducting qubit quantum computer design, which is necessary for its large-scale integration.
SQP-Based Cable-Tension Allocation for Multi-Drone Load Transport
Multi-Agent Aerial Load Transport Systems (MAATS) offer greater payload capacity and fault tolerance than single-drone solutions. However, they have an underdetermined tension allocation problem that leads to uneven energy distribution, cable slack, or collisions between drones and cables. This paper presents a real-time optimization layer that improves a hierarchical load-position-attitude controller by incorporating a Sequential Quadratic Programming (SQP) algorithm. The SQP formulation minimizes the sum of squared cable tensions while imposing a cable-alignment penalty that discourages small inter-cable angles, thereby preventing tether convergence without altering the reference trajectory. We tested the method under nominal conditions by running numerical simulations of four quadrotors. Computational experiments based on numerical simulations demonstrate that the SQP routine runs in a few milliseconds on standard hardware, indicating feasibility for real-time use. A sensitivity analysis confirms that the gain of the cable-alignment penalty can be tuned online, enabling a controllable trade-off between safety margin and energy consumption with no measurable degradation of tracking performance in simulation. This framework provides a scalable path to safe and energy-balanced cooperative load transport in practical deployments.
Control Lyapunov Functions for Optimality in Sontag-Type Control
Given a Control Lyapunov Function (CLF), Sontag's famous Formula provides a nonlinear state-feedback guaranteeing asymptotic stability of the setpoint. At the same time, a cost function that depends on the CLF is minimized. While there exist methods to construct CLFs for certain classes of systems, the impact on the resulting performance is unclear. This article aims to make two contributions to this problem: (1) We show that using the value function of an LQR design as CLF, the resulting Sontag-type controller minimizes a classical quadratic cost around the setpoint and a CLF-dependent cost within the domain where the CLF condition holds. We also show that the closed-loop system is stable within a local region at least as large as that generated by the LQR. (2) We show a related CLF design for feedback-linearizable systems resulting in a global CLF in a straight-forward manner; The Sontag design then guarantees global asymptotic stability while minimizing a quadratic cost at the setpoint and a CLF-dependent cost in the whole state-space. Both designs are constructive and easily applicable to nonlinear multi-input systems under mild assumptions.
comment: 6 pages
Dynamic Constraint Tightening for Nonlinear MPC for Autonomous Racing via Contraction Analysis
This work develops a robust nonlinear Model Predictive Control (MPC) framework for path tracking in autonomous vehicles operating at the limits of handling utilizing a Control Contraction Metric (CCM) derived from a perturbed dynamic single track model. We first present a nonlinear MPC scheme for autonomous vehicles. Building on this nominal scheme, we assume limited uncertainty in tire parameters as well as bounded force disturbances in both lateral and longitudinal directions. By simplifying the perturbed model, we optimize a CCM for the uncertain model, which is validated through simulations at the dynamic limits of vehicle performance. This CCM is subsequently employed to parameterize a homothetic tube used for constraint tightening within the MPC formulation. The resulting robust nonlinear MPC is computationally more efficient than competing methods, as it introduces only a single additional state variable into the prediction model compared to the nominal scheme. Simulation results demonstrate that the homothetic tube expands most significantly in regions where the nominal scheme would otherwise violate constraints, illustrating its ability to capture all uncertain trajectories while avoiding unnecessary conservatism.
comment: 8 pages
On Data-Driven Unbiased Predictors using the Koopman Operator
The Koopman operator and its data-driven approximations, such as extended dynamic mode decomposition (EDMD), are widely used for analysing, modelling, and controlling nonlinear dynamical systems. However, when the true Koopman eigenfunctions cannot be identified from finite data, multi-step predictions may suffer from structural inaccuracies and systematic bias. To address this issue, we analyse the first and second moments of the multi-step prediction residual. By decomposing the residual into contributions from the one-step approximation error and the propagation of accumulated inaccuracies, we derive a closed-form expression characterising these effects. This analysis enables the development of a novel and computationally efficient algorithm that enforces unbiasedness and reduces variance in the resulting predictor. The proposed method is validated in numerical simulations, showing improved uncertainty properties compared to standard EDMD. These results lay the foundation for uncertainty-aware and unbiased Koopman-based prediction frameworks that can be extended to controlled and stochastic systems.
comment: This paper is currently under review for ECC 2026
Safe Adaptive Control of Parabolic PDE-ODE Cascades
In this paper, we propose a safe adaptive boundary control strategy for a class of parabolic partial differential equation-ordinary differential equation (PDE-ODE) cascaded systems with parametric uncertainties in both the PDE and ODE subsystems. The proposed design is built upon an adaptive Control Barrier Function (aCBF) framework that incorporates high-relative-degree CBFs together with a batch least-squares identification (BaLSI)-based adaptive control that guarantees exact parameter identification in finite time. The proposed control law ensures that: (i) if the system output state initially lies within a prescribed safe set, safety is maintained for all time; otherwise, the output is driven back into the safe region within a preassigned finite time; and (ii) convergence to zero of all plant states is achieved. Numerical simulations are provided to demonstrate the effectiveness of the proposed approach.
Resilient Load Forecasting under Climate Change: Adaptive Conditional Neural Processes for Few-Shot Extreme Load Forecasting
Extreme weather can substantially change electricity consumption behavior, causing load curves to exhibit sharp spikes and pronounced volatility. If forecasts are inaccurate during those periods, power systems are more likely to face supply shortfalls or localized overloads, forcing emergency actions such as load shedding and increasing the risk of service disruptions and public-safety impacts. This problem is inherently difficult because extreme events can trigger abrupt regime shifts in load patterns, while relevant extreme samples are rare and irregular, making reliable learning and calibration challenging. We propose AdaCNP, a probabilistic forecasting model for data-scarce condition. AdaCNP learns similarity in a shared embedding space. For each target data, it evaluates how relevant each historical context segment is to the current condition and reweights the context information accordingly. This design highlights the most informative historical evidence even when extreme samples are rare. It enables few-shot adaptation to previously unseen extreme patterns. AdaCNP also produces predictive distributions for risk-aware decision-making without expensive fine-tuning on the target domain. We evaluate AdaCNP on real-world power-system load data and compare it against a range of representative baselines. The results show that AdaCNP is more robust during extreme periods, reducing the mean squared error by 22\% relative to the strongest baseline while achieving the lowest negative log-likelihood, indicating more reliable probabilistic outputs. These findings suggest that AdaCNP can effectively mitigate the combined impact of abrupt distribution shifts and scarce extreme samples, providing a more trustworthy forecasting for resilient power system operation under extreme events.
Reinforcement Learning-based Home Energy Management with Heterogeneous Batteries and Stochastic EV Behaviour
The widespread adoption of photovoltaic (PV), electric vehicles (EVs), and stationary energy storage systems (ESS) in households increases system complexity while simultaneously offering new opportunities for energy regulation. However, effectively coordinating these resources under uncertainties remains challenging. This paper proposes a novel home energy management framework based on deep reinforcement learning (DRL) that can jointly minimise energy expenditure and battery degradation while guaranteeing occupant comfort and EV charging requirements. Distinct from existing studies, we explicitly account for the heterogeneous degradation characteristics of stationary and EV batteries in the optimisation, alongside stochastic user behaviour regarding arrival time, departure time, and driving distance. The energy scheduling problem is formulated as a constrained Markov decision process (CMDP) and solved using a Lagrangian soft actor-critic (SAC) algorithm. This approach enables the agent to learn optimal control policies that enforce physical constraints, including indoor temperature bounds and target EV state of charge upon departure, despite stochastic uncertainties. Numerical simulations over a one-year horizon demonstrate the effectiveness of the proposed framework in satisfying physical constraints while eliminating thermal oscillations and achieving significant economic benefits. Specifically, the method reduces the cumulative operating cost substantially compared to two standard rule-based baselines while simultaneously decreasing battery degradation costs by 8.44%.
Peak Bounds for the Estimation Error under Sensor Attacks
This paper investigates bounds on the estimation error of a linear system affected by norm-bounded disturbances and full sensor attacks. The system is equipped with a detector that evaluates the norm of the innovation signal to detect faults, and the attacker wants to avoid detection. We utilize induced $L_\infty$ system norms, also called \emph{peak-to-peak} norms, to compare the estimation error bounds under nominal operations and under attack. This leads to a sufficient condition for when the bound on the estimation error is smaller during an attack than during nominal operation. This condition is independent of the attack strategy and depends only on the attacker's desire to remain undetected and (indirectly) the observer gain. Therefore, we investigate both an observer design method, that seeks to reduce the error bound under attack while keeping the nominal error bound low, and detector threshold tuning. As a numerical illustration, we show how a sensor attack can deactivate a robust safety filter based on control barrier functions if the attacked error bound is larger than the nominal one. We also statistically evaluate our observer design method and the effect of the detector threshold.
comment: 7 pages, 3 figures, accepted at the American Control Conference 2026
A Path-Complete Approach for Optimal Control of Switched Systems
We study the problem of estimating the value function of discrete-time switched systems under arbitrary switching. Unlike the switched LQR problem, where both inputs and mode sequences are optimized, we consider the case where switching is exogenous. For such systems, the number of possible mode sequences grows exponentially with time, making the exact computation of the value function intractable. This motivates the development of tractable bounds that approximate it. We propose a novel framework, based on path-complete graphs, for constructing computable upper bounds on the value function. In this framework, multiple quadratic functions are combined through a directed graph that encodes dynamic programming inequalities, yielding convex and sound formulations. For example, for switched linear systems with quadratic cost, we derive tractable LMI-based formulations and provide computational complexity bounds. We further establish approximation guarantees for the upper bounds and show asymptotic non-conservativeness using concepts from graph theory. Finally, we extend the approach to controller synthesis for systems with affine control inputs and demonstrate its effectiveness on numerical examples.
Parameter Privacy-Preserving Data Sharing: A Particle-Belief MDP Formulation
This paper investigates parameter-privacy-preserving data sharing in continuous-state dynamical systems, where a data owner designs a data-sharing policy to support downstream estimation and control while preventing adversarial inference of a sensitive parameter. This data-sharing problem is formulated as an optimization problem that trades off privacy leakage and the impact of data sharing on the data owner's utility, subject to a data-usability constraint. We show that this problem admits an equivalent belief Markov decision process (MDP) formulation, which provides a simplified representation of the optimal policy. To efficiently characterize information-theoretic privacy leakage in continuous state and action spaces, we propose a particle-belief MDP formulation that tracks the parameter posterior via sequential Monte Carlo, yielding a tractable belief-state approximation that converges asymptotically as the number of particles increases. We further derive a tractable closed-form upper bound on particle-based MI via Gaussian mixture approximations, which enables efficient optimization of the particle-belief MDP. Experiments on a mixed-autonomy platoon show that the learned continuous policy substantially impedes inference attacks on human-driving behavior parameters while maintaining data usability and system performance.
comment: 17 pages, 10 figures
Mitigation of Structural Harmonic Instability in Virtual Admittance-Based Grid-Forming Inverters via Mimicking Skin Effect
The virtual admittance-current controller (VA-CC) scheme is widely employed to emulate an equivalent inductance in front of the internal voltage source of grid-forming inverters. However, recent studies have reported harmonic instabilities associated with VA-CC, motivating the need for a more physically interpretable understanding of their origin. This letter identifies a delay-independent structural mechanism of harmonic instability in the VA-CC scheme, wherein the interaction between the filter and virtual inductances introduces a non-passive second-order transfer-function term exhibiting negative resistance. To address this issue, a simple yet effective modification is proposed by integrating a parallel virtual resistor into the VA structure. This reconfiguration enhances the passivity of VA-CC scheme across the harmonic range by mimicking the skin effect which augments damping in high-frequency range, without altering the wellestablished current controller or voltage feedforward control. Experimental results validate that the proposed method achieves robust harmonic stability, whereas the conventional approach fails under identical grid conditions.
Sampled-Data Wasserstein Distributionally Robust Control of Multiplicative Systems: A Convex Relaxation with Performance Guarantees
This paper investigates the robust optimal control of sampled-data stochastic systems with multiplicative noise and distributional ambiguity. We consider a class of discrete-time optimal control problems where the controller \emph{jointly} selects a feedback policy and a sampling period to maximize the worst-case expected concave utility of the inter-sample growth factor. Modeling uncertainty via a Wasserstein ambiguity set, we confront the structural obstacle of~``concave-max'' geometry arising from maximizing a concave utility against an adversarial distribution. Unlike standard convex loss minimization, the dual reformulation here requires a minimax interchange within the semi-infinite constraints, where the utility's concavity precludes exact strong duality. To address this, we utilize a general minimax inequality to derive a tractable convex relaxation. Our approach yields a rigorous lower bound that functions as a probabilistic performance guarantee. We establish an explicit, non-asymptotic bound on the resulting duality gap, proving that the approximation error is uniformly controlled by the Lipschitz-smoothness of the stage reward and the diameter of the disturbance support. Furthermore, we introduce necessary and sufficient conditions for \emph{robust viability}, ensuring state positivity invariance across the entire ambiguity set. Finally, we bridge the gap between static optimization and dynamic performance, proving that the optimal value of the relaxation serves as a rigorous deterministic floor for the asymptotic average utility rate almost surely. The framework is illustrated on a log-optimal portfolio control problem, which serves as a canonical instance of multiplicative stochastic control.
comment: Submitted for possible publication
Lyapunov Constrained Soft Actor-Critic (LC-SAC) using Koopman Operator Theory for Quadrotor Trajectory Tracking
Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. There has significant work in incorporating Lyapunov-based stability guarantees in RL algorithms with key challenges being selecting a candidate Lyapunov function, computational complexity by using excessive function approximators and conservative policies by incorporating stability criterion in the learning process. In this work we propose a novel Lyapunov-constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We propose use of extended dynamic mode decomposition (EDMD) to produce a linear approximation of the system and use this approximation to derive a closed form solution for candidate Lyapunov function. This derived Lyapunov function is incorporated in the SAC algorithm to further provide guarantees for a policy that stabilizes the nonlinear system. The results are evaluated trajectory tracking of a 2D Quadrotor environment based on safe-control-gym. The proposed algorithm shows training convergence and decaying violations for Lyapunov stability criterion compared to baseline vanilla SAC algorithm. GitHub Repository: https://github.com/DhruvKushwaha/LC-SAC-Quadrotor-Trajectory-Tracking
comment: 12 pages, 7 Figures, submitted to IEEE RA-L
Trojan Attacks on Neural Network Controllers for Robotic Systems
Neural network controllers are increasingly deployed in robotic systems for tasks such as trajectory tracking and pose stabilization. However, their reliance on potentially untrusted training pipelines or supply chains introduces significant security vulnerabilities. This paper investigates backdoor (Trojan) attacks against neural controllers, using a differential-drive mobile robot platform as a case study. In particular, assuming that the robot's tracking controller is implemented as a neural network, we design a lightweight, parallel Trojan network that can be embedded within the controller. This malicious module remains dormant during normal operation but, upon detecting a highly specific trigger condition defined by the robot's pose and goal parameters, compromises the primary controller's wheel velocity commands, resulting in undesired and potentially unsafe robot behaviours. We provide a proof-of-concept implementation of the proposed Trojan network, which is validated through simulation under two different attack scenarios. The results confirm the effectiveness of the proposed attack and demonstrate that neural network-based robotic control systems are subject to potentially critical security threats.
comment: Paper submitted to the 2026 IEEE Conference on Control Technology and Applications (CCTA)
GPU-to-Grid: Voltage Regulation via GPU Utilization Control
While the rapid expansion of data centers poses challenges for power grids, it also offers new opportunities as potentially flexible loads. Existing power system research often abstracts data centers as aggregate resources, while computer system research primarily focuses on optimizing GPU energy efficiency and largely ignores the grid impacts of optimized GPU power consumption. To bridge this gap, we develop a GPU-to-Grid framework that couples device-level GPU control with power system objectives. We study distribution-level voltage regulation enabled by flexibility in LLM inference, using batch size as a control knob that trades off the voltage impacts of GPU power consumption against inference latency and token throughput. We first formulate this problem as an optimization problem and then realize it as an online feedback optimization controller that leverages measurements from both the power grid and GPU systems. Our key insight is that reducing GPU power consumption alleviates violations of lower voltage limits, while increasing GPU power mitigates violations near upper voltage limits in distribution systems; this runs counter to the common belief that minimizing GPU power consumption is always beneficial to power grids.
Learning Nonlinear Continuous-Time Systems for Formal Uncertainty Propagation and Probabilistic Evaluation SC
Nonlinear ordinary differential equations (ODEs) are powerful tools for modeling real-world dynamical systems. However, propagating initial state uncertainty through nonlinear dynamics, especially when the ODE is unknown and learned from data, remains a major challenge. This paper introduces a novel continuum dynamics perspective for model learning that enables formal uncertainty propagation by constructing Taylor series approximations of probabilistic events. We establish sufficient conditions for the soundness of the approach and prove its asymptotic convergence. Empirical results demonstrate the framework's effectiveness, particularly when predicting rare events.
comment: 10 pages, 4 figures, to appear in ACM Int'l Conf. on Hybrid Systems: Computation and Control (HSCC), and ACM/IEEE Int'l Conference on Cyber-Physical Systems (ICCPS) 2026
Banach Control Barrier Functions for Large-Scale Swarm Control
This paper studies the safe control of very large multi-agent systems via a generalized framework that employs so-called Banach Control Barrier Functions (B-CBFs). Modeling a large swarm as probability distribution over a spatial domain, we show how B-CBFs can be used to appropriately capture a variety of macroscopic constraints that can integrate with large-scale swarm objectives. Leveraging this framework, we define stable and filtered gradient flows for large swarms, paying special attention to optimal transport algorithms. Further, we show how to derive agent-level, microscopical algorithms that are consistent with macroscopic counterparts in the large-scale limit. We then identify conditions for which a group of agents can compute a distributed solution that only requires local information from other agents within a communication range. Finally, we showcase the theoretical results over swarm systems in the simulations section.
comment: Accepted by IEEE ACC 2026
Multi-Sensor Scheduling for Remote State Estimation over Wireless MIMO Fading Channels with Semantic Over-the-Air Aggregation
In this work, we study multi-sensor scheduling for remote state estimation over wireless multiple-input multiple-output (MIMO) fading channels using a novel semantic over-the-air (SemOTA) aggregation approach. We first revisit Kalman filtering with conventional over-the-air (OTA) aggregation and highlight its transmit power limitations. To balance power efficiency and estimation performance, we formulate the scheduling task as a finite-horizon dynamic programming (DP) problem. By analyzing the structure of the optimal Q-function, we show that the resulting scheduling policy exhibits a semantic structure that adapts online to the estimation error covariance and channel variations. To obtain a practical solution, we derive a tractable upper bound on the Q-function via a positive semidefinite (PSD) cone decomposition, which enables an efficient approximate scheduling policy and a low-complexity remote estimation algorithm. Numerical results confirm that the proposed scheme outperforms existing methods in both estimation accuracy and power efficiency.
Steering the Herd: A Framework for LLM-based Control of Social Learning
Algorithms increasingly serve as information mediators--from social media feeds and targeted advertising to the increasing ubiquity of LLMs. This engenders a joint process where agents combine private, algorithmically-mediated signals with learning from peers to arrive at decisions. To study such settings, we introduce a model of controlled sequential social learning in which an information-mediating planner (e.g. an LLM) controls the information structure of agents while they also learn from the decisions of earlier agents. The planner may seek to improve social welfare (altruistic planner) or to induce a specific action the planner prefers (biased planner). Our framework presents a new optimization problem for social learning that combines dynamic programming with decentralized action choices and Bayesian belief updates. We prove the convexity of the value function and characterize the optimal policies of altruistic and biased planners, which attain desired tradeoffs between the costs they incur and the payoffs they earn from induced agent choices. Notably, in some regimes the biased planner intentionally obfuscates the agents' signals. Even under stringent transparency constraints--information parity with individuals, no lying or cherry-picking, and full observability--we show that information mediation can substantially shift social welfare in either direction. We complement our theory with simulations in which LLMs act as both planner and agents. Notably, the LLM planner in our simulations exhibits emergent strategic behavior in steering public opinion that broadly mirrors the trends predicted, though key deviations suggest the influence of non-Bayesian reasoning consistent with the cognitive patterns of both humans and LLMs trained on human-like data. Together, we establish our framework as a tractable basis for studying the impact and regulation of LLM information mediators.
AI-Powered CPS-Enabled Vulnerable-User-Aware Urban Transportation Digital Twin: Methods and Applications
We present methods and applications for the development of digital twins (DT) for urban traffic management. While the majority of studies on the DT focus on its ``eyes," which is the emerging sensing and perception like object detection and tracking, what really distinguishes the DT from a traditional simulator lies in its ``brain," the prediction and decision making capabilities of extracting patterns and making informed decisions from what has been seen and perceived. In order to add value to urban transportation management, DTs need to be powered by artificial intelligence and complement with low-latency high-bandwidth sensing and networking technologies, in other words, cyberphysical systems. This paper can be a pointer to help researchers and practitioners identify challenges and opportunities for the development of DTs; a bridge to initiate conversations across disciplines; and a road map to exploiting potentials of DTs for diverse urban transportation applications.
Boosting the transient performance of reference tracking controllers with neural networks
Reference tracking is a key objective in many control systems, including those characterized by complex nonlinear dynamics. In these settings, traditional control approaches can effectively ensure steady-state accuracy but often struggle to explicitly optimize transient performance. Neural network controllers have gained popularity due to their adaptability to nonlinearities and disturbances; however, they often lack formal closed-loop stability and performance guarantees. To address these challenges, a recently proposed neural-network control framework known as Performance Boosting (PB) has demonstrated the ability to maintain $\mathcal{L}_p$ stability properties of nonlinear systems while optimizing generic transient costs. This paper extends the PB approach to reference tracking problems. First, we characterize the complete set of nonlinear controllers that preserve desired tracking properties for nonlinear systems equipped with base reference-tracking controllers. Then, we show how to optimize transient costs while searching within subsets of tracking controllers that incorporate expressive neural network models. Furthermore, we analyze the robustness of our method to uncertainties in the underlying system dynamics. Numerical simulations on a robotic system demonstrate the advantages of our approach over the standard PB framework.
Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints
Autonomous navigation for nano-scale unmanned aerial vehicles (nano-UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub-100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state-of-the-art in sensing, computing, and control architectures designed specifically for these sub- 100mW computational envelopes. We critically analyse the transition from classical geometry-based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra-low-power System-on-Chips (SoCs) and neuromorphic event-based control. Beyond algorithms, we evaluate the hardware-software co-design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning-based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long-term endurance, robust obstacle avoidance in dynamic environments, and the "Sim-to-Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data-driven perception to enable fully autonomous, agile nano-UAVs in GPS-denied environments.
comment: 30 pages, 5 figures, 2 table. Review article
Robust Adaptive Discrete-Time Control Barrier Certificate
This work develops a robust adaptive control strategy for discrete-time systems using Control Barrier Functions (CBFs) to ensure safety under parametric model uncertainty and disturbances. A key contribution of this work is establishing a barrier function certificate in discrete time for general online parameter estimation algorithms. This barrier function certificate guarantees positive invariance of the safe set despite disturbances and parametric uncertainty without access to the true system parameters. In addition, real-time implementation and inherent robustness guarantees are provided. The proposed robust adaptive safe control framework demonstrates that the parameter estimation module can be designed separately from the CBF-based safety filter, simplifying the development of safe adaptive controllers for discrete-time systems. The resulting safe control approach guarantees that the system remains within the safe set while adapting to model uncertainties, making it a promising strategy for discrete-time safety-critical systems.
comment: 11 pages with updated simulations and illustrative figures. The conditions used for verifying barrier functions and online safe control are separated, which is a crucial and overlooked point in the literature on adaptive safe control. The paper is submitted to Automatica
Data-Driven Greenhouse Climate Regulation in Lettuce Cultivation Using BiLSTM and GRU Predictive Control
Efficient greenhouse management is essential for sustainable food production, but the high energy demand for climate regulation poses significant economic and environmental challenges. While traditional process-based greenhouse models exist, they are often too complex or imprecise for reliable control. To address this, our study introduces a novel data-driven predictive control framework using Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) neural networks within a Model Predictive Control (MPC) architecture. Training data were generated from a validated dynamic model simulating lettuce cultivation under various environmental conditions. The LSTM and GRU networks were trained to predict future greenhouse states -- including temperature, humidity, CO\textsubscript{2} concentration, and crop dry matter -- with robustness confirmed via $10$-fold cross-validation. These networks were embedded into an online MPC controller to optimize heating, ventilation, and CO\textsubscript{2} injection, aiming to minimize energy consumption and maximize crop yield while respecting biological constraints. Results showed that both the LSTM- and GRU-based controllers significantly outperformed a conventional MPC baseline. For example, humidity violations dropped from 54.77\% (MPC) to 15.45\% (GRU) and 17.71\% (LSTM), while day-night temperature deviations were kept below $2^\circ\text{C}$. The GRU controller further achieved up to 40\% lower computation time than its LSTM counterpart, confirming its real-time feasibility. Overall, the proposed GRU-driven predictive control approach offers a robust and computationally efficient solution for intelligent greenhouse climate automation under practical operational constraints.
Structure Identification of NDS with Descriptor Subsystems under Asynchronous, Non-Uniform, and Slow-Rate Sampling
This paper extends previous identification method to the asynchronous sampling scenario, enabling the simultaneous handling of asynchronous, non-uniform, and slow-rate sampling conditions. Moving beyond lumped systems, the proposed framework targets the identification of interconnection structure of Networked Dynamic Systems (NDS) with descriptor-form subsystems. In the first stage, right tangential interpolations are estimated from steady-state outputs, allowing all asynchronous samples to be fused into a unified estimator. In the second stage, a left null-space projection is employed to decouple the bilinear dependence between state-related matrices and interconnection parameters, reducing the identification problem to two successive linear estimation problems. The proposed approach eliminates the full-normal-rank transfer matrix assumption required in previous work, while providing theoretical guarantees of mean-square consistency and asymptotic unbiasedness. Numerical results demonstrate that the framework can accurately recover the system structure, even under severe sampling irregularities.
comment: 12 pages, 4 figures
EVOLVE: a Value-Added Services Platform for Electric Vehicle Charging Stations
A notable challenge in Electric Vehicle (EV) charging is the time required to fully charge the battery, which can range from 15 minutes to 2-3 hours. This idle period, however, presents an opportunity to offer time-consuming or data-intensive services such as vehicular software updates. ISO 15118 referred to the concept of Value-Added Services (VAS) in the charging scenario, but it remained underexplored in the literature. Our paper addresses this gap by proposing \acronym, the first EV charger compute architecture that supports secure on-charger universal applications with upstream and downstream communication. The architecture covers the end-to-end hardware/software stack, including standard API for vehicles and IT infrastructure. We demonstrate the feasibility and advantages of \acronym by employing and evaluating three suggested value-added services: vehicular software updates, security information and event management (SIEM), and secure payments. The results demonstrate significant reductions in bandwidth utilization and latency, as well as high throughput, which supports this novel concept and suggests a promising business model for Electric Vehicle charging station operation.
comment: This paper has been accepted for publication at 2025 IEEE 101st Vehicular Technology Conference (VTC2025-Spring) 2025. The final, published version is available via IEEE Xplore
LLM-Driven Transient Stability Assessment: From Automated Simulation to Neural Architecture Design
This paper presents an LLM-driven, end-to-end workflow that addresses the lack of automation and intelligence in power system transient stability assessment (TSA). The proposed agentic framework integrates large language models (LLMs) with a professional simulator (ANDES) to automatically generate and filter disturbance scenarios from natural language, and employs an LLM-driven Neural Network Design (LLM-NND) pipeline to autonomously design and optimize TSA models through performance-guided, closed-loop feedback. On the IEEE 39-bus system, the LLM-NND models achieve 93.71% test accuracy on four-class TSA with only 4.78M parameters, while maintaining real-time inference latency (less than 0.95 ms per sample). Compared with a manually designed DenseNet (25.9M parameters, 80.05% accuracy), the proposed approach jointly improves accuracy and efficiency. Ablation studies confirm that the synergy among domain-grounded retrieval, reasoning augmentation, and feedback mechanisms is essential for robust automation. The results demonstrate that LLM agents can reliably accelerate TSA research from scenario generation and data acquisition to model design and interpretation, offering a scalable paradigm that is readily extensible to other power system tasks such as optimal power flow, fault analysis, and market operations.
Information Shapes Koopman Representation ICLR 2026
The Koopman operator provides a powerful framework for modeling dynamical systems and has attracted growing interest from the machine learning community. However, its infinite-dimensional nature makes identifying suitable finite-dimensional subspaces challenging, especially for deep architectures. We argue that these difficulties come from suboptimal representation learning, where latent variables fail to balance expressivity and simplicity. This tension is closely related to the information bottleneck (IB) dilemma: constructing compressed representations that are both compact and predictive. Rethinking Koopman learning through this lens, we demonstrate that latent mutual information promotes simplicity, yet an overemphasis on simplicity may cause latent space to collapse onto a few dominant modes. In contrast, expressiveness is sustained by the von Neumann entropy, which prevents such collapse and encourages mode diversity. This insight leads us to propose an information-theoretic Lagrangian formulation that explicitly balances this tradeoff. Furthermore, we propose a new algorithm based on the Lagrangian formulation that encourages both simplicity and expressiveness, leading to a stable and interpretable Koopman representation. Beyond quantitative evaluations, we further visualize the learned manifolds under our representations, observing empirical results consistent with our theoretical predictions. Finally, we validate our approach across a diverse range of dynamical systems, demonstrating improved performance over existing Koopman learning methods. The implementation is publicly available at https://github.com/Wenxuan52/InformationKoopman.
comment: Published as a conference paper at ICLR 2026
On the Equilibrium between Feasible Zone and Uncertain Model in Safe Exploration
Ensuring the safety of environmental exploration is a critical problem in reinforcement learning (RL). While limiting exploration to a feasible zone has become widely accepted as a way to ensure safety, key questions remain unresolved: what is the maximum feasible zone achievable through exploration, and how can it be identified? This paper, for the first time, answers these questions by revealing that the goal of safe exploration is to find the equilibrium between the feasible zone and the environment model. This conclusion is based on the understanding that these two components are interdependent: a larger feasible zone leads to a more accurate environment model, and a more accurate model, in turn, enables exploring a larger zone. We propose the first equilibrium-oriented safe exploration framework called safe equilibrium exploration (SEE), which alternates between finding the maximum feasible zone and the least uncertain model. Using a graph formulation of the uncertain model, we prove that the uncertain model obtained by SEE is monotonically refined, the feasible zones monotonically expand, and both converge to the equilibrium of safe exploration. Experiments on classic control tasks show that our algorithm successfully expands the feasible zones with zero constraint violation, and achieves the equilibrium of safe exploration within a few iterations.
Learning-based Observer for Coupled Disturbance
Achieving high-precision control for robotic systems is hindered by the low-fidelity dynamical model and external disturbances. Especially, the intricate coupling between internal uncertainties and external disturbances further exacerbates this challenge. This study introduces an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning philosophies. Concretely, by resorting to Chebyshev series expansion, the coupled disturbance is firstly decomposed into an unknown parameter matrix and two known structures dependent on system state and external disturbance respectively. A regularized least squares algorithm is subsequently formalized to learn the parameter matrix using historical time-series data. Finally, a polynomial disturbance observer is specifically devised to achieve a high-precision estimation of the coupled disturbance by utilizing the learned portion. The proposed algorithm is evaluated through extensive simulations and real flight tests. We believe this work can offer a new pathway to integrate learning approaches into control frameworks for addressing longstanding challenges in robotic applications.
comment: 10 pages, 7 figures
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but $>40\%$ fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. Together, these results offer a transferable framework for decision-making in adaptive control.
comment: 28 pages, 11 figures
Bandit-Based Rate Adaptation for a Single-Server Queue
This paper considers the problem of obtaining bounded time-average expected queue sizes in a single-queue system with a partial-feedback structure. Time is slotted; in slot $t$ the transmitter chooses a rate $V(t)$ from a continuous interval. Transmission succeeds if and only if $V(t)\le C(t)$, where channel capacities $\{C(t)\}$ and arrivals are i.i.d. draws from fixed but unknown distributions. The transmitter observes only binary acknowledgments (ACK/NACK) indicating success or failure. Let $\varepsilon>0$ denote a sufficiently small lower bound on the slack between the arrival rate and the capacity region. We propose a phased algorithm that progressively refines a discretization of the uncountable infinite rate space and, without knowledge of $\varepsilon$, achieves a $\mathcal{O}\!\big(\log^{3.5}(1/\varepsilon)/\varepsilon^{3}\big)$ time-average expected queue size uniformly over the horizon. We also prove a converse result showing that for any rate-selection algorithm, regardless of whether $\varepsilon$ is known, there exists an environment in which the worst-case time-average expected queue size is $Ω(1/\varepsilon^{2})$. Thus, while a gap remains in the setting without knowledge of $\varepsilon$, we show that if $\varepsilon$ is known, a simple single-stage UCB type policy with a fixed discretization of the rate space achieves $\mathcal{O}\!\big(\log(1/\varepsilon)/\varepsilon^{2}\big)$, matching the converse up to logarithmic factors.
Modeling and Simulation of an Active Car Suspension with a Robust LQR Controller under Road Disturbance, Parameter Uncertainty and White Noise
Vehicle suspension is important for passengers to travel comfortably and to be less exposed to effects such as vibration and shock. A good suspension system increases the road holding of vehicles, allows them to take turns safely, and reduces the risk of traffic accidents. A passive suspension system is the most widely used suspension system in vehicles due to its simple structure and low cost. Passive suspension systems do not have an actuator and therefore do not have a controller. Active suspension systems have an actuator and a controller. Although their structures are more complex and costly, they are safer. The Proportional-IntegralDerivative (PID) controller is widely used in active suspension systems due to its simple structure, reasonable cost, and easy adjustment of coefficients. In this study, a more robust Linear Quadratic Regulator (LQR)-controlled active suspension was designed than a passive suspension and a PID-controlled active suspension. Robustness analyses were performed for passive suspension, PIDcontrolled active suspension, and LQR-controlled active suspension. Suspension travel, sprung mass acceleration, and sprung mass motion simulations were performed for all three suspensions under road disturbance, under simultaneous road disturbance and parameter uncertainty and under road disturbance with white noise. A comparative analysis was performed by obtaining the rise time, overshoot, and settling time data of the suspensions under different conditions. It was observed that the LQR-controlled active suspension showed the fastest rise time, the least overshoot and had the shortest settling time. In this case, it was proven that the LQRcontrolled active suspension provided a more comfortable and safe ride compared to the other two suspension systems.
comment: 15 pages, 19 figures
Robotics
Conformal Reachability for Safe Control in Unknown Environments
Designing provably safe control is a core problem in trustworthy autonomy. However, most prior work in this regard assumes either that the system dynamics are known or deterministic, or that the state and action space are finite, significantly limiting application scope. We address this limitation by developing a probabilistic verification framework for unknown dynamical systems which combines conformal prediction with reachability analysis. In particular, we use conformal prediction to obtain valid uncertainty intervals for the unknown dynamics at each time step, with reachability then verifying whether safety is maintained within the conformal uncertainty bounds. Next, we develop an algorithmic approach for training control policies that optimize nominal reward while also maximizing the planning horizon with sound probabilistic safety guarantees. We evaluate the proposed approach in seven safe control settings spanning four domains -- cartpole, lane following, drone control, and safe navigation -- for both affine and nonlinear safety specifications. Our experiments show that the policies we learn achieve the strongest provable safety guarantees while still maintaining high average reward.
BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks
Embodied world models have emerged as a promising paradigm in robotics, most of which leverage large-scale Internet videos or pretrained video generation models to enrich visual and motion priors. However, they still face key challenges: a misalignment between coordinate-space actions and pixel-space videos, sensitivity to camera viewpoint, and non-unified architectures across embodiments. To this end, we present BridgeV2W, which converts coordinate-space actions into pixel-aligned embodiment masks rendered from the URDF and camera parameters. These masks are then injected into a pretrained video generation model via a ControlNet-style pathway, which aligns the action control signals with predicted videos, adds view-specific conditioning to accommodate camera viewpoints, and yields a unified world model architecture across embodiments. To mitigate overfitting to static backgrounds, BridgeV2W further introduces a flow-based motion loss that focuses on learning dynamic and task-relevant regions. Experiments on single-arm (DROID) and dual-arm (AgiBot-G1) datasets, covering diverse and challenging conditions with unseen viewpoints and scenes, show that BridgeV2W improves video generation quality compared to prior state-of-the-art methods. We further demonstrate the potential of BridgeV2W on downstream real-world tasks, including policy evaluation and goal-conditioned planning. More results can be found on our project website at https://BridgeV2W.github.io .
QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization ICLR2026
The advent of Vision-Language-Action (VLA) models represents a significant leap for embodied intelligence, yet their immense computational demands critically hinder deployment on resource-constrained robotic platforms. Intuitively, low-bit quantization is a prevalent and preferred technique for large-scale model compression. However, we find that a systematic analysis of VLA model's quantization is fundamentally lacking. We argue that naively applying uniform-bit quantization from Large Language Models (LLMs) to robotics is flawed, as these methods prioritize passive data fidelity while ignoring how minor action deviations compound into catastrophic task failures. To bridge this gap, we introduce QVLA, the first action-centric quantization framework specifically designed for embodied control. In a sharp departure from the rigid, uniform-bit quantization of LLM-based methods, QVLA introduces a highly granular, channel-wise bit allocation strategy. Its core mechanism is to directly measure the final action-space sensitivity when quantizing each individual channel to various bit-widths. This process yields a precise, per-channel importance metric that guides a global optimization, which elegantly unifies quantization and pruning (0-bit) into a single, cohesive framework. Extensive evaluations on different baselines demonstrate the superiority of our approach. In the LIBERO, the quantization version of OpenVLA-OFT with our method requires only 29.2% of the original model's VRAM while maintaining 98.9% of its original performance and achieving a 1.49x speedup. This translates to a 22.6% performance improvement over the LLM-derived method SmoothQuant. Our work establishes a new, principled foundation for compressing VLA models in robotics, paving the way for deploying powerful, large-scale models on real-world hardware. Code will be released.
comment: ICLR2026
A Scene Graph Backed Approach to Open Set Semantic Mapping
While Open Set Semantic Mapping and 3D Semantic Scene Graphs (3DSSGs) are established paradigms in robotic perception, deploying them effectively to support high-level reasoning in large-scale, real-world environments remains a significant challenge. Most existing approaches decouple perception from representation, treating the scene graph as a derivative layer generated post hoc. This limits both consistency and scalability. In contrast, we propose a mapping architecture where the 3DSSG serves as the foundational backend, acting as the primary knowledge representation for the entire mapping process. Our approach leverages prior work on incremental scene graph prediction to infer and update the graph structure in real-time as the environment is explored. This ensures that the map remains topologically consistent and computationally efficient, even during extended operations in large-scale settings. By maintaining an explicit, spatially grounded representation that supports both flat and hierarchical topologies, we bridge the gap between sub-symbolic raw sensor data and high-level symbolic reasoning. Consequently, this provides a stable, verifiable structure that knowledge-driven frameworks, ranging from knowledge graphs and ontologies to Large Language Models (LLMs), can directly exploit, enabling agents to operate with enhanced interpretability, trustworthiness, and alignment to human concepts.
Input-to-State Safe Backstepping: Robust Safety-Critical Control with Unmatched Uncertainties
Guaranteeing safety in the presence of unmatched disturbances -- uncertainties that cannot be directly canceled by the control input -- remains a key challenge in nonlinear control. This paper presents a constructive approach to safety-critical control of nonlinear systems with unmatched disturbances. We first present a generalization of the input-to-state safety (ISSf) framework for systems with these uncertainties using the recently developed notion of an Optimal Decay CBF, which provides more flexibility for satisfying the associated Lyapunov-like conditions for safety. From there, we outline a procedure for constructing ISSf-CBFs for two relevant classes of systems with unmatched uncertainties: i) strict-feedback systems; ii) dual-relative-degree systems, which are similar to differentially flat systems. Our theoretical results are illustrated via numerical simulations of an inverted pendulum and planar quadrotor.
comment: To appear at the 2026 American Control Conference
When Should Agents Coordinate in Differentiable Sequential Decision Problems?
Multi-robot teams must coordinate to operate effectively. When a team operates in an uncoordinated manner, and agents choose actions that are only individually optimal, the team's outcome can suffer. However, in many domains, coordination requires costly communication. We explore the value of coordination in a broad class of differentiable motion-planning problems. In particular, we model coordinated behavior as a spectrum: at one extreme, agents jointly optimize a common team objective, and at the other, agents make unilaterally optimal decisions given their individual decision variables, i.e., they operate at Nash equilibria. We then demonstrate that reasoning about coordination in differentiable motion-planning problems reduces to reasoning about the second-order properties of agents' objectives, and we provide algorithms that use this second-order reasoning to determine at which times a team of agents should coordinate.
comment: 15 content pages, 2 pages for references, 4 figures
MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction
Learning \emph{latent actions} from diverse human videos enables scaling robot learning beyond embodiment-specific robot datasets, and these latent actions have recently been used as pseudo-action labels for vision-language-action (VLA) model pretraining. To make VLA pretraining effective, latent actions should contain information about the underlying agent's actions despite the absence of ground-truth labels. We propose \textbf{M}ulti-\textbf{V}iew\textbf{P}oint \textbf{L}atent \textbf{A}ction \textbf{M}odel (\textbf{MVP-LAM}), which learns discrete latent actions that are highly informative about ground-truth actions from time-synchronized multi-view videos. MVP-LAM trains latent actions with a \emph{cross-viewpoint reconstruction} objective, so that a latent action inferred from one view must explain the future in another view, reducing reliance on viewpoint-specific cues. On Bridge V2, MVP-LAM produces more action-centric latent actions, achieving higher mutual information with ground-truth actions and improved action prediction, including under out-of-distribution evaluation. Finally, pretraining VLAs with MVP-LAM latent actions improves downstream manipulation performance on the SIMPLER and LIBERO-Long benchmarks.
Variance-Reduced Model Predictive Path Integral via Quadratic Model Approximation
Sampling-based controllers, such as Model Predictive Path Integral (MPPI) methods, offer substantial flexibility but often suffer from high variance and low sample efficiency. To address these challenges, we introduce a hybrid variance-reduced MPPI framework that integrates a prior model into the sampling process. Our key insight is to decompose the objective function into a known approximate model and a residual term. Since the residual captures only the discrepancy between the model and the objective, it typically exhibits a smaller magnitude and lower variance than the original objective. Although this principle applies to general modeling choices, we demonstrate that adopting a quadratic approximation enables the derivation of a closed-form, model-guided prior that effectively concentrates samples in informative regions. Crucially, the framework is agnostic to the source of geometric information, allowing the quadratic model to be constructed from exact derivatives, structural approximations (e.g., Gauss- or Quasi-Newton), or gradient-free randomized smoothing. We validate the approach on standard optimization benchmarks, a nonlinear, underactuated cart-pole control task, and a contact-rich manipulation problem with non-smooth dynamics. Across these domains, we achieve faster convergence and superior performance in low-sample regimes compared to standard MPPI. These results suggest that the method can make sample-based control strategies more practical in scenarios where obtaining samples is expensive or limited.
Self-supervised Physics-Informed Manipulation of Deformable Linear Objects with Non-negligible Dynamics
We address dynamic manipulation of deformable linear objects by presenting SPiD, a physics-informed self-supervised learning framework that couples an accurate deformable object model with an augmented self-supervised training strategy. On the modeling side, we extend a mass-spring model to more accurately capture object dynamics while remaining lightweight enough for high-throughput rollouts during self-supervised learning. On the learning side, we train a neural controller using a task-oriented cost, enabling end-to-end optimization through interaction with the differentiable object model. In addition, we propose a self-supervised DAgger variant that detects distribution shift during deployment and performs offline self-correction to further enhance robustness without expert supervision. We evaluate our method primarily on the rope stabilization task, where a robot must bring a swinging rope to rest as quickly and smoothly as possible. Extensive experiments in both simulation and the real world demonstrate that the proposed controller achieves fast and smooth rope stabilization, generalizing across unseen initial states, rope lengths, masses, non-uniform mass distributions, and external disturbances. Additionally, we develop an affordable markerless rope perception method and demonstrate that our controller maintains performance with noisy and low-frequency state updates. Furthermore, we demonstrate the generality of the framework by extending it to the rope trajectory tracking task. Overall, SPiD offers a data-efficient, robust, and physically grounded framework for dynamic manipulation of deformable linear objects, featuring strong sim-to-real generalization.
comment: Submitted to IEEE Transactions on Robotics. Video: https://youtu.be/lgX2J-00TRM
Human-in-the-Loop Failure Recovery with Adaptive Task Allocation
Since the recent Covid-19 pandemic, mobile manipulators and humanoid assistive robots with higher levels of autonomy have increasingly been adopted for patient care and living assistance. Despite advancements in autonomy, these robots often struggle to perform reliably in dynamic and unstructured environments and require human intervention to recover from failures. Effective human-robot collaboration is essential to enable robots to receive assistance from the most competent operator, in order to reduce their workload and minimize disruptions in task execution. In this paper, we propose an adaptive method for allocating robotic failures to human operators (ARFA). Our proposed approach models the capabilities of human operators, and continuously updates these beliefs based on their actual performance for failure recovery. For every failure to be resolved, a reward function calculates expected outcomes based on operator capabilities and historical data, task urgency, and current workload distribution. The failure is then assigned to the operator with the highest expected reward. Our simulations and user studies show that ARFA outperforms random allocation, significantly reducing robot idle time, improving overall system performance, and leading to a more distributed workload among operators.
Multi-Player, Multi-Strategy Quantum Game Model for Interaction-Aware Decision-Making in Autonomous Driving
Although significant progress has been made in decision-making for automated driving, challenges remain for deployment in the real world. One challenge lies in addressing interaction-awareness. Most existing approaches oversimplify interactions between the ego vehicle and surrounding agents, and often neglect interactions among the agents themselves. A common solution is to model these interactions using classical game theory. However, its formulation assumes rational players, whereas human behavior is frequently uncertain or irrational. To address these challenges, we propose the Quantum Game Decision-Making (QGDM) model, a novel framework that combines classical game theory with quantum mechanics principles (such as superposition, entanglement, and interference) to tackle multi-player, multi-strategy decision-making problems. To the best of our knowledge, this is one of the first studies to apply quantum game theory to decision-making for automated driving. QGDM runs in real time on a standard computer, without requiring quantum hardware. We evaluate QGDM in simulation across various scenarios, including roundabouts, merging, and highways, and compare its performance with multiple baseline methods. Results show that QGDM significantly improves success rates and reduces collision rates compared to classical approaches, particularly in scenarios with high interaction.
Formal Evidence Generation for Assurance Cases for Robotic Software Models
Robotics and Autonomous Systems are increasingly deployed in safety-critical domains, so that demonstrating their safety is essential. Assurance Cases (ACs) provide structured arguments supported by evidence, but generating and maintaining this evidence is labour-intensive, error-prone, and difficult to keep consistent as systems evolve. We present a model-based approach to systematically generating AC evidence by embedding formal verification into the assurance workflow. The approach addresses three challenges: systematically deriving formal assertions from natural language requirements using templates, orchestrating multiple formal verification tools to handle diverse property types, and integrating formal evidence production into the workflow. Leveraging RoboChart, a domain-specific modelling language with formal semantics, we combine model checking and theorem proving in our approach. Structured requirements are automatically transformed into formal assertions using predefined templates, and verification results are automatically integrated as evidence. Case studies demonstrate the effectiveness of our approach.
comment: This is a preprint. The paper is currently under review at Software and Systems Modeling
AffordanceGrasp-R1:Leveraging Reasoning-Based Affordance Segmentation with Reinforcement Learning for Robotic Grasping
We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learning to enhance deduction and spatial grounding. In addition, we redesign the grasping pipeline to be more context-aware by generating grasp candidates from the global scene point cloud and subsequently filtering them using instruction-conditioned affordance masks. Extensive experiments demonstrate that AffordanceGrasp-R1 consistently outperforms state-of-the-art (SOTA) methods on benchmark datasets, and real-world robotic grasping evaluations further validate its robustness and generalization under complex language-conditioned manipulation scenarios.
comment: Preprint version
Investigating the Influence of Spatial Ability in Augmented Reality-assisted Robot Programming
Augmented Reality (AR) offers promising opportunities to enhance learning, but its mechanisms and effects are not yet fully understood. As learning becomes increasingly personalized, considering individual learner characteristics becomes more important. This study investigates the moderating effect of spatial ability on learning experience with AR in the context of robot programming. A between-subjects experiment ($N=71$) compared conventional robot programming to an AR-assisted approach using a head-mounted display. Participants' spatial ability was assessed using the Mental Rotation Test. The learning experience was measured through the System Usability Scale (SUS) and cognitive load. The results indicate that AR support does not significantly improve the learning experience compared to the conventional approach. However, AR appears to have a compensatory effect on the influence of spatial ability. In the control group, spatial ability was significantly positively associated with SUS scores and negatively associated with extraneous cognitive load, indicating that higher spatial ability predicts a better learning experience. In the AR condition, these relationships were not observable, suggesting that AR mitigated the disadvantage typically experienced by learners with lower spatial abilities. These findings suggest that AR can serve a compensatory function by reducing the influence of learner characteristics. Future research should further explore this compensatory role of AR to guide the design of personalized learning environments that address diverse learner needs and reduce barriers for learners with varying cognitive profiles.
CMR: Contractive Mapping Embeddings for Robust Humanoid Locomotion on Unstructured Terrains
Robust disturbance rejection remains a longstanding challenge in humanoid locomotion, particularly on unstructured terrains where sensing is unreliable and model mismatch is pronounced. While perception information, such as height map, enhances terrain awareness, sensor noise and sim-to-real gaps can destabilize policies in practice. In this work, we provide theoretical analysis that bounds the return gap under observation noise, when the induced latent dynamics are contractive. Furthermore, we present Contractive Mapping for Robustness (CMR) framework that maps high-dimensional, disturbance-prone observations into a latent space, where local perturbations are attenuated over time. Specifically, this approach couples contrastive representation learning with Lipschitz regularization to preserve task-relevant geometry while explicitly controlling sensitivity. Notably, the formulation can be incorporated into modern deep reinforcement learning pipelines as an auxiliary loss term with minimal additional technical effort required. Further, our extensive humanoid experiments show that CMR potently outperforms other locomotion algorithms under increased noise.
HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic ICRA
We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navi- gating real-world heterogeneous traffic dominated by vulner- able road users (VRUs), including pedestrians, cyclists, and motorcyclists that interact with vehicles. These mixed agent types exhibit complex behaviors such as hook turns, lane splitting, and informal right-of-way negotiation. Such behaviors pose significant challenges for autonomous vehicles but remain underrepresented in existing datasets focused on structured, lane-disciplined traffic. To bridge the gap, we collect a large- scale drone-based dataset to provide a holistic observation of traffic scenes with centimeter-accurate annotations, HD maps, and traffic signal states. We further develop a modular toolkit for extracting per-agent scenarios to support downstream task development. In total, the dataset comprises over 65.4k high- fidelity agent trajectories, 70% of which are from VRUs. HetroD supports modeling of VRU behaviors in dense, het- erogeneous traffic and provides standardized benchmarks for forecasting, planning, and simulation tasks. Evaluation results reveal that state-of-the-art prediction and planning models struggle with the challenges presented by our dataset: they fail to predict lateral VRU movements, cannot handle unstructured maneuvers, and exhibit limited performance in dense and multi-agent scenarios, highlighting the need for more robust approaches to heterogeneous traffic. See our project page for more examples: https://hetroddata.github.io/HetroD/
comment: IEEE International Conference on Robotics and Automation (ICRA) 2026
CRL-VLA: Continual Vision-Language-Action Learning
Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (VLA) models to master dexterous manipulation through environmental interaction. Thus, Continual Reinforcement Learning (CRL) is a promising pathway for deploying VLA models in lifelong robotic scenarios, yet balancing stability (retaining old skills) and plasticity (learning new ones) remains a formidable challenge for existing methods. We introduce CRL-VLA, a framework for continual post-training of VLA models with rigorous theoretical bounds. We derive a unified performance bound linking the stability-plasticity trade-off to goal-conditioned advantage magnitude, scaled by policy divergence. CRL-VLA resolves this dilemma via asymmetric regulation: constraining advantage magnitudes on prior tasks while enabling controlled growth on new tasks. This is realized through a simple but effective dual-critic architecture with novel Goal-Conditioned Value Formulation (GCVF), where a frozen critic anchors semantic consistency and a trainable estimator drives adaptation. Experiments on the LIBERO benchmark demonstrate that CRL-VLA effectively harmonizes these conflicting objectives, outperforming baselines in both anti-forgetting and forward adaptation.
Model-based Optimal Control for Rigid-Soft Underactuated Systems
Continuum soft robots are inherently underactuated and subject to intrinsic input constraints, making dynamic control particularly challenging, especially in hybrid rigid-soft robots. While most existing methods focus on quasi-static behaviors, dynamic tasks such as swing-up require accurate exploitation of continuum dynamics. This has led to studies on simple low-order template systems that often fail to capture the complexity of real continuum deformations. Model-based optimal control offers a systematic solution; however, its application to rigid-soft robots is often limited by the computational cost and inaccuracy of numerical differentiation for high-dimensional models. Building on recent advances in the Geometric Variable Strain model that enable analytical derivatives, this work investigates three optimal control strategies for underactuated soft systems-Direct Collocation, Differential Dynamic Programming, and Nonlinear Model Predictive Control-to perform dynamic swing-up tasks. To address stiff continuum dynamics and constrained actuation, implicit integration schemes and warm-start strategies are employed to improve numerical robustness and computational efficiency. The methods are evaluated in simulation on three Rigid-Soft and high-order soft benchmark systems-the Soft Cart-Pole, the Soft Pendubot, and the Soft Furuta Pendulum- highlighting their performance and computational trade-offs.
ProAct: A Benchmark and Multimodal Framework for Structure-Aware Proactive Response
While passive agents merely follow instructions, proactive agents align with higher-level objectives, such as assistance and safety by continuously monitoring the environment to determine when and how to act. However, developing proactive agents is hindered by the lack of specialized resources. To address this, we introduce ProAct-75, a benchmark designed to train and evaluate proactive agents across diverse domains, including assistance, maintenance, and safety monitoring. Spanning 75 tasks, our dataset features 91,581 step-level annotations enriched with explicit task graphs. These graphs encode step dependencies and parallel execution possibilities, providing the structural grounding necessary for complex decision-making. Building on this benchmark, we propose ProAct-Helper, a reference baseline powered by a Multimodal Large Language Model (MLLM) that grounds decision-making in state detection, and leveraging task graphs to enable entropy-driven heuristic search for action selection, allowing agents to execute parallel threads independently rather than mirroring the human's next step. Extensive experiments demonstrate that ProAct-Helper outperforms strong closed-source models, improving trigger detection mF1 by 6.21%, saving 0.25 more steps in online one-step decision, and increasing the rate of parallel actions by 15.58%.
Learning-based Initialization of Trajectory Optimization for Path-following Problems of Redundant Manipulators ICRA 2023
Trajectory optimization (TO) is an efficient tool to generate a redundant manipulator's joint trajectory following a 6-dimensional Cartesian path. The optimization performance largely depends on the quality of initial trajectories. However, the selection of a high-quality initial trajectory is non-trivial and requires a considerable time budget due to the extremely large space of the solution trajectories and the lack of prior knowledge about task constraints in configuration space. To alleviate the issue, we present a learning-based initial trajectory generation method that generates high-quality initial trajectories in a short time budget by adopting example-guided reinforcement learning. In addition, we suggest a null-space projected imitation reward to consider null-space constraints by efficiently learning kinematically feasible motion captured in expert demonstrations. Our statistical evaluation in simulation shows the improved optimality, efficiency, and applicability of TO when we plug in our method's output, compared with three other baselines. We also show the performance improvement and feasibility via real-world experiments with a seven-degree-of-freedom manipulator.
comment: Accepted to ICRA 2023. Project Page
Deep-Learning-Based Control of a Decoupled Two-Segment Continuum Robot for Endoscopic Submucosal Dissection
Manual endoscopic submucosal dissection (ESD) is technically demanding, and existing single-segment robotic tools offer limited dexterity. These limitations motivate the development of more advanced solutions. To address this, DESectBot, a novel dual segment continuum robot with a decoupled structure and integrated surgical forceps, enabling 6 degrees of freedom (DoFs) tip dexterity for improved lesion targeting in ESD, was developed in this work. Deep learning controllers based on gated recurrent units (GRUs) for simultaneous tip position and orientation control, effectively handling the nonlinear coupling between continuum segments, were proposed. The GRU controller was benchmarked against Jacobian based inverse kinematics, model predictive control (MPC), a feedforward neural network (FNN), and a long short-term memory (LSTM) network. In nested-rectangle and Lissajous trajectory tracking tasks, the GRU achieved the lowest position/orientation RMSEs: 1.11 mm/ 4.62° and 0.81 mm/ 2.59°, respectively. For orientation control at a fixed position (four target poses), the GRU attained a mean RMSE of 0.14 mm and 0.72°, outperforming all alternatives. In a peg transfer task, the GRU achieved a 100% success rate (120 success/120 attempts) with an average transfer time of 11.8s, the STD significantly outperforms novice-controlled systems. Additionally, an ex vivo ESD demonstration grasping, elevating, and resecting tissue as the scalpel completed the cut confirmed that DESectBot provides sufficient stiffness to divide thick gastric mucosa and an operative workspace adequate for large lesions.These results confirm that GRU-based control significantly enhances precision, reliability, and usability in ESD surgical training scenarios.
Enhancing Navigation Efficiency of Quadruped Robots via Leveraging Personal Transportation Platforms ICRA 2025
Quadruped robots face limitations in long-range navigation efficiency due to their reliance on legs. To ameliorate the limitations, we introduce a Reinforcement Learning-based Active Transporter Riding method (\textit{RL-ATR}), inspired by humans' utilization of personal transporters, including Segways. The \textit{RL-ATR} features a transporter riding policy and two state estimators. The policy devises adequate maneuvering strategies according to transporter-specific control dynamics, while the estimators resolve sensor ambiguities in non-inertial frames by inferring unobservable robot and transporter states. Comprehensive evaluations in simulation validate proficient command tracking abilities across various transporter-robot models and reduced energy consumption compared to legged locomotion. Moreover, we conduct ablation studies to quantify individual component contributions within the \textit{RL-ATR}. This riding ability could broaden the locomotion modalities of quadruped robots, potentially expanding the operational range and efficiency.
comment: Accepted to ICRA 2025. Project Page
PlanTRansformer: Unified Prediction and Planning with Goal-conditioned Transformer
Trajectory prediction and planning are fundamental yet disconnected components in autonomous driving. Prediction models forecast surrounding agent motion under unknown intentions, producing multimodal distributions, while planning assumes known ego objectives and generates deterministic trajectories. This mismatch creates a critical bottleneck: prediction lacks supervision for agent intentions, while planning requires this information. Existing prediction models, despite strong benchmarking performance, often remain disconnected from planning constraints such as collision avoidance and dynamic feasibility. We introduce Plan TRansformer (PTR), a unified Gaussian Mixture Transformer framework integrating goal-conditioned prediction, dynamic feasibility, interaction awareness, and lane-level topology reasoning. A teacher-student training strategy progressively masks surrounding agent commands during training to align with inference conditions where agent intentions are unavailable. PTR achieves 4.3%/3.5% improvement in marginal/joint mAP compared to the baseline Motion Transformer (MTR) and 15.5% planning error reduction at 5s horizon compared to GameFormer. The architecture-agnostic design enables application to diverse Transformer-based prediction models. Project Website: https://github.com/SelzerConst/PlanTRansformer
comment: Submitted and accepted at IEEE IV 2026
Learning-based Adaptive Control of Quadruped Robots for Active Stabilization on Moving Platforms IROS 2024
A quadruped robot faces balancing challenges on a six-degrees-of-freedom moving platform, like subways, buses, airplanes, and yachts, due to independent platform motions and resultant diverse inertia forces on the robot. To alleviate these challenges, we present the Learning-based Active Stabilization on Moving Platforms (\textit{LAS-MP}), featuring a self-balancing policy and system state estimators. The policy adaptively adjusts the robot's posture in response to the platform's motion. The estimators infer robot and platform states based on proprioceptive sensor data. For a systematic training scheme across various platform motions, we introduce platform trajectory generation and scheduling methods. Our evaluation demonstrates superior balancing performance across multiple metrics compared to three baselines. Furthermore, we conduct a detailed analysis of the \textit{LAS-MP}, including ablation studies and evaluation of the estimators, to validate the effectiveness of each component.
comment: Accepted to IROS 2024. Project Page
Manipulation via Force Distribution at Contact
Efficient and robust trajectories play a crucial role in contact-rich manipulation, which demands accurate mod- eling of object-robot interactions. Many existing approaches rely on point contact models due to their computational effi- ciency. Simple contact models are computationally efficient but inherently limited for achieving human-like, contact-rich ma- nipulation, as they fail to capture key frictional dynamics and torque generation observed in human manipulation. This study introduces a Force-Distributed Line Contact (FDLC) model in contact-rich manipulation and compares it against conventional point contact models. A bi-level optimization framework is constructed, in which the lower-level solves an optimization problem for contact force computation, and the upper-level optimization applies iLQR for trajectory optimization. Through this framework, the limitations of point contact are demon- strated, and the benefits of the FDLC in generating efficient and robust trajectories are established. The effectiveness of the proposed approach is validated by a box rotating task, demonstrating that FDLC enables trajectories generated via non-uniform force distributions along the contact line, while requiring lower control effort and less motion of the robot.
RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization
Vision-Language-Action (VLA) models hold promise for generalist robotics but currently struggle with data scarcity, architectural inefficiencies, and the inability to generalize across different hardware platforms. We introduce RDT2, a robotic foundation model built upon a 7B parameter VLM designed to enable zero-shot deployment on novel embodiments for open-vocabulary tasks. To achieve this, we collected one of the largest open-source robotic datasets--over 10,000 hours of demonstrations in diverse families--using an enhanced, embodiment-agnostic Universal Manipulation Interface (UMI). Our approach employs a novel three-stage training recipe that aligns discrete linguistic knowledge with continuous control via Residual Vector Quantization (RVQ), flow-matching, and distillation for real-time inference. Consequently, RDT2 becomes one of the first models that simultaneously zero-shot generalizes to unseen objects, scenes, instructions, and even robotic platforms. Besides, it outperforms state-of-the-art baselines in dexterous, long-horizon, and dynamic downstream tasks like playing table tennis. See https://rdt-robotics.github.io/rdt2/ for more information.
LEVIO: Lightweight Embedded Visual Inertial Odometry for Resource-Constrained Devices
Accurate, infrastructure-less sensor systems for motion tracking are essential for mobile robotics and augmented reality (AR) applications. The most popular state-of-the-art visual-inertial odometry (VIO) systems, however, are too computationally demanding for resource-constrained hardware, such as micro-drones and smart glasses. This work presents LEVIO, a fully featured VIO pipeline optimized for ultra-low-power compute platforms, allowing six-degrees-of-freedom (DoF) real-time sensing. LEVIO incorporates established VIO components such as Oriented FAST and Rotated BRIEF (ORB) feature tracking and bundle adjustment, while emphasizing a computationally efficient architecture with parallelization and low memory usage to suit embedded microcontrollers and low-power systems-on-chip (SoCs). The paper proposes and details the algorithmic design choices and the hardware-software co-optimization approach, and presents real-time performance on resource-constrained hardware. LEVIO is validated on a parallel-processing ultra-low-power RISC-V SoC, achieving 20 FPS while consuming less than 100 mW, and benchmarked against public VIO datasets, offering a compelling balance between efficiency and accuracy. To facilitate reproducibility and adoption, the complete implementation is released as open-source.
comment: This article has been accepted for publication in the IEEE Sensors Journal (JSEN)
Collision Detection with Analytical Derivatives of Contact Kinematics
Differentiable contact kinematics are essential for gradient-based methods in robotics, yet the mapping from robot state to contact distance, location, and normal becomes non-smooth in degenerate configurations of shapes with zero or undefined curvature. We address this inherent limitation by selectively regularizing such geometries into strictly convex implicit representations, restoring uniqueness and smoothness of the contact map. Leveraging this geometric regularization, we develop iDCOL, an implicit differentiable collision detection and contact kinematics framework. iDCOL represents colliding bodies using strictly convex implicit surfaces and computes collision detection and contact kinematics by solving a fixed-size nonlinear system derived from a geometric scaling-based convex optimization formulation. By applying the Implicit Function Theorem to the resulting system residual, we derive analytical derivatives of the contact kinematic quantities. We develop a fast Newton-based solver for iDCOL and provide an open-source C++ implementation of the framework. The robustness of the approach is evaluated through extensive collision simulations and benchmarking, and applicability is demonstrated in gradient-based kinematic path planning and differentiable contact physics, including multi-body rigid collisions and a soft-robot interaction example.
comment: 12 pages, 9 figures, 2 tables
A thin and soft optical tactile sensor for highly sensitive object perception
Tactile sensing is crucial in robotics and wearable devices for safe perception and interaction with the environment. Optical tactile sensors have emerged as promising solutions, as they are immune to electromagnetic interference and have high spatial resolution. However, existing optical approaches, particularly vision-based tactile sensors, rely on complex optical assemblies that involve lenses and cameras, resulting in bulky, rigid, and alignment-sensitive designs. In this study, we present a thin, compact, and soft optical tactile sensor featuring an alignment-free configuration. The soft optical sensor operates by capturing deformation-induced changes in speckle patterns generated within a soft silicone material, thereby enabling precise force measurements and texture recognition via machine learning. The experimental results show a root-mean-square error of 40 mN in the force measurement and a classification accuracy of 93.33% over nine classes of textured surfaces, including Mahjong tiles. The proposed speckle-based approach provides a compact, easily fabricated, and mechanically compliant platform that bridges optical sensing with flexible shape-adaptive architectures, thereby demonstrating its potential as a novel tactile-sensing paradigm for soft robotics and wearable haptic interfaces.
Omnidirectional Solid-State mmWave Radar Perception for UAV Power Line Collision Avoidance ICRA
Detecting and estimating distances to power lines is a challenge for both human UAV pilots and autonomous systems, which increases the risk of unintended collisions. We present a mmWave radar-based perception system that provides spherical sensing coverage around a small UAV for robust power line detection and avoidance. The system integrates multiple compact solid-state mmWave radar modules to synthesize an omnidirectional field of view while remaining lightweight. We characterize the sensing behavior of this omnidirectional radar arrangement in power line environments and develop a robust detection-and-avoidance algorithm tailored to that behavior. Field experiments on real power lines demonstrate reliable detection at ranges up to 10 m, successful avoidance maneuvers at flight speeds upwards of 10 m/s, and detection of wires as thin as 1.2 mm in diameter. These results indicate the approach's suitability as an additional safety layer for both autonomous and manual UAV flight.
comment: Accepted for publication at the 2026 IEEE International Conference on Robotics and Automation (ICRA). Video at https://www.youtube.com/watch?v=rJW3eEC-5Ao (youtube)
Depth Completion in Unseen Field Robotics Environments Using Extremely Sparse Depth Measurements ICRA 2026
Autonomous field robots operating in unstructured environments require robust perception to ensure safe and reliable operations. Recent advances in monocular depth estimation have demonstrated the potential of low-cost cameras as depth sensors; however, their adoption in field robotics remains limited due to the absence of reliable scale cues, ambiguous or low-texture conditions, and the scarcity of large-scale datasets. To address these challenges, we propose a depth completion model that trains on synthetic data and uses extremely sparse measurements from depth sensors to predict dense metric depth in unseen field robotics environments. A synthetic dataset generation pipeline tailored to field robotics enables the creation of multiple realistic datasets for training purposes. This dataset generation approach utilizes textured 3D meshes from Structure from Motion and photorealistic rendering with novel viewpoint synthesis to simulate diverse field robotics scenarios. Our approach achieves an end-to-end latency of 53 ms per frame on a Nvidia Jetson AGX Orin, enabling real-time deployment on embedded platforms. Extensive evaluation demonstrates competitive performance across diverse real-world field robotics scenarios.
comment: Accepted to ICRA 2026
HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control
While current humanoid whole-body control frameworks predominantly rely on the static environment assumptions, addressing tasks characterized by high dynamism and complex interactions presents a formidable challenge. In this paper, we address humanoid skateboarding, a highly challenging task requiring stable dynamic maneuvering on an underactuated wheeled platform. This integrated system is governed by non-holonomic constraints and tightly coupled human-object interactions. Successfully executing this task requires simultaneous mastery of hybrid contact dynamics and robust balance control on a mechanically coupled, dynamically unstable skateboard. To overcome the aforementioned challenges, we propose HUSKY, a learning-based framework that integrates humanoid-skateboard system modeling and physics-aware whole-body control. We first model the coupling relationship between board tilt and truck steering angles, enabling a principled analysis of system dynamics. Building upon this, HUSKY leverages Adversarial Motion Priors (AMP) to learn human-like pushing motions and employs a physics-guided, heading-oriented strategy for lean-to-steer behaviors. Moreover, a trajectory-guided mechanism ensures smooth and stable transitions between pushing and steering. Experimental results on the Unitree G1 humanoid platform demonstrate that our framework enables stable and agile maneuvering on skateboards in real-world scenarios. The project page is available on https://husky-humanoid.github.io/.
Hierarchical Proportion Models for Motion Generation via Integration of Motion Primitives
Imitation learning (IL) enables robots to acquire human-like motion skills from demonstrations, but it still requires extensive high-quality data and retraining to handle complex or long-horizon tasks. To improve data efficiency and adaptability, this study proposes a hierarchical IL framework that integrates motion primitives with proportion-based motion synthesis. The proposed method employs a two-layer architecture, where the upper layer performs long-term planning, while a set of lower-layer models learn individual motion primitives, which are combined according to specific proportions. Three model variants are introduced to explore different trade-offs between learning flexibility, computational cost, and adaptability: a learning-based proportion model, a sampling-based proportion model, and a playback-based proportion model, which differ in how the proportions are determined and whether the upper layer is trainable. Through real-robot pick-and-place experiments, the proposed models successfully generated complex motions not included in the primitive set. The sampling-based and playback-based proportion models achieved more stable and adaptable motion generation than the standard hierarchical model, demonstrating the effectiveness of proportion-based motion integration for practical robot learning.
comment: 6 pages, 9 figures. Accepted for publication in IEEE AMC 2026
Estimation of Ground Reaction Forces from Kinematic Data during Locomotion
Ground reaction forces (GRFs) provide fundamental insight into human gait mechanics and are widely used to assess joint loading, limb symmetry, balance control, and motor function. Despite their clinical relevance, the use of GRF remains underutilised in clinical workflows due to the practical limitations of force plate systems. In this work, we present a force-plate-free approach for estimating GRFs using only marker-based motion capture data. This kinematics only method to estimate and decompose GRF makes it well suited for widespread clinical depolyment. By using kinematics from sixteen body segments, we estimate the centre of mass (CoM) and compute GRFs, which are subsequently decomposed into individual components through a minimization-based approach. Through this framework, we can identify gait stance phases and provide access to clinically meaningful kinetic measures without a dedicated force plate system. Experimental results demonstrate the viability of CoM and GRF estimation based solely on kinematic data, supporting force-plate-free gait analysis.
When Attention Betrays: Erasing Backdoor Attacks in Robotic Policies by Reconstructing Visual Tokens ICRA2026
Downstream fine-tuning of vision-language-action (VLA) models enhances robotics, yet exposes the pipeline to backdoor risks. Attackers can pretrain VLAs on poisoned data to implant backdoors that remain stealthy but can trigger harmful behavior during inference. However, existing defenses either lack mechanistic insight into multimodal backdoors or impose prohibitive computational costs via full-model retraining. To this end, we uncover a deep-layer attention grabbing mechanism: backdoors redirect late-stage attention and form compact embedding clusters near the clean manifold. Leveraging this insight, we introduce Bera, a test-time backdoor erasure framework that detects tokens with anomalous attention via latent-space localization, masks suspicious regions using deep-layer cues, and reconstructs a trigger-free image to break the trigger-unsafe-action mapping while restoring correct behavior. Unlike prior defenses, Bera requires neither retraining of VLAs nor any changes to the training pipeline. Extensive experiments across multiple embodied platforms and tasks show that Bera effectively maintains nominal performance, significantly reduces attack success rates, and consistently restores benign behavior from backdoored outputs, thereby offering a robust and practical defense mechanism for securing robotic systems.
comment: ICRA2026 accepted
Multi-function Robotized Surgical Dissector for Endoscopic Pulmonary Thromboendarterectomy: Preclinical Study and Evaluation
Patients suffering chronic severe pulmonary thromboembolism need Pulmonary Thromboendarterectomy (PTE) to remove the thromb and intima located inside pulmonary artery (PA). During the surgery, a surgeon holds tweezers and a dissector to delicately strip the blockage, but available tools for this surgery are rigid and straight, lacking distal dexterity to access into thin branches of PA. Therefore, this work presents a novel robotized dissector based on concentric push/pull robot (CPPR) structure, enabling entering deep thin branch of tortuous PA. Compared with conventional rigid dissectors, our design characterizes slenderness and dual-segment-bending dexterity. Owing to the hollow and thin-walled structure of the CPPR-based dissector as it has a slender body of 3.5mm in diameter, the central lumen accommodates two channels for irrigation and tip tool, and space for endoscopic camera's signal wire. To provide accurate surgical manipulation, optimization-based kinematics model was established, realizing a 2mm accuracy in positioning the tip tool (60mm length) under open-loop control strategy. As such, with the endoscopic camera, traditional PTE is possible to be upgraded as endoscopic PTE. Basic physic performance of the robotized dissector including stiffness, motion accuracy and maneuverability was evaluated through experiments. Surgery simulation on ex vivo porcine lung also demonstrates its dexterity and notable advantages in PTE.
A Unified Candidate Set with Scene-Adaptive Refinement via Diffusion for End-to-End Autonomous Driving WWW
End-to-end autonomous driving is increasingly adopting a multimodal planning paradigm that generates multiple trajectory candidates and selects the final plan, making candidate-set design critical. A fixed trajectory vocabulary provides stable coverage in routine driving but often misses optimal solutions in complex interactions, while scene-adaptive refinement can cause over-correction in simple scenarios by unnecessarily perturbing already strong vocabulary trajectories.We propose CdDrive, which preserves the original vocabulary candidates and augments them with scene-adaptive candidates generated by vocabulary-conditioned diffusion denoising. Both candidate types are jointly scored by a shared selection module, enabling reliable performance across routine and highly interactive scenarios. We further introduce HATNA (Horizon-Aware Trajectory Noise Adapter) to improve the smoothness and geometric continuity of diffusion candidates via temporal smoothing and horizon-aware noise modulation. Experiments on NAVSIM v1 and NAVSIM v2 demonstrate leading performance, and ablations verify the contribution of each component.
comment: Code:https://github.com/WWW-TJ/CdDrive
Training and Simulation of Quadrupedal Robot in Adaptive Stair Climbing for Indoor Firefighting: An End-to-End Reinforcement Learning Approach
Quadruped robots are used for primary searches during the early stages of indoor fires. A typical primary search involves quickly and thoroughly looking for victims under hazardous conditions and monitoring flammable materials. However, situational awareness in complex indoor environments and rapid stair climbing across different staircases remain the main challenges for robot-assisted primary searches. In this project, we designed a two-stage end-to-end deep reinforcement learning (RL) approach to optimize both navigation and locomotion. In the first stage, the quadrupeds, Unitree Go2, were trained to climb stairs in Isaac Lab's pyramid-stair terrain. In the second stage, the quadrupeds were trained to climb various realistic indoor staircases in the Isaac Lab engine, with the learned policy transferred from the previous stage. These indoor staircases are straight, L-shaped, and spiral, to support climbing tasks in complex environments. This project explores how to balance navigation and locomotion and how end-to-end RL methods can enable quadrupeds to adapt to different stair shapes. Our main contributions are: (1) A two-stage end-to-end RL framework that transfers stair-climbing skills from abstract pyramid terrain to realistic indoor stair topologies. (2) A centerline-based navigation formulation that enables unified learning of navigation and locomotion without hierarchical planning. (3) Demonstration of policy generalization across diverse staircases using only local height-map perception. (4) An empirical analysis of success, efficiency, and failure modes under increasing stair difficulty.
comment: 8 pages, 9 figures, 43rd International Symposium on Automation and Robotics in Construction
Towards Considerate Embodied AI: Co-Designing Situated Multi-Site Healthcare Robots from Abstract Concepts to High-Fidelity Prototypes
Co-design is essential for grounding embodied artificial intelligence (AI) systems in real-world contexts, especially high-stakes domains such as healthcare. While prior work has explored multidisciplinary collaboration, iterative prototyping, and support for non-technical participants, few have interwoven these into a sustained co-design process. Such efforts often target one context and low-fidelity stages, limiting the generalizability of findings and obscuring how participants' ideas evolve. To address these limitations, we conducted a 14-week workshop with a multidisciplinary team of 22 participants, centered around how embodied AI can reduce non-value-added task burdens in three healthcare settings: emergency departments, long-term rehabilitation facilities, and sleep disorder clinics. We found that the iterative progression from abstract brainstorming to high-fidelity prototypes, supported by educational scaffolds, enabled participants to understand real-world trade-offs and generate more deployable solutions. We propose eight guidelines for co-designing more considerate embodied AI: attuned to context, responsive to social dynamics, mindful of expectations, and grounded in deployment. Project Page: https://byc-sophie.github.io/Towards-Considerate-Embodied-AI/
comment: To appear in Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI 2026)
RPL: Learning Robust Humanoid Perceptive Locomotion on Challenging Terrains
Humanoid perceptive locomotion has made significant progress and shows great promise, yet achieving robust multi-directional locomotion on complex terrains remains underexplored. To tackle this challenge, we propose RPL, a two-stage training framework that enables multi-directional locomotion on challenging terrains, and remains robust with payloads. RPL first trains terrain-specific expert policies with privileged height map observations to master decoupled locomotion and manipulation skills across different terrains, and then distills them into a transformer policy that leverages multiple depth cameras to cover a wide range of views. During distillation, we introduce two techniques to robustify multi-directional locomotion, depth feature scaling based on velocity commands and random side masking, which are critical for asymmetric depth observations and unseen widths of terrains. For scalable depth distillation, we develop an efficient multi-depth system that ray-casts against both dynamic robot meshes and static terrain meshes in massively parallel environments, achieving a 5-times speedup over the depth rendering pipelines in existing simulators while modeling realistic sensor latency, noise, and dropout. Extensive real-world experiments demonstrate robust multi-directional locomotion with payloads (2kg) across challenging terrains, including 20° slopes, staircases with different step lengths (22 cm, 25 cm, 30 cm), and 25 cm by 25 cm stepping stones separated by 60 cm gaps.
Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control
Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, degrees of freedom (DoFs), and kinematic topology still hinder a single policy from commanding diverse humanoids. Moreover, obtaining a generalist policy that not only transfers across embodiments but also supports richer behaviors-beyond simple walking to squatting, leaning-remains especially challenging. In this work, we tackle these obstacles by introducing EAGLE, an iterative generalist-specialist distillation framework that produces a single unified policy that controls multiple heterogeneous humanoids without per-robot reward tuning. During each cycle, embodiment-specific specialists are forked from the current generalist, refined on their respective robots, and new skills are distilled back into the generalist by training on the pooled embodiment set. Repeating this loop until performance convergence produces a robust Whole-Body Controller validated on robots such as Unitree H1, G1, and Fourier N1. We conducted experiments on five different robots in simulation and four in real-world settings. Through quantitative evaluations, EAGLE achieves high tracking accuracy and robustness compared to other methods, marking a step toward scalable, fleet-level humanoid control. See more details at https://eagle-wbc.github.io/
Comparative Analysis of Autonomous Robotic and Manual Techniques for Ultrasonic Sacral Osteotomy: A Preliminary Study
In this paper, we introduce an autonomous Ultrasonic Sacral Osteotomy (USO) robotic system that integrates an ultrasonic osteotome with a seven-degree-of-freedom (DoF) robotic manipulator guided by an optical tracking system. To assess multi-directional control along both the surface trajectory and cutting depth of this system, we conducted quantitative comparisons between manual USO (MUSO) and robotic USO (RUSO) in Sawbones phantoms under identical osteotomy conditions. The RUSO system achieved sub-millimeter trajectory accuracy (0.11 mm RMSE), an order of magnitude improvement over MUSO (1.10 mm RMSE). Moreover, MUSO trials showed substantial over-penetration (16.0 mm achieved vs. 8.0 mm target), whereas the RUSO system maintained precise depth control (8.1 mm). These results demonstrate that robotic procedures can effectively overcome the critical limitations of manual osteotomy, establishing a foundation for safer and more precise sacral resections.
comment: 17 pages, 6 figures, Accepted or publication in 2026 International Symposium on Medical Robotics (ISMR)
Control and State Estimation of Vehicle-Mounted Aerial Systems in GPS-Denied, Non-Inertial Environments
We present a robust control and estimation framework for quadrotors operating in Global Navigation Satellite System(GNSS)-denied, non-inertial environments where inertial sensors such as Inertial Measurement Units (IMUs) become unreliable due to platform-induced accelerations. In such settings, conventional estimators fail to distinguish whether the measured accelerations arise from the quadrotor itself or from the non-inertial platform, leading to drift and control degradation. Unlike conventional approaches that depend heavily on IMU and GNSS, our method relies exclusively on external position measurements combined with a Extended Kalman Filter with Unknown Inputs (EKF-UI) to account for platform motion. The estimator is paired with a cascaded PID controller for full 3D tracking. To isolate estimator performance from localization errors, all tests are conducted using high-precision motion capture systems. Experimental results in a moving-cart testbed validate our approach under both translational in X-axis and Y-axis dissonance. Compared to standard EKF, the proposed method significantly improves stability and trajectory tracking without requiring inertial feedback, enabling practical deployment on moving platforms such as trucks or elevators.
comment: 10 pages 8 figures
Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World
The integration of foundation models (FMs) into robotics has accelerated real-world deployment, while introducing new safety challenges arising from open-ended semantic reasoning and embodied physical action. These challenges require safety notions beyond physical constraint satisfaction. In this paper, we characterize FM-enabled robot safety along three dimensions: action safety (physical feasibility and constraint compliance), decision safety (semantic and contextual appropriateness), and human-centered safety (conformance to human intent, norms, and expectations). We argue that existing approaches, including static verification, monolithic controllers, and end-to-end learned policies, are insufficient in settings where tasks, environments, and human expectations are open-ended, long-tailed, and subject to adaptation over time. To address this gap, we propose modular safety guardrails, consisting of monitoring (evaluation) and intervention layers, as an architectural foundation for comprehensive safety across the autonomy stack. Beyond modularity, we highlight possible cross-layer co-design opportunities through representation alignment and conservatism allocation to enable faster, less conservative, and more effective safety enforcement. We call on the community to explore richer guardrail modules and principled co-design strategies to advance safe real-world physical AI deployment.
An Anatomy-specific Guidewire Shaping Robot for Improved Vascular Navigation
Neuroendovascular access often relies on passive microwires that are hand-shaped at the back table and then used to track a microcatheter to the target. Neuroendovascular surgeons determine the shape of the wire by examining the patient pre-operative images and using their experience to identify anatomy specific shapes of the wire that would facilitate reaching the target. This procedure is particularly complex in convoluted anatomical structures and is heavily dependent on the level of expertise of the surgeon. Towards enabling standardized autonomous shaping, we present a bench-top guidewire shaping robot capable of producing navigation-specific desired wire configurations. We present a model that can map the desired wire shape into robot actions, calibrated using experimental data. We show that the robot can produce clinically common tip geometries (C, S, Angled, Hook) and validate them with respect to the model-predicted shapes in 2D. Our model predicts the shape with a Root Mean Square (RMS) error of 0.56mm across all shapes when compared to the experimental results. We also demonstrate 3D tip shaping capabilities and the ability to traverse complex endoluminal navigation from the petrous Internal Carotid Artery (ICA) to the Posterior Communicating Artery (PComm).
comment: 7 pages, 7 figures, ISMR2026
DADP: Domain Adaptive Diffusion Policy
Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.
FDA Flocking: Future Direction-Aware Flocking via Velocity Prediction
Understanding self-organization in natural collectives such as bird flocks inspires swarm robotics, yet most flocking models remain reactive, overlooking anticipatory cues that enhance coordination. Motivated by avian postural and wingbeat signals, as well as multirotor attitude tilts that precede directional changes, this work introduces a principled, bio-inspired anticipatory augmentation of reactive flocking termed Future Direction-Aware (FDA) flocking. In the proposed framework, agents blend reactive alignment with a predictive term based on short-term estimates of neighbors' future velocities, regulated by a tunable blending parameter that interpolates between reactive and anticipatory behaviors. This predictive structure enhances velocity consensus and cohesion-separation balance while mitigating the adverse effects of sensing and communication delays and measurement noise that destabilize reactive baselines. Simulation results demonstrate that FDA achieves faster and higher alignment, enhanced translational displacement of the flock, and improved robustness to delays and noise compared to a purely reactive model. Future work will investigate adaptive blending strategies, weighted prediction schemes, and experimental validation on multirotor drone swarms.
Towards X-embodiment safety: A control theory perspective on transferring safety certificates across dynamical systems
Control barrier functions (CBFs) provide a powerful tool for enforcing safety constraints in control systems, but their direct application to complex, high-dimensional dynamics is often challenging. In many settings, safety certificates are more naturally designed for simplified or alternative system models that do not exactly match the dynamics of interest. This paper addresses the problem of transferring safety guarantees between dynamical systems with mismatched dynamics. We propose a transferred control barrier function (tCBF) framework that enables safety constraints defined on one system to be systematically enforced on another system using a simulation function and an explicit margin term. The resulting transferred barrier accounts for model mismatch and induces a safety condition that can be enforced on the target system via a quadratic-program-based safety filter. The proposed approach is general and does not require the two systems to share the same state dimension or dynamics. We demonstrate the effectiveness of the framework on a quadrotor navigation task with the transferred barrier ensuring collision avoidance for the target system, while remaining minimally invasive to a nominal controller. These results highlight the potential of transferred control barrier functions as a general mechanism for enforcing safety across heterogeneous dynamical systems.
eCP: Informative uncertainty quantification via Equivariantized Conformal Prediction with pre-trained models
We study the effect of group symmetrization of pre-trained models on conformal prediction (CP), a post-hoc, distribution-free, finite-sample method of uncertainty quantification that offers formal coverage guarantees under the assumption of data exchangeability. Unfortunately, CP uncertainty regions can grow significantly in long horizon missions, rendering the statistical guarantees uninformative. To that end, we propose infusing CP with geometric information via group-averaging of the pretrained predictor to distribute the non-conformity mass across the orbits. Each sample now is treated as a representative of an orbit, thus uncertainty can be mitigated by other samples entangled to it via the orbit inducing elements of the symmetry group. Our approach provably yields contracted non-conformity scores in increasing convex order, implying improved exponential-tail bounds and sharper conformal prediction sets in expectation, especially at high confidence levels. We then propose an experimental design to test these theoretical claims in pedestrian trajectory prediction.
Efficient Long-Horizon Vision-Language-Action Models via Static-Dynamic Disentanglement
Vision-Language-Action (VLA) models have recently emerged as a promising paradigm for generalist robotic control. Built upon vision-language model (VLM) architectures, VLAs predict actions conditioned on visual observations and language instructions, achieving strong performance and generalization across tasks. However, VLAs face two major challenges: limited long-horizon context and inefficient inference due to the quadratic attention complexity and large parameter counts. Our work is motivated by the observation that much of the visual information in a trajectory remains static across timesteps (e.g., the background). Leveraging this property, we propose SD-VLA, a framework that disentangles visual inputs into multi-level static and dynamic tokens, which enables (1) retaining a single copy of static tokens across frames to significantly reduce context length, and (2) reusing the key-value (KV) cache of static tokens through a lightweight recache gate that updates only when necessary. This design enables efficient multi-frame integration and efficient inference. In addition, we introduce a new benchmark that more effectively evaluates the long-horizon temporal dependency modeling ability of VLAs. Experimental results show that our approach outperforms baselines on this benchmark by 39.8% absolute improvement in success rate, and achieves a 3.9% gain on the SimplerEnv benchmark. Moreover, SD-VLA delivers a 2.26x inference speedup over the base VLA model on the same benchmark, enabling faster and more practical real-world deployment.
VLS: Steering Pretrained Robot Policies via Vision-Language Models
Why do pretrained diffusion or flow-matching policies fail when the same task is performed near an obstacle, on a shifted support surface, or amid mild clutter? Such failures rarely reflect missing motor skills; instead, they expose a limitation of imitation learning under train-test shifts, where action generation is tightly coupled to training-specific spatial configurations and task specifications. Retraining or fine-tuning to address these failures is costly and conceptually misaligned, as the required behaviors already exist but cannot be selectively adapted at test time. We propose Vision-Language Steering (VLS), a training-free framework for inference-time adaptation of frozen generative robot policies. VLS treats adaptation as an inference-time control problem, steering the sampling process of a pretrained diffusion or flow-matching policy in response to out-of-distribution observation-language inputs without modifying policy parameters. By leveraging vision-language models to synthesize trajectory-differentiable reward functions, VLS guides denoising toward action trajectories that satisfy test-time spatial and task requirements. Across simulation and real-world evaluations, VLS consistently outperforms prior steering methods, achieving a 31% improvement on CALVIN and a 13% gain on LIBERO-PRO. Real-world deployment on a Franka robot further demonstrates robust inference-time adaptation under test-time spatial and semantic shifts. Project page: https://vision-language-steering.github.io/webpage/
comment: 11 Pages, Project page: https://vision-language-steering.github.io/webpage/
How Users Understand Robot Foundation Model Performance through Task Success Rates and Beyond IJCAI 2026
Robot Foundation Models (RFMs) represent a promising approach to developing general-purpose home robots. Given the broad capabilities of RFMs, users will inevitably ask an RFM-based robot to perform tasks that the RFM was not trained or evaluated on. In these cases, it is crucial that users understand the risks associated with attempting novel tasks due to the relatively high cost of failure. Furthermore, an informed user who understands an RFM's capabilities will know what situations and tasks the robot can handle. In this paper, we study how non-roboticists interpret performance information from RFM evaluations. These evaluations typically report task success rate (TSR) as the primary performance metric. While TSR is intuitive to experts, it is necessary to validate whether novices also use this information as intended. Toward this end, we conducted a study in which users saw real evaluation data, including TSR, failure case descriptions, and videos from multiple published RFM research projects. The results highlight that non-experts not only use TSR in a manner consistent with expert expectations but also highly value other information types, such as failure cases that are not often reported in RFM evaluations. Furthermore, we find that users want access to both real data from previous evaluations of the RFM and estimates from the robot about how well it will do on a novel task.
comment: Submitted to IJCAI 2026
Beyond the Vehicle: Cooperative Localization by Fusing Point Clouds for GPS-Challenged Urban Scenarios
Accurate vehicle localization is a critical challenge in urban environments where GPS signals are often unreliable. This paper presents a cooperative multi-sensor and multi-modal localization approach to address this issue by fusing data from vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) systems. Our approach integrates cooperative data with a point cloud registration-based simultaneous localization and mapping (SLAM) algorithm. The system processes point clouds generated from diverse sensor modalities, including vehicle-mounted LiDAR and stereo cameras, as well as sensors deployed at intersections. By leveraging shared data from infrastructure, our method significantly improves localization accuracy and robustness in complex, GPS-noisy urban scenarios.
comment: 8 pages, 2 figures, Driving the Future Symposium 2025
Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading
This paper proposes a novel planning framework to handle a multi-agent pathfinding problem under team-connected communication constraint, where all agents must have a connected communication channel to the rest of the team during their entire movements. Standard multi-agent path finding approaches (e.g., priority-based search) have potential in this domain but fail when neighboring configurations at start and goal differ. Their single-expansion approach -- computing each agent's path from the start to the goal in just a single expansion -- cannot reliably handle planning under communication constraints for agents as their neighbors change during navigating. Similarly, leader-follower approaches (e.g., platooning) are effective at maintaining team communication, but fixing the leader at the outset of planning can cause planning to become stuck in dense-clutter environments, limiting their practical utility. To overcome this limitation, we propose a novel two-level multi-agent pathfinding framework that integrates two techniques: adaptive path expansion to expand agent paths to their goals in multiple stages; and dynamic leading technique that enables the reselection of the leading agent during each agent path expansion whenever progress cannot be made. Simulation experiments show the efficiency of our planners, which can handle up to 25 agents across five environment types under a limited communication range constraint and up to 11-12 agents on three environment types under line-of-sight communication constraint, exceeding 90% success-rate where baselines routinely fail.
Toward Learning POMDPs Beyond Full-Rank Actions and State Observability
We are interested in enabling autonomous agents to learn and reason about systems with hidden states, such as locking mechanisms. We cast this problem as learning the parameters of a discrete Partially Observable Markov Decision Process (POMDP). The agent begins with knowledge of the POMDP's actions and observation spaces, but not its state space, transitions, or observation models. These properties must be constructed from a sequence of actions and observations. Spectral approaches to learning models of partially observable domains, such as Predictive State Representations (PSRs), learn representations of state that are sufficient to predict future outcomes. PSR models, however, do not have explicit transition and observation system models that can be used with different reward functions to solve different planning problems. Under a mild set of rankness assumptions on the products of transition and observation matrices, we show how PSRs learn POMDP matrices up to a similarity transform, and this transform may be estimated via tensor decomposition methods. Our method learns observation matrices and transition matrices up to a partition of states, where the states in a single partition have the same observation distributions corresponding to actions whose transition matrices are full-rank. Our experiments suggest that explicit observation and transition likelihoods can be leveraged to generate new plans for different goals and reward functions after the model has been learned. We also show that learning a POMDP beyond a partition of states is impossible from sequential data by constructing two POMDPs that agree on all observation distributions but differ in their transition dynamics.
comment: Update abstract
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
Robotic manipulation benefits from foundation models that describe goals, but today's agents still lack a principled way to learn from their own mistakes. We ask whether natural language can serve as feedback, an error-reasoning signal that helps embodied agents diagnose what went wrong and correct course. We introduce LAGEA (Language Guided Embodied Agents), a framework that turns episodic, schema-constrained reflections from a vision language model (VLM) into temporally grounded guidance for reinforcement learning. LAGEA summarizes each attempt in concise language, localizes the decisive moments in the trajectory, aligns feedback with visual state in a shared representation, and converts goal progress and feedback agreement into bounded, step-wise shaping rewards whose influence is modulated by an adaptive, failure-aware coefficient. This design yields dense signals early when exploration needs direction and gracefully recedes as competence grows. On the Meta-World MT10 and Robotic Fetch embodied manipulation benchmark, LAGEA improves average success over the state-of-the-art (SOTA) methods by 9.0% on random goals, 5.3% on fixed goals, and 17% on fetch tasks, while converging faster. These results support our hypothesis: language, when structured and grounded in time, is an effective mechanism for teaching robots to self-reflect on mistakes and make better choices.
Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation
Large Language Models (LLMs) have recently shown strong promise for robotic task planning, particularly through automatic planning domain generation. Planning domains are brittle under imperfect logical states and perception noise; prior approaches largely treat generated planning domains as plan utilities, overlooking their potential as scalable sources of reasoning supervision and structured reward signals. At the same time, reasoning LLMs depend on chain-of-thought (CoT) supervision that is expensive to collect for robotic tasks, and reinforcement learning (RL) faces challenges in reward engineering. We propose Self-CriTeach, an LLM self-teaching and self-critiquing framework in which an LLM autonomously generates symbolic planning domains that serve a dual role: (i) enabling large-scale generation of robotic planning problem-plan pairs, and (ii) providing structured reward functions. First, the self-written domains enable large-scale generation of symbolic task plans, which are automatically transformed into extended CoT trajectories for supervised fine-tuning. Second, the self-written domains are reused as structured reward functions, providing dense feedback for reinforcement learning without manual reward engineering. This unified training pipeline yields a planning-enhanced LLM with higher planning success rates, stronger cross-task generalization, reduced inference cost, and improved robustness to imperfect logical states.
comment: 31 pages, 6 figures
Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models
Panoptic maps enable robots to reason about both geometry and semantics. However, open-vocabulary models repeatedly produce closely related labels that split panoptic entities and degrade volumetric consistency. The proposed UPPM advances open-world scene understanding by leveraging foundation models to introduce a panoptic Dynamic Descriptor that reconciles open-vocabulary labels with unified category structure and geometric size priors. The fusion for such dynamic descriptors is performed within a multi-resolution multi-TSDF map using language-guided open-vocabulary panoptic segmentation and semantic retrieval, resulting in a persistent and promptable panoptic map without additional model training. Based on our evaluation experiments, UPPM shows the best overall performance in terms of the map reconstruction accuracy and the panoptic segmentation quality. The ablation study investigates the contribution for each component of UPPM (custom NMS, blurry-frame filtering, and unified semantics) to the overall system performance. Consequently, UPPM preserves open-vocabulary interpretability while delivering strong geometric and panoptic accuracy.
SEMNAV: Enhancing Visual Semantic Navigation in Robotics through Semantic Segmentation
Visual Semantic Navigation (VSN) is a fundamental problem in robotics, where an agent must navigate toward a target object in an unknown environment, mainly using visual information. Most state-of-the-art VSN models are trained in simulation environments, where rendered scenes of the real world are used, at best. These approaches typically rely on raw RGB data from the virtual scenes, which limits their ability to generalize to real-world environments due to domain adaptation issues. To tackle this problem, in this work, we propose SEMNAV, a novel approach that leverages semantic segmentation as the main visual input representation of the environment to enhance the agent's perception and decision-making capabilities. By explicitly incorporating this type of high-level semantic information, our model learns robust navigation policies that improve generalization across unseen environments, both in simulated and real world settings. We also introduce the SEMNAV dataset, a newly curated dataset designed for training semantic segmentation-aware navigation models like SEMNAV. Our approach is evaluated extensively in both simulated environments and with real-world robotic platforms. Experimental results demonstrate that SEMNAV outperforms existing state-of-the-art VSN models, achieving higher success rates in the Habitat 2.0 simulation environment, using the HM3D dataset. Furthermore, our real-world experiments highlight the effectiveness of semantic segmentation in mitigating the sim-to-real gap, making our model a promising solution for practical VSN-based robotic applications. The code and datasets are accessible at https://github.com/gramuah/semnav
Spiking Neural Networks for Continuous Control via End-to-End Model-Based Learning
Despite recent progress in training spiking neural networks (SNNs) for classification, their application to continuous motor control remains limited. Here, we demonstrate that fully spiking architectures can be trained end-to-end to control robotic arms with multiple degrees of freedom in continuous environments. Our predictive-control framework combines Leaky Integrate-and-Fire dynamics with surrogate gradients, jointly optimizing a forward model for dynamics prediction and a policy network for goal-directed action. We evaluate this approach on both a planar 2D reaching task and a simulated 6-DOF Franka Emika Panda robot with torque control. In direct comparison to non-spiking recurrent baselines trained under the same predictive-control pipeline, the proposed SNN achieves comparable task performance while using substantially fewer parameters. An extensive ablation study highlights the role of initialization, learnable time constants, adaptive thresholds, and latent-space compression as key contributors to stable training and effective control. Together, these findings establish spiking neural networks as a viable and scalable substrate for high-dimensional continuous control, while emphasizing the importance of principled architectural and training design.
Driving on Registers
We present DrivoR, a simple and efficient transformer-based architecture for end-to-end autonomous driving. Our approach builds on pretrained Vision Transformers (ViTs) and introduces camera-aware register tokens that compress multi-camera features into a compact scene representation, significantly reducing downstream computation without sacrificing accuracy. These tokens drive two lightweight transformer decoders that generate and then score candidate trajectories. The scoring decoder learns to mimic an oracle and predicts interpretable sub-scores representing aspects such as safety, comfort, and efficiency, enabling behavior-conditioned driving at inference. Despite its minimal design, DrivoR outperforms or matches strong contemporary baselines across NAVSIM-v1, NAVSIM-v2, and the photorealistic closed-loop HUGSIM benchmark. Our results show that a pure-transformer architecture, combined with targeted token compression, is sufficient for accurate, efficient, and adaptive end-to-end driving. Code and checkpoints will be made available via the project page.
OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering
Accurate 3D multi-object tracking (MOT) is crucial for autonomous driving, as it enables robust perception, navigation, and planning in complex environments. While deep learning-based solutions have demonstrated impressive 3D MOT performance, model-based approaches remain appealing for their simplicity, interpretability, and data efficiency. Conventional model-based trackers typically rely on random vector-based Bayesian filters within the tracking-by-detection (TBD) framework but face limitations due to heuristic data association and track management schemes. In contrast, random finite set (RFS)-based Bayesian filtering handles object birth, survival, and death in a theoretically sound manner, facilitating interpretability and parameter tuning. In this paper, we present OptiPMB, a novel RFS-based 3D MOT method that employs an optimized Poisson multi-Bernoulli (PMB) filter while incorporating several key innovative designs within the TBD framework. Specifically, we propose a measurement-driven hybrid adaptive birth model for improved track initialization, employ adaptive detection probability parameters to effectively maintain tracks for occluded objects, and optimize density pruning and track extraction modules to further enhance overall tracking performance. Extensive evaluations on nuScenes and KITTI datasets show that OptiPMB achieves superior tracking accuracy compared with state-of-the-art methods, thereby establishing a new benchmark for model-based 3D MOT and offering valuable insights for future research on RFS-based trackers in autonomous driving.
Material-informed Gaussian Splatting for 3D World Reconstruction in a Digital Twin
3D reconstruction for Digital Twins often relies on LiDAR-based methods, which provide accurate geometry but lack the semantics and textures naturally captured by cameras. Traditional LiDAR-camera fusion approaches require complex calibration and still struggle with certain materials like glass, which are visible in images but poorly represented in point clouds. We propose a camera-only pipeline that reconstructs scenes using 3D Gaussian Splatting from multi-view images, extracts semantic material masks via vision models, converts Gaussian representations to mesh surfaces with projected material labels, and assigns physics-based material properties for accurate sensor simulation in modern graphics engines and simulators. This approach combines photorealistic reconstruction with physics-based material assignment, providing sensor simulation fidelity comparable to LiDAR-camera fusion while eliminating hardware complexity and calibration requirements. We validate our camera-only method using an internal dataset from an instrumented test vehicle, leveraging LiDAR as ground truth for reflectivity validation alongside image similarity metrics.
comment: 8 pages, 5 figures. Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026. Revised version (v3) presents camera-ready publication
MetaSym: A Symplectic Meta-learning Framework for Physical Intelligence
Scalable and generalizable physics-aware deep learning has long been considered a significant challenge with various applications across diverse domains ranging from robotics to molecular dynamics. Central to almost all physical systems are symplectic forms, the geometric backbone that underpins fundamental invariants like energy and momentum. In this work, we introduce a novel deep learning framework, MetaSym. In particular, MetaSym combines a strong symplectic inductive bias obtained from a symplectic encoder, and an autoregressive decoder with meta-attention. This principled design ensures that core physical invariants remain intact, while allowing flexible, data efficient adaptation to system heterogeneities. We benchmark MetaSym with highly varied and realistic datasets, such as a high-dimensional spring-mesh system Otness et al. (2021), an open quantum system with dissipation and measurement backaction, and robotics-inspired quadrotor dynamics. Crucially, we fine-tune and deploy MetaSym on real-world quadrotor data, demonstrating robustness to sensor noise and real-world uncertainty. Across all tasks, MetaSym achieves superior few-shot adaptation and outperforms larger state-of-the-art (SOTA) models.
comment: Published in Transactions on Machine Learning Research (TMLR), 10 + 18 pages, 9 figures, 10 tables
L2M-Reg: Building-level Uncertainty-aware Registration of Outdoor LiDAR Point Clouds and Semantic 3D City Models SP
Accurate registration between LiDAR (Light Detection and Ranging) point clouds and semantic 3D city models is a fundamental topic in urban digital twinning and a prerequisite for downstream tasks, such as digital construction, change detection, and model refinement. However, achieving accurate LiDAR-to-Model registration at the individual building level remains challenging, particularly due to the generalization uncertainty in semantic 3D city models at the Level of Detail 2 (LoD2). This paper addresses this gap by proposing L2M-Reg, a plane-based fine registration method that explicitly accounts for model uncertainty. L2M-Reg consists of three key steps: establishing reliable plane correspondence, building a pseudo-plane-constrained Gauss-Helmert model, and adaptively estimating vertical translation. Overall, extensive experiments on five real-world datasets demonstrate that L2M-Reg is both more accurate and computationally efficient than current leading ICP-based and plane-based methods. Therefore, L2M-Reg provides a novel building-level solution regarding LiDAR-to-Model registration when model uncertainty is present. The datasets and code for L2M-Reg can be found: https://github.com/Ziyang-Geodesy/L2M-Reg.
comment: Accepted version by ISPRS Journal of Photogrammetry and Remote Sensing
Correspondence-Free, Function-Based Sim-to-Real Learning for Deformable Surface Control
This paper presents a correspondence-free, function-based sim-to-real learning method for controlling deformable freeform surfaces. Unlike traditional sim-to-real transfer methods that strongly rely on marker points with full correspondences, our approach simultaneously learns a deformation function space and a confidence map -- both parameterized by a neural network -- to map simulated shapes to their real-world counterparts. As a result, the sim-to-real learning can be conducted by input from either a 3D scanner as point clouds (without correspondences) or a motion capture system as marker points (tolerating missed markers). The resultant sim-to-real transfer can be seamlessly integrated into a neural network-based computational pipeline for inverse kinematics and shape control. We demonstrate the versatility and adaptability of our method on both vision devices and across four pneumatically actuated soft robots: a deformable membrane, a robotic mannequin, and two soft manipulators.
comment: arXiv admin note: text overlap with arXiv:2405.08935
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
For safety-critical applications, model-free reinforcement learning (RL) faces numerous challenges, particularly the difficulty of establishing verifiable stability guarantees while maintaining high exploration efficiency. To address these challenges, we present Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that seamlessly integrates exponential stability with maximum entropy reinforcement learning (MERL). In contrast to existing methods that rely on complex reward engineering and single-step constraints, MSACL utilizes intuitive rewards and multi-step data for actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Leveraging these certificates, we then develop a stability-aware advantage function to guide policy optimization, thereby ensuring rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilization and two high-dimensional tracking tasks. Experimental results demonstrate its consistent superiority over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits significant robustness against environmental uncertainties and remarkable generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.
comment: This work has been submitted to the IEEE for possible publication
Task-Centric Policy Optimization from Misaligned Motion Priors
Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
comment: Work requires further details and not complete yet
Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RL
Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomous systems encounter environmental changes in system dynamics, sensor drift, or task objectives, fixed policies rapidly degrade in performance. We show that employing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible algorithm for online adaptation, can effectively fine-tune a pretrained policy to improve autonomous agents' performance on driving tasks. We further show that RTRRL synergizes with a recent biologically inspired recurrent network model, the Liquid-Resistance Liquid-Capacitance RNN. We demonstrate the effectiveness of this closed-loop approach in a simulated CarRacing environment and in a real-world line-following task with a RoboRacer car equipped with an event camera.
MapDream: Task-Driven Map Learning for Vision-Language Navigation
Vision-Language Navigation (VLN) requires agents to follow natural language instructions in partially observed 3D environments, motivating map representations that aggregate spatial context beyond local perception. However, most existing approaches rely on hand-crafted maps constructed independently of the navigation policy. We argue that maps should instead be learned representations shaped directly by navigation objectives rather than exhaustive reconstructions. Based on this insight, we propose MapDream, a map-in-the-loop framework that formulates map construction as autoregressive bird's-eye-view (BEV) image synthesis. The framework jointly learns map generation and action prediction, distilling environmental context into a compact three-channel BEV map that preserves only navigation-critical affordances. Supervised pre-training bootstraps a reliable mapping-to-control interface, while the autoregressive design enables end-to-end joint optimization through reinforcement fine-tuning. Experiments on R2R-CE and RxR-CE achieve state-of-the-art monocular performance, validating task-driven generative map learning.
Scene-Adaptive Motion Planning with Explicit Mixture of Experts and Interaction-Oriented Optimization
Despite over a decade of development, autonomous driving trajectory planning in complex urban environments continues to encounter significant challenges. These challenges include the difficulty in accommodating the multi-modal nature of trajectories, the limitations of single expert model in managing diverse scenarios, and insufficient consideration of environmental interactions. To address these issues, this paper introduces the EMoE-Planner, which incorporates three innovative approaches. Firstly, the Explicit MoE (Mixture of Experts) dynamically selects specialized experts based on scenario-specific information through a shared scene router. Secondly, the planner utilizes scene-specific queries to provide multi-modal priors, directing the model's focus towards relevant target areas. Lastly, it enhances the prediction model and loss calculation by considering the interactions between the ego vehicle and other agents, thereby significantly boosting planning performance. Comparative experiments were conducted on the Nuplan dataset against the state-of-the-art methods. The simulation results demonstrate that our model consistently outperforms SOTA models across nearly all test scenarios. Our model is the first pure learning model to achieve performance surpassing rule-based algorithms in almost all Nuplan closed-loop simulations.
comment: Main text 10 pages with 7 figures
Spiking Neural-Invariant Kalman Fusion for Accurate Localization Using Low-Cost IMUs
Low-cost inertial measurement units (IMUs) are widely utilized in mobile robot localization due to their affordability and ease of integration. However, their complex, nonlinear, and time-varying noise characteristics often lead to significant degradation in localization accuracy when applied directly for dead reckoning. To overcome this limitation, we propose a novel brain-inspired state estimation framework that combines a spiking neural network (SNN) with an invariant extended Kalman filter (InEKF). The SNN is designed to extract motion-related features from long sequences of IMU data affected by substantial random noise and is trained via a surrogate gradient descent algorithm to enable dynamic adaptation of the covariance noise parameter within the InEKF. By fusing the SNN output with raw IMU measurements, the proposed method enhances the robustness and accuracy of pose estimation. Extensive experiments conducted on the KITTI dataset and real-world data collected using a mobile robot equipped with a low-cost IMU demonstrate that the proposed approach outperforms state-of-the-art methods in localization accuracy and exhibits strong robustness to sensor noise, highlighting its potential for real-world mobile robot applications.
LP-MPPI: Low-Pass Filtering for Efficient Model Predictive Path Integral Control ICRA 2026
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, valued for its flexibility in handling arbitrary dynamics and cost functions. However, it often suffers from high-frequency noise in the sampled control trajectories, which hinders the search for optimal controls and transfers to the applied controls, leading to actuator wear. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and enhance the algorithm's efficiency. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled control trajectory perturbations, leading to more efficient sampling and smoother control. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
comment: Accepted at International Conference on Robotics and Automation 2026 (ICRA 2026)
CoFreeVLA: Collision-Free Dual-Arm Manipulation via Vision-Language-Action Model and Risk Estimation
Vision Language Action (VLA) models enable instruction following manipulation, yet dualarm deployment remains unsafe due to under modeled selfcollisions between arms and grasped objects. We introduce CoFreeVLA, which augments an endtoend VLA with a short horizon selfcollision risk estimator that predicts collision likelihood from proprioception, visual embeddings, and planned actions. The estimator gates risky commands, recovers to safe states via risk-guided adjustments, and shapes policy refinement for safer rollouts. It is pre-trained with model-based collision labels and posttrained on real robot rollouts for calibration. On five bimanual tasks with the PiPER robot arm, CoFreeVLA reduces selfcollisions and improves success rates versus RDT and APEX.
TreeLoc: 6-DoF LiDAR Global Localization in Forests via Inter-Tree Geometric Matching ICRA 2026
Reliable localization is crucial for navigation in forests, where GPS is often degraded and LiDAR measurements are repetitive, occluded, and structurally complex. These conditions weaken the assumptions of traditional urban-centric localization methods, which assume that consistent features arise from unique structural patterns, necessitating forest-centric solutions to achieve robustness in these environments. To address these challenges, we propose TreeLoc, a LiDAR-based global localization framework for forests that handles place recognition and 6-DoF pose estimation. We represent scenes using tree stems and their Diameter at Breast Height (DBH), which are aligned to a common reference frame via their axes and summarized using the tree distribution histogram (TDH) for coarse matching, followed by fine matching with a 2D triangle descriptor. Finally, pose estimation is achieved through a two-step geometric verification. On diverse forest benchmarks, TreeLoc outperforms baselines, achieving precise localization. Ablation studies validate the contribution of each component. We also propose applications for long-term forest management using descriptors from a compact global tree database. TreeLoc is open-sourced for the robotics community at https://github.com/minwoo0611/TreeLoc.
comment: An 8-page paper with 7 tables and 8 figures, accepted to ICRA 2026
A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation
Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
While Vision-Language-Action (VLA) models have achieved remarkable success in ground-based embodied intelligence, their application to Aerial Manipulation Systems (AMS) remains a largely unexplored frontier. The inherent characteristics of AMS, including floating-base dynamics, strong coupling between the UAV and the manipulator, and the multi-step, long-horizon nature of operational tasks, pose severe challenges to existing VLA paradigms designed for static or 2D mobile bases. To bridge this gap, we propose \textbf{AIR-VLA}, the first VLA benchmark specifically tailored for aerial manipulation. We construct a physics-based simulation environment and release a high-quality multimodal dataset comprising 3000 manually teleoperated demonstrations, covering base manipulation, object \& spatial understanding, semantic reasoning, and long-horizon planning. Leveraging this platform, we systematically evaluate mainstream VLA models and state-of-the-art VLM models. Our experiments not only validate the feasibility of transferring VLA paradigms to aerial systems but also, through multi-dimensional metrics tailored to aerial tasks, reveal the capabilities and boundaries of current models regarding UAV mobility, manipulator control, and high-level planning. \textbf{AIR-VLA} establishes a standardized testbed and data foundation for future research in general-purpose aerial robotics. The resource of AIR-VLA will be available at https://github.com/SpencerSon2001/AIR-VLA.
What does really matter in image goal navigation?
Image goal navigation requires two different skills: firstly, core navigation skills, including the detection of free space and obstacles, and taking decisions based on an internal representation; and secondly, computing directional information by comparing visual observations to the goal image. Current state-of-the-art methods either rely on dedicated image-matching, or pre-training of computer vision modules on relative pose estimation. In this paper, we study whether this task can be efficiently solved with end-to-end training of full agents with RL, as has been claimed by recent work. A positive answer would have impact beyond Embodied AI and allow training of relative pose estimation from reward for navigation alone. In this large experimental study we investigate the effect of architectural choices like late fusion, channel stacking, space-to-depth projections and cross-attention, and their role in the emergence of relative pose estimators from navigation training. We show that the success of recent methods is influenced up to a certain extent by simulator settings, leading to shortcuts in simulation. However, we also show that these capabilities can be transferred to more realistic setting, up to some extent. We also find evidence for correlations between navigation performance and probed (emerging) relative pose estimation performance, an important sub skill.
USS-Nav: Unified Spatio-Semantic Scene Graph for Lightweight UAV Zero-Shot Object Navigation
Zero-Shot Object Navigation in unknown environments poses significant challenges for Unmanned Aerial Vehicles (UAVs) due to the conflict between high-level semantic reasoning requirements and limited onboard computational resources. To address this, we present USS-Nav, a lightweight framework that incrementally constructs a Unified Spatio-Semantic scene graph and enables efficient Large Language Model (LLM)-augmented Zero-Shot Object Navigation in unknown environments. Specifically, we introduce an incremental Spatial Connectivity Graph generation method utilizing polyhedral expansion to capture global geometric topology, which is dynamically partitioned into semantic regions via graph clustering. Concurrently, open-vocabulary object semantics are instantiated and anchored to this topology to form a hierarchical environmental representation. Leveraging this hierarchical structure, we present a coarse-to-fine exploration strategy: LLM grounded in the scene graph's semantics to determine global target regions, while a local planner optimizes frontier coverage based on information gain. Experimental results demonstrate that our framework outperforms state-of-the-art methods in terms of computational efficiency and real-time update frequency (15 Hz) on a resource-constrained platform. Furthermore, ablation studies confirm the effectiveness of our framework, showing substantial improvements in Success weighted by Path Length (SPL). The source code will be made publicly available to foster further research.
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
Imitation learning has emerged as an effective approach for bootstrapping sequential decision-making in robotics, achieving strong performance even in high-dimensional dexterous manipulation tasks. Recent behavior cloning methods further leverage expressive generative models, such as diffusion models and flow matching, to represent multimodal action distributions. However, policies pretrained in this manner often exhibit limited generalization and require additional fine-tuning to achieve robust performance at deployment time. Such adaptation must preserve the global exploration benefits of pretraining while enabling rapid correction of local execution errors. We propose Residual Flow Steering(RFS), a data-efficient reinforcement learning framework for adapting pretrained generative policies. RFS steers a pretrained flow-matching policy by jointly optimizing a residual action and a latent noise distribution, enabling complementary forms of exploration: local refinement through residual corrections and global exploration through latent-space modulation. This design allows efficient adaptation while retaining the expressive structure of the pretrained policy. We demonstrate the effectiveness of RFS on dexterous manipulation tasks, showing efficient fine-tuning in both simulation and real-world settings when adapting pretrained base policies. Project website:https://weirdlabuw.github.io/rfs.
SyNeT: Synthetic Negatives for Traversability Learning
Reliable traversability estimation is crucial for autonomous robots to navigate complex outdoor environments safely. Existing self-supervised learning frameworks primarily rely on positive and unlabeled data; however, the lack of explicit negative data remains a critical limitation, hindering the model's ability to accurately identify diverse non-traversable regions. To address this issue, we introduce a method to explicitly construct synthetic negatives, representing plausible but non-traversable, and integrate them into vision-based traversability learning. Our approach is formulated as a training strategy that can be seamlessly integrated into both Positive-Unlabeled (PU) and Positive-Negative (PN) frameworks without modifying inference architectures. Complementing standard pixel-wise metrics, we introduce an object-centric FPR evaluation approach that analyzes predictions in regions where synthetic negatives are inserted. This evaluation provides an indirect measure of the model's ability to consistently identify non-traversable regions without additional manual labeling. Extensive experiments on both public and self-collected datasets demonstrate that our approach significantly enhances robustness and generalization across diverse environments. The source code and demonstration videos will be publicly available.
VLBiMan: Vision-Language Anchored One-Shot Demonstration Enables Generalizable Bimanual Robotic Manipulation ICLR 2026
Achieving generalizable bimanual manipulation requires systems that can learn efficiently from minimal human input while adapting to real-world uncertainties and diverse embodiments. Existing approaches face a dilemma: imitation policy learning demands extensive demonstrations to cover task variations, while modular methods often lack flexibility in dynamic scenes. We introduce VLBiMan, a framework that derives reusable skills from a single human example through task-aware decomposition, preserving invariant primitives as anchors while dynamically adapting adjustable components via vision-language grounding. This adaptation mechanism resolves scene ambiguities caused by background changes, object repositioning, or visual clutter without policy retraining, leveraging semantic parsing and geometric feasibility constraints. Moreover, the system inherits human-like hybrid control capabilities, enabling mixed synchronous and asynchronous use of both arms. Extensive experiments validate VLBiMan across tool-use and multi-object tasks, demonstrating: (1) a drastic reduction in demonstration requirements compared to imitation baselines, (2) compositional generalization through atomic skill splicing for long-horizon tasks, (3) robustness to novel but semantically similar objects and external disturbances, and (4) strong cross-embodiment transfer, showing that skills learned from human demonstrations can be instantiated on different robotic platforms without retraining. By bridging human priors with vision-language anchored adaptation, our work takes a step toward practical and versatile dual-arm manipulation in unstructured settings.
comment: accepted by ICLR 2026. The project link is https://hnuzhy.github.io/projects/VLBiMan/
DDP-WM: Disentangled Dynamics Prediction for Efficient World Models
World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployment. To address this efficiency-performance bottleneck, we introduce DDP-WM, a novel world model centered on the principle of Disentangled Dynamics Prediction (DDP). We hypothesize that latent state evolution in observed scenes is heterogeneous and can be decomposed into sparse primary dynamics driven by physical interactions and secondary context-driven background updates. DDP-WM realizes this decomposition through an architecture that integrates efficient historical processing with dynamic localization to isolate primary dynamics. By employing a crossattention mechanism for background updates, the framework optimizes resource allocation and provides a smooth optimization landscape for planners. Extensive experiments demonstrate that DDP-WM achieves significant efficiency and performance across diverse tasks, including navigation, precise tabletop manipulation, and complex deformable or multi-body interactions. Specifically, on the challenging Push-T task, DDP-WM achieves an approximately 9 times inference speedup and improves the MPC success rate from 90% to98% compared to state-of-the-art dense models. The results establish a promising path for developing efficient, high-fidelity world models. Codes will be available at https://github.com/HCPLab-SYSU/DDP-WM.
comment: Codes will be available at https://github.com/HCPLab-SYSU/DDP-WM
Fast Task Planning with Neuro-Symbolic Relaxation
Real-world task planning requires long-horizon reasoning over large sets of objects with complex relationships and attributes, leading to a combinatorial explosion for classical symbolic planners. To prune the search space, recent methods prioritize searching on a simplified task only containing a few ``important" objects predicted by a neural network. However, such a simple neuro-symbolic (NeSy) integration risks omitting critical objects and wasting resources on unsolvable simplified tasks. To enable Fast and reliable planning, we introduce a NeSy relaxation strategy (Flax), combining neural importance prediction with symbolic expansion. Specifically, we first learn a graph neural network to predict object importance to create a simplified task and solve it with a symbolic planner. Then, we solve a rule-relaxed task to obtain a quick rough plan, and reintegrate all referenced objects into the simplified task to recover any overlooked but essential elements. Finally, we apply complementary rules to refine the updated task, keeping it both reliable and compact. Extensive experiments are conducted on both synthetic and real-world maze navigation benchmarks where a robot must traverse through a maze and interact with movable obstacles. The results show that Flax boosts the average success rate by 20.82\% and cuts mean wall-clock planning time by 17.65\% compared with the state-of-the-art NeSy baseline. We expect that Flax offers a practical path toward fast, scalable, long-horizon task planning in complex environments.
comment: 8 pages, 6 figures
Learning-based Force Sensing and Impedance Matching for Safe Haptic Feedback in Robot-assisted Laparoscopic Surgery
Integrating accurate haptic feedback into robot-assisted minimally invasive surgery (RAMIS) remains challenging due to difficulties in precise force rendering and ensuring system safety during teleoperation. We present a Nonlinear Impedance Matching Approach (NIMA) that extends our previously validated Impedance Matching Approach (IMA) by incorporating nonlinear dynamics to accurately model and render complex tool-tissue interactions in real-time. NIMA achieves a mean absolute error of 0.01 (std 0.02 N), representing a 95% reduction compared to IMA. Additionally, NIMA eliminates haptic "kickback" by ensuring zero force is applied to the user's hand when they release the handle, enhancing both patient safety and operator comfort. By accounting for nonlinearities in tool-tissue interactions, NIMA significantly improves force fidelity, responsiveness, and precision across various surgical conditions, advancing haptic feedback systems for reliable robot-assisted surgical procedures.
Statistical Guarantees for Offline Domain Randomization ICLR 2026
Reinforcement-learning (RL) agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by sampling dynamics parameters, but standard DR ignores offline data already available from the real system. We study offline domain randomization (ODR), which first fits a distribution over simulator parameters to an offline dataset. While a growing body of empirical work reports substantial gains with algorithms such as DROPO, the theoretical foundations of ODR remain largely unexplored. In this work, we cast ODR as a maximum-likelihood estimation over a parametric simulator family and provide statistical guarantees: under mild regularity and identifiability conditions, the estimator is weakly consistent (it converges in probability to the true dynamics as data grows), and it becomes strongly consistent (i.e., it converges almost surely to the true dynamics) when an additional uniform Lipschitz continuity assumption holds. We examine the practicality of these assumptions and outline relaxations that justify ODR's applicability across a broader range of settings. Taken together, our results place ODR on a principled footing and clarify when offline data can soundly guide the choice of a randomization distribution for downstream offline RL.
comment: ICLR 2026
Multiagent Systems
Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems
While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories. In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS. Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5\% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones.
comment: 16 pages
Efficient Investment in Multi-Agent Models of Public Transportation
We study two stylized, multi-agent models aimed at investing a limited, indivisible resource in public transportation. In the first model, we face the decision of which potential stops to open along a (e.g., bus) path, given agents' travel demands. While it is known that utilitarian optimal solutions can be identified in polynomial time, we find that computing approximately optimal solutions with respect to egalitarian welfare is NP-complete. This is surprising as we operate on the simple topology of a line graph. In the second model, agents navigate a more complex network modeled by a weighted graph where edge weights represent distances. We face the decision of improving travel time along a fixed number of edges. We provide a polynomial-time algorithm that combines Dijkstra's algorithm with a dynamical program to find the optimal decision for one or two agents. By contrast, if the number of agents is variable, we find \np-completeness and inapproximability results for utilitarian and egalitarian welfare. Moreover, we demonstrate implications of our results for a related model of railway network design.
When Should Agents Coordinate in Differentiable Sequential Decision Problems?
Multi-robot teams must coordinate to operate effectively. When a team operates in an uncoordinated manner, and agents choose actions that are only individually optimal, the team's outcome can suffer. However, in many domains, coordination requires costly communication. We explore the value of coordination in a broad class of differentiable motion-planning problems. In particular, we model coordinated behavior as a spectrum: at one extreme, agents jointly optimize a common team objective, and at the other, agents make unilaterally optimal decisions given their individual decision variables, i.e., they operate at Nash equilibria. We then demonstrate that reasoning about coordination in differentiable motion-planning problems reduces to reasoning about the second-order properties of agents' objectives, and we provide algorithms that use this second-order reasoning to determine at which times a team of agents should coordinate.
comment: 15 content pages, 2 pages for references, 4 figures
Game-Theoretic and Algorithmic Analyses of Multi-Agent Routing under Crossing Costs AAMAS 2026
Coordinating the movement of multiple autonomous agents over a shared network is a fundamental challenge in algorithmic robotics, intelligent transportation, and distributed systems. The dominant approach, Multi-Agent Path Finding, relies on centralized control and synchronous collision avoidance, which often requires strict synchronization and guarantees of globally conflict-free execution. This paper introduces the Multi-Agent Routing under Crossing Cost model on mixed graphs, a novel framework tailored to asynchronous settings. In our model, instead of treating conflicts as hard constraints, each agent is assigned a path, and the system is evaluated through a cost function that measures potential head-on encounters. This ``crossing cost'', which is defined as the product of the numbers of agents traversing an edge in opposite directions, quantifies the risk of congestion and delay in decentralized execution. Our contributions are both game-theoretic and algorithmic. We model the setting as a congestion game with a non-standard cost function, prove the existence of pure Nash equilibria, and analyze the dynamics leading to them. Equilibria can be found in polynomial time under mild conditions, while the general case is PLS-complete. From an optimization perspective, minimizing the total crossing cost is NP-hard, as the problem generalizes Steiner Orientation. To address this hardness barrier, we design a suite of parameterized algorithms for minimizing crossing cost, with parameters including the number of arcs, edges, agents, and structural graph measures. These yield XP or FPT results depending on the parameter, offering algorithmic strategies for structurally restricted instances. Our framework provides a new theoretical foundation for decentralized multi-agent routing, bridging equilibrium analysis and parameterized complexity to support scalable and risk-aware coordination.
comment: Accepted as extended abstract at AAMAS 2026
Rejecting Arguments Based on Doubt in Structured Bipolar Argumentation AAMAS 2026
This paper develops a new approach to computational argumentation that is informed by philosophical and linguistic views. Namely, it takes into account two ideas that have received little attention in the literature on computational argumentation: First, an agent may rationally reject an argument based on mere doubt, thus not all arguments they could defend must be accepted; and, second, that it is sometimes more natural to think in terms of which individual sentences or claims an agent accepts in a debate, rather than which arguments. In order to incorporate these two ideas into a computational approach, we first define the notion of structured bipolar argumentation frameworks (SBAFs), where arguments consist of sentences and we have both an attack and a support relation between them. Then, we provide semantics for SBAFs with two features: (1) Unlike with completeness-based semantics, our semantics do not force agents to accept all defended arguments. (2) In addition to argument extensions, which give acceptable sets of arguments, we also provide semantics for language extensions that specify acceptable sets of sentences. These semantics represent reasonable positions an agent might have in a debate. Our semantics lie between the admissible and complete semantics of abstract argumentation. Further, our approach can be used to provide a new perspective on existing approaches. For instance, we can specify the conditions under which an agent can ignore support between arguments (i.e. under which the use of abstract argumentation is warranted) and we show that deductive support semantics is a special case of our approach.
comment: Accepted to AAMAS 2026
Internet of Agentic AI: Incentive-Compatible Distributed Teaming and Workflow
Large language models (LLMs) have enabled a new class of agentic AI systems that reason, plan, and act by invoking external tools. However, most existing agentic architectures remain centralized and monolithic, limiting scalability, specialization, and interoperability. This paper proposes a framework for scalable agentic intelligence, termed the Internet of Agentic AI, in which autonomous, heterogeneous agents distributed across cloud and edge infrastructure dynamically form coalitions to execute task-driven workflows. We formalize a network-native model of agentic collaboration and introduce an incentive-compatible workflow-coalition feasibility framework that integrates capability coverage, network locality, and economic implementability. To enable scalable coordination, we formulate a minimum-effort coalition selection problem and propose a decentralized coalition formation algorithm. The proposed framework can operate as a coordination layer above the Model Context Protocol (MCP). A healthcare case study demonstrates how domain specialization, cloud-edge heterogeneity, and dynamic coalition formation enable scalable, resilient, and economically viable agentic workflows. This work lays the foundation for principled coordination and scalability in the emerging era of Internet of Agentic AI.
MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems
Multi-Agent Systems (MAS) built on Large Language Models (LLMs) often exhibit high variance in their reasoning trajectories. Process verification, which evaluates intermediate steps in trajectories, has shown promise in general reasoning settings, and has been suggested as a potential tool for guiding coordination of MAS; however, its actual effectiveness in MAS remains unclear. To fill this gap, we present MAS-ProVe, a systematic empirical study of process verification for multi-agent systems (MAS). Our study spans three verification paradigms (LLM-as-a-Judge, reward models, and process reward models), evaluated across two levels of verification granularity (agent-level and iteration-level). We further examine five representative verifiers and four context management strategies, and conduct experiments over six diverse MAS frameworks on multiple reasoning benchmarks. We find that process-level verification does not consistently improve performance and frequently exhibits high variance, highlighting the difficulty of reliably evaluating partial multi-agent trajectories. Among the methods studied, LLM-as-a-Judge generally outperforms reward-based approaches, with trained judges surpassing general-purpose LLMs. We further observe a small performance gap between LLMs acting as judges and as single agents, and identify a context-length-performance trade-off in verification. Overall, our results suggest that effective and robust process verification for MAS remains an open challenge, requiring further advances beyond current paradigms. Code is available at https://github.com/Wang-ML-Lab/MAS-ProVe.
comment: Preprint; work in progress
LatentMem: Customizing Latent Memory for Multi-Agent Systems
Large language model (LLM)-powered multi-agent systems (MAS) demonstrate remarkable collective intelligence, wherein multi-agent memory serves as a pivotal mechanism for continual adaptation. However, existing multi-agent memory designs remain constrained by two fundamental bottlenecks: (i) memory homogenization arising from the absence of role-aware customization, and (ii) information overload induced by excessively fine-grained memory entries. To address these limitations, we propose LatentMem, a learnable multi-agent memory framework designed to customize agent-specific memories in a token-efficient manner. Specifically, LatentMem comprises an experience bank that stores raw interaction trajectories in a lightweight form, and a memory composer that synthesizes compact latent memories conditioned on retrieved experience and agent-specific contexts. Further, we introduce Latent Memory Policy Optimization (LMPO), which propagates task-level optimization signals through latent memories to the composer, encouraging it to produce compact and high-utility representations. Extensive experiments across diverse benchmarks and mainstream MAS frameworks show that LatentMem achieves a performance gain of up to $19.36$% over vanilla settings and consistently outperforms existing memory architectures, without requiring any modifications to the underlying frameworks.
Visual Reasoning over Time Series via Multi-Agent System
Time series analysis underpins many real-world applications, yet existing time-series-specific methods and pretrained large-model-based approaches remain limited in integrating intuitive visual reasoning and generalizing across tasks with adaptive tool usage. To address these limitations, we propose MAS4TS, a tool-driven multi-agent system for general time series tasks, built upon an Analyzer-Reasoner-Executor paradigm that integrates agent communication, visual reasoning, and latent reconstruction within a unified framework. MAS4TS first performs visual reasoning over time series plots with structured priors using a Vision-Language Model to extract temporal structures, and subsequently reconstructs predictive trajectories in latent space. Three specialized agents coordinate via shared memory and gated communication, while a router selects task-specific tool chains for execution. Extensive experiments on multiple benchmarks demonstrate that MAS4TS achieves state-of-the-art performance across a wide range of time series tasks, while exhibiting strong generalization and efficient inference.
FDA Flocking: Future Direction-Aware Flocking via Velocity Prediction
Understanding self-organization in natural collectives such as bird flocks inspires swarm robotics, yet most flocking models remain reactive, overlooking anticipatory cues that enhance coordination. Motivated by avian postural and wingbeat signals, as well as multirotor attitude tilts that precede directional changes, this work introduces a principled, bio-inspired anticipatory augmentation of reactive flocking termed Future Direction-Aware (FDA) flocking. In the proposed framework, agents blend reactive alignment with a predictive term based on short-term estimates of neighbors' future velocities, regulated by a tunable blending parameter that interpolates between reactive and anticipatory behaviors. This predictive structure enhances velocity consensus and cohesion-separation balance while mitigating the adverse effects of sensing and communication delays and measurement noise that destabilize reactive baselines. Simulation results demonstrate that FDA achieves faster and higher alignment, enhanced translational displacement of the flock, and improved robustness to delays and noise compared to a purely reactive model. Future work will investigate adaptive blending strategies, weighted prediction schemes, and experimental validation on multirotor drone swarms.
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent
While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk.
Enhancing Mathematical Problem Solving in LLMs through Execution-Driven Reasoning Augmentation ACL
Mathematical problem solving is a fundamental benchmark for assessing the reasoning capabilities of artificial intelligence and a gateway to applications in education, science, and engineering where reliable symbolic reasoning is essential. Although recent advances in multi-agent LLM-based systems have enhanced their mathematical reasoning capabilities, they still lack a reliably revisable representation of the reasoning process. Existing agents either operate in rigid sequential pipelines that cannot correct earlier steps or rely on heuristic self-evaluation that can fail to identify and fix errors. In addition, programmatic context can distract language models and degrade accuracy. To address these gaps, we introduce Iteratively Improved Program Construction (IIPC), a reasoning method that iteratively refines programmatic reasoning chains and combines execution feedback with the native Chain-of-thought abilities of the base LLM to maintain high-level contextual focus. IIPC surpasses competing approaches in the majority of reasoning benchmarks on multiple base LLMs. All code and implementations are released as open source.
comment: 9 pages, 7 figures, submitted to ACL ARR 2026
Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading
This paper proposes a novel planning framework to handle a multi-agent pathfinding problem under team-connected communication constraint, where all agents must have a connected communication channel to the rest of the team during their entire movements. Standard multi-agent path finding approaches (e.g., priority-based search) have potential in this domain but fail when neighboring configurations at start and goal differ. Their single-expansion approach -- computing each agent's path from the start to the goal in just a single expansion -- cannot reliably handle planning under communication constraints for agents as their neighbors change during navigating. Similarly, leader-follower approaches (e.g., platooning) are effective at maintaining team communication, but fixing the leader at the outset of planning can cause planning to become stuck in dense-clutter environments, limiting their practical utility. To overcome this limitation, we propose a novel two-level multi-agent pathfinding framework that integrates two techniques: adaptive path expansion to expand agent paths to their goals in multiple stages; and dynamic leading technique that enables the reselection of the leading agent during each agent path expansion whenever progress cannot be made. Simulation experiments show the efficiency of our planners, which can handle up to 25 agents across five environment types under a limited communication range constraint and up to 11-12 agents on three environment types under line-of-sight communication constraint, exceeding 90% success-rate where baselines routinely fail.
A Research Roadmap for Augmenting Software Engineering Processes and Software Products with Generative AI
Generative AI (GenAI) is rapidly transforming software engineering (SE) practices, influencing how SE processes are executed, as well as how software systems are developed, operated, and evolved. This paper applies design science research to build a roadmap for GenAI-augmented SE. The process consists of three cycles that incrementally integrate multiple sources of evidence, including collaborative discussions from the FSE 2025 "Software Engineering 2030" workshop, rapid literature reviews, and external feedback sessions involving peers. McLuhan's tetrads were used as a conceptual instrument to systematically capture the transforming effects of GenAI on SE processes and software products.The resulting roadmap identifies four fundamental forms of GenAI augmentation in SE and systematically characterizes their related research challenges and opportunities. These insights are then consolidated into a set of future research directions. By grounding the roadmap in a rigorous multi-cycle process and cross-validating it among independent author teams and peers, the study provides a transparent and reproducible foundation for analyzing how GenAI affects SE processes, methods and tools, and for framing future research within this rapidly evolving area. Based on these findings, the article finally makes ten predictions for SE in the year 2030.
The incompatibility of the Condorcet winner and loser criteria with positive involvement and resolvability
We prove that there is no preferential voting method satisfying the Condorcet winner and loser criteria, positive involvement (if a candidate $x$ wins in an initial preference profile, then adding a voter who ranks $x$ uniquely first cannot cause $x$ to lose), and $n$-voter resolvability (if $x$ initially ties for winning, then $x$ can be made the unique winner by adding some set of up to $n$ voters). This impossibility theorem holds for any positive integer $n$. It also holds if either the Condorcet loser criterion is replaced by independence of clones or positive involvement is replaced by negative involvement.
comment: 6 pages, 2 figures. Added theorem for independence of clones
Causal Graph Dynamics and Kan Extensions
On the one side, the formalism of Global Transformations comes with the claim of capturing any transformation of space that is local, synchronous and deterministic. The claim has been proven for different classes of models such as mesh refinements from computer graphics, Lindenmayer systems from morphogenesis modeling and cellular automata from biological, physical and parallel computation modeling. The Global Transformation formalism achieves this by using category theory for its genericity, and more precisely the notion of Kan extension to determine the global behaviors based on the local ones. On the other side, Causal Graph Dynamics describe the transformation of port graphs in a synchronous and deterministic way and has not yet being tackled. In this paper, we show the precise sense in which the claim of Global Transformations holds for them as well. This is done by showing different ways in which they can be expressed as Kan extensions, each of them highlighting different features of Causal Graph Dynamics. Along the way, this work uncovers the interesting class of Monotonic Causal Graph Dynamics and their universality among General Causal Graph Dynamics.
Multi-Agent Teams Hold Experts Back
Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.
comment: Preprint
LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation Systems
Recommendation systems must optimize multiple objectives while satisfying hard business constraints such as fairness and coverage. For example, an e-commerce platform may require every recommendation list to include items from multiple sellers and at least one newly listed product; violating such constraints--even once--is unacceptable in production. Prior work on multi-objective recommendation and recent LLM-based recommender agents largely treat constraints as soft penalties or focus on item scoring and interaction, leading to frequent violations in real-world deployments. How to leverage LLMs for coordinating constrained optimization in recommendation systems remains underexplored. We propose DualAgent-Rec, an LLM-coordinated dual-agent framework for constrained multi-objective e-commerce recommendation. The framework separates optimization into an Exploitation Agent that prioritizes accuracy under hard constraints and an Exploration Agent that promotes diversity through unconstrained Pareto search. An LLM-based coordinator adaptively allocates resources between agents based on optimization progress and constraint satisfaction, while an adaptive epsilon-relaxation mechanism guarantees feasibility of final solutions. Experiments on the Amazon Reviews 2023 dataset demonstrate that DualAgent-Rec achieves 100% constraint satisfaction and improves Pareto hypervolume by 4-6% over strong baselines, while maintaining competitive accuracy-diversity trade-offs. These results indicate that LLMs can act as effective orchestration agents for deployable and constraint-compliant recommendation systems.
comment: 8 pages, 5 figures
ENGRAM: Effective, Lightweight Memory Orchestration for Conversational Agents
Large language models (LLMs) deployed in user-facing applications require long-horizon consistency: the ability to remember prior interactions, respect user preferences, and ground reasoning in past events. However, contemporary memory systems often adopt complex architectures such as knowledge graphs, multi-stage retrieval pipelines, and OS-style schedulers, which introduce engineering complexity and reproducibility challenges. We present ENGRAM, a lightweight memory system that organizes conversation into three canonical memory types (episodic, semantic, and procedural) through a single router and retriever. Each user turn is converted into typed memory records with normalized schemas and embeddings and stored in a database. At query time, the system retrieves top-k dense neighbors for each type, merges results with simple set operations, and provides the most relevant evidence as context to the model. ENGRAM attains state-of-the-art results on LoCoMo, a multi-session conversational QA benchmark for long-horizon memory, and exceeds the full-context baseline by 15 points on LongMemEval while using only about 1% of the tokens. These results show that careful memory typing and straightforward dense retrieval can enable effective long-term memory management in language models without requiring complex architectures.
AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents
The rise of Large Language Models (LLMs) as coding agents promises to accelerate software development, but their impact on generated code reproducibility remains largely unexplored. This paper presents an empirical study investigating whether LLM-generated code can be executed successfully in a clean environment with only OS packages and using only the dependencies that the model specifies. We evaluate three state-of-the-art LLM coding agents (Claude Code, OpenAI Codex, and Gemini) across 300 projects generated from 100 standardized prompts in Python, JavaScript, and Java. We introduce a three-layer dependency framework (distinguishing between claimed, working, and runtime dependencies) to quantify execution reproducibility. Our results show that only 68.3% of projects execute out-of-the-box, with substantial variation across languages (Python 89.2%, Java 44.0%). We also find a 13.5 times average expansion from declared to actual runtime dependencies, revealing significant hidden dependencies.
Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models NeurIPS
Large Language Models (LLMs) demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Beyond the concern of how LLMs persuade others, their own susceptibility to persuasion poses a critical alignment challenge, raising questions about robustness, safety, and adherence to ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasiveness and susceptibility to persuasion in multi-agent interactions. Our framework offers a scalable alternative to the costly and time-intensive human annotation process typically used to study persuasion in LLMs. PMIYC automatically conducts multi-turn conversations between Persuader and Persuadee agents, measuring both the effectiveness of and susceptibility to persuasion. Our comprehensive evaluation spans a diverse set of LLMs and persuasion settings (e.g., subjective and misinformation scenarios). We validate the efficacy of our framework through human evaluations and demonstrate alignment with human assessments from prior studies. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.
comment: Updated to match the NeurIPS MTI-LLM Workshop format. Content remains consistent with the original version, with structural refinements, expanded explanations, and an extended appendix including additional results
Systems and Control (EESS)
Mitigating Timing-Based Attacks in Real-Time Cyber-Physical Systems
Real-time cyber-physical systems depend on deterministic task execution to guarantee safety and correctness. Unfortunately, this determinism can unintentionally expose timing information that enables adversaries to infer task execution patterns and carry out timing-based attacks targeting safety-critical control tasks. While prior defenses aim to obscure schedules through randomization or isolation, they typically neglect the implications of such modifications on closed-loop control behavior and real-time feasibility. This work studies the problem of securing real-time control workloads against timing inference attacks while explicitly accounting for both schedulability constraints and control performance requirements. We present a scheduling-based mitigation approach that introduces bounded timing perturbations to control task executions in a structured manner, reducing adversarial opportunities without violating real-time guarantees. The framework jointly considers worst-case execution behavior and the impact of execution delays on control performance, enabling the system to operate within predefined safety and performance limits. Through experimental evaluation on representative task sets and control scenarios, the proposed approach demonstrates that exposure to timing-based attacks can be significantly reduced while preserving predictable execution and acceptable control quality.
comment: 12 pages, 10 figures
Input-to-State Safe Backstepping: Robust Safety-Critical Control with Unmatched Uncertainties
Guaranteeing safety in the presence of unmatched disturbances -- uncertainties that cannot be directly canceled by the control input -- remains a key challenge in nonlinear control. This paper presents a constructive approach to safety-critical control of nonlinear systems with unmatched disturbances. We first present a generalization of the input-to-state safety (ISSf) framework for systems with these uncertainties using the recently developed notion of an Optimal Decay CBF, which provides more flexibility for satisfying the associated Lyapunov-like conditions for safety. From there, we outline a procedure for constructing ISSf-CBFs for two relevant classes of systems with unmatched uncertainties: i) strict-feedback systems; ii) dual-relative-degree systems, which are similar to differentially flat systems. Our theoretical results are illustrated via numerical simulations of an inverted pendulum and planar quadrotor.
comment: To appear at the 2026 American Control Conference
A Comparison of Set-Based Observers for Nonlinear Systems
Set-based state estimation computes sets of states consistent with a system model given bounded sets of disturbances and noise. Bounding the set of states is crucial for safety-critical applications so that one can ensure that all specifications are met. While numerous approaches have been proposed for nonlinear discrete-time systems, a unified evaluation under comparable conditions is lacking. This paper reviews and implements a representative selection of set-based observers within the CORA framework. To provide an objective comparison, the methods are evaluated on common benchmarks, and we examine computational effort, scalability, and the conservatism of the resulting state bounds. This study highlights characteristic trade-offs between observer categories and set representations, as well as practical considerations arising in their implementation. All implementations are made publicly available to support reproducibility and future development. This paper thereby offers the first broad, tool-supported comparison of guaranteed state estimators for nonlinear discrete-time systems.
comment: 13 pages
Real-world energy data of 200 feeders from low-voltage grids with metadata in Germany over two years
The last mile of the distribution grid is crucial for a successful energy transition, as more low-carbon technology like photovoltaic systems, heat pumps, and electric vehicle chargers connect to the low-voltage grid. Despite considerable challenges in operation and planning, researchers often lack access to suitable low-voltage grid data. To address this, we present the FeederBW dataset with data recorded by the German distribution system operator Netze BW. It offers real-world energy data from 200 low-voltage feeders over two years (2023-2025) with weather information and detailed metadata, including changes in low-carbon technology installations. The dataset includes feeder-specific details such as the number of housing units, installed power of low-carbon technology, and aggregated industrial energy data. Furthermore, high photovoltaic feed-in and one-minute temporal resolution makes the dataset unique. FeederBW supports various applications, including machine learning for load forecasting, conducting non-intrusive load monitoring, generating synthetic data, and analyzing the interplay between weather, feeder measurements, and metadata. The dataset reveals insightful patterns and clearly reflects the growing impact of low-carbon technology on low-voltage grids.
comment: 20 pages, 6 Figures, 6 Tables. Data is available on Zenodo: https://zenodo.org/records/17831177
A necessary and sufficient condition for discrete-time consensus on star boundaries
It is intuitive and well known, that if agents in a multi-agent system iteratively update their states in the Euclidean space as convex combinations of neighbors' states, all states eventually converge to the same value (consensus), provided the interaction graph is sufficiently connected. However, this seems to be also true in practice if the convex combinations of states are mapped or radially projected onto any unit $l_p$-sphere or even boundaries of star-convex sets, herein referred to as star boundaries. In this paper, we present insight into this matter by providing a necessary and sufficient condition for asymptotic consensus of the normalized states (directions) for strongly connected directed graphs, which is equivalent to asymptotic consensus of states when the star boundaries are the same for all agents. Furthermore, we show that when asymptotic consensus occurs, the states converge linearly and the point of convergence is continuous in the initial states. Assuming a directed strongly connected graph provides a more general setting than that considered, for example, in gradient-based consensus protocols, where symmetric graphs are assumed. Illustrative examples and a vast number of numerical simulations showcase the theoretical results.
comment: 14 pages, 8 figures
Cholesky factorisation, and intrinsically sparse linear quadratic regulation
We classify a family of matrices of shift operators that can be factorised in a computationally tractable manner with the Cholesky algorithm. Such matrices arise in the linear quadratic regulator problem, and related areas. We use the factorisation to uncover intrinsic sparsity properties in the control laws for transportation problems with an underlying tree structure. This reveals that the optimal control can be applied in a distributed manner that is obscured by standard solution methods.
comment: 15 pages, 7 figures, under review
When control meets large language models: From words to dynamics
While large language models (LLMs) are transforming engineering and technology through enhanced control capabilities and decision support, they are simultaneously evolving into complex dynamical systems whose behavior must be regulated. This duality highlights a reciprocal connection in which prompts support control system design while control theory helps shape prompts to achieve specific goals efficiently. In this study, we frame this emerging interconnection of LLM and control as a bidirectional continuum, from prompt design to system dynamics. First, we investigate how LLMs can advance the field of control in two distinct capacities: directly, by assisting in the design and synthesis of controllers, and indirectly, by augmenting research workflows. Second, we examine how control concepts help LLMs steer their trajectories away from undesired meanings, improving reachability and alignment via input optimization, parameter editing, and activation-level interventions. Third, we look into deeper integrations by treating LLMs as dynamic systems within a state-space framework, where their internal representations are closely linked to external control loops. Finally, we identify key challenges and outline future research directions to understand LLM behavior and develop interpretable and controllable LLMs that are as trustworthy and robust as their electromechanical counterparts, thereby ensuring they continue to support and safeguard society.
Dynamics of Implicit Time-Invariant Max-Min-Plus-Scaling Discrete-Event Systems
Max-min-plus-scaling (MMPS) systems generalize max-plus, min-plus and max-min-plus models with more flexibility in modelling discrete-event dynamics. Especially, implicit MMPS models capture a wide range of real world discrete-event applications. This article analyzes the dynamics of an autonomous, time-invariant implicit MMPS system in a discrete-event framework. First, we provide sufficient conditions under which an implicit MMPS system admits at least one solution to its state-space representation. Then, we analyze its global behavior by determining the key parameters; the growth rates and fixed points. For a solvable MMPS system, we assess the local behavior of the system around its set of fixed points via a normalization procedure. Further, we present the notion of stability for the normalized system. A case study of the urban railway network substantiates the theoretical results.
comment: 12 pages, Under review at Automatica
Power Reserve Procurement Considering Dependent Random Variables with PCE
This paper presents an approach for the modelling of dependent random variables using generalised polynomial chaos. This allows to write chance-constrained optimization problems with respect to a joint distribution modelling dependencies between different stochastic inputs. Arbitrary dependencies are modelled by using Gaussian copulas to construct the joint distribution. The paper exploits the problem structure and develops suitable transformations to ensure tractability. The proposed method is applied to a probabilistic power reserve procurement problem. The effectiveness of the method to capture dependencies is shown by comparing the approach with a standard approach considering independent random variables.
Impact of Physics-Informed Features on Neural Network Complexity for Li-ion Battery Voltage Prediction in Electric Vertical Takeoff and Landing Aircrafts
The electrification of vertical takeoff and landing aircraft demands high-fidelity battery management systems capable of predicting voltage response under aggressive power dynamics. While data-driven models offer high accuracy, they often require complex architectures and extensive training data. Conversely, equivalent circuit models (ECMs), such as the second-order model, offer physical interpretability but struggle with high C-rate non-linearities. This paper investigates the impact of integrating physics-based information into data-driven surrogate models. Specifically, we evaluate whether physics-informed features allow for the simplification of neural network architectures without compromising accuracy. Using the open-source electric vertical takeoff and landing (eVTOL) battery dataset, we compare pure data-driven models against physics-informed data models. Results demonstrate that physics-informed models achieve comparable accuracy to complex pure data-driven models while using up to 75% fewer trainable parameters, significantly reducing computational overhead for potential on-board deployment.
Joint Network-and-Server Congestion in Multi-Source Traffic Allocation: A Convex Formulation and Price-Based Decentralization
This paper studies an important rate allocation problem that arises in many networked and distributed systems: steady-state traffic rate allocation from multiple sources to multiple service nodes when both (i) the access-path delay on each source-node route is rate-dependent (capacity-constrained) and convex, and (ii) each service node (also capacity-constrained) experiences a load-dependent queueing delay driven by aggregate load from all sources. We show that the resulting flow-weighted end-to-end delay minimization is a convex program, yielding a global system-optimal solution characterized by KKT conditions that equalize total marginal costs (a path marginal access term plus a node congestion price) across all utilized routes. This condition admits a Wardrop-type interpretation: for each source, all utilized options equalize total marginal cost, while any option with strictly larger total marginal cost receives no flow. Building on this structure, we develop a lightweight distributed pricing-based algorithm in which each service node locally computes and broadcasts a scalar congestion price from its observed aggregate load, while each source updates its traffic split by solving a small separable convex allocation problem under the advertised prices. Numerical illustrations demonstrate convergence of the distributed iteration to the centralized optimum and highlight the trade-offs induced by jointly modeling access and service congestion.
comment: 10pages, 7 figures, submitted a version conference
Geometry-Preserving Neural Architectures on Manifolds with Boundary
Preserving geometric structure is important in learning. We propose a unified class of geometry-aware architectures that interleave geometric updates between layers, where both projection layers and intrinsic exponential map updates arise as discretizations of projected dynamical systems on manifolds (with or without boundary). Within this framework, we establish universal approximation results for constrained neural ODEs. We also analyze architectures that enforce geometry only at the output, proving a separate universal approximation property that enables direct comparison to interleaved designs. When the constraint set is unknown, we learn projections via small-time heat-kernel limits, showing diffusion/flow-matching can be used as data-based projections. Experiments on dynamics over S^2 and SO(3), and diffusion on S^{d-1}-valued features demonstrate exact feasibility for analytic updates and strong performance for learned projections
ProOPF: Benchmarking and Improving LLMs for Professional-Grade Power Systems Optimization Modeling
Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
Fast Diffusion with Physics-Correction for ACOPF
Generating large-scale, physically consistent AC Optimal Power Flow (ACOPF) datasets is essential for modern data-driven power system applications. The central challenge lies in balancing solution accuracy with computational efficiency. Recent diffusion-based generative models produce high-quality samples; however, their slow sampling procedures limit practical scalability. In this work, we argue that exact physical feasibility is ultimately enforced by power flow solvers or projection steps, and therefore the generative model only needs to produce good initializations rather than perfectly feasible solutions. Based on this insight, we propose a fast diffusion framework using Denoising Diffusion Implicit Models (DDIM) combined with physics-guided corrections during sampling. The proposed method replaces slow stochastic refinement with a small number of deterministic steps and explicit constraint guidance. Experiments on IEEE 6-, 24-, and 118-bus systems show that our approach achieves up to 20 times faster sampling than standard diffusion models while maintaining comparable statistical accuracy and physical consistency. This makes the method well suited for scalable OPF dataset generation and practical power system learning tasks. We release the implementation code at https://github.com/PSquare-Lab/DDIM_OPF.
Data-driven stabilization of continuous-time systems with noisy input-output data
We study data-driven stabilization of continuous-time systems in autoregressive form when only noisy input-output data are available. First, we provide an operator-based characterization of the set of systems consistent with the data. Next, combining this characterization with behavioral theory, we derive a necessary and sufficient condition for the noisy data to be informative for quadratic stabilization. This condition is formulated as linear matrix inequalities, whose solution yields a stabilizing controller. Finally, we characterize data informativity for system identification in the noise-free setting.
comment: 18 pages
Human-Centric Traffic Signal Control for Equity: A Multi-Agent Action Branching Deep Reinforcement Learning Approach
Coordinating traffic signals along multimodal corridors is challenging because many multi-agent deep reinforcement learning (DRL) approaches remain vehicle-centric and struggle with high-dimensional discrete action spaces. We propose MA2B-DDQN, a human-centric multi-agent action-branching double Deep Q-Network (DQN) framework that explicitly optimizes traveler-level equity. Our key contribution is an action-branching discrete control formulation that decomposes corridor control into (i) local, per-intersection actions that allocate green time between the next two phases and (ii) a single global action that selects the total duration of those phases. This decomposition enables scalable coordination under discrete control while reducing the effective complexity of joint decision-making. We also design a human-centric reward that penalizes the number of delayed individuals in the corridor, accounting for pedestrians, vehicle occupants, and transit passengers. Extensive evaluations across seven realistic traffic scenarios in Melbourne, Australia, demonstrate that our approach significantly reduces the number of impacted travelers, outperforming existing DRL and baseline methods. Experiments confirm the robustness of our model, showing minimal variance across diverse settings. This framework not only advocates for a fairer traffic signal system but also provides a scalable solution adaptable to varied urban traffic conditions.
Hybrid-Field Channel Estimation for XL-MIMO Systems: Dictionary-based Sparse Signal Recovery
Extremely large-scale multiple-input multiple-output (XL-MIMO) systems are a key technology for future wireless networks, but the large array aperture naturally creates a hybrid-field (HF) propagation regime in which far-field (FF) planar-wave and near-field (NF) spherical-wave components coexist. This work considers the problem of HF channel estimation (CE) and introduces a unified model that superimposes FF and NF contributions according to the Rayleigh distance boundary. By exploiting the inherent sparsity of the channel in the angular and polar domains, we formulate the estimation task as a sparse recovery problem. Unlike conventional approaches that require prior knowledge of the channel sparsity level, the proposed method operates without requiring knowledge of the sparsity level L and the NF/FF ratio γ, which are used only for synthetic channel generation in simulations. The channel estimator determines the number of paths adaptively through a residual-based stopping rule. A combined FF/NF dictionary is employed to initialize the support, and each selected atom undergoes continuous parameter refinement to mitigate grid mismatch. Simulation results demonstrate that the proposed estimator achieves accurate HF channel reconstruction under both line-of-sight (LoS) and non-line-of-sight (NLoS) conditions, offering a practical and computationally efficient solution for XL-MIMO systems. Extremely Large-Scale MIMO (XL-MIMO); Channel State Information (CSI); Channel estimation (CE); hybrid-field (HF) wave propagation; near-field (NF) spherical wave model; far-field (FF) planar wave model
comment: 5 pages, 2 figures, letter paper
Control and State Estimation of Vehicle-Mounted Aerial Systems in GPS-Denied, Non-Inertial Environments
We present a robust control and estimation framework for quadrotors operating in Global Navigation Satellite System(GNSS)-denied, non-inertial environments where inertial sensors such as Inertial Measurement Units (IMUs) become unreliable due to platform-induced accelerations. In such settings, conventional estimators fail to distinguish whether the measured accelerations arise from the quadrotor itself or from the non-inertial platform, leading to drift and control degradation. Unlike conventional approaches that depend heavily on IMU and GNSS, our method relies exclusively on external position measurements combined with a Extended Kalman Filter with Unknown Inputs (EKF-UI) to account for platform motion. The estimator is paired with a cascaded PID controller for full 3D tracking. To isolate estimator performance from localization errors, all tests are conducted using high-precision motion capture systems. Experimental results in a moving-cart testbed validate our approach under both translational in X-axis and Y-axis dissonance. Compared to standard EKF, the proposed method significantly improves stability and trajectory tracking without requiring inertial feedback, enabling practical deployment on moving platforms such as trucks or elevators.
comment: 10 pages 8 figures
Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World
The integration of foundation models (FMs) into robotics has accelerated real-world deployment, while introducing new safety challenges arising from open-ended semantic reasoning and embodied physical action. These challenges require safety notions beyond physical constraint satisfaction. In this paper, we characterize FM-enabled robot safety along three dimensions: action safety (physical feasibility and constraint compliance), decision safety (semantic and contextual appropriateness), and human-centered safety (conformance to human intent, norms, and expectations). We argue that existing approaches, including static verification, monolithic controllers, and end-to-end learned policies, are insufficient in settings where tasks, environments, and human expectations are open-ended, long-tailed, and subject to adaptation over time. To address this gap, we propose modular safety guardrails, consisting of monitoring (evaluation) and intervention layers, as an architectural foundation for comprehensive safety across the autonomy stack. Beyond modularity, we highlight possible cross-layer co-design opportunities through representation alignment and conservatism allocation to enable faster, less conservative, and more effective safety enforcement. We call on the community to explore richer guardrail modules and principled co-design strategies to advance safe real-world physical AI deployment.
Towards X-embodiment safety: A control theory perspective on transferring safety certificates across dynamical systems
Control barrier functions (CBFs) provide a powerful tool for enforcing safety constraints in control systems, but their direct application to complex, high-dimensional dynamics is often challenging. In many settings, safety certificates are more naturally designed for simplified or alternative system models that do not exactly match the dynamics of interest. This paper addresses the problem of transferring safety guarantees between dynamical systems with mismatched dynamics. We propose a transferred control barrier function (tCBF) framework that enables safety constraints defined on one system to be systematically enforced on another system using a simulation function and an explicit margin term. The resulting transferred barrier accounts for model mismatch and induces a safety condition that can be enforced on the target system via a quadratic-program-based safety filter. The proposed approach is general and does not require the two systems to share the same state dimension or dynamics. We demonstrate the effectiveness of the framework on a quadrotor navigation task with the transferred barrier ensuring collision avoidance for the target system, while remaining minimally invasive to a nominal controller. These results highlight the potential of transferred control barrier functions as a general mechanism for enforcing safety across heterogeneous dynamical systems.
eCP: Informative uncertainty quantification via Equivariantized Conformal Prediction with pre-trained models
We study the effect of group symmetrization of pre-trained models on conformal prediction (CP), a post-hoc, distribution-free, finite-sample method of uncertainty quantification that offers formal coverage guarantees under the assumption of data exchangeability. Unfortunately, CP uncertainty regions can grow significantly in long horizon missions, rendering the statistical guarantees uninformative. To that end, we propose infusing CP with geometric information via group-averaging of the pretrained predictor to distribute the non-conformity mass across the orbits. Each sample now is treated as a representative of an orbit, thus uncertainty can be mitigated by other samples entangled to it via the orbit inducing elements of the symmetry group. Our approach provably yields contracted non-conformity scores in increasing convex order, implying improved exponential-tail bounds and sharper conformal prediction sets in expectation, especially at high confidence levels. We then propose an experimental design to test these theoretical claims in pedestrian trajectory prediction.
Safety-Critical Reinforcement Learning with Viability-Based Action Shielding for Hypersonic Longitudinal Flight
This paper presents a safety-critical reinforcement learning framework for nonlinear dynamical systems with continuous state and input spaces operating under explicit physical constraints. Hard safety constraints are enforced independently of the reward through action shielding and reachability-based admissible action sets, ensuring that unsafe behaviors are never intentionally selected during learning or execution. To capture nominal operation and recovery behavior within a single control architecture, the state space is partitioned into safe and unsafe regions based on membership in a safety box, and a mode-dependent reward is used to promote accurate tracking inside the safe region and recovery toward it when operating outside. To enable online tabular learning on continuous dynamics, a finite-state abstraction is constructed via state aggregation, and action selection and value updates are consistently restricted to admissible actions. The framework is demonstrated on a longitudinal point-mass hypersonic vehicle model with aerodynamic and propulsion couplings, using angle of attack and throttle as control inputs.
C-IDS: Solving Contextual POMDP via Information-Directed Objective
We study the policy synthesis problem in contextual partially observable Markov decision processes (CPOMDPs), where the environment is governed by an unknown latent context that induces distinct POMDP dynamics. Our goal is to design a policy that simultaneously maximizes cumulative return and actively reduces uncertainty about the underlying context. We introduce an information-directed objective that augments reward maximization with mutual information between the latent context and the agent's observations. We develop the C-IDS algorithm to synthesize policies that maximize the information-directed objective. We show that the objective can be interpreted as a Lagrangian relaxation of the linear information ratio and prove that the temperature parameter is an upper bound on the information ratio. Based on this characterization, we establish a sublinear Bayesian regret bound over K episodes. We evaluate our approach on a continuous Light-Dark environment and show that it consistently outperforms standard POMDP solvers that treat the unknown context as a latent state variable, achieving faster context identification and higher returns.
An End-to-End Approach for Microgrid Probabilistic Forecasting and Robust Operation via Decision-focused Learning
High penetration of renewable energy sources (RES) introduces significant uncertainty and intermittency into microgrid operations, posing challenges to economic and reliable scheduling. To address this, this paper proposes an end-to-end decision-focused framework that jointly optimizes probabilistic forecasting and robust operation for microgrids. A multilayer encoder-decoder (MED) probabilistic forecasting model is integrated with a two-stage robust optimization (TSRO) model involving direct load control (DLC) through a differentiable decision pathway, enabling gradient-based feedback from operational outcomes to improve forecasting performance. Unlike conventional sequential approaches, the proposed method aligns forecasting accuracy with operational objectives by directly minimizing decision regret via a surrogate smart predict-then-optimize (SPO) loss function. This integration ensures that probabilistic forecasts are optimized for downstream decisions, enhancing both economic efficiency and robustness. Case studies on modified IEEE 33-bus and 69-bus systems demonstrate that the proposed framework achieves superior forecasting accuracy and operational performance, reducing total and net operation costs by up to 18% compared with conventional forecasting and optimization combinations. The results verify the effectiveness and scalability of the end-to-end decision-focused approach for resilient and cost-efficient microgrid management under uncertainty.
comment: 10 pages
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control
For safety-critical applications, model-free reinforcement learning (RL) faces numerous challenges, particularly the difficulty of establishing verifiable stability guarantees while maintaining high exploration efficiency. To address these challenges, we present Multi-Step Actor-Critic Learning with Lyapunov Certificates (MSACL), a novel approach that seamlessly integrates exponential stability with maximum entropy reinforcement learning (MERL). In contrast to existing methods that rely on complex reward engineering and single-step constraints, MSACL utilizes intuitive rewards and multi-step data for actor-critic learning. Specifically, we first introduce Exponential Stability Labels (ESLs) to categorize samples and propose a $λ$-weighted aggregation mechanism to learn Lyapunov certificates. Leveraging these certificates, we then develop a stability-aware advantage function to guide policy optimization, thereby ensuring rapid Lyapunov descent and robust state convergence. We evaluate MSACL across six benchmarks, comprising four stabilization and two high-dimensional tracking tasks. Experimental results demonstrate its consistent superiority over both standard RL baselines and state-of-the-art Lyapunov-based RL algorithms. Beyond rapid convergence, MSACL exhibits significant robustness against environmental uncertainties and remarkable generalization to unseen reference signals. The source code and benchmarking environments are available at \href{https://github.com/YuanZhe-Xing/MSACL}{https://github.com/YuanZhe-Xing/MSACL}.
comment: This work has been submitted to the IEEE for possible publication
Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RL
Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomous systems encounter environmental changes in system dynamics, sensor drift, or task objectives, fixed policies rapidly degrade in performance. We show that employing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible algorithm for online adaptation, can effectively fine-tune a pretrained policy to improve autonomous agents' performance on driving tasks. We further show that RTRRL synergizes with a recent biologically inspired recurrent network model, the Liquid-Resistance Liquid-Capacitance RNN. We demonstrate the effectiveness of this closed-loop approach in a simulated CarRacing environment and in a real-world line-following task with a RoboRacer car equipped with an event camera.
DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for safe and energy-efficient operation. However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an evolving DC. This specification-to-policy lag causes a lack of timely, effective control policies, which may lead to service outages. To bridge the gap, we present DCoPilot, a hybrid framework for generative control policies in dynamic DC operation. DCoPilot synergizes two distinct generative paradigms, i.e., a large language model (LLM) that performs symbolic generation of structured reward forms, and a hypernetwork that conducts parametric generation of policy weights. DCoPilot operates through three coordinated phases: (i) simulation scale-up, which stress-tests reward candidates across diverse simulation-ready (SimReady) scenes; (ii) meta policy distillation, where a hypernetwork is trained to output policy weights conditioned on SLA and scene embeddings; and (iii) online adaptation, enabling zero-shot policy generation in response to updated specifications. Evaluated across five control task families spanning diverse DC components, DCoPilot achieves near-zero constraint violations and outperforms all baselines across specification variations. Ablation studies validate the effectiveness of LLM-based unified reward generation in enabling stable hypernetwork convergence.
A tutorial overview of model predictive control for continuous crystallization: current possibilities and future perspectives
This paper presents a systematic approach to the advanced control of continuous crystallization processes using model predictive control. We provide a tutorial introduction to controlling complex particle size distributions by integrating population balance equations with detailed models of various continuous crystallizers. Since these high-fidelity models are often too complex for online optimization, we propose the use of data-driven surrogate models that enable efficient optimization-based control. Through two case studies, one with a low-complexity system allowing direct comparison with traditional methods and another involving a spatially distributed crystallizer, we demonstrate how our approach enables real-time model predictive control while maintaining accuracy. The presented methodology facilitates the use of complex models in a model-based control framework, allowing precise control of key particle size distribution characteristics, such as the median particle size $d_{50}$ and the width $d_{90} - d_{10}$. This addresses a critical challenge in pharmaceutical and fine chemical manufacturing, where product quality depends on tight control of particle characteristics.
LP-MPPI: Low-Pass Filtering for Efficient Model Predictive Path Integral Control ICRA 2026
Model Predictive Path Integral (MPPI) control is a widely used sampling-based approach for real-time control, valued for its flexibility in handling arbitrary dynamics and cost functions. However, it often suffers from high-frequency noise in the sampled control trajectories, which hinders the search for optimal controls and transfers to the applied controls, leading to actuator wear. In this work, we introduce Low-Pass Model Predictive Path Integral Control (LP-MPPI), which integrates low-pass filtering into the sampling process to eliminate detrimental high-frequency components and enhance the algorithm's efficiency. Unlike prior approaches, LP-MPPI provides direct and interpretable control over the frequency spectrum of sampled control trajectory perturbations, leading to more efficient sampling and smoother control. Through extensive evaluations in Gymnasium environments, simulated quadruped locomotion, and real-world F1TENTH autonomous racing, we demonstrate that LP-MPPI consistently outperforms state-of-the-art MPPI variants, achieving significant performance improvements while reducing control signal chattering.
comment: Accepted at International Conference on Robotics and Automation 2026 (ICRA 2026)
Incremental Collision Laws Based on the Bouc-Wen Model: Improved Collision Models and Further Results
In the article titled "The Bouc-Wen Model for Binary Direct Collinear Collisions of Convex Viscoplastic Bodies" and published in the Journal of Computational and Nonlinear Dynamics (Volume 20, Issue 6, June 2025), the authors studied mathematical models of binary direct collinear collisions of convex viscoplastic bodies that employed two incremental collision laws based on the Bouc-Wen differential model of hysteresis. It was shown that the models possess favorable analytical properties, and several model parameter identification studies were conducted, demonstrating that the models can accurately capture the nature of a variety of collision phenomena. In this article, the aforementioned models are augmented by modeling the effects of external forces as time-dependent inputs. Furthermore, the range of the parameters under which the models possess favorable analytical properties is extended to several corner cases that were not considered in the prior publication. Finally, the previously conducted model parameter identification studies are extended, and an additional model parameter identification study is provided in an attempt to validate the ability of the augmented models to represent the effects of external forces.
comment: 15 pages, 9 figures, see https://gitlab.com/user9716869/EBWCM ; (v2-v10) various minor amendments; (v5) replaced the parameter identification study of Quinn (2004) with Villegas et al (2021) due to incompatibility of the proposed collision models with the experimental setup in Quinn (2004); (v7) added ODEs for COM and an application example; arXiv admin note: text overlap with arXiv:2410.08147
Certainty-Equivalence Model Predictive Control: Stability, Performance, and Beyond
Handling model mismatch is a common challenge in model predictive control (MPC). While robust MPC is effective, its conservatism often makes it less desirable. Certainty-equivalence MPC (CE-MPC), which uses a nominal model, offers an appealing alternative due to its design simplicity and low computational costs. This paper investigates CE-MPC for uncertain nonlinear systems with multiplicative parametric uncertainty and input constraints that are inactive at the steady state. The primary contributions are two-fold. First, a novel perturbation analysis of the MPC value function is provided, without assuming the Lipschitz continuity of the stage cost, better tailoring the widely used quadratic cost and having broader applicability in value function approximation, learning-based MPC, and performance-driven MPC design. Second, the stability and performance analysis of CE-MPC are provided, quantifying the suboptimality of CE-MPC compared to the infinite-horizon optimal controller with perfect model knowledge. The results provide insights in how the prediction horizon and model mismatch jointly affect stability and the worst-case performance. Furthermore, the general results are specialized to linear quadratic control, and a competitive ratio bound is derived, serving as the first competitive-ratio bound for MPC of uncertain linear systems with input constraints and multiplicative uncertainty.
comment: To appear in IEEE Transactions on Automatic Control (July 2026)
Robust H-infinity control and worst-case search in constrained parametric space
Standard H-infinity/H2 robust control and analysis tools operate on uncertain parameters assumed to vary independently within prescribed bounds. This paper extends their capabilities in the presence of constraints coupling these parameters and restricting the parametric space. Focusing on the worst-case search, we demonstrate -- based on the theory of upper-$C^1$ functions -- the validity of standard, readily available smooth optimization to address this nonsmooth constrained optimization problem. Specifically, we prove that the sequential quadratic programming algorithm converges to Karush-Kuhn-Tucker points, and that such conditions are satisfied by any subgradient at a local minimum. This worst-case search then enables robust controller synthesis: identified worst-case configurations are iteratively added to an active set on which a non-smooth multi-models optimization of the controller is performed. From a practical point of view, we combine the local exploitation proposed above with a global exploration using Monte-Carlo sampling. The proposed approach is illustrated through the robust control of a mechanical system, of order 50 with 43 uncertain parameters. We show that the constrained optimization effectively complements Monte-Carlo sampling by enabling fast detection of rare worst-case configurations, and that the robust controller optimization converges with less than 10 active configurations.
comment: Preprint
An Overview and Comparison of Spectral Bundle Methods for Primal and Dual Semidefinite Programs
The spectral bundle method developed by Helmberg and Rendl is well-established for solving large-scale semidefinite programs (SDPs) in the dual form, especially when the SDPs admit $\textit{low-rank primal solutions}$. Under mild regularity conditions, a recent result by Ding and Grimmer has established fast linear convergence rates when the bundle method captures $\textit{the rank of primal solutions}$. In this paper, we present an overview and comparison of spectral bundle methods for solving both $\textit{primal}$ and $\textit{dual}$ SDPs. In particular, we introduce a new family of spectral bundle methods for solving SDPs in the $\textit{primal}$ form. The algorithm developments are parallel to those by Helmberg and Rendl, mirroring the elegant duality between primal and dual SDPs. The new family of spectral bundle methods also achieves linear convergence rates for primal feasibility, dual feasibility, and duality gap when the algorithm captures $\textit{the rank of the dual solutions}$. Therefore, the original spectral bundle method by Helmberg and Rendl is well-suited for SDPs with $\textit{low-rank primal solutions}$, while on the other hand, our new spectral bundle method works well for SDPs with $\textit{low-rank dual solutions}$. These theoretical findings are supported by a range of large-scale numerical experiments. Finally, we demonstrate that our new spectral bundle method achieves state-of-the-art efficiency and scalability for solving polynomial optimization compared to a set of baseline solvers $\textsf{SDPT3}$, $\textsf{MOSEK}$, $\textsf{CDCS}$, and $\textsf{SDPNAL+}$.
comment: 57 pages, 4 figures, and 4 tables
Polynomial Chaos-based Input Shaper Design under Time-Varying Uncertainty
The work presented here investigates the application of polynomial chaos expansion toward input shaper design in order to maintain robustness in dynamical systems subject to uncertainty. Furthermore, this work intends to specifically address time-varying uncertainty by employing intrusive polynomial chaos expansion. The methodology presented is validated through numerical simulation of intrusive polynomial chaos expansion formulation applied to spring mass system experiencing time-varying uncertainty in the spring stiffness. The system also evaluates non-robust and robust input shapers through the framework in order to identify designs that minimize residual energy. Results indicate that vibration mitigation is achieved at a similar accuracy, yet at higher efficiency compared to a Monte Carlo framework.
Error bounds, PL condition, and quadratic growth for weakly convex functions, and linear convergences of proximal point methods
Many practical optimization problems lack strong convexity. Fortunately, recent studies have revealed that first-order algorithms also enjoy linear convergences under various weaker regularity conditions. While the relationship among different conditions for convex and smooth functions is well-understood, it is not the case for the nonsmooth setting. In this paper, we go beyond convexity and smoothness, and clarify the connections among common regularity conditions in the class of weakly convex functions, including $\textit{strong convexity}$, $\textit{restricted secant inequality}$, $\textit{subdifferential error bound}$, $\textit{Polyak-Łojasiewicz inequality}$, and $\textit{quadratic growth}$. In addition, using these regularity conditions, we present a simple and modular proof for the linear convergence of the proximal point method (PPM) for convex and weakly convex optimization problems. The linear convergence also holds when the subproblems of PPM are solved inexactly with a proper control of inexactness.
comment: 34 pages, 3 figures, and 1 table
A Simultaneous ECG-PCG Acquisition System with Real-Time Burst-Adaptive Noise Cancellation
Cardiac auscultation is an essential clinical skill, requiring excellent hearing to distinguish subtle differences in timing and pitch of heart sounds. However, diagnosing solely from these sounds is often challenging due to interference from surrounding noise, and the information may be limited. Most of the existing solutions that adaptively cancel external noise are either non-real-time or computationally intensive, making them unsuitable for implementation in a portable system. This work proposes an end-to-end system with a real-time adaptive noise cancellation pipeline integrated into a device that simultaneously acquires electrocardiogram (ECG) and phonocardiogram (PCG) signals. We employ a burst adaptive normalized least mean square algorithm that adjusts its adaptation in response to high-energy, non-stationary hospital noise. The algorithm's performance was initially assessed using datasets with artificially induced noise. Subsequently, the complete end-to-end system was validated using real-world hospital recordings captured with the dual-modality device. For PCG and ECG signals recorded from the device in noisy hospital settings, the proposed system achieved signal-to-noise ratio improvements of 37.01 dB and 30.32 dB, respectively. Furthermore, complexity analysis confirms the pipeline's suitability for embedded implementation. These results demonstrate the system's effectiveness in enabling reliable and accessible cardiac screening in noisy hospital environments typical of resource-constrained settings.
comment: Paper submitted to the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2026)
Robotics
Flow Policy Gradients for Robot Control
Likelihood-based policy gradient methods are the dominant approach for training robot control policies from rewards. These methods rely on differentiable action likelihoods, which constrain policy outputs to simple distributions like Gaussians. In this work, we show how flow matching policy gradients -- a recent framework that bypasses likelihood computation -- can be made effective for training and fine-tuning more expressive policies in challenging robot control settings. We introduce an improved objective that enables success in legged locomotion, humanoid motion tracking, and manipulation tasks, as well as robust sim-to-real transfer on two humanoid robots. We then present ablations and analysis on training dynamics. Results show how policies can exploit the flow representation for exploration when training from scratch, as well as improved fine-tuning robustness over baselines.
comment: Project webpage: https://hongsukchoi.github.io/fpo-control
HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos
Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction data or the need for meticulous, task-specific reward engineering, which limits their scalability. To narrow this gap, we present HumanX, a full-stack framework that compiles human video into generalizable, real-world interaction skills for humanoids, without task-specific rewards. HumanX integrates two co-designed components: XGen, a data generation pipeline that synthesizes diverse and physically plausible robot interaction data from video while supporting scalable data augmentation; and XMimic, a unified imitation learning framework that learns generalizable interaction skills. Evaluated across five distinct domains--basketball, football, badminton, cargo pickup, and reactive fighting--HumanX successfully acquires 10 different skills and transfers them zero-shot to a physical Unitree G1 humanoid. The learned capabilities include complex maneuvers such as pump-fake turnaround fadeaway jumpshots without any external perception, as well as interactive tasks like sustained human-robot passing sequences over 10 consecutive cycles--learned from a single video demonstration. Our experiments show that HumanX achieves over 8 times higher generalization success than prior methods, demonstrating a scalable and task-agnostic pathway for learning versatile, real-world robot interactive skills.
TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments
Robots in dynamic, human-centric environments must follow language instructions while maintaining real-time reactive control. Vision-language-action (VLA) models offer a promising framework, but they assume temporally aligned reasoning and control, despite semantic inference being inherently delayed relative to real-time action. We introduce Think-in-Control (TIC)-VLA, a latency-aware framework that explicitly models delayed semantic reasoning during action generation. TIC-VLA defines a delayed semantic-control interface that conditions action generation on delayed vision-language semantic states and explicit latency metadata, in addition to current observations, enabling policies to compensate for asynchronous reasoning. We further propose a latency-consistent training pipeline that injects reasoning inference delays during imitation learning and online reinforcement learning, aligning training with asynchronous deployment. To support realistic evaluation, we present DynaNav, a physics-accurate, photo-realistic simulation suite for language-guided navigation in dynamic environments. Extensive experiments in simulation and on a real robot show that TIC-VLA consistently outperforms prior VLA models while maintaining robust real-time control under multi-second reasoning latency. Project website: https://ucla-mobility.github.io/TIC-VLA/
Relationship-Aware Hierarchical 3D Scene Graph for Task Reasoning ICRA 2026
Representing and understanding 3D environments in a structured manner is crucial for autonomous agents to navigate and reason about their surroundings. While traditional Simultaneous Localization and Mapping (SLAM) methods generate metric reconstructions and can be extended to metric-semantic mapping, they lack a higher level of abstraction and relational reasoning. To address this gap, 3D scene graphs have emerged as a powerful representation for capturing hierarchical structures and object relationships. In this work, we propose an enhanced hierarchical 3D scene graph that integrates open-vocabulary features across multiple abstraction levels and supports object-relational reasoning. Our approach leverages a Vision Language Model (VLM) to infer semantic relationships. Notably, we introduce a task reasoning module that combines Large Language Models (LLM) and a VLM to interpret the scene graph's semantic and relational information, enabling agents to reason about tasks and interact with their environment more intelligently. We validate our method by deploying it on a quadruped robot in multiple environments and tasks, highlighting its ability to reason about them.
comment: ICRA 2026, 8 pages
World-Gymnast: Training Robots with Reinforcement Learning in a World Model
Robot learning from interacting with the physical world is fundamentally bottlenecked by the cost of physical interaction. The two alternatives, supervised finetuning (SFT) from expert demonstrations and reinforcement learning (RL) in a software-based simulator, are limited by the amount of expert data available and the sim-to-real gap for manipulation. With the recent emergence of world models learned from real-world video-action data, we ask the question of whether training a policy in a world model can be more effective than supervised learning or software simulation in achieving better real-robot performance. We propose World-Gymnast, which performs RL finetuning of a vision-language-action (VLA) policy by rolling out the policy in an action-conditioned video world model and rewarding the rollouts with a vision-language model (VLM). On the Bridge robot setup, World-Gymnast outperforms SFT by as much as 18x and outperforms software simulator by as much as 2x. More importantly, World-Gymnast demonstrates intriguing capabilities of RL with a world model, including training on diverse language instructions and novel scenes from the world model, test-time training in a novel scene, and online iterative world model and policy improvement. Our results suggest learning a world model and training robot policies in the cloud could be the key to bridging the gap between robots that work in demonstrations and robots that can work in anyone's household.
comment: https://world-gymnast.github.io/
3D Foundation Model-Based Loop Closing for Decentralized Collaborative SLAM
Decentralized Collaborative Simultaneous Localization And Mapping (C-SLAM) techniques often struggle to identify map overlaps due to significant viewpoint variations among robots. Motivated by recent advancements in 3D foundation models, which can register images despite large viewpoint differences, we propose a robust loop closing approach that leverages these models to establish inter-robot measurements. In contrast to resource-intensive methods requiring full 3D reconstruction within a centralized map, our approach integrates foundation models into existing SLAM pipelines, yielding scalable and robust multi-robot mapping. Our contributions include: (1) integrating 3D foundation models to reliably estimate relative poses from monocular image pairs within decentralized C-SLAM; (2) introducing robust outlier mitigation techniques critical to the use of these relative poses; and (3) developing specialized pose graph optimization formulations that efficiently resolve scale ambiguities. We evaluate our method against state-of-the-art approaches, demonstrating improvements in localization and mapping accuracy, alongside significant gains in computational and memory efficiency. These results highlight the potential of our approach for deployment in large-scale multi-robot scenarios.
Multi-Agent Monte Carlo Tree Search for Makespan-Efficient Object Rearrangement in Cluttered Spaces
Object rearrangement planning in complex, cluttered environments is a common challenge in warehouses, households, and rescue sites. Prior studies largely address monotone instances, whereas real-world tasks are often non-monotone-objects block one another and must be temporarily relocated to intermediate positions before reaching their final goals. In such settings, effective multi-agent collaboration can substantially reduce the time required to complete tasks. This paper introduces Centralized, Asynchronous, Multi-agent Monte Carlo Tree Search (CAM-MCTS), a novel framework for general-purpose makespan-efficient object rearrangement planning in challenging environments. CAM-MCTS combines centralized task assignment-where agents remain aware of each other's intended actions to facilitate globally optimized planning-with an asynchronous task execution strategy that enables agents to take on new tasks at appropriate time steps, rather than waiting for others, guided by a one-step look-ahead cost estimate. This design minimizes idle time, prevents unnecessary synchronization delays, and enhances overall system efficiency. We evaluate CAM-MCTS across a diverse set of monotone and non-monotone tasks in cluttered environments, demonstrating consistent reductions in makespan compared to strong baselines. Finally, we validate our approach on a real-world multi-agent system under different configurations, further confirming its effectiveness and robustness.
SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation
Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Existing simulators rely on predefined physics or data-driven dynamics without robot-conditioned control, limiting accuracy, stability, and generalization. This paper presents SoMA, a 3D Gaussian Splat simulator for soft-body manipulation. SoMA couples deformable dynamics, environmental forces, and robot joint actions in a unified latent neural space for end-to-end real-to-sim simulation. Modeling interactions over learned Gaussian splats enables controllable, stable long-horizon manipulation and generalization beyond observed trajectories without predefined physical models. SoMA improves resimulation accuracy and generalization on real-world robot manipulation by 20%, enabling stable simulation of complex tasks such as long-horizon cloth folding.
comment: Project page: https://city-super.github.io/SoMA/
PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning
Robotic imitation learning typically requires models that capture multimodal action distributions while operating at real-time control rates and accommodating multiple sensing modalities. Although recent generative approaches such as diffusion models, flow matching, and Implicit Maximum Likelihood Estimation (IMLE) have achieved promising results, they often satisfy only a subset of these requirements. To address this, we introduce PRISM, a single-pass policy based on a batch-global rejection-sampling variant of IMLE. PRISM couples a temporal multisensory encoder (integrating RGB, depth, tactile, audio, and proprioception) with a linear-attention generator using a Performer architecture. We demonstrate the efficacy of PRISM on a diverse real-world hardware suite, including loco-manipulation using a Unitree Go2 with a 7-DoF arm D1 and tabletop manipulation with a UR5 manipulator. Across challenging physical tasks such as pre-manipulation parking, high-precision insertion, and multi-object pick-and-place, PRISM outperforms state-of-the-art diffusion policies by 10-25% in success rate while maintaining high-frequency (30-50 Hz) closed-loop control. We further validate our approach on large-scale simulation benchmarks, including CALVIN, MetaWorld, and Robomimic. In CALVIN (10% data split), PRISM improves success rates by approximately 25% over diffusion and approximately 20% over flow matching, while simultaneously reducing trajectory jerk by 20x-50x. These results position PRISM as a fast, accurate, and multisensory imitation policy that retains multimodal action coverage without the latency of iterative sampling.
comment: 10 pages main text and 4 figures, and 11 pages appendix and 10 figures, total 21 pages and 14 figures
Mapping-Guided Task Discovery and Allocation for Robotic Inspection of Underwater Structures ICRA
Task generation for underwater multi-robot inspections without prior knowledge of existing geometry can be achieved and optimized through examination of simultaneous localization and mapping (SLAM) data. By considering hardware parameters and environmental conditions, a set of tasks is generated from SLAM meshes and optimized through expected keypoint scores and distance-based pruning. In-water tests are used to demonstrate the effectiveness of the algorithm and determine the appropriate parameters. These results are compared to simulated Voronoi partitions and boustrophedon patterns for inspection coverage on a model of the test environment. The key benefits of the presented task discovery method include adaptability to unexpected geometry and distributions that maintain coverage while focusing on areas more likely to present defects or damage.
comment: This paper will appear in the proceedings of the 2026 IEEE International Conference on Robotics and Automation (ICRA)
TTT-Parkour: Rapid Test-Time Training for Perceptive Robot Parkour
Achieving highly dynamic humanoid parkour on unseen, complex terrains remains a challenge in robotics. Although general locomotion policies demonstrate capabilities across broad terrain distributions, they often struggle with arbitrary and highly challenging environments. To overcome this limitation, we propose a real-to-sim-to-real framework that leverages rapid test-time training (TTT) on novel terrains, significantly enhancing the robot's capability to traverse extremely difficult geometries. We adopt a two-stage end-to-end learning paradigm: a policy is first pre-trained on diverse procedurally generated terrains, followed by rapid fine-tuning on high-fidelity meshes reconstructed from real-world captures. Specifically, we develop a feed-forward, efficient, and high-fidelity geometry reconstruction pipeline using RGB-D inputs, ensuring both speed and quality during test-time training. We demonstrate that TTT-Parkour empowers humanoid robots to master complex obstacles, including wedges, stakes, boxes, trapezoids, and narrow beams. The whole pipeline of capturing, reconstructing, and test-time training requires less than 10 minutes on most tested terrains. Extensive experiments show that the policy after test-time training exhibits robust zero-shot sim-to-real transfer capability.
comment: Project Page: https://ttt-parkour.github.io/
Before Autonomy Takes Control: Software Testing in Robotics
Robotic systems are complex and safety-critical software systems. As such, they need to be tested thoroughly. Unfortunately, robot software is intrinsically hard to test compared to traditional software, mainly since the software needs to closely interact with hardware, account for uncertainty in its operational environment, handle disturbances, and act highly autonomously. However, given the large space in which robots operate, anticipating possible failures when designing tests is challenging. This paper presents a mapping study by considering robotics testing papers and relating them to the software testing theory. We consider 247 robotics testing papers and map them to software testing, discussing the state-of-the-art software testing in robotics with an illustrated example, and discuss current challenges. Forming the basis to introduce both the robotics and software engineering communities to software testing challenges. Finally, we identify open questions and lessons learned.
Bridging the Sim-to-Real Gap with multipanda ros2: A Real-Time ROS2 Framework for Multimanual Systems
We present $multipanda\_ros2$, a novel open-source ROS2 architecture for multi-robot control of Franka Robotics robots. Leveraging ros2 control, this framework provides native ROS2 interfaces for controlling any number of robots from a single process. Our core contributions address key challenges in real-time torque control, including interaction control and robot-environment modeling. A central focus of this work is sustaining a 1kHz control frequency, a necessity for real-time control and a minimum frequency required by safety standards. Moreover, we introduce a controllet-feature design pattern that enables controller-switching delays of $\le 2$ ms, facilitating reproducible benchmarking and complex multi-robot interaction scenarios. To bridge the simulation-to-reality (sim2real) gap, we integrate a high-fidelity MuJoCo simulation with quantitative metrics for both kinematic accuracy and dynamic consistency (torques, forces, and control errors). Furthermore, we demonstrate that real-world inertial parameter identification can significantly improve force and torque accuracy, providing a methodology for iterative physics refinement. Our work extends approaches from soft robotics to rigid dual-arm, contact-rich tasks, showcasing a promising method to reduce the sim2real gap and providing a robust, reproducible platform for advanced robotics research.
comment: This work has been submitted to the IEEE for possible publication
Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RL
Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomous systems encounter environmental changes in system dynamics, sensor drift, or task objectives, fixed policies rapidly degrade in performance. We show that employing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible algorithm for online adaptation, can effectively fine-tune a pretrained policy to improve autonomous agents' performance on driving tasks. We further show that RTRRL synergizes with a recent biologically inspired recurrent network model, the Liquid-Resistance Liquid-Capacitance RNN. We demonstrate the effectiveness of this closed-loop approach in a simulated CarRacing environment and in a real-world line-following task with a RoboRacer car equipped with an event camera.
LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation
The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a multi-granularity, open-vocabulary goal navigation task where agents interpret natural language instructions to reach targets at four semantic levels: scene, room, region, and instance. To this end, we present Language as a Map (LangMap), a large-scale benchmark built on real-world 3D indoor scans with comprehensive human-verified annotations and tasks spanning these levels. LangMap provides region labels, discriminative region descriptions, discriminative instance descriptions covering 414 object categories, and over 18K navigation tasks. Each target features both concise and detailed descriptions, enabling evaluation across different instruction styles. LangMap achieves superior annotation quality, outperforming GOAT-Bench by 23.8% in discriminative accuracy using four times fewer words. Comprehensive evaluations of zero-shot and supervised models on LangMap reveal that richer context and memory improve success, while long-tailed, small, context-dependent, and distant goals, as well as multi-goal completion, remain challenging. HieraNav and LangMap establish a rigorous testbed for advancing language-driven embodied navigation. Project: https://bo-miao.github.io/LangMap
Extending the Law of Intersegmental Coordination: Implications for Powered Prosthetic Controls
Powered prostheses are capable of providing net positive work to amputees and have advanced in the past two decades. However, reducing amputee metabolic cost of walking remains an open problem. The Law of Intersegmental Coordination (ISC) has been observed across gaits and has been previously implicated in energy expenditure of walking, yet it has rarely been analyzed or applied within the context of lower-limb amputee gait. This law states that the elevation angles of the thigh, shank and foot over the gait cycle are not independent. In this work, we developed a method to analyze intersegmental coordination for lower-limb 3D kinematic data, to simplify ISC analysis. Moreover, inspired by motor control, biomechanics and robotics literature, we used our method to broaden ISC toward a new law of coordination of moments. We find these Elevation Space Moments (ESM), and present results showing a moment-based coordination for able bodied gait. We also analyzed ISC for amputee gait walking with powered and passive prosthesis, and found that while elevation angles remained planar, the ESM showed less coordination. We use ISC as a constraint to predict the shank angles/moments that would compensate for alterations due to a passive foot so as to mimic a healthy thigh angle/moment profile. This may have implications for improving powered prosthetic control. We developed the ISC3d toolbox that is freely available online, which may be used to compute kinematic and kinetic ISC in 3D. This provides a means to further study the role of coordination in gait and may help address fundamental questions of the neural control of human movement.
comment: Submitted to 2026 IEEE International Conference on Biomedical Robotics and Biomechatronics (BioRob)
Real-Time 2D LiDAR Object Detection Using Three-Frame RGB Scan Encoding
Indoor service robots need perception that is robust, more privacy-friendly than RGB video, and feasible on embedded hardware. We present a camera-free 2D LiDAR object detection pipeline that encodes short-term temporal context by stacking three consecutive scans as RGB channels, yielding a compact YOLOv8n input without occupancy-grid construction while preserving angular structure and motion cues. Evaluated in Webots across 160 randomized indoor scenarios with strict scenario-level holdout, the method achieves 98.4% mAP@0.5 (0.778 mAP@0.5:0.95) with 94.9% precision and 94.7% recall on four object classes. On a Raspberry Pi 5, it runs in real time with a mean post-warm-up end-to-end latency of 47.8ms per frame, including scan encoding and postprocessing. Relative to a closely related occupancy-grid LiDAR-YOLO pipeline reported on the same platform, the proposed representation is associated with substantially lower reported end-to-end latency. Although results are simulation-based, they suggest that lightweight temporal encoding can enable accurate and real-time LiDAR-only detection for embedded indoor robotics without capturing RGB appearance.
comment: 6 pages, 6 figures, submitted to IEEE SAS 2026
FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation
Force sensing is a crucial modality for Vision-Language-Action (VLA) frameworks, as it enables fine-grained perception and dexterous manipulation in contact-rich tasks. We present Force-Distilled VLA (FD-VLA), a novel framework that integrates force awareness into contact-rich manipulation without relying on physical force sensors. The core of our approach is a Force Distillation Module (FDM), which distills force by mapping a learnable query token, conditioned on visual observations and robot states, into a predicted force token aligned with the latent representation of actual force signals. During inference, this distilled force token is injected into the pretrained VLM, enabling force-aware reasoning while preserving the integrity of its vision-language semantics. This design provides two key benefits: first, it allows practical deployment across a wide range of robots that lack expensive or fragile force-torque sensors, thereby reducing hardware cost and complexity; second, the FDM introduces an additional force-vision-state fusion prior to the VLM, which improves cross-modal alignment and enhances perception-action robustness in contact-rich scenarios. Surprisingly, our physical experiments show that the distilled force token outperforms direct sensor force measurements as well as other baselines, which highlights the effectiveness of this force-distilled VLA approach.
Frictional Contact Solving for Material Point Method
Accurately handling contact with friction remains a core bottleneck for Material Point Method (MPM), from reliable contact point detection to enforcing frictional contact laws (non-penetration, Coulomb friction, and maximum dissipation principle). In this paper, we introduce a frictional-contact pipeline for implicit MPM that is both precise and robust. During the collision detection phase, contact points are localized with particle-centric geometric primitives; during the contact resolution phase, we cast frictional contact as a Nonlinear Complementarity Problem (NCP) over contact impulses and solve it with an Alternating Direction Method of Multipliers (ADMM) scheme. Crucially, the formulation reuses the same implicit MPM linearization, yielding efficiency and numerical stability. The method integrates seamlessly into the implicit MPM loop and is agnostic to modeling choices, including material laws, interpolation functions, and transfer schemes. We evaluate it across seven representative scenes that span elastic and elasto-plastic responses, simple and complex deformable geometries, and a wide range of contact conditions. Overall, the proposed method enables accurate contact localization, reliable frictional handling, and broad generality, making it a practical solution for MPM-based simulations in robotics and related domains.
Bandwidth-Efficient Multi-Agent Communication through Information Bottleneck and Vector Quantization ICRA 2026
Multi-agent reinforcement learning systems deployed in real-world robotics applications face severe communication constraints that significantly impact coordination effectiveness. We present a framework that combines information bottleneck theory with vector quantization to enable selective, bandwidth-efficient communication in multi-agent environments. Our approach learns to compress and discretize communication messages while preserving task-critical information through principled information-theoretic optimization. We introduce a gated communication mechanism that dynamically determines when communication is necessary based on environmental context and agent states. Experimental evaluation on challenging coordination tasks demonstrates that our method achieves 181.8% performance improvement over no-communication baselines while reducing bandwidth usage by 41.4%. Comprehensive Pareto frontier analysis shows dominance across the entire success-bandwidth spectrum with area-under-curve of 0.198 vs 0.142 for next-best methods. Our approach significantly outperforms existing communication strategies and establishes a theoretically grounded framework for deploying multi-agent systems in bandwidth-constrained environments such as robotic swarms, autonomous vehicle fleets, and distributed sensor networks.
comment: Accepted at the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026), Vienna, Austria. 9 pages, 4 figures, 6 tables
Synchronized Online Friction Estimation and Adaptive Grasp Control for Robust Gentle Grasp
We introduce a unified framework for gentle robotic grasping that synergistically couples real-time friction estimation with adaptive grasp control. We propose a new particle filter-based method for real-time estimation of the friction coefficient using vision-based tactile sensors. This estimate is seamlessly integrated into a reactive controller that dynamically modulates grasp force to maintain a stable grip. The two processes operate synchronously in a closed-loop: the controller uses the current best estimate to adjust the force, while new tactile feedback from this action continuously refines the estimation. This creates a highly responsive and robust sensorimotor cycle. The reliability and efficiency of the complete framework are validated through extensive robotic experiments.
Reformulating AI-based Multi-Object Relative State Estimation for Aleatoric Uncertainty-based Outlier Rejection of Partial Measurements ICRA 2026
Precise localization with respect to a set of objects of interest enables mobile robots to perform various tasks. With the rise of edge devices capable of deploying deep neural networks (DNNs) for real-time inference, it stands to reason to use artificial intelligence (AI) for the extraction of object-specific, semantic information from raw image data, such as the object class and the relative six degrees of freedom (6-DoF) pose. However, fusing such AI-based measurements in an Extended Kalman Filter (EKF) requires quantifying the DNNs' uncertainty and outlier rejection capabilities. This paper presents the benefits of reformulating the measurement equation in AI-based, object-relative state estimation. By deriving an EKF using the direct object-relative pose measurement, we can decouple the position and rotation measurements, thus limiting the influence of erroneous rotation measurements and allowing partial measurement rejection. Furthermore, we investigate the performance and consistency improvements for state estimators provided by replacing the fixed measurement covariance matrix of the 6-DoF object-relative pose measurements with the predicted aleatoric uncertainty of the DNN.
comment: Accepted for publication at ICRA 2026, Vienna, Austria
A Unified Control Architecture for Macro-Micro Manipulation using a Active Remote Center of Compliance for Manufacturing Applications
Macro-micro manipulators combine a macro manipulator with a large workspace, such as an industrial robot, with a lightweight, high-bandwidth micro manipulator. This enables highly dynamic interaction control while preserving the wide workspace of the robot. Traditionally, position control is assigned to the macro manipulator, while the micro manipulator handles the interaction with the environment, limiting the achievable interaction control bandwidth. To solve this, we propose a novel control architecture that incorporates the macro manipulator into the active interaction control. This leads to a increase in control bandwidth by a factor of 2.1 compared to the state of the art architecture, based on the leader-follower approach and factor 12.5 compared to traditional robot-based force control. Further we propose surrogate models for a more efficient controller design and easy adaptation to hardware changes. We validate our approach by comparing it against the other control schemes in different experiments, like collision with an object, following a force trajectory and industrial assembly tasks.
comment: 17 pages, 14 figures, submitted to Robotics and Computer-Integrated Manufacturing (RCIM)
Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy ICRA 2026
Recently, active vision has reemerged as an important concept for manipulation, since visual occlusion occurs more frequently when main cameras are mounted on the robot heads. We reflect on the visual occlusion issue and identify its essence as the absence of information useful for task completion. Inspired by this, we come up with the more fundamental problem of Exploratory and Focused Manipulation (EFM). The proposed problem is about actively collecting information to complete challenging manipulation tasks that require exploration or focus. As an initial attempt to address this problem, we establish the EFM-10 benchmark that consists of 4 categories of tasks that align with our definition (10 tasks in total). We further come up with a Bimanual Active Perception (BAP) strategy, which leverages one arm to provide active vision and another arm to provide force sensing while manipulating. Based on this idea, we collect a dataset named BAPData for the tasks in EFM-10. With the dataset, we successfully verify the effectiveness of the BAP strategy in an imitation learning manner. We hope that the EFM-10 benchmark along with the BAP strategy can become a cornerstone that facilitates future research towards this direction. Project website: EFManipulation.github.io.
comment: ICRA 2026
LIEREx: Language-Image Embeddings for Robotic Exploration
Semantic maps allow a robot to reason about its surroundings to fulfill tasks such as navigating known environments, finding specific objects, and exploring unmapped areas. Traditional mapping approaches provide accurate geometric representations but are often constrained by pre-designed symbolic vocabularies. The reliance on fixed object classes makes it impractical to handle out-of-distribution knowledge not defined at design time. Recent advances in Vision-Language Foundation Models, such as CLIP, enable open-set mapping, where objects are encoded as high-dimensional embeddings rather than fixed labels. In LIEREx, we integrate these VLFMs with established 3D Semantic Scene Graphs to enable target-directed exploration by an autonomous agent in partially unknown environments.
comment: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in KI - Künstliche Intelligenz, and is available online at https://doi.org/10.1007/s13218-026-00902-6
ForSim: Stepwise Forward Simulation for Traffic Policy Fine-Tuning ICRA 2026
As the foundation of closed-loop training and evaluation in autonomous driving, traffic simulation still faces two fundamental challenges: covariate shift introduced by open-loop imitation learning and limited capacity to reflect the multimodal behaviors observed in real-world traffic. Although recent frameworks such as RIFT have partially addressed these issues through group-relative optimization, their forward simulation procedures remain largely non-reactive, leading to unrealistic agent interactions within the virtual domain and ultimately limiting simulation fidelity. To address these issues, we propose ForSim, a stepwise closed-loop forward simulation paradigm. At each virtual timestep, the traffic agent propagates the virtual candidate trajectory that best spatiotemporally matches the reference trajectory through physically grounded motion dynamics, thereby preserving multimodal behavioral diversity while ensuring intra-modality consistency. Other agents are updated with stepwise predictions, yielding coherent and interaction-aware evolution. When incorporated into the RIFT traffic simulation framework, ForSim operates in conjunction with group-relative optimization to fine-tune traffic policy. Extensive experiments confirm that this integration consistently improves safety while maintaining efficiency, realism, and comfort. These results underscore the importance of modeling closed-loop multimodal interactions within forward simulation and enhance the fidelity and reliability of traffic simulation for autonomous driving. Project Page: https://currychen77.github.io/ForSim/
comment: Accepted by ICRA 2026
Multi-Task Learning for Robot Perception with Imbalanced Data
Multi-task problem solving has been shown to improve the accuracy of the individual tasks, which is an important feature for robots, as they have a limited resource. However, when the number of labels for each task is not equal, namely imbalanced data exist, a problem may arise due to insufficient number of samples, and labeling is not very easy for mobile robots in every environment. We propose a method that can learn tasks even in the absence of the ground truth labels for some of the tasks. We also provide a detailed analysis of the proposed method. An interesting finding is related to the interaction of the tasks. We show a methodology to find out which tasks can improve the performance of other tasks. We investigate this by training the teacher network with the task outputs such as depth as inputs. We further provide empirical evidence when trained with a small amount of data. We use semantic segmentation and depth estimation tasks on different datasets, NYUDv2 and Cityscapes.
comment: 16 pages
Path Tracking with Dynamic Control Point Blending for Autonomous Vehicles: An Experimental Study
This paper presents an experimental study of a path-tracking framework for autonomous vehicles in which the lateral control command is applied to a dynamic control point along the wheelbase. Instead of enforcing a fixed reference at either the front or rear axle, the proposed method continuously interpolates between both, enabling smooth adaptation across driving contexts, including low-speed maneuvers and reverse motion. The lateral steering command is obtained by barycentric blending of two complementary controllers: a front-axle Stanley formulation and a rear-axle curvature-based geometric controller, yielding continuous transitions in steering behavior and improved tracking stability. In addition, we introduce a curvature-aware longitudinal control strategy based on virtual track borders and ray-tracing, which converts upcoming geometric constraints into a virtual obstacle distance and regulates speed accordingly. The complete approach is implemented in a unified control stack and validated in simulation and on a real autonomous vehicle equipped with GPS-RTK, radar, odometry, and IMU. The results in closed-loop tracking and backward maneuvers show improved trajectory accuracy, smoother steering profiles, and increased adaptability compared to fixed control-point baselines.
Multimodal Large Language Models for Real-Time Situated Reasoning
In this work, we explore how multimodal large language models can support real-time context- and value-aware decision-making. To do so, we combine the GPT-4o language model with a TurtleBot 4 platform simulating a smart vacuum cleaning robot in a home. The model evaluates the environment through vision input and determines whether it is appropriate to initiate cleaning. The system highlights the ability of these models to reason about domestic activities, social norms, and user preferences and take nuanced decisions aligned with the values of the people involved, such as cleanliness, comfort, and safety. We demonstrate the system in a realistic home environment, showing its ability to infer context and values from limited visual input. Our results highlight the promise of multimodal large language models in enhancing robotic autonomy and situational awareness, while also underscoring challenges related to consistency, bias, and real-time performance.
comment: Submitted to the interactivity track of the 21st ACM/IEEE International Conference on Human-Robot Interaction on December 2025, accepted January 2026
BTGenBot-2: Efficient Behavior Tree Generation with Small Language Models
Recent advances in robot learning increasingly rely on LLM-based task planning, leveraging their ability to bridge natural language with executable actions. While prior works showcased great performances, the widespread adoption of these models in robotics has been challenging as 1) existing methods are often closed-source or computationally intensive, neglecting the actual deployment on real-world physical systems, and 2) there is no universally accepted, plug-and-play representation for robotic task generation. Addressing these challenges, we propose BTGenBot-2, a 1B-parameter open-source small language model that directly converts natural language task descriptions and a list of robot action primitives into executable behavior trees in XML. Unlike prior approaches, BTGenBot-2 enables zero-shot BT generation, error recovery at inference and runtime, while remaining lightweight enough for resource-constrained robots. We further introduce the first standardized benchmark for LLM-based BT generation, covering 52 navigation and manipulation tasks in NVIDIA Isaac Sim. Extensive evaluations demonstrate that BTGenBot-2 consistently outperforms GPT-5, Claude Opus 4.1, and larger open-source models across both functional and non-functional metrics, achieving average success rates of 90.38% in zero-shot and 98.07% in one-shot, while delivering up to 16x faster inference compared to the previous BTGenBot.
Vision-only UAV State Estimation for Fast Flights Without External Localization Systems: A2RL Drone Racing Finalist Approach
Fast flights with aggressive maneuvers in cluttered GNSS-denied environments require fast, reliable, and accurate UAV state estimation. In this paper, we present an approach for onboard state estimation of a high-speed UAV using a monocular RGB camera and an IMU. Our approach fuses data from Visual-Inertial Odometry (VIO), an onboard landmark-based camera measurement system, and an IMU to produce an accurate state estimate. Using onboard measurement data, we estimate and compensate for VIO drift through a novel mathematical drift model. State-of-the-art approaches often rely on more complex hardware (e.g., stereo cameras or rangefinders) and use uncorrected drifting VIO velocities, orientation, and angular rates, leading to errors during fast maneuvers. In contrast, our method corrects all VIO states (position, orientation, linear and angular velocity), resulting in accurate state estimation even during rapid and dynamic motion. Our approach was thoroughly validated through 1600 simulations and numerous real-world experiments. Furthermore, we applied the proposed method in the A2RL Drone Racing Challenge 2025, where our team advanced to the final four out of 210 teams and earned a medal.
comment: Visit our webpage for more details: https://mrs.fel.cvut.cz/papers/vision-only-uav-state-estimation
Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models
Vision Language Action (VLA) models close the perception action loop by translating multimodal instructions into executable behaviors, but this very capability magnifies safety risks: jailbreaks that merely yield toxic text in LLMs can trigger unsafe physical actions in embodied systems. Existing defenses alignment, filtering, or prompt hardening intervene too late or at the wrong modality, leaving fused representations exploitable. We introduce a concept-based dictionary learning framework for inference-time safety control. By constructing sparse, interpretable dictionaries from hidden activations, our method identifies harmful concept directions and applies threshold-based interventions to suppress or block unsafe activations. Experiments on Libero-Harm, BadRobot, RoboPair, and IS-Bench show that our approach achieves state-of-the-art defense performance, cutting attack success rates by over 70\% while maintaining task success. Crucially, the framework is plug-in and model-agnostic, requiring no retraining and integrating seamlessly with diverse VLAs. To our knowledge, this is the first inference-time concept-based safety method for embodied systems, advancing both interpretability and safe deployment of VLA models.
From Knowing to Doing Precisely: A General Self-Correction and Termination Framework for VLA models ICASSP 2026
While vision-language-action (VLA) models for embodied agents integrate perception, reasoning, and control, they remain constrained by two critical weaknesses: first, during grasping tasks, the action tokens generated by the language model often exhibit subtle spatial deviations from the target object, resulting in grasp failures; second, they lack the ability to reliably recognize task completion, which leads to redundant actions and frequent timeout errors. To address these challenges and enhance robustness, we propose a lightweight, training-free framework, VLA-SCT. This framework operates as a self-correcting control loop, combining data-driven action refinement with conditional logic for termination. Consequently, compared to baseline approaches, our method achieves consistent improvements across all datasets in the LIBERO benchmark, significantly increasing the success rate of fine manipulation tasks and ensuring accurate task completion, thereby promoting the deployment of more reliable VLA agents in complex, unstructured environments.
comment: Accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)
RFS: Reinforcement learning with Residual flow steering for dexterous manipulation
Imitation learning has emerged as an effective approach for bootstrapping sequential decision-making in robotics, achieving strong performance even in high-dimensional dexterous manipulation tasks. Recent behavior cloning methods further leverage expressive generative models, such as diffusion models and flow matching, to represent multimodal action distributions. However, policies pretrained in this manner often exhibit limited generalization and require additional fine-tuning to achieve robust performance at deployment time. Such adaptation must preserve the global exploration benefits of pretraining while enabling rapid correction of local execution errors.We propose \emph{Residual Flow Steering} (RFS), a data-efficient reinforcement learning framework for adapting pretrained generative policies. RFS steers a pretrained flow-matching policy by jointly optimizing a residual action and a latent noise distribution, enabling complementary forms of exploration: local refinement through residual corrections and global exploration through latent-space modulation. This design allows efficient adaptation while retaining the expressive structure of the pretrained policy.We demonstrate the effectiveness of RFS on dexterous manipulation tasks, showing efficient fine-tuning both in simulation and in real-world settings when adapting pretrained base policies.Project website:https://weirdlabuw.github.io/rfs.
DDP-WM: Disentangled Dynamics Prediction for Efficient World Models
World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployment. To address this efficiency-performance bottleneck, we introduce DDP-WM, a novel world model centered on the principle of Disentangled Dynamics Prediction (DDP). We hypothesize that latent state evolution in observed scenes is heterogeneous and can be decomposed into sparse primary dynamics driven by physical interactions and secondary context-driven background updates. DDP-WM realizes this decomposition through an architecture that integrates efficient historical processing with dynamic localization to isolate primary dynamics. By employing a crossattention mechanism for background updates, the framework optimizes resource allocation and provides a smooth optimization landscape for planners. Extensive experiments demonstrate that DDP-WM achieves significant efficiency and performance across diverse tasks, including navigation, precise tabletop manipulation, and complex deformable or multi-body interactions. Specifically, on the challenging Push-T task, DDP-WM achieves an approximately 9 times inference speedup and improves the MPC success rate from 90% to98% compared to state-of-the-art dense models. The results establish a promising path for developing efficient, high-fidelity world models. Codes will be available at https://github.com/HCPLabSYSU/DDP-WM.
comment: Codes will be available at https://github.com/HCPLabSYSU/DDP-WM
Uncertainty-Aware Non-Prehensile Manipulation with Mobile Manipulators under Object-Induced Occlusion ICRA 2026
Non-prehensile manipulation using onboard sensing presents a fundamental challenge: the manipulated object occludes the sensor's field of view, creating occluded regions that can lead to collisions. We propose CURA-PPO, a reinforcement learning framework that addresses this challenge by explicitly modeling uncertainty under partial observability. By predicting collision possibility as a distribution, we extract both risk and uncertainty to guide the robot's actions. The uncertainty term encourages active perception, enabling simultaneous manipulation and information gathering to resolve occlusions. When combined with confidence maps that capture observation reliability, our approach enables safe navigation despite severe sensor occlusion. Extensive experiments across varying object sizes and obstacle configurations demonstrate that CURA-PPO achieves up to 3X higher success rates than the baselines, with learned behaviors that handle occlusions. Our method provides a practical solution for autonomous manipulation in cluttered environments using only onboard sensing.
comment: 8 pages, 7 figures, Accepted to ICRA 2026, Webpage: https://jiw0o.github.io/cura-ppo/
Tilt-Ropter: A Novel Hybrid Aerial and Terrestrial Vehicle with Tilt Rotors and Passive Wheels
In this work, we present Tilt-Ropter, a novel hybrid aerial-terrestrial vehicle (HATV) that combines tilt rotors with passive wheels to achieve energy-efficient multi-mode locomotion. Unlike existing under-actuated HATVs, the fully actuated design of Tilt-Ropter enables decoupled force and torque control, greatly enhancing its mobility and environmental adaptability. A nonlinear model predictive controller (NMPC) is developed to track reference trajectories and handle contact constraints across locomotion modes, while a dedicated control allocation module exploits actuation redundancy to achieve energy-efficient control of actuators. Additionally, to enhance robustness during ground contact, we introduce an external wrench estimation algorithm that estimates environmental interaction forces and torques in real time. The system is validated through both simulation and real-world experiments, including seamless air-ground transitions and trajectory tracking. Results show low tracking errors in both modes and highlight a 92.8% reduction in power consumption during ground locomotion, demonstrating the system's potential for long-duration missions across large-scale and energy-constrained environments.
comment: 8 pages, 10 figures
GSR: Learning Structured Reasoning for Embodied Manipulation
Despite rapid progress, embodied agents still struggle with long-horizon manipulation that requires maintaining spatial consistency, causal dependencies, and goal constraints. A key limitation of existing approaches is that task reasoning is implicitly embedded in high-dimensional latent representations, making it challenging to separate task structure from perceptual variability. We introduce Grounded Scene-graph Reasoning (GSR), a structured reasoning paradigm that explicitly models world-state evolution as transitions over semantically grounded scene graphs. By reasoning step-wise over object states and spatial relations, rather than directly mapping perception to actions, GSR enables explicit reasoning about action preconditions, consequences, and goal satisfaction in a physically grounded space. To support learning such reasoning, we construct Manip-Cognition-1.6M, a large-scale dataset that jointly supervises world understanding, action planning, and goal interpretation. Extensive evaluations across RLBench, LIBERO, GSR-benchmark, and real-world robotic tasks show that GSR significantly improves zero-shot generalization and long-horizon task completion over prompting-based baselines. These results highlight explicit world-state representations as a key inductive bias for scalable embodied reasoning.
Towards Autonomous Instrument Tray Assembly for Sterile Processing Applications
The Sterile Processing and Distribution (SPD) department is responsible for cleaning, disinfecting, inspecting, and assembling surgical instruments between surgeries. Manual inspection and preparation of instrument trays is a time-consuming, error-prone task, often prone to contamination and instrument breakage. In this work, we present a fully automated robotic system that sorts and structurally packs surgical instruments into sterile trays, focusing on automation of the SPD assembly stage. A custom dataset comprising 31 surgical instruments and 6,975 annotated images was collected to train a hybrid perception pipeline using YOLO12 for detection and a cascaded ResNet-based model for fine-grained classification. The system integrates a calibrated vision module, a 6-DOF Staubli TX2-60L robotic arm with a custom dual electromagnetic gripper, and a rule-based packing algorithm that reduces instrument collisions during transport. The packing framework uses 3D printed dividers and holders to physically isolate instruments, reducing collision and friction during transport. Experimental evaluations show high perception accuracy and statistically significant reduction in tool-to-tool collisions compared to human-assembled trays. This work serves as the scalable first step toward automating SPD workflows, improving safety, and consistency of surgical preparation while reducing SPD processing times.
comment: 7 pages, 9 figures, 2026 International Symposium on Medical Robotics
Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss
Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Classic bag-of-words approaches such as DBoW are efficient but often degrade under appearance change and perceptual aliasing. In parallel, deep learning-based visual place recognition (VPR) descriptors (e.g., NetVLAD and Transformer-based models) offer stronger robustness, but their computational cost is often viewed as a barrier to real-time SLAM. In this paper, we empirically evaluate NetVLAD as an LCD module and compare it against DBoW on the KITTI dataset. We introduce a Fine-Grained Top-K precision-recall curve that better reflects LCD settings where a query may have zero or multiple valid matches. With Faiss-accelerated nearestneighbor search, NetVLAD achieves real-time query speed while improving accuracy and robustness over DBoW, making it a practical drop-in alternative for LCD in SLAM.
AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act
Recent advances in large vision-language models (VLMs) have demonstrated generalizable open-vocabulary perception and reasoning, yet their real-robot manipulation capability remains unclear for long-horizon, closed-loop execution in unstructured, in-the-wild environments. Prior VLM-based manipulation pipelines are difficult to compare across different research groups' setups, and many evaluations rely on simulation, privileged state, or specially designed setups. We present AgenticLab, a model-agnostic robot agent platform and benchmark for open-world manipulation. AgenticLab provides a closed-loop agent pipeline for perception, task decomposition, online verification, and replanning. Using AgenticLab, we benchmark state-of-the-art VLM-based agents on real-robot tasks in unstructured environments. Our benchmark reveals several failure modes that offline vision-language tests (e.g., VQA and static image understanding) fail to capture, including breakdowns in multi-step grounding consistency, object grounding under occlusion and scene changes, and insufficient spatial reasoning for reliable manipulation. We will release the full hardware and software stack to support reproducible evaluation and accelerate research on general-purpose robot agents.
From Perception to Action: Spatial AI Agents and World Models
While large language models have become the prevailing approach for agentic reasoning and planning, their success in symbolic domains does not readily translate to the physical world. Spatial intelligence, the ability to perceive 3D structure, reason about object relationships, and act under physical constraints, is an orthogonal capability that proves important for embodied agents. Existing surveys address either agentic architectures or spatial domains in isolation. None provide a unified framework connecting these complementary capabilities. This paper bridges that gap. Through a thorough review of over 2,000 papers, citing 742 works from top-tier venues, we introduce a unified three-axis taxonomy connecting agentic capabilities with spatial tasks across scales. Crucially, we distinguish spatial grounding (metric understanding of geometry and physics) from symbolic grounding (associating images with text), arguing that perception alone does not confer agency. Our analysis reveals three key findings mapped to these axes: (1) hierarchical memory systems (Capability axis) are important for long-horizon spatial tasks. (2) GNN-LLM integration (Task axis) is a promising approach for structured spatial reasoning. (3) World models (Scale axis) are essential for safe deployment across micro-to-macro spatial scales. We conclude by identifying six grand challenges and outlining directions for future research, including the need for unified evaluation frameworks to standardize cross-domain assessment. This taxonomy provides a foundation for unifying fragmented research efforts and enabling the next generation of spatially-aware autonomous systems in robotics, autonomous vehicles, and geospatial intelligence.
comment: 61 pages, 742 citations, 1 figure, 3 tables. Survey paper on spatial AI agents, embodied AI, graph neural networks, and world models
A Closed-Form Geometric Retargeting Solver for Upper Body Humanoid Robot Teleoperation
Retargeting human motion to robot poses is a practical approach for teleoperating bimanual humanoid robot arms, but existing methods can be suboptimal and slow, often causing undesirable motion or latency. This is due to optimizing to match robot end-effector to human hand position and orientation, which can also limit the robot's workspace to that of the human. Instead, this paper reframes retargeting as an orientation alignment problem, enabling a closed-form, geometric solution algorithm with an optimality guarantee. The key idea is to align a robot arm to a human's upper and lower arm orientations, as identified from shoulder, elbow, and wrist (SEW) keypoints; hence, the method is called SEW-Mimic. The method has fast inference (3 kHz) on standard commercial CPUs, leaving computational overhead for downstream applications; an example in this paper is a safety filter to avoid bimanual self-collision. The method suits most 7-degree-of-freedom robot arms and humanoids, and is agnostic to input keypoint source. Experiments show that SEW-Mimic outperforms other retargeting methods in computation time and accuracy. A pilot user study suggests that the method improves teleoperation task success. Preliminary analysis indicates that data collected with SEW-Mimic improves policy learning due to being smoother. SEW-Mimic is also shown to be a drop-in way to accelerate full-body humanoid retargeting. Finally, hardware demonstrations illustrate SEW-Mimic's practicality. The results emphasize the utility of SEW-Mimic as a fundamental building block for bimanual robot manipulation and humanoid robot teleoperation.
comment: Project page at https://sew-mimic.com/
AdaptNC: Adaptive Nonconformity Scores for Uncertainty-Aware Autonomous Systems in Dynamic Environments
Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework for this task, yet its standard formulations rely on exchangeability assumptions that are violated by the distribution shifts inherent in real-world robotics. Existing online CP methods maintain target coverage by adaptively scaling the conformal threshold, but typically employ a static nonconformity score function. We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts. To address this, we propose \textbf{AdaptNC}, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold. AdaptNC leverages an adaptive reweighting scheme to optimize score functions, and introduces a replay buffer mechanism to mitigate the coverage instability that occurs during score transitions. We evaluate AdaptNC on diverse robotic benchmarks involving multi-agent policy changes, environmental changes and sensor degradation. Our results demonstrate that AdaptNC significantly reduces prediction region volume compared to state-of-the-art threshold-only baselines while maintaining target coverage levels.
Efficiently Solving Mixed-Hierarchy Games with Quasi-Policy Approximations
Multi-robot coordination often exhibits hierarchical structure, with some robots' decisions depending on the planned behaviors of others. While game theory provides a principled framework for such interactions, existing solvers struggle to handle mixed information structures that combine simultaneous (Nash) and hierarchical (Stackelberg) decision-making. We study N-robot forest-structured mixed-hierarchy games, in which each robot acts as a Stackelberg leader over its subtree while robots in different branches interact via Nash equilibria. We derive the Karush-Kuhn-Tucker (KKT) first-order optimality conditions for this class of games and show that they involve increasingly high-order derivatives of robots' best-response policies as the hierarchy depth grows, rendering a direct solution intractable. To overcome this challenge, we introduce a quasi-policy approximation that removes higher-order policy derivatives and develop an inexact Newton method for efficiently solving the resulting approximated KKT systems. We prove local exponential convergence of the proposed algorithm for games with non-quadratic objectives and nonlinear constraints. The approach is implemented in a highly optimized Julia library (MixedHierarchyGames.jl) and evaluated in simulated experiments, demonstrating real-time convergence for complex mixed-hierarchy information structures.
UniDWM: Towards a Unified Driving World Model via Multifaceted Representation Learning
Achieving reliable and efficient planning in complex driving environments requires a model that can reason over the scene's geometry, appearance, and dynamics. We present UniDWM, a unified driving world model that advances autonomous driving through multifaceted representation learning. UniDWM constructs a structure- and dynamic-aware latent world representation that serves as a physically grounded state space, enabling consistent reasoning across perception, prediction, and planning. Specifically, a joint reconstruction pathway learns to recover the scene's structure, including geometry and visual texture, while a collaborative generation framework leverages a conditional diffusion transformer to forecast future world evolution within the latent space. Furthermore, we show that our UniDWM can be deemed as a variation of VAE, which provides theoretical guidance for the multifaceted representation learning. Extensive experiments demonstrate the effectiveness of UniDWM in trajectory planning, 4D reconstruction and generation, highlighting the potential of multifaceted world representations as a foundation for unified driving intelligence. The code will be publicly available at https://github.com/Say2L/UniDWM.
Co-Design of Rover Wheels and Control using Bayesian Optimization and Rover-Terrain Simulations
While simulation is vital for optimizing robotic systems, the cost of modeling deformable terrain has long limited its use in full-vehicle studies of off-road autonomous mobility. For example, Discrete Element Method (DEM) simulations are often confined to single-wheel tests, which obscures coupled wheel-vehicle-controller interactions and prevents joint optimization of mechanical design and control. This paper presents a Bayesian optimization framework that co-designs rover wheel geometry and steering controller parameters using high-fidelity, full-vehicle closed-loop simulations on deformable terrain. Using the efficiency and scalability of a continuum-representation model (CRM) for terramechanics, we evaluate candidate designs on trajectories of varying complexity while towing a fixed load. The optimizer tunes wheel parameters (radius, width, and grouser features) and steering PID gains under a multi-objective formulation that balances traversal speed, tracking error, and energy consumption. We compare two strategies: simultaneous co-optimization of wheel and controller parameters versus a sequential approach that decouples mechanical and control design. We analyze trade-offs in performance and computational cost. Across 3,000 full-vehicle simulations, campaigns finish in five to nine days, versus months with the group's earlier DEM-based workflow. Finally, a preliminary hardware study suggests the simulation-optimized wheel designs preserve relative performance trends on the physical rover. Together, these results show that scalable, high-fidelity simulation can enable practical co-optimization of wheel design and control for off-road vehicles on deformable terrain without relying on prohibitively expensive DEM studies. The simulation infrastructure (scripts and models) is released as open source in a public repository to support reproducibility and further research.
comment: 19 pages, 15 figures
RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots
Deploying learned control policies on humanoid robots is challenging: policies that appear robust in simulation can execute confidently in out-of-distribution (OOD) states after Sim-to-Real transfer, leading to silent failures that risk hardware damage. Although anomaly detection can mitigate these failures, prior methods are often incompatible with high-rate control, poorly calibrated at the extremely low false-positive rates required for practical deployment, or operate as black boxes that provide a binary stop signal without explaining why the robot drifted from nominal behavior. We present RAPT, a lightweight, self-supervised deployment-time monitor for 50Hz humanoid control. RAPT learns a probabilistic spatio-temporal manifold of nominal execution from simulation and evaluates execution-time predictive deviation as a calibrated, per-dimension signal. This yields (i) reliable online OOD detection under strict false-positive constraints and (ii) a continuous, interpretable measure of Sim-to-Real mismatch that can be tracked over time to quantify how far deployment has drifted from training. Beyond detection, we introduce an automated post-hoc root-cause analysis pipeline that combines gradient-based temporal saliency derived from RAPT's reconstruction objective with LLM-based reasoning conditioned on saliency and joint kinematics to produce semantic failure diagnoses in a zero-shot setting. We evaluate RAPT on a Unitree G1 humanoid across four complex tasks in simulation and on physical hardware. In large-scale simulation, RAPT improves True Positive Rate (TPR) by 37% over the strongest baseline at a fixed episode-level false positive rate of 0.5%. On real-world deployments, RAPT achieves a 12.5% TPR improvement and provides actionable interpretability, reaching 75% root-cause classification accuracy across 16 real-world failures using only proprioceptive data.
TreeLoc: 6-DoF LiDAR Global Localization in Forests via Inter-Tree Geometric Matching ICRA 2026
Reliable localization is crucial for navigation in forests, where GPS is often degraded and LiDAR measurements are repetitive, occluded, and structurally complex. These conditions weaken the assumptions of traditional urban-centric localization methods, which assume that consistent features arise from unique structural patterns, necessitating forest-centric solutions to achieve robustness in these environments. To address these challenges, we propose TreeLoc, a LiDAR-based global localization framework for forests that handles place recognition and 6-DoF pose estimation. We represent scenes using tree stems and their Diameter at Breast Height (DBH), which are aligned to a common reference frame via their axes and summarized using the tree distribution histogram (TDH) for coarse matching, followed by fine matching with a 2D triangle descriptor. Finally, pose estimation is achieved through a two-step geometric verification. On diverse forest benchmarks, TreeLoc outperforms baselines, achieving precise localization. Ablation studies validate the contribution of each component. We also propose applications for long-term forest management using descriptors from a compact global tree database. TreeLoc is open-sourced for the robotics community at https://github.com/minwoo0611/TreeLoc.
comment: An 8-page paper with 7 tables and 8 figures, accepted to ICRA 2026
Modular Isoperimetric Soft Robotic Truss for Lunar Applications
We introduce a large-scale robotic system designed as a lightweight, modular, and reconfigurable structure for lunar applications. The system consists of truss-like robotic triangles formed by continuous inflated fabric tubes routed through two robotic roller units and a connecting unit. A newly developed spherical joint enables up to three triangles to connect at a vertex, allowing construction of truss assemblies beyond a single octahedron. When deflated, the triangles compact to approximately the volume of the roller units, achieving a stowed-to-deployed volume ratio of 1:18.3. Upon inflation, the roller units pinch the tubes, locally reducing bending stiffness to form effective joints. Electric motors then translate the roller units along the tube, shifting the pinch point by lengthening one edge while shortening another at the same rate, thereby preserving a constant perimeter (isoperimetric). This shape-changing process requires no additional compressed air, enabling untethered operation after initial inflation. We demonstrate the system as a 12-degree-of-freedom solar array capable of tilting up to 60 degrees and sweeping 360 degrees, and as a 14-degree-of-freedom locomotion device using a step-and-slide gait. This modular, shape-adaptive system addresses key challenges for sustainable lunar operations and future space missions.
Moving On, Even When You're Broken: Fail-Active Trajectory Generation via Diffusion Policies Conditioned on Embodiment and Task
Robot failure is detrimental and disruptive, often requiring human intervention to recover. Maintaining safe operation under impairment to achieve task completion, i.e. fail-active operation, is our target. Focusing on actuation failures, we introduce DEFT, a diffusion-based trajectory generator conditioned on the robot's current embodiment and task constraints. DEFT generalizes across failure types, supports constrained and unconstrained motions, and enables task completion under arbitrary failure. We evaluated DEFT in both simulation and real-world scenarios using a 7-DoF robotic arm. In simulation over thousands of joint-failure cases across multiple tasks, DEFT outperformed the baseline by up to 2 times. On failures unseen during training, it continued to outperform the baseline, indicating robust generalization in simulation. Further, we performed real-world evaluations on two multi-step tasks, drawer manipulation and whiteboard erasing. These experiments demonstrated DEFT succeeding on tasks where classical methods failed. Our results show that DEFT achieves fail-active manipulation across arbitrary failure configurations and real-world deployments.
comment: To be published in the 2026 IEEE International Conference on Robotics & Automation
Accelerating Structured Chain-of-Thought in Autonomous Vehicles
Chain-of-Thought (CoT) reasoning enhances the decision-making capabilities of vision-language-action models in autonomous driving, but its autoregressive nature introduces significant inference latency, making it impractical for real-time applications. To address this, we introduce FastDriveCoT, a novel parallel decoding method that accelerates template-structured CoT. Our approach decomposes the reasoning process into a dependency graph of distinct sub-tasks, such as identifying critical objects and summarizing traffic rules, some of which can be generated in parallel. By generating multiple independent reasoning steps concurrently within a single forward pass, we significantly reduce the number of sequential computations. Experiments demonstrate a 3-4$\times$ speedup in CoT generation and a substantial reduction in end-to-end latency across various model architectures, all while preserving the original downstream task improvements brought by incorporating CoT reasoning.
IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration
The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.
comment: 12 pages, submitted to a journal
Latent Perspective-Taking via a Schrödinger Bridge in Influence-Augmented Local Models
Operating in environments alongside humans requires robots to make decisions under uncertainty. In addition to exogenous dynamics, they must reason over others' hidden mental-models and mental-states. While Interactive POMDPs and Bayesian Theory of Mind formulations are principled, exact nested-belief inference is intractable, and hand-specified models are brittle in open-world settings. We address both by learning structured mental-models and an estimator of others' mental-states. Building on the Influence-Based Abstraction, we instantiate an Influence-Augmented Local Model to decompose socially-aware robot tasks into local dynamics, social influences, and exogenous factors. We propose (a) a neuro-symbolic world model instantiating a factored, discrete Dynamic Bayesian Network, and (b) a perspective-shift operator modeled as an amortized Schrödinger Bridge over the learned local dynamics that transports factored egocentric beliefs into other-centric beliefs. We show that this architecture enables agents to synthesize socially-aware policies in model-based reinforcement learning, via decision-time mental-state planning (a Schrödinger Bridge in belief space), with preliminary results in a MiniGrid social navigation task.
comment: Extended Abstract & Poster, Presented at World Modeling Workshop 2026
Causal Flow Q-Learning for Robust Offline Reinforcement Learning
Expressive policies based on flow-matching have been successfully applied in reinforcement learning (RL) more recently due to their ability to model complex action distributions from offline data. These algorithms build on standard policy gradients, which assume that there is no unmeasured confounding in the data. However, this condition does not necessarily hold for pixel-based demonstrations when a mismatch exists between the demonstrator's and the learner's sensory capabilities, leading to implicit confounding biases in offline data. We address the challenge by investigating the problem of confounded observations in offline RL from a causal perspective. We develop a novel causal offline RL objective that optimizes policies' worst-case performance that may arise due to confounding biases. Based on this new objective, we introduce a practical implementation that learns expressive flow-matching policies from confounded demonstrations, employing a deep discriminator to assess the discrepancy between the target policy and the nominal behavioral policy. Experiments across 25 pixel-based tasks demonstrate that our proposed confounding-robust augmentation procedure achieves a success rate 120\% that of confounding-unaware, state-of-the-art offline RL methods.
Kino-PAX$^+$: Near-Optimal Massively Parallel Kinodynamic Sampling-based Motion Planner
Sampling-based motion planners (SBMPs) are widely used for robot motion planning with complex kinodynamic constraints in high-dimensional spaces, yet they struggle to achieve \emph{real-time} performance due to their serial computation design. Recent efforts to parallelize SBMPs have achieved significant speedups in finding feasible solutions; however, they provide no guarantees of optimizing an objective function. We introduce Kino-PAX$^{+}$, a massively parallel kinodynamic SBMP with asymptotic near-optimal guarantees. Kino-PAX$^{+}$ builds a sparse tree of dynamically feasible trajectories by decomposing traditionally serial operations into three massively parallel subroutines. The algorithm focuses computation on the most promising nodes within local neighborhoods for propagation and refinement, enabling rapid improvement of solution cost. We prove that, while maintaining probabilistic $δ$-robust completeness, this focus on promising nodes ensures asymptotic $δ$-robust near-optimality. Our results show that Kino-PAX$^{+}$ finds solutions up to three orders of magnitude faster than existing serial methods and achieves lower solution costs than a state-of-the-art GPU-based planner.
comment: 10 pages, 8 figures
Language Movement Primitives: Grounding Language Models in Robot Motion
Enabling robots to perform novel manipulation tasks from natural language instructions remains a fundamental challenge in robotics, despite significant progress in generalized problem solving with foundational models. Large vision and language models (VLMs) are capable of processing high-dimensional input data for visual scene and language understanding, as well as decomposing tasks into a sequence of logical steps; however, they struggle to ground those steps in embodied robot motion. On the other hand, robotics foundation models output action commands, but require in-domain fine-tuning or experience before they are able to perform novel tasks successfully. At its core, there still remains the fundamental challenge of connecting abstract task reasoning with low-level motion control. To address this disconnect, we propose Language Movement Primitives (LMPs), a framework that grounds VLM reasoning in Dynamic Movement Primitive (DMP) parameterization. Our key insight is that DMPs provide a small number of interpretable parameters, and VLMs can set these parameters to specify diverse, continuous, and stable trajectories. Put another way: VLMs can reason over free-form natural language task descriptions, and semantically ground their desired motions into DMPs -- bridging the gap between high-level task reasoning and low-level position and velocity control. Building on this combination of VLMs and DMPs, we formulate our LMP pipeline for zero-shot robot manipulation that effectively completes tabletop manipulation problems by generating a sequence of DMP motions. Across 20 real-world manipulation tasks, we show that LMP achieves 80% task success as compared to 31% for the best-performing baseline. See videos at our website: https://collab.me.vt.edu/lmp
Adaptive Linear Path Model-Based Diffusion ICRA 2026
The interest in combining model-based control approaches with diffusion models has been growing. Although we have seen many impressive robotic control results in difficult tasks, the performance of diffusion models is highly sensitive to the choice of scheduling parameters, making parameter tuning one of the most critical challenges. We introduce Linear Path Model-Based Diffusion (LP-MBD), which replaces the variance-preserving schedule with a flow-matching-inspired linear probability path. This yields a geometrically interpretable and decoupled parameterization that reduces tuning complexity and provides a stable foundation for adaptation. Building on this, we propose Adaptive LP-MBD (ALP-MBD), which leverages reinforcement learning to adjust diffusion steps and noise levels according to task complexity and environmental conditions. Across numerical studies, Brax benchmarks, and mobile-robot trajectory tracking, LP-MBD simplifies scheduling while maintaining strong performance, and ALP-MBD further improves robustness, adaptability, and real-time efficiency.
comment: ICRA 2026
Fast Near Time-Optimal Motion Planning for Holonomic Vehicles in Structured Environments
This paper proposes a novel and efficient optimization-based method for generating near time-optimal trajectories for holonomic vehicles navigating through complex but structured environments. The approach aims to solve the problem of motion planning for planar motion systems using magnetic levitation that can be used in assembly lines, automated laboratories or clean-rooms. In these applications, time-optimal trajectories that can be computed in real-time are required to increase productivity and allow the vehicles to be reactive if needed. The presented approach encodes the environment representation using free-space corridors and represents the motion of the vehicle through such a corridor using a motion primitive. These primitives are selected heuristically and define the trajectory with a limited number of degrees of freedom, which are determined in an optimization problem. As a result, the method achieves significantly lower computation times compared to the state-of-the-art, most notably solving a full Optimal Control Problem (OCP), OMG-tools or VP-STO without significantly compromising optimality within a fixed corridor sequence. The approach is benchmarked extensively in simulation and is validated on a real-world Beckhoff XPlanar system
Sub-optimality bounds for certainty equivalent policies in partially observed systems
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.
comment: 12 pages, 0 figures
Simulating Human Audiovisual Search Behavior
Locating a target based on auditory and visual cues$\unicode{x2013}$such as finding a car in a crowded parking lot or identifying a speaker in a virtual meeting$\unicode{x2013}$requires balancing effort, time, and accuracy under uncertainty. Existing models of audiovisual search often treat perception and action in isolation, overlooking how people adaptively coordinate movement and sensory strategies. We present Sensonaut, a computational model of embodied audiovisual search. The core assumption is that people deploy their body and sensory systems in ways they believe will most efficiently improve their chances of locating a target, trading off time and effort under perceptual constraints. Our model formulates this as a resource-rational decision-making problem under partial observability. We validate the model against newly collected human data, showing that it reproduces both adaptive scaling of search time and effort under task complexity, occlusion, and distraction, and characteristic human errors. Our simulation of human-like resource-rational search informs the design of audiovisual interfaces that minimize search cost and cognitive load.
comment: 17 pages, 10 figures, CHI 2026
Bimanual High-Density EMG Control for In-Home Mobile Manipulation by a User with Quadriplegia
Mobile manipulators in the home can enable people with cervical spinal cord injury (cSCI) to perform daily physical household tasks that they could not otherwise do themselves. However, paralysis in these users often limits access to traditional robot control interfaces such as joysticks or keyboards. In this work, we introduce and deploy the first system that enables a user with quadriplegia to control a mobile manipulator in their own home using bimanual high-density electromyography (HDEMG). We develop a pair of custom, fabric-integrated HDEMG forearm sleeves, worn on both arms, that capture residual neuromotor activity from clinically paralyzed degrees of freedom and support real-time gesture-based robot control. Second, by integrating vision, language, and motion planning modules, we introduce a shared autonomy framework that supports robust and user-driven teleoperation, with particular benefits for navigation-intensive tasks in home environments. Finally, to demonstrate the system in the wild, we present a twelve-day in-home user study evaluating real-time use of the wearable EMG interface for daily robot control. Together, these system components enable effective robot control for performing activities of daily living and other household tasks in a real home environment.
comment: 14 pages, 17 figures
Ethical Asymmetry in Human-Robot Interaction - An Empirical Test of Sparrow's Hypothesis
The ethics of human-robot interaction (HRI) have been discussed extensively based on three traditional frameworks: deontology, consequentialism, and virtue ethics. We conducted a mixed within/between experiment to investigate Sparrow's proposed ethical asymmetry hypothesis in human treatment of robots. The moral permissibility of action (MPA) was manipulated as a subject grouping variable, and virtue type (prudence, justice, courage, and temperance) was controlled as a within-subjects factor. We tested moral stimuli using an online questionnaire with Perceived Moral Permissibility of Action (PMPA) and Perceived Virtue Scores (PVS) as response measures. The PVS measure was based on an adaptation of the established Questionnaire on Cardinal Virtues (QCV), while the PMPA was based on Malle et al. [39] work. We found that the MPA significantly influenced the PMPA and perceived virtue scores. The best-fitting model to describe the relationship between PMPA and PVS was cubic, which is symmetrical in nature. Our study did not confirm Sparrow's asymmetry hypothesis. The adaptation of the QCV is expected to have utility for future studies, pending additional psychometric property assessments.
comment: 27 pages, 3 figures
PokeNet: Learning Kinematic Models of Articulated Objects from Human Observations
Articulation modeling enables robots to learn joint parameters of articulated objects for effective manipulation which can then be used downstream for skill learning or planning. Existing approaches often rely on prior knowledge about the objects, such as the number or type of joints. Some of these approaches also fail to recover occluded joints that are only revealed during interaction. Others require large numbers of multi-view images for every object, which is impractical in real-world settings. Furthermore, prior works neglect the order of manipulations, which is essential for many multi-DoF objects where one joint must be operated before another, such as a dishwasher. We introduce PokeNet, an end-to-end framework that estimates articulation models from a single human demonstration without prior object knowledge. Given a sequence of point cloud observations of a human manipulating an unknown object, PokeNet predicts joint parameters, infers manipulation order, and tracks joint states over time. PokeNet outperforms existing state-of-the-art methods, improving joint axis and state estimation accuracy by an average of over 27% across diverse objects, including novel and unseen categories. We demonstrate these gains in both simulation and real-world environments.
AROLA: A Modular Layered Architecture for Scaled Autonomous Racing
Autonomous racing has advanced rapidly, particularly on scaled platforms, and software stacks must evolve accordingly. In this work, AROLA is introduced as a modular, layered software architecture in which fragmented and monolithic designs are reorganized into interchangeable layers and components connected through standardized ROS 2 interfaces. The autonomous-driving pipeline is decomposed into sensing, pre-processing, perception, localization and mapping, planning, behavior, control, and actuation, enabling rapid module replacement and objective benchmarking without reliance on custom message definitions. To support consistent performance evaluation, a Race Monitor framework is introduced as a lightweight system through which lap timing, trajectory quality, and computational load are logged in real time and standardized post-race analyses are generated. AROLA is validated in simulation and on hardware using the RoboRacer platform, including deployment at the 2025 RoboRacer IV25 competition. Together, AROLA and Race Monitor demonstrate that modularity, transparent interfaces, and systematic evaluation can accelerate development and improve reproducibility in scaled autonomous racing.
comment: 6 pages, 6 figures, IV 2026
Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion ICLR 2026
We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizon tasks in domains with multiple entities. Achieving long-horizon goals in complex environments remains a core challenge in Reinforcement Learning (RL). Domains with multiple entities are particularly difficult due to their combinatorial complexity. GCRL facilitates generalization across goals and the use of subgoal structure, but struggles with high-dimensional observations and combinatorial state-spaces, especially under sparse reward. We employ a two-level hierarchy composed of a value-based GCRL agent and a factored subgoal-generating conditional diffusion model. The RL agent and subgoal generator are trained independently and composed post hoc through selective subgoal generation based on the value function, making the approach modular and compatible with existing GCRL algorithms. We introduce new variations to benchmark tasks that highlight the challenges of multi-entity domains, and show that our method consistently boosts performance of the underlying RL agent on image-based long-horizon tasks with sparse rewards, achieving over 150% higher success rates on the hardest task in our suite and generalizing to increasing horizons and numbers of entities. Rollout videos are provided at: https://sites.google.com/view/hecrl
comment: ICLR 2026
Reward Evolution with Graph-of-Thoughts: A Bi-Level Language Model Framework for Reinforcement Learning
Designing effective reward functions remains a major challenge in reinforcement learning (RL), often requiring considerable human expertise and iterative refinement. Recent advances leverage Large Language Models (LLMs) for automated reward design, but these approaches are limited by hallucinations, reliance on human feedback, and challenges with handling complex, multi-step tasks. In this work, we introduce Reward Evolution with Graph-of-Thoughts (RE-GoT), a novel bi-level framework that enhances LLMs with structured graph-based reasoning and integrates Visual Language Models (VLMs) for automated rollout evaluation. RE-GoT first decomposes tasks into text-attributed graphs, enabling comprehensive analysis and reward function generation, and then iteratively refines rewards using visual feedback from VLMs without human intervention. Extensive experiments on 10 RoboGen and 4 ManiSkill2 tasks demonstrate that RE-GoT consistently outperforms existing LLM-based baselines. On RoboGen, our method improves average task success rates by 32.25%, with notable gains on complex multi-step tasks. On ManiSkill2, RE-GoT achieves an average success rate of 93.73% across four diverse manipulation tasks, significantly surpassing prior LLM-based approaches and even exceeding expert-designed rewards. Our results indicate that combining LLMs and VLMs with graph-of-thoughts reasoning provides a scalable and effective solution for autonomous reward evolution in RL.
Terrain Costmap Generation via Scaled Preference Conditioning
Successful autonomous robot navigation in off-road domains requires the ability to generate high-quality terrain costmaps that are able to both generalize well over a wide variety of terrains and rapidly adapt relative costs at test time to meet mission-specific needs. Existing approaches for costmap generation allow for either rapid test-time adaptation of relative costs (e.g., semantic segmentation methods) or generalization to new terrain types (e.g., representation learning methods), but not both. In this work, we present scaled preference conditioned all-terrain costmap generation (SPACER), a novel approach for generating terrain costmaps that leverages synthetic data during training in order to generalize well to new terrains, and allows for rapid test-time adaptation of relative costs by conditioning on a user-specified scaled preference context. Using large-scale aerial maps, we provide empirical evidence that SPACER outperforms other approaches at generating costmaps for terrain navigation, with the lowest measured regret across varied preferences in five of seven environments for global path planning.
HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction
We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality. The project page and source code will be made available at https://hi-slam2.github.io/.
What does really matter in image goal navigation?
Image goal navigation requires two different skills: firstly, core navigation skills, including the detection of free space and obstacles, and taking decisions based on an internal representation; and secondly, computing directional information by comparing visual observations to the goal image. Current state-of-the-art methods either rely on dedicated image-matching, or pre-training of computer vision modules on relative pose estimation. In this paper, we study whether this task can be efficiently solved with end-to-end training of full agents with RL, as has been claimed by recent work. A positive answer would have impact beyond Embodied AI and allow training of relative pose estimation from reward for navigation alone. In this large experimental study we investigate the effect of architectural choices like late fusion, channel stacking, space-to-depth projections and cross-attention, and their role in the emergence of relative pose estimators from navigation training. We show that the success of recent methods is influenced up to a certain extent by simulator settings, leading to shortcuts in simulation. However, we also show that these capabilities can be transferred to more realistic setting, up to some extent. We also find evidence for correlations between navigation performance and probed (emerging) relative pose estimation performance, an important sub skill.
MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road Environments
The use of robotics in humanitarian demining increasingly involves computer vision techniques to improve landmine detection capabilities. However, in the absence of diverse and realistic datasets, the reliable validation of algorithms remains a challenge for the research community. In this paper, we introduce MineInsight, a publicly available multi-sensor, multi-spectral dataset designed for off-road landmine detection. The dataset features 35 different targets (15 landmines and 20 commonly found objects) distributed along three distinct tracks, providing a diverse and realistic testing environment. MineInsight is, to the best of our knowledge, the first dataset to integrate dual-view sensor scans from both an Unmanned Ground Vehicle and its robotic arm, offering multiple viewpoints to mitigate occlusions and improve spatial awareness. It features two LiDARs, as well as images captured at diverse spectral ranges, including visible (RGB, monochrome), visible short-wave infrared (VIS-SWIR), and long-wave infrared (LWIR). Additionally, the dataset provides bounding boxes generated by an automated pipeline and refined with human supervision. We recorded approximately one hour of data in both daylight and nighttime conditions, resulting in around 38,000 RGB frames, 53,000 VIS-SWIR frames, and 108,000 LWIR frames. MineInsight serves as a benchmark for developing and evaluating landmine detection algorithms. Our dataset is available at https://github.com/mariomlz99/MineInsight.
Policy Contrastive Decoding for Robotic Foundation Models
Robotic foundation models, or generalist robot policies, hold immense potential to enable flexible, general-purpose and dexterous robotic systems. Despite their advancements, our empirical experiments reveal that existing robot policies are prone to learning spurious correlations from pre-training trajectories, adversely affecting their generalization capabilities beyond the training data. To tackle this, we propose a novel Policy Contrastive Decoding (PCD) approach, which redirects the robot policy's focus toward object-relevant visual clues by contrasting action probability distributions derived from original and object-masked visual inputs. As a training-free method, our PCD can be used as a plugin to improve different types of robot policies without needing to finetune or access model weights. We conduct extensive experiments on top of three open-source robot policies, including the autoregressive policy OpenVLA and the diffusion-based policies Octo and $π_0$. The obtained results in both simulation and real-world environments prove PCD's flexibility and effectiveness, e.g., PCD enhances the state-of-the-art policy $π_0$ by 8.9% in the simulation environment and by 108% in the real-world environment. Code and demos are publicly available at: https://koorye.github.io/PCD.
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robots
Quadruped mammals coordinate spinal bending and axial compression to enhance locomotion agility and efficiency. However, existing robotic spines typically lack the active compliance required to support such dynamic behaviours. We present SPARC, a compact 3-DoF sagittal-plane spine module that enables simultaneous revolute and prismatic motions within a 1.26 kg package. Using a floating-base impedance controller, we facilitate independent, task-space tuning of spinal stiffness and damping to mimic biological load-bearing strategies. Benchtop experiments confirm high-fidelity rendering of commanded impedance, with linear force-displacement error within 1.5%. Systematic locomotion simulations reveal a critical speed-dependency: while low-speed efficiency is insensitive to spinal properties, precise impedance tuning becomes indispensable for high-speed performance. Our results demonstrate that an optimally compliant spine reduces power consumption by 21% at 0.9 m/s compared to a rigid-spine baseline. This efficiency gain is mechanistically attributed to the spine's role in augmenting stride length and acting as a mechanical low-pass filter to attenuate high-frequency torque fluctuations. SPARC provides an open-source platform for systematic studies of spine compliance in legged locomotion. Available at: github.com/YueWang996/sparc
A Survey on Efficient Vision-Language-Action Models
Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. Despite their remarkable performance, foundational VLAs are hindered by the prohibitive computational and data demands inherent to their large-scale architectures. While a surge of recent research has focused on enhancing VLA efficiency, the field lacks a unified framework to consolidate these disparate advancements. To bridge this gap, this survey presents the first comprehensive review of Efficient Vision-Language-Action models (Efficient VLAs) across the entire model-training-data pipeline. Specifically, we introduce a unified taxonomy to systematically organize the disparate efforts in this domain, categorizing current techniques into three core pillars: (1) Efficient Model Design, focusing on efficient architectures and model compression; (2) Efficient Training, which reduces computational burdens during model learning; and (3) Efficient Data Collection, which addresses the bottlenecks in acquiring and utilizing robotic data. Through a critical review of state-of-the-art methods within this framework, this survey not only establishes a foundational reference for the community but also summarizes representative applications, delineates key challenges, and charts a roadmap for future research. We maintain a continuously updated project page to track our latest developments: https://evla-survey.github.io/.
comment: 28 pages, 8 figures
3D Dynamics-Aware Manipulation: Endowing Manipulation Policies with 3D Foresight ICRA 2026
The incorporation of world modeling into manipulation policy learning has pushed the boundary of manipulation performance. However, existing efforts simply model the 2D visual dynamics, which is insufficient for robust manipulation when target tasks involve prominent depth-wise movement. To address this, we present a 3D dynamics-aware manipulation framework that seamlessly integrates 3D world modeling and policy learning. Three self-supervised learning tasks (current depth estimation, future RGB-D prediction, 3D flow prediction) are introduced within our framework, which complement each other and endow the policy model with 3D foresight. Extensive experiments on simulation and the real world show that 3D foresight can greatly boost the performance of manipulation policies without sacrificing inference speed. Code is available at https://github.com/Stardust-hyx/3D-Foresight.
comment: ICRA 2026
Line-Search Filter Differential Dynamic Programming for Optimal Control with Nonlinear Equality Constraints ICRA
We present FilterDDP, a differential dynamic programming algorithm for solving discrete-time, optimal control problems (OCPs) with nonlinear equality constraints. Unlike prior methods based on merit functions or the augmented Lagrangian class of algorithms, FilterDDP uses a step filter in conjunction with a line search to handle equality constraints. We identify two important design choices for the step filter criteria which lead to robust numerical performance: 1) we use the Lagrangian instead of the cost in the step acceptance criterion and, 2) in the backward pass, we perturb the value function Hessian. Both choices are rigorously justified, for 2) in particular by a formal proof of local quadratic convergence. In addition to providing a primal-dual interior point extension for handling OCPs with both equality and inequality constraints, we validate FilterDDP on three contact implicit trajectory optimisation problems which arise in robotics.
comment: Accepted for publication in the IEEE International Conference on Robotics and Automation (ICRA) 2026
GenTrack2: An Improved Hybrid Approach for Visual Multi-Object Tracking
This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target numbers under nonlinear dynamics. A stochastic particle filter addresses nonlinear dynamics and non-Gaussian noise, with support from particle swarm optimization (PSO) to guide particles toward state distribution modes and mitigate divergence through proposed fitness measures incorporating motion consistency, appearance similarity, and social-interaction cues with neighboring targets. Deterministic association further enforces identifier consistency via a proposed cost matrix incorporating spatial consistency between particles and current detections, detection confidences, and track penalties. Subsequently, a novel scheme is proposed for the smooth updating of target states while preserving their identities, particularly for weak tracks during interactions with other targets and prolonged occlusions. Moreover, velocity regression over past states provides trend-seed velocities, enhancing particle sampling and state updates. The proposed tracker is designed to operate flexibly for both pre-recorded videos and camera live streams, where future frames are unavailable. Experimental results confirm superior performance compared to state-of-the-art trackers. The source-code reference implementations of both the proposed method and compared-trackers are provided on GitHub: https://github.com/SDU-VelKoTek/GenTrack2
comment: This work has been submitted to the IEEE for possible publication
Model Reconciliation through Explainability and Collaborative Recovery in Assistive Robotics
Whenever humans and robots work together, it is essential that unexpected robot behavior can be explained to the user. Especially in applications such as shared control the user and the robot must share the same model of the objects in the world, and the actions that can be performed on these objects. In this paper, we achieve this with a so-called model reconciliation framework. We leverage a Large Language Model to predict and explain the difference between the robot's and the human's mental models, without the need of a formal mental model of the user. Furthermore, our framework aims to solve the model divergence after the explanation by allowing the human to correct the robot. We provide an implementation in an assistive robotics domain, where we conduct a set of experiments with a real wheelchair-based mobile manipulator and its digital twin.
LaST$_{0}$: Latent Spatio-Temporal Chain-of-Thought for Robotic Vision-Language-Action Model
Vision-Language-Action (VLA) models have recently shown strong generalization, with some approaches seeking to explicitly generate linguistic reasoning traces or predict future observations prior to execution. However, explicit reasoning typically incurs non-negligible inference latency, which constrains the temporal resolution required for robotic manipulation. Moreover, such reasoning is confined to the linguistic space, imposing a representational bottleneck that struggles to faithfully capture ineffable physical attributes. To mitigate these limitations, we propose LaST$_0$, a framework that enables efficient reasoning before acting through a Latent Spatio-Temporal Chain-of-Thought (CoT), capturing fine-grained physical and robotic dynamics that are often difficult to verbalize. Specifically, we introduce a token-efficient latent CoT space that models future visual dynamics, 3D structural information, and robot proprioceptive states, and further extends these representations across time to enable temporally consistent implicit reasoning trajectories. Furthermore, LaST$_0$ adopts a dual-system architecture implemented via a Mixture-of-Transformers design, where a reasoning expert conducts low-frequency latent inference and an acting expert generates high-frequency actions conditioned on robotics-oriented latent representations. To facilitate coordination, LaST$_0$ is trained with heterogeneous operation frequencies, enabling adaptive switching during deployment. Across 10 real-world tasks spanning tabletop, mobile, and dexterous hand manipulation, LaST$_0$ improves mean success rates by 13%, 14% and 14% over prior SOTA VLA methods, respectively.
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification ICLR 2026
Accurate material identification plays a crucial role in embodied AI systems, enabling a wide range of applications. However, current vision-based solutions are limited by the inherent constraints of optical sensors, while radio-frequency (RF) approaches, which can reveal intrinsic material properties, have received growing attention. Despite this progress, RF-based material identification remains hindered by the lack of large-scale public datasets and the limited benchmarking of learning-based approaches. In this work, we present RF-MatID, the first open-source, large-scale, wide-band, and geometry-diverse RF dataset for fine-grained material identification. RF-MatID includes 16 fine-grained categories grouped into 5 superclasses, spanning a broad frequency range from 4 to 43.5 GHz, and comprises 142k samples in both frequency- and time-domain representations. The dataset systematically incorporates controlled geometry perturbations, including variations in incidence angle and stand-off distance. We further establish a multi-setting, multi-protocol benchmark by evaluating state-of-the-art deep learning models, assessing both in-distribution performance and out-of-distribution robustness under cross-angle and cross-distance shifts. The 5 frequency-allocation protocols enable systematic frequency- and region-level analysis, thereby facilitating real-world deployment. RF-MatID aims to enable reproducible research, accelerate algorithmic advancement, foster cross-domain robustness, and support the development of real-world application in RF-based material identification.
comment: Accepted by ICLR 2026
EgoFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving ICRA2026
Current End-to-End Autonomous Driving (E2E-AD) methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized with a fully differentiable framework in a planning-oriented manner, existing end-to-end driving systems lacking ego-centric designs still suffer from unsatisfactory performance and inferior efficiency, due to rasterized scene representation learning and redundant information transmission. In this paper, we propose an ego-centric fully sparse paradigm, named EgoFSD, for end-to-end self-driving. Specifically, EgoFSD consists of sparse perception, hierarchical interaction and iterative motion planner. The sparse perception module performs detection and online mapping based on sparse representation of the driving scene. The hierarchical interaction module aims to select the Closest In-Path Vehicle / Stationary (CIPV / CIPS) from coarse to fine, benefiting from an additional geometric prior. As for the iterative motion planner, both selected interactive agents and ego-vehicle are considered for joint motion prediction, where the output multi-modal ego-trajectories are optimized in an iterative fashion. In addition, position-level motion diffusion and trajectory-level planning denoising are introduced for uncertainty modeling, thereby enhancing the training stability and convergence speed. Extensive experiments are conducted on nuScenes and Bench2Drive datasets, which significantly reduces the average L2 error by 59% and collision rate by 92% than UniAD while achieves 6.9x faster running efficiency.
comment: Accepted to ICRA2026
A Three-Level Whole-Body Disturbance Rejection Control Framework for Dynamic Motions in Legged Robots SP
This paper presents a control framework designed to enhance the stability and robustness of legged robots in the presence of uncertainties, including model uncertainties, external disturbances, and faults. The framework enables the full-state feedback estimator to estimate and compensate for uncertainties in the whole-body dynamics of the legged robots. First, we propose a novel moving horizon extended state observer (MH-ESO) to estimate uncertainties and mitigate noise in legged systems, which can be integrated into the framework for disturbance compensation. Second, we introduce a three-level whole-body disturbance rejection control framework (T-WB-DRC). Unlike the previous two-level approach, this three-level framework considers both the plan based on whole-body dynamics without uncertainties and the plan based on dynamics with uncertainties, significantly improving payload transportation, external disturbance rejection, and fault tolerance. Third, simulations of both humanoid and quadruped robots in the Gazebo simulator demonstrate the effectiveness and versatility of T-WB-DRC. Finally, extensive experimental trials on a quadruped robot validate the robustness and stability of the system when using T-WB-DRC under various disturbance conditions.
comment: has been accepted for publication as a SPECIAL ISSUE paper in the IEEE Transactions on Automation Science and Engineering
A Gait Driven Reinforcement Learning Framework for Humanoid Robots
This paper presents a real-time gait driven training framework for humanoid robots. First, we introduce a novel gait planner that incorporates dynamics to design the desired joint trajectory. In the gait design process, the 3D robot model is decoupled into two 2D models, which are then approximated as hybrid inverted pendulums (H-LIP) for trajectory planning. The gait planner operates in parallel in real time within the robot's learning environment. Second, based on this gait planner, we design three effective reward functions within a reinforcement learning framework, forming a reward composition to achieve periodic bipedal gait. This reward composition reduces the robot's learning time and enhances locomotion performance. Finally, a gait design example, along with simulation and experimental comparisons, is presented to demonstrate the effectiveness of the proposed method.
System Identification for Virtual Sensor-Based Model Predictive Control: Application to a 2-DoF Direct-Drive Robotic Arm
Nonlinear Model Predictive Control (NMPC) offers a powerful approach for controlling complex nonlinear systems, yet faces two key challenges. First, accurately modeling nonlinear dynamics remains difficult. Second, variables directly related to control objectives often cannot be directly measured during operation. Although high-cost sensors can acquire these variables during model development, their use in practical deployment is typically infeasible. To overcome these limitations, we propose a Predictive Virtual Sensor Identification (PVSID) framework that leverages temporary high-cost sensors during the modeling phase to create virtual sensors for NMPC implementation. We validate PVSID on a Two-Degree-of-Freedom (2-DoF) direct-drive robotic arm with complex joint interactions, capturing tip position via motion capture during modeling and utilize an Inertial Measurement Unit (IMU) in NMPC. Experimental results show our NMPC with identified virtual sensors achieves precise tip trajectory tracking without requiring the motion capture system during operation. PVSID offers a practical solution for implementing optimal control in nonlinear systems where the measurement of key variables is constrained by cost or operational limitations.
comment: 6 pages, 5 figures. Published in the proceedings of the 2025 IEEE 64th Conference on Decision and Control (CDC 2025)
UNIC: Learning Unified Multimodal Extrinsic Contact Estimation
Contact-rich manipulation requires reliable estimation of extrinsic contacts-the interactions between a grasped object and its environment which provide essential contextual information for planning, control, and policy learning. However, existing approaches often rely on restrictive assumptions, such as predefined contact types, fixed grasp configurations, or camera calibration, that hinder generalization to novel objects and deployment in unstructured environments. In this paper, we present UNIC, a unified multimodal framework for extrinsic contact estimation that operates without any prior knowledge or camera calibration. UNIC directly encodes visual observations in the camera frame and integrates them with proprioceptive and tactile modalities in a fully data-driven manner. It introduces a unified contact representation based on scene affordance maps that captures diverse contact formations and employs a multimodal fusion mechanism with random masking, enabling robust multimodal representation learning. Extensive experiments demonstrate that UNIC performs reliably. It achieves a 9.6 mm average Chamfer distance error on unseen contact locations, performs well on unseen objects, remains robust under missing modalities, and adapts to dynamic camera viewpoints. These results establish extrinsic contact estimation as a practical and versatile capability for contact-rich manipulation. The overview and hardware experiment videos are at https://youtu.be/xpMitkxN6Ls?si=7Vgj-aZ_P1wtnWZN
Safe and Stable Neural Network Dynamical Systems for Robot Motion Planning
Learning safe and stable robot motions from demonstrations remains a challenge, especially in complex, nonlinear tasks involving dynamic, obstacle-rich environments. In this paper, we propose Safe and Stable Neural Network Dynamical Systems S$^2$-NNDS, a learning-from-demonstration framework that simultaneously learns expressive neural dynamical systems alongside neural Lyapunov stability and barrier safety certificates. Unlike traditional approaches with restrictive polynomial parameterizations, S$^2$-NNDS leverages neural networks to capture complex robot motions, providing probabilistic guarantees through split conformal prediction in learned certificates. Experimental results in various 2D and 3D datasets -- including LASA handwriting and demonstrations recorded kinesthetically from the Franka Emika Panda robot -- validate the effectiveness of S$^2$-NNDS in learning robust, safe, and stable motions from potentially unsafe demonstrations. The source code, supplementary material and experiment videos can be accessed via https://github.com/allemmbinn/S2NNDS
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle
This paper introduces a deep transformer network for estimating the relative 6D pose of a Unmanned Aerial Vehicle (UAV) with respect to a ship using monocular images. A synthetic dataset of ship images is created and annotated with 2D keypoints of multiple ship parts. A Transformer Neural Network model is trained to detect these keypoints and estimate the 6D pose of each part. The estimates are integrated using Bayesian fusion. The model is tested on synthetic data and in-situ flight experiments, demonstrating robustness and accuracy in various lighting conditions. The position estimation error is approximately 0.8\% and 1.0\% of the distance to the ship for the synthetic data and the flight experiments, respectively. The method has potential applications for ship-based autonomous UAV landing and navigation.
comment: 23 pages, 25 figures, 3 tables
Compact LED-Based Displacement Sensing for Robot Fingers
In this paper, we introduce a sensor designed for integration in robot fingers, where it can provide information on the displacements induced by external contact. Our sensor uses LEDs to sense the displacement between two plates connected by a transparent elastomer; when a force is applied to the finger, the elastomer displaces and the LED signals change. We show that using LEDs as both light emitters an receivers in this context provides high sensitivity, allowing such an emitter and receiver pairs to detect very small displacements. We characterize the standalone performance of the sensor by testing the ability of a supervised learning model to predict complete force and torque data from its raw signals, and obtain a mean error between 0.05 and 0.07 N across the three directions of force applied to the finger. Our method allows for finger-size packaging with no amplification electronics, low cost manufacturing, easy integration into a complete hand, and high overload shear forces and bending torques, suggesting future applicability to complete manipulation tasks.
Geometry-aware 4D Video Generation for Robot Manipulation ICLR 2026
Understanding and predicting dynamics of the physical world can enhance a robot's ability to plan and interact effectively in complex environments. While recent video generation models have shown strong potential in modeling dynamic scenes, generating videos that are both temporally coherent and geometrically consistent across camera views remains a significant challenge. To address this, we propose a 4D video generation model that enforces multi-view 3D consistency of generated videos by supervising the model with cross-view pointmap alignment during training. Through this geometric supervision, the model learns a shared 3D scene representation, enabling it to generate spatio-temporally aligned future video sequences from novel viewpoints given a single RGB-D image per view, and without relying on camera poses as input. Compared to existing baselines, our method produces more visually stable and spatially aligned predictions across multiple simulated and real-world robotic datasets. We further show that the predicted 4D videos can be used to recover robot end-effector trajectories using an off-the-shelf 6DoF pose tracker, yielding robot manipulation policies that generalize well to novel camera viewpoints.
comment: ICLR 2026; Project website: https://robot4dgen.github.io
Multiagent Systems
David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning
The evolution of large language models into autonomous agents introduces adversarial failures that exploit legitimate tool privileges, transforming safety evaluation in tool-augmented environments from a subjective NLP task into an objective control problem. We formalize this threat model as Tag-Along Attacks: a scenario where a tool-less adversary "tags along" on the trusted privileges of a safety-aligned Operator to induce prohibited tool use through conversation alone. To validate this threat, we present Slingshot, a 'cold-start' reinforcement learning framework that autonomously discovers emergent attack vectors, revealing a critical insight: in our setting, learned attacks tend to converge to short, instruction-like syntactic patterns rather than multi-turn persuasion. On held-out extreme-difficulty tasks, Slingshot achieves a 67.0% success rate against a Qwen2.5-32B-Instruct-AWQ Operator (vs. 1.7% baseline), reducing the expected attempts to first success (on solved tasks) from 52.3 to 1.3. Crucially, Slingshot transfers zero-shot to several model families, including closed-source models like Gemini 2.5 Flash (56.0% attack success rate) and defensive-fine-tuned open-source models like Meta-SecAlign-8B (39.2% attack success rate). Our work establishes Tag-Along Attacks as a first-class, verifiable threat model and shows that effective agentic attacks can be elicited from off-the-shelf open-weight models through environment interaction alone.
comment: Under review. 8 main pages, 2 figures, 2 tables. Appendix included
Context Learning for Multi-Agent Discussion
Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that current MAD methods easily suffer from discussion inconsistency, LLMs fail to reach a coherent solution, due to the misalignment between their individual contexts.In this paper, we introduce a multi-LLM context learning method (M2CL) that learns a context generator for each agent, capable of dynamically generating context instructions per discussion round via automatic information organization and refinement. Specifically, inspired by our theoretical insights on the context instruction, M2CL train the generators to control context coherence and output discrepancies via a carefully crafted self-adaptive mechanism.It enables LLMs to avoid premature convergence on majority noise and progressively reach the correct consensus. We evaluate M2CL on challenging tasks, including academic reasoning, embodied tasks, and mobile control. The results show that the performance of M2CL significantly surpasses existing methods by 20%--50%, while enjoying favorable transferability and computational efficiency.
Modelling Socio-Psychological Drivers of Land Management Intensity
Land management intensity shapes ecosystem service provision, socio-ecological resilience and is central to sustainable transformation. Yet most land use models emphasise economic and biophysical drivers, while socio-psychological factors influencing land managers' decisions remain underrepresented despite increasing evidence that they shape land management choices. To address this gap, we develop a generic behavioural extension for agent-based land use models, guided by the Theory of Planned Behaviour as an overarching conceptual framework. The extension integrates environmental attitudes, descriptive social norms and behavioural inertia into land managers' decisions on land management intensity. To demonstrate applicability, the extension is coupled to an existing land use modelling framework and explored in stylised settings to isolate behavioural mechanisms. Results show that socio-psychological drivers can significantly alter land management intensity shares, landscape configuration, and ecosystem service provision. Nonlinear feedbacks between these drivers, spatial resource heterogeneity, and ecosystem service demand lead to emergent dynamics that are sometimes counter-intuitive and can diverge from the agent-level decision rules. Increasing the influence of social norms generates spatial clustering and higher landscape connectivity, while feedbacks between behavioural factors can lead to path dependence, lock-in effects, and the emergence of multiple stable regimes with sharp transitions. The proposed framework demonstrates how even low levels of behavioural diversity and social interactions can reshape system-level land use outcomes and provides a reusable modelling component for incorporating socio-psychological processes into land use simulations. The approach can be integrated into other agent-based land use models and parameterised empirically in future work.
comment: 34 pages (including references and appendix), 27 figures. Submitted to JASSS
Self-Evolving Coordination Protocol in Multi-Agent AI Systems: An Exploratory Systems Feasibility Study
Contemporary multi-agent systems increasingly rely on internal coordination mechanisms to combine, arbitrate, or constrain the outputs of heterogeneous components. In safety-critical and regulated domains such as finance, these mechanisms must satisfy strict formal requirements, remain auditable, and operate within explicitly bounded limits. Coordination logic therefore functions as a governance layer rather than an optimization heuristic. This paper presents an exploratory systems feasibility study of Self-Evolving Coordination Protocols (SECP): coordination protocols that permit limited, externally validated self-modification while preserving fixed formal invariants. We study a controlled proof-of-concept setting in which six fixed Byzantine consensus protocol proposals are evaluated by six specialized decision modules. All coordination regimes operate under identical hard constraints, including Byzantine fault tolerance (f < n/3), O(n2) message complexity, complete non-statistical safety and liveness arguments, and bounded explainability. Four coordination regimes are compared in a single-shot design: unanimous hard veto, weighted scalar aggregation, SECP v1.0 (an agent-designed non-scalar protocol), and SECP v2.0 (the result of one governed modification). Outcomes are evaluated using a single metric, proposal coverage, defined as the number of proposals accepted. A single recursive modification increased coverage from two to three accepted proposals while preserving all declared invariants. The study makes no claims regarding statistical significance, optimality, convergence, or learning. Its contribution is architectural: it demonstrates that bounded self-modification of coordination protocols is technically implementable, auditable, and analyzable under explicit formal constraints, establishing a foundation for governed multi-agent systems.
Bandwidth-Efficient Multi-Agent Communication through Information Bottleneck and Vector Quantization ICRA 2026
Multi-agent reinforcement learning systems deployed in real-world robotics applications face severe communication constraints that significantly impact coordination effectiveness. We present a framework that combines information bottleneck theory with vector quantization to enable selective, bandwidth-efficient communication in multi-agent environments. Our approach learns to compress and discretize communication messages while preserving task-critical information through principled information-theoretic optimization. We introduce a gated communication mechanism that dynamically determines when communication is necessary based on environmental context and agent states. Experimental evaluation on challenging coordination tasks demonstrates that our method achieves 181.8% performance improvement over no-communication baselines while reducing bandwidth usage by 41.4%. Comprehensive Pareto frontier analysis shows dominance across the entire success-bandwidth spectrum with area-under-curve of 0.198 vs 0.142 for next-best methods. Our approach significantly outperforms existing communication strategies and establishes a theoretically grounded framework for deploying multi-agent systems in bandwidth-constrained environments such as robotic swarms, autonomous vehicle fleets, and distributed sensor networks.
comment: Accepted at the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026), Vienna, Austria. 9 pages, 4 figures, 6 tables
ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems
Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade performance, and opaque execution traces make failures difficult to localize or debug. We introduce ROMA (Recursive Open Meta-Agents), a domain-agnostic framework that addresses these limitations through recursive task decomposition and structured aggregation. ROMA decomposes goals into dependency-aware subtask trees that can be executed in parallel, while aggregation compresses and validates intermediate results to control context growth. Our framework standardizes agent construction around four modular roles --Atomizer (which decides whether a task should be decomposed), Planner, Executor, and Aggregator -- which cleanly separate orchestration from model selection and enable transparent, hierarchical execution traces. This design supports heterogeneous multi-agent systems that mix models and tools according to cost, latency, and capability. To adapt ROMA to specific tasks without fine-tuning, we further introduce GEPA$+$, an improved Genetic-Pareto prompt proposer that searches over prompts within ROMA's component hierarchy while preserving interface contracts. We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks. On SEAL-0, which evaluates reasoning over conflicting web evidence, ROMA instantiated with GLM-4.6 improves accuracy by 9.9\% over Kimi-Researcher. On EQ-Bench, a long-form writing benchmark, ROMA enables DeepSeek-V3 to match the performance of leading closed-source models such as Claude Sonnet 4.5. Our results demonstrate that recursive, modular agent architectures can scale reasoning depth while remaining interpretable, flexible, and model-agnostic.
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning
The design of environments plays a critical role in shaping the development and evaluation of cooperative multi-agent reinforcement learning (MARL) algorithms. While existing benchmarks highlight critical challenges, they often lack the modularity required to design custom evaluation scenarios. We introduce the Totally Accelerated Battle Simulator in JAX (TABX), a high-throughput sandbox designed for reconfigurable multi-agent tasks. TABX provides granular control over environmental parameters, permitting a systematic investigation into emergent agent behaviors and algorithmic trade-offs across a diverse spectrum of task complexities. Leveraging JAX for hardware-accelerated execution on GPUs, TABX enables massive parallelization and significantly reduces computational overhead. By providing a fast, extensible, and easily customized framework, TABX facilitates the study of MARL agents in complex structured domains and serves as a scalable foundation for future research. Our code is available at: https://anonymous.4open.science/r/TABX-00CA.
From Perception to Action: Spatial AI Agents and World Models
While large language models have become the prevailing approach for agentic reasoning and planning, their success in symbolic domains does not readily translate to the physical world. Spatial intelligence, the ability to perceive 3D structure, reason about object relationships, and act under physical constraints, is an orthogonal capability that proves important for embodied agents. Existing surveys address either agentic architectures or spatial domains in isolation. None provide a unified framework connecting these complementary capabilities. This paper bridges that gap. Through a thorough review of over 2,000 papers, citing 742 works from top-tier venues, we introduce a unified three-axis taxonomy connecting agentic capabilities with spatial tasks across scales. Crucially, we distinguish spatial grounding (metric understanding of geometry and physics) from symbolic grounding (associating images with text), arguing that perception alone does not confer agency. Our analysis reveals three key findings mapped to these axes: (1) hierarchical memory systems (Capability axis) are important for long-horizon spatial tasks. (2) GNN-LLM integration (Task axis) is a promising approach for structured spatial reasoning. (3) World models (Scale axis) are essential for safe deployment across micro-to-macro spatial scales. We conclude by identifying six grand challenges and outlining directions for future research, including the need for unified evaluation frameworks to standardize cross-domain assessment. This taxonomy provides a foundation for unifying fragmented research efforts and enabling the next generation of spatially-aware autonomous systems in robotics, autonomous vehicles, and geospatial intelligence.
comment: 61 pages, 742 citations, 1 figure, 3 tables. Survey paper on spatial AI agents, embodied AI, graph neural networks, and world models
MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety
Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due to their \textbf{reliance on static, pre-collected data distributions}. In this paper, we introduce \textbf{MAGIC}, a novel multi-turn multi-agent reinforcement learning framework that formulates LLM safety alignment as an adversarial asymmetric game. Specifically, an attacker agent learns to iteratively rewrite original queries into deceptive prompts, while a defender agent simultaneously optimizes its policy to recognize and refuse such inputs. This dynamic process triggers a \textbf{co-evolution}, where the attacker's ever-changing strategies continuously uncover long-tail vulnerabilities, driving the defender to generalize to unseen attack patterns. Remarkably, we observe that the attacker, endowed with initial reasoning ability, evolves \textbf{novel, previously unseen combinatorial strategies} through iterative RL training, underscoring our method's substantial potential. Theoretically, we provide insights into a more robust game equilibrium and derive safety guarantees. Extensive experiments validate our framework's effectiveness, demonstrating superior defense success rates without compromising the helpfulness of the model. Our code is available at https://github.com/BattleWen/MAGIC.
IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration
The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.
comment: 12 pages, submitted to a journal
Scaling Small Agents Through Strategy Auctions
Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, while smaller agents can closely match larger ones on simple tasks, it remains unclear how their performance scales with task complexity, when large models become necessary, and how to better leverage small agents for long-horizon workloads. In this work, we empirically show that small agents' performance fails to scale with task complexity on deep search and coding tasks, and we introduce Strategy Auctions for Workload Efficiency (SALE), an agent framework inspired by freelancer marketplaces. In SALE, agents bid with short strategic plans, which are scored by a systematic cost-value mechanism and refined via a shared auction memory, enabling per-task routing and continual self-improvement without training a separate router or running all models to completion. Across deep search and coding tasks of varying complexity, SALE reduces reliance on the largest agent by 53%, lowers overall cost by 35%, and consistently improves upon the largest agent's pass@1 with only a negligible overhead beyond executing the final trace. In contrast, established routers that rely on task descriptions either underperform the largest agent or fail to reduce cost -- often both -- underscoring their poor fit for agentic workflows. These results suggest that while small agents may be insufficient for complex workloads, they can be effectively "scaled up" through coordinated task allocation and test-time self-improvement. More broadly, they motivate a systems-level view of agentic AI in which performance gains come less from ever-larger individual models and more from market-inspired coordination mechanisms that organize heterogeneous agents into efficient, adaptive ecosystems.
Exploring Silicon-Based Societies: An Early Study of the Moltbook Agent Community
The rapid emergence of autonomous large language model agents has given rise to persistent, large-scale agent ecosystems whose collective behavior cannot be adequately understood through anecdotal observation or small-scale simulation. This paper introduces data-driven silicon sociology as a systematic empirical framework for studying social structure formation among interacting artificial agents. We present a pioneering large-scale data mining investigation of an in-the-wild agent society by analyzing Moltbook, a social platform designed primarily for agent-to-agent interaction. At the time of study, Moltbook hosted over 150,000 registered autonomous agents operating across thousands of agent-created sub-communities. Using programmatic and non-intrusive data acquisition, we collected and analyzed the textual descriptions of 12,758 submolts, which represent proactive sub-community partitioning activities within the ecosystem. Treating agent-authored descriptions as first-class observational artifacts, we apply rigorous preprocessing, contextual embedding, and unsupervised clustering techniques to uncover latent patterns of thematic organization and social space structuring. The results show that autonomous agents systematically organize collective space through reproducible patterns spanning human-mimetic interests, silicon-centric self-reflection, and early-stage economic and coordination behaviors. Rather than relying on predefined sociological taxonomies, these structures emerge directly from machine-generated data traces. This work establishes a methodological foundation for data-driven silicon sociology and demonstrates that data mining techniques can provide a powerful lens for understanding the organization and evolution of large autonomous agent societies.
comment: 10 pages, 3 figures, a pilot study for silicon-based societies
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
As scientific research becomes increasingly complex, innovative tools are needed to manage vast data, facilitate interdisciplinary collaboration, and accelerate discovery. Large language models (LLMs) are now evolving into LLM-based scientific agents that automate critical tasks ranging from hypothesis generation and experiment design to data analysis and simulation. Unlike general-purpose LLMs, these specialized agents integrate domain-specific knowledge, advanced tool sets, and robust validation mechanisms, enabling them to handle complex data types, ensure reproducibility, and drive scientific breakthroughs. This survey provides a focused review of the architectures, design, benchmarks, applications, and ethical considerations surrounding LLM-based scientific agents. We highlight why they differ from general agents and the ways in which they advance research across various scientific fields. By examining their development and challenges, this survey offers a comprehensive roadmap for researchers and practitioners to harness these agents for more efficient, reliable, and ethically sound scientific discovery.
MAS-Shield: A Defense Framework for Secure and Efficient LLM MAS
Large Language Model (LLM)-based Multi-Agent Systems (MAS) are susceptible to linguistic attacks that can trigger cascading failures across the network. Existing defenses face a fundamental dilemma: lightweight single-auditor methods are prone to single points of failure, while robust committee-based approaches incur prohibitive computational costs in multi-turn interactions. To address this challenge, we propose \textbf{MAS-Shield}, a secure and efficient defense framework designed with a coarse-to-fine filtering pipeline. Rather than applying uniform scrutiny, MAS-Shield dynamically allocates defense resources through a three-stage protocol: (1) \textbf{Critical Agent Selection } strategically targets high-influence nodes to narrow the defense surface; (2) \textbf{Light Auditing} employs lightweight sentry models to rapidly filter the majority of benign cases; and (3) \textbf{Global Consensus Auditing} escalates only suspicious or ambiguous signals to a heavyweight committee for definitive arbitration. This hierarchical design effectively optimizes the security-efficiency trade-off. Experiments demonstrate that MAS-Shield achieves a 92.5\% recovery rate against diverse adversarial scenarios and reduces defense latency by over 70\% compared to existing methods.
Past-Discounting is Key for Learning Markovian Fairness with Long Horizons
Fairness is an important consideration for dynamic resource allocation in multi-agent systems. Many existing methods treat fairness as a one-shot problem without considering temporal dynamics, which misses the nuances of accumulating inequalities over time. Recent approaches overcome this limitation by tracking allocations over time, assuming perfect recall of all past utilities. While the former neglects long-term equity, the latter introduces a critical challenge: the augmented state space required to track cumulative utilities grows unboundedly with time, hindering the scalability and convergence of learning algorithms. Motivated by behavioral insights that human fairness judgments discount distant events, we introduce a framework for temporal fairness that incorporates past-discounting into the learning problem. This approach offers a principled interpolation between instantaneous and perfect-recall fairness. Our central contribution is a past-discounted framework for memory tracking and a theoretical analysis of fairness memories, showing past-discounting guarantees a bounded, horizon-independent state space, a property that we prove perfect-recall methods lack. This result unlocks the ability to learn fair policies tractably over arbitrarily long horizons. We formalize this framework, demonstrate its necessity with experiments showing that perfect recall fails where past-discounting succeeds, and provide a clear path toward building scalable and equitable resource allocation systems.
Reuse, Don't Recompute: Efficient Large Reasoning Model Inference via Memory Orchestration
Large reasoning models (LRMs) achieve strong accuracy through test-time scaling, generating longer chains of thought or sampling multiple solutions, but at steep costs in tokens and latency. We argue that memory is a core ingredient for efficient reasoning: when evidence already exists, models should think less by reusing structured memory instead of recomputing derivations. We present ENGRAM-R, an inference-time memory layer that integrates typed retrieval with compact fact card representations and explicit citation control. On the LoCoMo benchmark, ENGRAM-R reduces input tokens by 85% and reasoning tokens by 75% compared to full context while maintaining high accuracy. On a multi-hop slice of the LongMemEval benchmark, it achieves similar efficiency with substantial accuracy gains. These results show that memory is not only critical for long-horizon correctness but also a practical lever for efficient reasoning under tight compute, memory, and latency budgets.
Systems and Control (EESS)
Robust Safety-Critical Control of Networked SIR Dynamics
We present a robust safety-critical control framework tailored for networked susceptible-infected-recovered (SIR) epidemic dynamics, leveraging control barrier functions (CBFs) and robust control barrier functions to address the challenges of epidemic spread and mitigation. In our networked SIR model, each node must keep its infection level below a critical threshold, despite dynamic interactions with neighboring nodes and inherent uncertainties in the epidemic parameters and measurement errors, to ensure public health safety. We first derive a CBF-based controller that guarantees infection thresholds are not exceeded in the nominal case. We enhance the framework to handle realistic epidemic scenarios under uncertainties by incorporating compensation terms that reinforce safety against uncertainties: an independent method with constant bounds for uniform uncertainty, and a novel approach that scales with the state to capture increased relative noise in early or suppressed outbreak stages. Simulation results on a networked SIR system illustrate that the nominal CBF controller maintains safety under low uncertainty, while the robust approaches provide formal safety guarantees under higher uncertainties; in particular, the novel method employs more conservative control efforts to provide larger safety margins, whereas the independent approach optimizes resource allocation by allowing infection levels to approach the boundaries in steady epidemic regimes.
comment: 8 pages, 7 figures, accepted to the 2026 American Control Conference (ACC)
Preemptive Scheduling for Age of Job Minimization in Task-Specific Machine Networks
We consider a time-slotted job-assignment system consisting of a central server, $N$ task-specific networks of machines, and multiple users. Each network specializes in executing a distinct type of task. Users stochastically generate jobs of various types and forward them to the central server, which routes each job to the appropriate network of machines. Due to resource constraints, the server cannot serve all users' jobs simultaneously, which motivates the design of scheduling policies with possible preemption. To evaluate scheduling performance, we introduce a novel timeliness metric, the age of job, inspired by the well-known metric, the age of information. We study the problem of minimizing the long-term weighted average age of job. We first propose a max-weight policy by minimizing the one-step Lyapunov drift and then derive the Whittle index (WI) policy when the job completion times of the networks of machines follow geometric distributions. For general job completion time distributions, we introduce a Whittle index with max-weight fallback (WIMWF) policy. We also investigate the Net-gain maximization (NGM) policy. Numerically, we show that the proposed WIMWF policy achieves the best performance in the general job completion time setting. We also observe a scaling trend: two different max-weight policies can outperform the NGM policy in small systems, whereas the NGM policy improves as we scale the system size and becomes asymptotically better than max-weight policies. For geometric service times, the WI policy yields the lowest age across all considered system sizes.
Sequential Quadratic Sum-of-squares Programming for Nonlinear Control Systems
Many problems in nonlinear systems analysis and control design, such as local region-of-attraction estimation, inner-approximations of reachable sets or control design under state and control constraints can be formulated as nonconvex sum-of-squares programs. Yet tractable and efficient solution methods are still lacking, limiting their application in control engineering. To address this gap, we propose a filter line-search algorithm that solves a sequence of quadratic subproblems. Numerical benchmarks demonstrate that the algorithm can significantly reduce the number of iterations, resulting in a substantial decrease in computation time compared to established methods for nonconvex sum-of-squares programs. An open-source implementation of the algorithm along with the numerical benchmarks is provided
comment: This work has been submitted to the IEEE Transactions on Control Systems Technology for possible publication
An Efficient Power Management Unit With Continuous MPPT and Energy Recycling for Wireless Millimetric Biomedical Implants
Biomedical implants offer transformative tools to improve medical outcomes. To realize minimally invasive implants with miniaturized volume and weight, wireless power transfer has been extensively studied to replace bulky batteries that dominate the volume of traditional implants and require surgical replacements. Ultra-sonic and magnetoelectric WPT modalities, which leverage low frequency acoustic electrical coupling for energy transduction, become viable solutions for mm-scale receivers. This work presents a fully integrated power management unit for ME WPT in millimetric implants. The PMU achieves load independent maximum power extraction and usage by continuously matching the impedance of the transducer, dynamically optimizing the power stage across varying input divided by load conditions, and reusing the storage energy to sustain the system when input power drops. Its parallel-input regulation and storing stages architecture prevent the cascading power loss. With the skewed-duty-cycle MPPT technique and regulation efficiency optimizer, the PMU achieves a peak MPPT efficiency of 98.5 percent and a peak system overall efficiency of 73.33 percent. Additionally, the PMU includes an adaptive high-voltage charging stage that charges the stimulation capacitor up to 12 V with an improved efficiency of 37.88 percent.
Bridging the Sim-to-Real Gap with multipanda ros2: A Real-Time ROS2 Framework for Multimanual Systems
We present $multipanda\_ros2$, a novel open-source ROS2 architecture for multi-robot control of Franka Robotics robots. Leveraging ros2 control, this framework provides native ROS2 interfaces for controlling any number of robots from a single process. Our core contributions address key challenges in real-time torque control, including interaction control and robot-environment modeling. A central focus of this work is sustaining a 1kHz control frequency, a necessity for real-time control and a minimum frequency required by safety standards. Moreover, we introduce a controllet-feature design pattern that enables controller-switching delays of $\le 2$ ms, facilitating reproducible benchmarking and complex multi-robot interaction scenarios. To bridge the simulation-to-reality (sim2real) gap, we integrate a high-fidelity MuJoCo simulation with quantitative metrics for both kinematic accuracy and dynamic consistency (torques, forces, and control errors). Furthermore, we demonstrate that real-world inertial parameter identification can significantly improve force and torque accuracy, providing a methodology for iterative physics refinement. Our work extends approaches from soft robotics to rigid dual-arm, contact-rich tasks, showcasing a promising method to reduce the sim2real gap and providing a robust, reproducible platform for advanced robotics research.
comment: This work has been submitted to the IEEE for possible publication
Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RL
Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomous systems encounter environmental changes in system dynamics, sensor drift, or task objectives, fixed policies rapidly degrade in performance. We show that employing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible algorithm for online adaptation, can effectively fine-tune a pretrained policy to improve autonomous agents' performance on driving tasks. We further show that RTRRL synergizes with a recent biologically inspired recurrent network model, the Liquid-Resistance Liquid-Capacitance RNN. We demonstrate the effectiveness of this closed-loop approach in a simulated CarRacing environment and in a real-world line-following task with a RoboRacer car equipped with an event camera.
Generating Causal Temporal Interaction Graphs for Counterfactual Validation of Temporal Link Prediction
Temporal link prediction (TLP) models are commonly evaluated based on predictive accuracy, yet such evaluations do not assess whether these models capture the causal mechanisms that govern temporal interactions. In this work, we propose a framework for counterfactual validation of TLP models by generating causal temporal interaction graphs (CTIGs) with known ground-truth causal structure. We first introduce a structural equation model for continuous-time event sequences that supports both excitatory and inhibitory effects, and then extend this mechanism to temporal interaction graphs. To compare causal models, we propose a distance metric based on cross-model predictive error, and empirically validate the hypothesis that predictors trained on one causal model degrade when evaluated on sufficiently distant models. Finally, we instantiate counterfactual evaluation under (i) controlled causal shifts between generating models and (ii) timestamp shuffling as a stochastic distortion with measurable causal distance. Our framework provides a foundation for causality-aware benchmarking.
DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations
Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for safe and energy-efficient operation. However, manually designing piecewise deep reinforcement learning (DRL) agents cannot keep pace with frequent dynamics shifts and service-level agreement (SLA) changes of an evolving DC. This specification-to-policy lag causes a lack of timely, effective control policies, which may lead to service outages. To bridge the gap, we present DCoPilot, a hybrid framework for generative control policies in dynamic DC operation. DCoPilot synergizes two distinct generative paradigms, i.e., a large language model (LLM) that performs symbolic generation of structured reward forms, and a hypernetwork that conducts parametric generation of policy weights. DCoPilot operates through three coordinated phases: (i) simulation scale-up, which stress-tests reward candidates across diverse simulation-ready (SimReady) scenes; (ii) meta policy distillation, where a hypernetwork is trained to output policy weights conditioned on SLA and scene embeddings; and (iii) online adaptation, enabling zero-shot policy generation in response to updated specifications. Evaluated across five control task families spanning diverse DC components, DCoPilot achieves near-zero constraint violations and outperforms all baselines across specification variations. Ablation studies validate the effectiveness of LLM-based unified reward generation in enabling stable hypernetwork convergence.
Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks
Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.
Position: The Need for Ultrafast Training
Domain-specialized FPGAs have delivered unprecedented performance for low-latency inference across scientific and industrial workloads, yet nearly all existing accelerators assume static models trained offline, relegating learning and adaptation to slower CPUs or GPUs. This separation fundamentally limits systems that must operate in non-stationary, high-frequency environments, where model updates must occur at the timescale of the underlying physics. In this paper, I argue for a shift from inference-only accelerators to ultrafast on-chip learning, in which both inference and training execute directly within the FPGA fabric under deterministic, sub-microsecond latency constraints. Bringing learning into the same real-time datapath as inference would enable closed-loop systems that adapt as fast as the physical processes they control, with applications spanning quantum error correction, cryogenic qubit calibration, plasma and fusion control, accelerator tuning, and autonomous scientific experiments. Enabling such regimes requires rethinking algorithms, architectures, and toolflows jointly, but promises to transform FPGAs from static inference engines into real-time learning machines.
comment: Position paper at the 2nd Workshop on Domain-Specialized FPGAs (WDSFPGA 2026)
Path Tracking with Dynamic Control Point Blending for Autonomous Vehicles: An Experimental Study
This paper presents an experimental study of a path-tracking framework for autonomous vehicles in which the lateral control command is applied to a dynamic control point along the wheelbase. Instead of enforcing a fixed reference at either the front or rear axle, the proposed method continuously interpolates between both, enabling smooth adaptation across driving contexts, including low-speed maneuvers and reverse motion. The lateral steering command is obtained by barycentric blending of two complementary controllers: a front-axle Stanley formulation and a rear-axle curvature-based geometric controller, yielding continuous transitions in steering behavior and improved tracking stability. In addition, we introduce a curvature-aware longitudinal control strategy based on virtual track borders and ray-tracing, which converts upcoming geometric constraints into a virtual obstacle distance and regulates speed accordingly. The complete approach is implemented in a unified control stack and validated in simulation and on a real autonomous vehicle equipped with GPS-RTK, radar, odometry, and IMU. The results in closed-loop tracking and backward maneuvers show improved trajectory accuracy, smoother steering profiles, and increased adaptability compared to fixed control-point baselines.
Super-twisting over networks: A Lyapunov approach for distributed differentiation
We study distributed differentiation, where agents in a networked system estimate the average of local time-varying signals and their derivatives under mild assumptions on the agents' signals and their first and second derivatives. Existing sliding-mode methods provide only local stability guarantees and lack systematic gain selection. By isolating the structural features shared with the super-twisting algorithm and encoding them into an abstract model, we construct a Lyapunov function enabling systematic gain design and proving global finite-time convergence to consensus for the distributed differentiator. Building on this framework, we develop an event-triggered hybrid system implementation using time-varying and state dependent threshold rules and derive minimum inter-event time guarantees and accuracy bounds that quantify the trade-off between estimation accuracy and communication effort.
comment: Preprint. Submitted for possible publication
Fostering Data Collaboration in Digital Transportation Marketplaces: The Role of Privacy-Preserving Mechanisms
Data collaboration between municipal authorities (MA) and mobility providers (MPs) has brought tremendous benefits to transportation systems in the era of big data. Engaging in collaboration can improve the service operations (e.g., reduced delay) of these data owners, however, it can also raise privacy concerns and discourage data-sharing willingness. Specifically, data owners may be concerned that the shared data may leak sensitive information about their customers' mobility patterns or business secrets, resulting in the failure of collaboration. This paper investigates how privacy-preserving mechanisms can foster data collaboration in such settings. We propose a game-theoretic framework to investigate data-sharing among transportation stakeholders, especially considering perturbation-based privacy-preserving mechanisms. Numerical studies demonstrate that lower data quality expectations can incentivize voluntary data sharing, improving transport-related welfare for both MAs and MPs. Our findings provide actionable insights for policymakers and system designers on how privacy-preserving technologies can help bridge data silos and promote collaborative, privacy-aware transportation systems.
Synthesized-Isotropic Narrowband Channel Parameter Extraction from Angle-Resolved Wideband Channel Measurements
Angle-resolved channel sounding using antenna arrays or mechanically steered high-gain antennas is widely employed at millimeter-wave and terahertz bands. To extract antenna-independent large-scale channel parameters such as path loss, delay spread, and angular spread, the radiation-pattern effects embedded in the measured responses must be properly compensated. This paper revisits the technical challenges of path-gain calculation from angle-resolved wideband measurements, with emphasis on angular-domain power integration where the scan beams are inherently non-orthogonal and simple power summation leads to biased omni-equivalent power estimates. We first formulate the synthesized-isotropic narrowband power in a unified matrix form and introduce a beam-accumulation correction factor, including an offset-averaged variant to mitigate scalloping due to off-grid angles. The proposed framework is validated through simulations using channel models and 154~GHz corridor measurements.
AdaptNC: Adaptive Nonconformity Scores for Uncertainty-Aware Autonomous Systems in Dynamic Environments
Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework for this task, yet its standard formulations rely on exchangeability assumptions that are violated by the distribution shifts inherent in real-world robotics. Existing online CP methods maintain target coverage by adaptively scaling the conformal threshold, but typically employ a static nonconformity score function. We show that this fixed geometry leads to highly conservative, volume-inefficient prediction regions when environments undergo structural shifts. To address this, we propose \textbf{AdaptNC}, a framework for the joint online adaptation of both the nonconformity score parameters and the conformal threshold. AdaptNC leverages an adaptive reweighting scheme to optimize score functions, and introduces a replay buffer mechanism to mitigate the coverage instability that occurs during score transitions. We evaluate AdaptNC on diverse robotic benchmarks involving multi-agent policy changes, environmental changes and sensor degradation. Our results demonstrate that AdaptNC significantly reduces prediction region volume compared to state-of-the-art threshold-only baselines while maintaining target coverage levels.
LMI Optimization Based Multirate Steady-State Kalman Filter Design
This paper presents an LMI-based design framework for multirate steady-state Kalman filters in systems with sensors operating at different sampling rates. The multirate system is formulated as a periodic time-varying system, where the Kalman gains converge to periodic steady-state values that repeat every frame period. Cyclic reformulation transforms this into a time-invariant problem; however, the resulting measurement noise covariance becomes semidefinite rather than positive definite, preventing direct application of standard Riccati equation methods. We address this through a dual LQR formulation with LMI optimization that naturally handles semidefinite covariances. The framework enables multi-objective design, supporting pole placement for guaranteed convergence rates and mixed H_2/l_2-induced norm design for balancing average and worst-case performance. Numerical validation using an automotive navigation system with GPS and wheel speed sensors demonstrates that the proposed filter achieves estimation errors well below raw measurement noise levels.
comment: Submit to IEEE ACCESS
Hybrid Control Technique for Switched LPV Systems and Its Application to Active Magnetic Bearing System
This paper proposes a novel hybrid control framework for switched linear parameter-varying (LPV) systems under hysteresis switching logic. By introducing a controller state-reset mechanism, the hybrid LPV synthesis problem is reformulated as a convex optimization problem expressed in terms of linear matrix inequalities (LMIs), enabling efficient computation of both switching LPV controller gains and reset matrices. The proposed approach is then applied to active magnetic bearing (AMB) systems, whose rotor dynamics exhibit strong dependence on rotational speed. Conventional LPV designs are often conservative due to large speed variations. The proposed hybrid gain-scheduled controller explicitly accounts for bounds on parameter variation rates, employs multiple LPV controllers over distinct operating regions, and uses hysteresis switching to reduce chattering and ensure stability. The effectiveness of the approach is demonstrated through a detailed AMB control design example.
White-Box Neural Ensemble for Vehicular Plasticity: Quantifying the Efficiency Cost of Symbolic Auditability in Adaptive NMPC
We present a white-box adaptive NMPC architecture that resolves vehicular plasticity (adaptation to varying operating regimes without retraining) by arbitrating among frozen, regime-specific neural specialists using a Modular Sovereignty paradigm. The ensemble dynamics are maintained as a fully traversable symbolic graph in CasADi, enabling maximal runtime auditability. Synchronous simulation validates rapid adaptation (~7.3 ms) and near-ideal tracking fidelity under compound regime shifts (friction, mass, drag) where non-adaptive baselines fail. Empirical benchmarking quantifies the transparency cost: symbolic graph maintenance increases solver latency by 72-102X versus compiled parametric physics models, establishing the efficiency price of strict white-box implementation.
comment: 5 pages, 1 table, 1 figure, submitted to IEEE VTC 2026 Recent Results Track
Harnessing Flexible Spatial and Temporal Data Center Workloads for Grid Regulation Services
Data centers (DCs) are increasingly recognized as flexible loads that can support grid frequency regulation. Yet, most existing methods treat workload scheduling and regulation capacity bidding separately, overlooking how queueing dynamics and spatial-temporal dispatch decisions affect the ability to sustain real-time regulation. As a result, the committed regulation may become infeasible or short-lived. To address this issue, we propose a unified day-ahead co-optimization framework that jointly decides workload distribution across geographically distributed DCs and regulation capacity commitments. We construct a space-time network model to capture workload migration costs, latency requirements, and heterogeneous resource limits. To ensure that the committed regulation remains deliverable, we introduce chance constraints on instantaneous power flexibility based on interactive load forecasts, and apply Value-at-Risk queue-state constraints to maintain sustainable response under cumulative regulation signals. Case studies on a modified IEEE 68-bus system using real data center traces show that the proposed framework lowers system operating costs, enables more viable regulation capacity, and achieves better revenue-risk trade-offs compared to strategies that optimize scheduling and regulation independently.
Optimal Sizing of Charging Energy Hubs for Heavy-Duty Electric Transport through Co-Optimization
Electrification of heavy-duty vehicles places substantial stress on distribution grids, and Charging Energy Hubs (CEHs) mitigate these impacts by integrating charging infrastructure with renewable energy sources and battery storage. Optimal sizing of CEH components is therefore a critical investment decision, yet challenging because design choices depend strongly on operational dynamics. This work presents a mixed-integer linear programming model for the optimal sizing of CEH components, using a co-design approach that jointly optimizes component sizing and operational decisions. A case study for a heavy-duty fleet demonstrates the effectiveness of the method for cost-efficient, scalable, and grid-compliant CEH planning.
From Discrete to Continuous Mixed Populations of Conformists, Nonconformists, and Imitators
In two-strategy decision-making problems, individuals often imitate the highest earners or choose either the common or rare strategy. Individuals who benefit from the common strategy are conformists, whereas those who profit by choosing the less common one are called nonconformists. The population proportions of the two strategies may undergo perpetual fluctuations in finite, discrete, heterogeneous populations of imitators, conformists, and nonconformists. How these fluctuations evolve as population size increases was left as an open question and is addressed in this paper. We show that the family of Markov chains describing the discrete population dynamics forms a generalized stochastic approximation process for a differential inclusion--the continuous-time dynamics. Furthermore, we prove that the continuous-time dynamics always equilibrate. Then, by leveraging results from the stochastic approximation theory, we show that the amplitudes of fluctuations in the proportions of the two strategies in the population approach zero with probability one when the population size grows to infinity. Our results suggest that large-scale perpetual fluctuations are unlikely in large, well-mixed populations consisting of these three types, particularly when imitators follow the highest earners.
How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models?
Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL methods primarily focus on offline settings for reward maximization, with limited consideration of safety in online settings. To address this gap, we propose Augmented Lagrangian-Guided Diffusion (ALGD), a novel algorithm for off-policy safe RL. By revisiting optimization theory and energy-based model, we show that the instability of primal-dual methods arises from the non-convex Lagrangian landscape. In diffusion-based safe RL, the Lagrangian can be interpreted as an energy function guiding the denoising dynamics. Counterintuitively, direct usage destabilizes both policy generation and training. ALGD resolves this issue by introducing an augmented Lagrangian that locally convexifies the energy landscape, yielding a stabilized policy generation and training process without altering the distribution of the optimal policy. Theoretical analysis and extensive experiments demonstrate that ALGD is both theoretically grounded and empirically effective, achieving strong and stable performance across diverse environments.
Spatiotemporal Decision Transformer for Traffic Coordination
Traffic signal control is a critical challenge in urban transportation, requiring coordination among multiple intersections to optimize network-wide traffic flow. While reinforcement learning has shown promise for adaptive signal control, existing methods struggle with multi-agent coordination and sample efficiency. We introduce MADT (Multi-Agent Decision Transformer), a novel approach that reformulates multi-agent traffic signal control as a sequence modeling problem. MADT extends the Decision Transformer paradigm to multi-agent settings by incorporating: (1) a graph attention mechanism for modeling spatial dependencies between intersections, (2) a|temporal transformer encoder for capturing traffic dynamics, and (3) return-to-go conditioning for target performance specification. Our approach enables offline learning from historical traffic data, with architecture design that facilitates potential online fine-tuning. Experiments on synthetic grid networks and real-world traffic scenarios demonstrate that MADT achieves state-of-the-art performance, reducing average travel time by 5-6% compared to the strongest baseline while exhibiting superior coordination among adjacent intersections.
Estimation of Cell-to-Cell Variation and State of Health for Battery Modules with Parallel-Connected Cells
Estimating cell-to-cell variation (CtCV) and state of health (SoH) for battery modules with parallel-connected cells is challenging when only module-level signals are measurable and individual cell behaviors remain unobserved. Although progress has been made in SoH estimation, CtCV estimation remains unresolved in the literature. This paper proposes a unified framework that accurately estimates both CtCV and SoH for modules using only module-level information extracted from incremental capacity analysis (ICA) and differential voltage analysis (DVA). With the proposed framework, CtCV and SoH estimations can be decoupled into two separate tasks, allowing each to be solved with dedicated algorithms without mutual interference and providing greater design flexibility. The framework also exhibits strong versatility in accommodating different CtCV metrics, highlighting its general-purpose nature. Experimental validation on modules with three parallel-connected cells demonstrates that the proposed framework can systematically select optimal module-level features for CtCV and SoH estimations, deliver accurate CtCV and SoH estimates with high confidence and low computational complexity, remain effective across different C-rates, and be suitable for onboard implementation.
IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration
The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.
comment: 12 pages, submitted to a journal
Sub-optimality bounds for certainty equivalent policies in partially observed systems
In this paper, we present a generalization of the certainty equivalence principle of stochastic control. One interpretation of the classical certainty equivalence principle for linear systems with output feedback and quadratic costs is as follows: the optimal action at each time is obtained by evaluating the optimal state-feedback policy of the stochastic linear system at the minimum mean square error (MMSE) estimate of the state. Motivated by this interpretation, we consider certainty equivalent policies for general (non-linear) partially observed stochastic systems that allow for any state estimate rather than restricting to MMSE estimates. In such settings, the certainty equivalent policy is not optimal. For models where the cost and the dynamics are smooth in an appropriate sense, we derive upper bounds on the sub-optimality of certainty equivalent policies. We present several examples to illustrate the results.
comment: 12 pages, 0 figures
Computationally Efficient Laplacian CL-colME
Decentralized collaborative mean estimation (colME) is a fundamental task in heterogeneous networks. Its graph-based variants B-colME and C-colME achieve high scalability of the problem. This paper evaluates the consensus-based C-colME framework, which relies on doubly stochastic averaging matrices to ensure convergence to the oracle solution. We propose CL-colME, a novel variant utilizing Laplacian-based consensus to avoid the computationally expensive normalization processes. Simulation results show that the proposed CL-colME maintains the convergence behavior and accuracy of C-colME while improving computational efficiency.
comment: 4 pages, 1 figure
State Estimation and Control for Continuous-Time Nonlinear Systems: A Unified SDRE-Based Approach
This paper introduces a unified approach for state estimation and control of nonlinear dynamic systems, employing the State-Dependent Riccati Equation (SDRE) framework. The proposed approach naturally extends classical linear quadratic Gaussian (LQG) methods into nonlinear scenarios, avoiding linearization by using state-dependent coefficient (SDC) matrices. An SDRE-based Kalman filter (SDRE-KF) is integrated within an SDRE-based control structure, providing a coherent and intuitive strategy for nonlinear system analysis and control design. To evaluate the effectiveness and robustness of the proposed methodology, comparative simulations are conducted on two benchmark nonlinear systems: a simple pendulum and a Van der Pol oscillator. Results demonstrate that the SDRE-KF achieves comparable or superior estimation accuracy compared to traditional methods, including the Extended Kalman Filter (EKF) and the Particle Filter (PF). These findings underline the potential of the unified SDRE-based approach as a viable alternative for nonlinear state estimation and control, providing valuable insights for both educational purposes and practical engineering applications.
comment: Accepted and published in: IEEE CoDIT 2025. 6 pages, 5 figures
Bio-inspired density control of multi-agent swarms via leader-follower plasticity
The design of control systems for the spatial self-organization of mobile agents is an open challenge across several engineering domains, including swarm robotics and synthetic biology. Here, we propose a bio-inspired leader-follower solution, which is aware of energy constraints of mobile agents and is apt to deal with large swarms. Akin to many natural systems, control objectives are formulated for the entire collective, and leaders and followers are allowed to plastically switch their role in time. We frame a density control problem, modeling the agents' population via a system of nonlinear partial differential equations. This approach allows for a compact description that inherently avoids the curse of dimensionality and improves analytical tractability. We derive analytical guarantees for the existence of desired steady-state solutions and their local stability for one-dimensional and higher-dimensional problems. We numerically validate our control methodology, offering support to the effectiveness, robustness, and versatility of our proposed bio-inspired control strategy.
Learning-Augmented Power System Operations: A Unified Optimization View
With the increasing penetration of renewable energy, traditional physics-based power system operation faces growing challenges in achieving economic efficiency, stability, and robustness. Machine learning (ML) has emerged as a powerful tool for modeling complex system dynamics to address these challenges. However, existing ML designs are often developed in isolation and lack systematic integration with established operational decision frameworks. To bridge this gap, this paper proposes a holistic framework of Learning-Augmented Power System Operations (LAPSO, pronounced Lap-So). From a native mathematical optimization perspective, LAPSO is centered on the operation stage and aims to unify traditionally siloed power system tasks such as forecasting, operation, and control. The framework jointly optimizes machine learning and physics-based models at both the training and inference stages. Then, a complete set of design metrics is introduced to quantify and evaluate the impact of ML models on the existing decision-makings. These metrics facilitate a deeper understanding of representative applications such as stability-constrained optimization (SCO) and objective-based forecasting (OBF). Moreover, LAPSO is inherently extensible to emerging learning paradigms that integrate forecasting, operation, and control in a closed loop. It also enables the systematic identification and mitigation of different sources and timings of uncertainty from Bayesian perspective. Finally, a dedicated Python package \texttt{lapso} is developed to automatically augment existing power system optimization models with learnable components. All source code and datasets are publicly available at: https://github.com/xuwkk/lapso_exp.
SPARC: Spine with Prismatic and Revolute Compliance for Quadruped Robots
Quadruped mammals coordinate spinal bending and axial compression to enhance locomotion agility and efficiency. However, existing robotic spines typically lack the active compliance required to support such dynamic behaviours. We present SPARC, a compact 3-DoF sagittal-plane spine module that enables simultaneous revolute and prismatic motions within a 1.26 kg package. Using a floating-base impedance controller, we facilitate independent, task-space tuning of spinal stiffness and damping to mimic biological load-bearing strategies. Benchtop experiments confirm high-fidelity rendering of commanded impedance, with linear force-displacement error within 1.5%. Systematic locomotion simulations reveal a critical speed-dependency: while low-speed efficiency is insensitive to spinal properties, precise impedance tuning becomes indispensable for high-speed performance. Our results demonstrate that an optimally compliant spine reduces power consumption by 21% at 0.9 m/s compared to a rigid-spine baseline. This efficiency gain is mechanistically attributed to the spine's role in augmenting stride length and acting as a mechanical low-pass filter to attenuate high-frequency torque fluctuations. SPARC provides an open-source platform for systematic studies of spine compliance in legged locomotion. Available at: github.com/YueWang996/sparc
Real-Time Vibration-Based Bearing Fault Diagnosis Under Time-Varying Speed Conditions
Detection of rolling-element bearing faults is crucial for implementing proactive maintenance strategies and for minimizing the economic and operational consequences of unexpected failures. However, many existing techniques are developed and tested under strictly controlled conditions, limiting their adaptability to the diverse and dynamic settings encountered in practical applications. This paper presents an efficient real-time convolutional neural network (CNN) for diagnosing multiple bearing faults under various noise levels and time-varying rotational speeds. Additionally, we propose a novel Fisher-based spectral separability analysis (SSA) method to elucidate the effectiveness of the designed CNN model. We conducted experiments on both healthy bearings and bearings afflicted with inner race, outer race, and roller ball faults. The experimental results show the superiority of our model over the current state-of-the-art approach in three folds: it achieves substantial accuracy gains of up to 15.8%, it is robust to noise with high performance across various signal-to-noise ratios, and it runs in real-time with processing durations five times less than acquisition. Additionally, by using the proposed SSA technique, we offer insights into the model's performance and underscore its effectiveness in tackling real-world challenges.
Line-Search Filter Differential Dynamic Programming for Optimal Control with Nonlinear Equality Constraints ICRA
We present FilterDDP, a differential dynamic programming algorithm for solving discrete-time, optimal control problems (OCPs) with nonlinear equality constraints. Unlike prior methods based on merit functions or the augmented Lagrangian class of algorithms, FilterDDP uses a step filter in conjunction with a line search to handle equality constraints. We identify two important design choices for the step filter criteria which lead to robust numerical performance: 1) we use the Lagrangian instead of the cost in the step acceptance criterion and, 2) in the backward pass, we perturb the value function Hessian. Both choices are rigorously justified, for 2) in particular by a formal proof of local quadratic convergence. In addition to providing a primal-dual interior point extension for handling OCPs with both equality and inequality constraints, we validate FilterDDP on three contact implicit trajectory optimisation problems which arise in robotics.
comment: Accepted for publication in the IEEE International Conference on Robotics and Automation (ICRA) 2026
Distributed Koopman Operator Learning from Sequential Observations
This paper presents a distributed Koopman operator learning framework for modeling unknown nonlinear dynamics using sequential observations from multiple agents. Each agent estimates a local Koopman approximation based on lifted data and collaborates over a communication graph to reach exponential consensus on a consistent distributed approximation. The approach supports distributed computation under asynchronous and resource-constrained sensing. Its performance is demonstrated through simulation results, validating convergence and predictive accuracy under sensing-constrained scenarios and limited communication.
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics can mask this operational asymmetry. We introduce an operator-legible evaluation framework -- Under-Prediction Rate (UPR), tail Reserve$_{99.5}^{\%}$ requirements, and explicit inflation diagnostics (Bias$_{24h}$/OPR) -- to quantify one-sided reliability risk beyond MAPE. Using this framework, we evaluate state space models (Mamba variants) and strong baselines on a weather-aligned California Independent System Operator (CAISO) dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 regional transmission areas) under a rolling-origin walk-forward backtest. We develop and evaluate thermal-lag-aligned weather fusion strategies for these architectures. Our results demonstrate that standard accuracy metrics are insufficient proxies for operational safety: models with comparable MAPE can imply materially different tail reserve requirements (Reserve$_{99.5}^{\%}$). We show that explicit weather integration narrows error distributions, reducing the impact of temperature-driven demand spikes. Furthermore, while probabilistic calibration reduces large-error events, it can induce systematic schedule inflation. We introduce Bias/OPR-constrained objectives to enable auditable trade-offs between minimizing tail risk and preventing trivial over-forecasting.
comment: 30 pages, 7 figures, 9 tables
System Identification for Virtual Sensor-Based Model Predictive Control: Application to a 2-DoF Direct-Drive Robotic Arm
Nonlinear Model Predictive Control (NMPC) offers a powerful approach for controlling complex nonlinear systems, yet faces two key challenges. First, accurately modeling nonlinear dynamics remains difficult. Second, variables directly related to control objectives often cannot be directly measured during operation. Although high-cost sensors can acquire these variables during model development, their use in practical deployment is typically infeasible. To overcome these limitations, we propose a Predictive Virtual Sensor Identification (PVSID) framework that leverages temporary high-cost sensors during the modeling phase to create virtual sensors for NMPC implementation. We validate PVSID on a Two-Degree-of-Freedom (2-DoF) direct-drive robotic arm with complex joint interactions, capturing tip position via motion capture during modeling and utilize an Inertial Measurement Unit (IMU) in NMPC. Experimental results show our NMPC with identified virtual sensors achieves precise tip trajectory tracking without requiring the motion capture system during operation. PVSID offers a practical solution for implementing optimal control in nonlinear systems where the measurement of key variables is constrained by cost or operational limitations.
comment: 6 pages, 5 figures. Published in the proceedings of the 2025 IEEE 64th Conference on Decision and Control (CDC 2025)
Safe and Stable Neural Network Dynamical Systems for Robot Motion Planning
Learning safe and stable robot motions from demonstrations remains a challenge, especially in complex, nonlinear tasks involving dynamic, obstacle-rich environments. In this paper, we propose Safe and Stable Neural Network Dynamical Systems S$^2$-NNDS, a learning-from-demonstration framework that simultaneously learns expressive neural dynamical systems alongside neural Lyapunov stability and barrier safety certificates. Unlike traditional approaches with restrictive polynomial parameterizations, S$^2$-NNDS leverages neural networks to capture complex robot motions, providing probabilistic guarantees through split conformal prediction in learned certificates. Experimental results in various 2D and 3D datasets -- including LASA handwriting and demonstrations recorded kinesthetically from the Franka Emika Panda robot -- validate the effectiveness of S$^2$-NNDS in learning robust, safe, and stable motions from potentially unsafe demonstrations. The source code, supplementary material and experiment videos can be accessed via https://github.com/allemmbinn/S2NNDS
comment: Accepted for publication in IEEE Robotics and Automation Letters (RA-L)
Robotics
Towards a Novel Wearable Robotic Vest for Hemorrhage Suppression
This paper introduces a novel robotic system designed to manage severe bleeding in emergency scenarios, including unique environments like space stations. The robot features a shape-adjustable "ring mechanism", transitioning from a circular to an elliptical configuration to adjust wound coverage across various anatomical regions. We developed various arms for this ring mechanism with varying flexibilities to improve adaptability when applied to non-extremities of the body (abdomen, back, neck, etc.). To apply equal and constant pressure across the wound, we developed an inflatable ring and airbag balloon that are compatible with this shape-changing ring mechanism. A series of experiments focused on evaluating various ring arm configurations to characterize their bending stiffness. Subsequent experiments measured the force exerted by the airbag balloon system using a digital scale. Despite its promising performance, certain limitations related to coverage area are identified. The shape-changing effect of the device is limited to scenarios involving partially inflated or deflated airbag balloons, and cannot fully conform to complex anatomical regions. Finally, the device was tested on casualty simulation kits, where it successfully demonstrated its ability to control simulated bleeding.
Sem-NaVAE: Semantically-Guided Outdoor Mapless Navigation via Generative Trajectory Priors
This work presents a mapless global navigation approach for outdoor applications. It combines the exploratory capacity of conditional variational autoencoders (CVAEs) to generate trajectories and the semantic segmentation capabilities of a lightweight visual language model (VLM) to select the trajectory to execute. Open-vocabulary segmentation is used to score and select the generated trajectories based on natural language, and a state-of-the-art local planner executes velocity commands. One of the key features of the proposed approach is its ability to generate a large variability of trajectories and to select them and navigate in real-time. The approach was validated through real-world outdoor navigation experiments, achieving superior performance compared to state-of-the-art methods. A video showing an experimental run of the system can be found in https://www.youtube.com/watch?v=i3R5ey5O2yk.
comment: 8 pages, 5 figures
Instance-Guided Unsupervised Domain Adaptation for Robotic Semantic Segmentation ICRA 2026
Semantic segmentation networks, which are essential for robotic perception, often suffer from performance degradation when the visual distribution of the deployment environment differs from that of the source dataset on which they were trained. Unsupervised Domain Adaptation (UDA) addresses this challenge by adapting the network to the robot's target environment without external supervision, leveraging the large amounts of data a robot might naturally collect during long-term operation. In such settings, UDA methods can exploit multi-view consistency across the environment's map to fine-tune the model in an unsupervised fashion and mitigate domain shift. However, these approaches remain sensitive to cross-view instance-level inconsistencies. In this work, we propose a method that starts from a volumetric 3D map to generate multi-view consistent pseudo-labels. We then refine these labels using the zero-shot instance segmentation capabilities of a foundation model, enforcing instance-level coherence. The refined annotations serve as supervision for self-supervised fine-tuning, enabling the robot to adapt its perception system at deployment time. Experiments on real-world data demonstrate that our approach consistently improves performance over state-of-the-art UDA baselines based on multi-view consistency, without requiring any ground-truth labels in the target domain.
comment: Accepted for publication at ICRA 2026
TriphiBot: A Triphibious Robot Combining FOC-based Propulsion with Eccentric Design
Triphibious robots capable of multi-domain motion and cross-domain transitions are promising to handle complex tasks across diverse environments. However, existing designs primarily focus on dual-mode platforms, and some designs suffer from high mechanical complexity or low propulsion efficiency, which limits their application. In this paper, we propose a novel triphibious robot capable of aerial, terrestrial, and aquatic motion, by a minimalist design combining a quadcopter structure with two passive wheels, without extra actuators. To address inefficiency of ground-support motion (moving on land/seabed) for quadcopter based designs, we introduce an eccentric Center of Gravity (CoG) design that inherently aligns thrust with motion, enhancing efficiency without specialized mechanical transformation designs. Furthermore, to address the drastic differences in motion control caused by different fluids (air and water), we develop a unified propulsion system based on Field-Oriented Control (FOC). This method resolves torque matching issues and enables precise, rapid bidirectional thrust across different mediums. Grounded in the perspective of living condition and ground support, we analyse the robot's dynamics and propose a Hybrid Nonlinear Model Predictive Control (HNMPC)-PID control system to ensure stable multi-domain motion and seamless transitions. Experimental results validate the robot's multi-domain motion and cross-mode transition capability, along with the efficiency and adaptability of the proposed propulsion system.
OASIS-DC: Generalizable Depth Completion via Output-level Alignment of Sparse-Integrated Monocular Pseudo Depth ICRA 2026
Recent monocular foundation models excel at zero-shot depth estimation, yet their outputs are inherently relative rather than metric, limiting direct use in robotics and autonomous driving. We leverage the fact that relative depth preserves global layout and boundaries: by calibrating it with sparse range measurements, we transform it into a pseudo metric depth prior. Building on this prior, we design a refinement network that follows the prior where reliable and deviates where necessary, enabling accurate metric predictions from very few labeled samples. The resulting system is particularly effective when curated validation data are unavailable, sustaining stable scale and sharp edges across few-shot regimes. These findings suggest that coupling foundation priors with sparse anchors is a practical route to robust, deployment-ready depth completion under real-world label scarcity.
comment: Accepted to ICRA 2026
Reinforcement Learning for Active Perception in Autonomous Navigation ICRA
This paper addresses the challenge of active perception within autonomous navigation in complex, unknown environments. Revisiting the foundational principles of active perception, we introduce an end-to-end reinforcement learning framework in which a robot must not only reach a goal while avoiding obstacles, but also actively control its onboard camera to enhance situational awareness. The policy receives observations comprising the robot state, the current depth frame, and a particularly local geometry representation built from a short history of depth readings. To couple collision-free motion planning with information-driven active camera control, we augment the navigation reward with a voxel-based information metric. This enables an aerial robot to learn a robust policy that balances goal-directed motion with exploratory sensing. Extensive evaluation demonstrates that our strategy achieves safer flight compared to using fixed, non-actuated camera baselines while also inducing intrinsic exploratory behaviors.
comment: Accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2026
SkySim: A ROS2-based Simulation Environment for Natural Language Control of Drone Swarms using Large Language Models
Unmanned Aerial Vehicle (UAV) swarms offer versatile applications in logistics, agriculture, and surveillance, yet controlling them requires expert knowledge for safety and feasibility. Traditional static methods limit adaptability, while Large Language Models (LLMs) enable natural language control but generate unsafe trajectories due to lacking physical grounding. This paper introduces SkySim, a ROS2-based simulation framework in Gazebo that decouples LLM high-level planning from low-level safety enforcement. Using Gemini 3.5 Pro, SkySim translates user commands (e.g., "Form a circle") into spatial waypoints, informed by real-time drone states. An Artificial Potential Field (APF) safety filter applies minimal adjustments for collision avoidance, kinematic limits, and geo-fencing, ensuring feasible execution at 20 Hz. Experiments with swarms of 3, 10, and 30 Crazyflie drones validate spatial reasoning accuracy (100% across tested geometric primitives), real-time collision prevention, and scalability. SkySim empowers non-experts to iteratively refine behaviors, bridging AI cognition with robotic safety for dynamic environments. Future work targets hardware integration.
SPOT: Spatio-Temporal Obstacle-free Trajectory Planning for UAVs in an Unknown Dynamic Environment
We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
Latent Reasoning VLA: Latent Thinking and Prediction for Vision-Language-Action Models
Vision-Language-Action (VLA) models benefit from chain-of-thought (CoT) reasoning, but existing approaches incur high inference overhead and rely on discrete reasoning representations that mismatch continuous perception and control. We propose Latent Reasoning VLA (\textbf{LaRA-VLA}), a unified VLA framework that internalizes multi-modal CoT reasoning into continuous latent representations for embodied action. LaRA-VLA performs unified reasoning and prediction in latent space, eliminating explicit CoT generation at inference time and enabling efficient, action-oriented control. To realize latent embodied reasoning, we introduce a curriculum-based training paradigm that progressively transitions from explicit textual and visual CoT supervision to latent reasoning, and finally adapts latent reasoning dynamics to condition action generation. We construct two structured CoT datasets and evaluate LaRA-VLA on both simulation benchmarks and long-horizon real-robot manipulation tasks. Experimental results show that LaRA-VLA consistently outperforms state-of-the-art VLA methods while reducing inference latency by up to 90\% compared to explicit CoT-based approaches, demonstrating latent reasoning as an effective and efficient paradigm for real-time embodied control. Project Page: \href{https://loveju1y.github.io/Latent-Reasoning-VLA/}{LaRA-VLA Website}.
Improving Robustness of Vision-Language-Action Models by Restoring Corrupted Visual Inputs
Vision-Language-Action (VLA) models have emerged as a dominant paradigm for generalist robotic manipulation, unifying perception and control within a single end-to-end architecture. However, despite their success in controlled environments, reliable real-world deployment is severely hindered by their fragility to visual disturbances. While existing literature extensively addresses physical occlusions caused by scene geometry, a critical mode remains largely unexplored: image corruptions. These sensor-level artifacts, ranging from electronic noise and dead pixels to lens contaminants, directly compromise the integrity of the visual signal prior to interpretation. In this work, we quantify this vulnerability, demonstrating that state-of-the-art VLAs such as $π_{0.5}$ and SmolVLA, suffer catastrophic performance degradation, dropping from 90\% success rates to as low as 2\%, under common signal artifacts. To mitigate this, we introduce the Corruption Restoration Transformer (CRT), a plug-and-play and model-agnostic vision transformer designed to immunize VLA models against sensor disturbances. Leveraging an adversarial training objective, CRT restores clean observations from corrupted inputs without requiring computationally expensive fine-tuning of the underlying model. Extensive experiments across the LIBERO and Meta-World benchmarks demonstrate that CRT effectively recovers lost performance, enabling VLAs to maintain near-baseline success rates, even under severe visual corruption.
PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning ICLR 2026
Among on-policy reinforcement learning algorithms, Proximal Policy Optimization (PPO) demonstrates is widely favored for its simplicity, numerical stability, and strong empirical performance. Standard PPO relies on surrogate objectives defined via importance ratios, which require evaluating policy likelihood that is typically straightforward when the policy is modeled as a Gaussian distribution. However, extending PPO to more expressive, high-capacity policy models such as continuous normalizing flows (CNFs), also known as flow-matching models, is challenging because likelihood evaluation along the full flow trajectory is computationally expensive and often numerically unstable. To resolve this issue, we propose PolicyFlow, a novel on-policy CNF-based reinforcement learning algorithm that integrates expressive CNF policies with PPO-style objectives without requiring likelihood evaluation along the full flow path. PolicyFlow approximates importance ratios using velocity field variations along a simple interpolation path, reducing computational overhead without compromising training stability. To further prevent mode collapse and further encourage diverse behaviors, we propose the Brownian Regularizer, an implicit policy entropy regularizer inspired by Brownian motion, which is conceptually elegant and computationally lightweight. Experiments on diverse tasks across various environments including MultiGoal, PointMaze, IsaacLab and MuJoCo Playground show that PolicyFlow achieves competitive or superior performance compared to PPO using Gaussian policies and flow-based baselines including FPO and DPPO. Notably, results on MultiGoal highlight PolicyFlow's ability to capture richer multimodal action distributions.
comment: Submitted to ICLR 2026
UniForce: A Unified Latent Force Model for Robot Manipulation with Diverse Tactile Sensors
Force sensing is essential for dexterous robot manipulation, but scaling force-aware policy learning is hindered by the heterogeneity of tactile sensors. Differences in sensing principles (e.g., optical vs. magnetic), form factors, and materials typically require sensor-specific data collection, calibration, and model training, thereby limiting generalisability. We propose UniForce, a novel unified tactile representation learning framework that learns a shared latent force space across diverse tactile sensors. UniForce reduces cross-sensor domain shift by jointly modeling inverse dynamics (image-to-force) and forward dynamics (force-to-image), constrained by force equilibrium and image reconstruction losses to produce force-grounded representations. To avoid reliance on expensive external force/torque (F/T) sensors, we exploit static equilibrium and collect force-paired data via direct sensor--object--sensor interactions, enabling cross-sensor alignment with contact force. The resulting universal tactile encoder can be plugged into downstream force-aware robot manipulation tasks with zero-shot transfer, without retraining or finetuning. Extensive experiments on heterogeneous tactile sensors including GelSight, TacTip, and uSkin, demonstrate consistent improvements in force estimation over prior methods, and enable effective cross-sensor coordination in Vision-Tactile-Language-Action (VTLA) models for a robotic wiping task. Code and datasets will be released.
KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV ICRA2026
Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform feature-wise nonlinear calibration of the action mapping on RWKV outputs. Moreover, we introduce an Action Consistency Regularization (ACR), a lightweight auxiliary loss that enforces alignment between predicted action trajectories and expert demonstrations via Euler extrapolation, providing additional supervision to stabilize training and improve policy precision. Without resorting to large UNets, our design reduces parameters by 86.8\%, maintains fast runtime, and achieves state-of-the-art success rates on Adroit, Meta-World, and DexArt benchmarks. Our project page can be viewed in \href{https://zhihaochen-2003.github.io/KAN-We-Flow.github.io/}{\textcolor{red}{link}}
comment: Accepted By ICRA2026
StreamVLA: Breaking the Reason-Act Cycle via Completion-State Gating
Long-horizon robotic manipulation requires bridging the gap between high-level planning (System 2) and low-level control (System 1). Current Vision-Language-Action (VLA) models often entangle these processes, performing redundant multimodal reasoning at every timestep, which leads to high latency and goal instability. To address this, we present StreamVLA, a dual-system architecture that unifies textual task decomposition, visual goal imagination, and continuous action generation within a single parameter-efficient backbone. We introduce a "Lock-and-Gated" mechanism to intelligently modulate computation: only when a sub-task transition is detected, the model triggers slow thinking to generate a textual instruction and imagines the specific visual completion state, rather than generic future frames. Crucially, this completion state serves as a time-invariant goal anchor, making the policy robust to execution speed variations. During steady execution, these high-level intents are locked to condition a Flow Matching action head, allowing the model to bypass expensive autoregressive decoding for 72% of timesteps. This hierarchical abstraction ensures sub-goal focus while significantly reducing inference latency. Extensive evaluations demonstrate that StreamVLA achieves state-of-the-art performance, with a 98.5% success rate on the LIBERO benchmark and robust recovery in real-world interference scenarios, achieving a 48% reduction in latency compared to full-reasoning baselines.
Failure-Aware Bimanual Teleoperation via Conservative Value Guided Assistance
Teleoperation of high-precision manipulation is con-strained by tight success tolerances and complex contact dy-namics, which make impending failures difficult for human operators to anticipate under partial observability. This paper proposes a value-guided, failure-aware framework for bimanual teleoperation that provides compliant haptic assistance while pre-serving continuous human authority. The framework is trained entirely from heterogeneous offline teleoperation data containing both successful and failed executions. Task feasibility is mod-eled as a conservative success score learned via Conservative Value Learning, yielding a risk-sensitive estimate that remains reliable under distribution shift. During online operation, the learned success score regulates the level of assistance, while a learned actor provides a corrective motion direction. Both are integrated through a joint-space impedance interface on the master side, yielding continuous guidance that steers the operator away from failure-prone actions without overriding intent. Experimental results on contact-rich manipulation tasks demonstrate improved task success rates and reduced operator workload compared to conventional teleoperation and shared-autonomy baselines, indicating that conservative value learning provides an effective mechanism for embedding failure awareness into bilateral teleoperation. Experimental videos are available at https://www.youtube.com/watch?v=XDTsvzEkDRE
Estimating Force Interactions of Deformable Linear Objects from their Shapes
This work introduces an analytical approach for detecting and estimating external forces acting on deformable linear objects (DLOs) using only their observed shapes. In many robot-wire interaction tasks, contact occurs not at the end-effector but at other points along the robot's body. Such scenarios arise when robots manipulate wires indirectly (e.g., by nudging) or when wires act as passive obstacles in the environment. Accurately identifying these interactions is crucial for safe and efficient trajectory planning, helping to prevent wire damage, avoid restricted robot motions, and mitigate potential hazards. Existing approaches often rely on expensive external force-torque sensor or that contacts occur at the end-effector for accurate force estimation. Using wire shape information acquired from a depth camera and under the assumption that the wire is in or near its static equilibrium, our method estimates both the location and magnitude of external forces without additional prior knowledge. This is achieved by exploiting derived consistency conditions and solving a system of linear equations based on force-torque balance along the wire. The approach was validated through simulation, where it achieved high accuracy, and through real-world experiments, where accurate estimation was demonstrated in selected interaction scenarios.
comment: 7 pages, 4 figures
A Systematic Study of Data Modalities and Strategies for Co-training Large Behavior Models for Robot Manipulation
Large behavior models have shown strong dexterous manipulation capabilities by extending imitation learning to large-scale training on multi-task robot data, yet their generalization remains limited by the insufficient robot data coverage. To expand this coverage without costly additional data collection, recent work relies on co-training: jointly learning from target robot data and heterogeneous data modalities. However, how different co-training data modalities and strategies affect policy performance remains poorly understood. We present a large-scale empirical study examining five co-training data modalities: standard vision-language data, dense language annotations for robot trajectories, cross-embodiment robot data, human videos, and discrete robot action tokens across single- and multi-phase training strategies. Our study leverages 4,000 hours of robot and human manipulation data and 50M vision-language samples to train vision-language-action policies. We evaluate 89 policies over 58,000 simulation rollouts and 2,835 real-world rollouts. Our results show that co-training with forms of vision-language and cross-embodiment robot data substantially improves generalization to distribution shifts, unseen tasks, and language following, while discrete action token variants yield no significant benefits. Combining effective modalities produces cumulative gains and enables rapid adaptation to unseen long-horizon dexterous tasks via fine-tuning. Training exclusively on robot data degrades the visiolinguistic understanding of the vision-language model backbone, while co-training with effective modalities restores these capabilities. Explicitly conditioning action generation on chain-of-thought traces learned from co-training data does not improve performance in our simulation benchmark. Together, these results provide practical guidance for building scalable generalist robot policies.
LLM-Based Behavior Tree Generation for Construction Machinery
Earthwork operations are facing an increasing demand, while workforce aging and skill loss create a pressing need for automation. ROS2-TMS for Construction, a Cyber-Physical System framework designed to coordinate construction machinery, has been proposed for autonomous operation; however, its reliance on manually designed Behavior Trees (BTs) limits scalability, particularly in scenarios involving heterogeneous machine cooperation. Recent advances in large language models (LLMs) offer new opportunities for task planning and BT generation. However, most existing approaches remain confined to simulations or simple manipulators, with relatively few applications demonstrated in real-world contexts, such as complex construction sites involving multiple machines. This paper proposes an LLM-based workflow for BT generation, introducing synchronization flags to enable safe and cooperative operation. The workflow consists of two steps: high-level planning, where the LLM generates synchronization flags, and BT generation using structured templates. Safety is ensured by planning with parameters stored in the system database. The proposed method is validated in simulation and further demonstrated through real-world experiments, highlighting its potential to advance automation in civil engineering.
comment: 7 pages, 7 figures
Learning Adaptive Cross-Embodiment Visuomotor Policy with Contrastive Prompt Orchestration
Learning adaptive visuomotor policies for embodied agents remains a formidable challenge, particularly when facing cross-embodiment variations such as diverse sensor configurations and dynamic properties. Conventional learning approaches often struggle to separate task-relevant features from domain-specific variations (e.g., lighting, field-of-view, and rotation), leading to poor sample efficiency and catastrophic failure in unseen environments. To bridge this gap, we propose ContrAstive Prompt Orchestration (CAPO), a novel approach for learning visuomotor policies that integrates contrastive prompt learning and adaptive prompt orchestration. For prompt learning, we devise a hybrid contrastive learning strategy that integrates visual, temporal action, and text objectives to establish a pool of learnable prompts, where each prompt induces a visual representation encapsulating fine-grained domain factors. Based on these learned prompts, we introduce an adaptive prompt orchestration mechanism that dynamically aggregates these prompts conditioned on current observations. This enables the agent to adaptively construct optimal state representations by identifying dominant domain factors instantaneously. Consequently, the policy optimization is effectively shielded from irrelevant interference, preventing the common issue of overfitting to source domains. Extensive experiments demonstrate that CAPO significantly outperforms state-of-the-art baselines in sample efficiency and asymptotic performance. Crucially, it exhibits superior zero-shot adaptation across unseen target domains characterized by drastic environmental (e.g., illumination) and physical shifts (e.g., field-of-view and rotation), validating its effectiveness as a viable solution for cross-embodiment visuomotor policy adaptation.
Offline Discovery of Interpretable Skills from Multi-Task Trajectories
Hierarchical Imitation Learning is a powerful paradigm for acquiring complex robot behaviors from demonstrations. A central challenge, however, lies in discovering reusable skills from long-horizon, multi-task offline data, especially when the data lacks explicit rewards or subtask annotations. In this work, we introduce LOKI, a three-stage end-to-end learning framework designed for offline skill discovery and hierarchical imitation. The framework commences with a two-stage, weakly supervised skill discovery process: Stage one performs coarse, task-aware macro-segmentation by employing an alignment-enforced Vector Quantized VAE guided by weak task labels. Stage two then refines these segments at a micro-level using a self-supervised sequential model, followed by an iterative clustering process to consolidate skill boundaries. The third stage then leverages these precise boundaries to construct a hierarchical policy within an option-based framework-complete with a learned termination condition beta for explicit skill switching. LOKI achieves high success rates on the challenging D4RL Kitchen benchmark and outperforms standard HIL baselines. Furthermore, we demonstrate that the discovered skills are semantically meaningful, aligning with human intuition, and exhibit compositionality by successfully sequencing them to solve a novel, unseen task.
HERMES: A Holistic End-to-End Risk-Aware Multimodal Embodied System with Vision-Language Models for Long-Tail Autonomous Driving
End-to-end autonomous driving models increasingly benefit from large vision--language models for semantic understanding, yet ensuring safe and accurate operation under long-tail conditions remains challenging. These challenges are particularly prominent in long-tail mixed-traffic scenarios, where autonomous vehicles must interact with heterogeneous road users, including human-driven vehicles and vulnerable road users, under complex and uncertain conditions. This paper proposes HERMES, a holistic risk-aware end-to-end multimodal driving framework designed to inject explicit long-tail risk cues into trajectory planning. HERMES employs a foundation-model-assisted annotation pipeline to produce structured Long-Tail Scene Context and Long-Tail Planning Context, capturing hazard-centric cues together with maneuver intent and safety preference, and uses these signals to guide end-to-end planning. HERMES further introduces a Tri-Modal Driving Module that fuses multi-view perception, historical motion cues, and semantic guidance, ensuring risk-aware accurate trajectory planning under long-tail scenarios. Experiments on the real-world long-tail dataset demonstrate that HERMES consistently outperforms representative end-to-end and VLM-driven baselines under long-tail mixed-traffic scenarios. Ablation studies verify the complementary contributions of key components.
Geometry-Aware Sampling-Based Motion Planning on Riemannian Manifolds
In many robot motion planning problems, task objectives and physical constraints induce non-Euclidean geometry on the configuration space, yet many planners operate using Euclidean distances that ignore this structure. We address the problem of planning collision-free motions that minimize length under configuration-dependent Riemannian metrics, corresponding to geodesics on the configuration manifold. Conventional numerical methods for computing such paths do not scale well to high-dimensional systems, while sampling-based planners trade scalability for geometric fidelity. To bridge this gap, we propose a sampling-based motion planning framework that operates directly on Riemannian manifolds. We introduce a computationally efficient midpoint-based approximation of the Riemannian geodesic distance and prove that it matches the true Riemannian distance with third-order accuracy. Building on this approximation, we design a local planner that traces the manifold using first-order retractions guided by Riemannian natural gradients. Experiments on a two-link planar arm and a 7-DoF Franka manipulator under a kinetic-energy metric, as well as on rigid-body planning in $\mathrm{SE}(2)$ with non-holonomic motion constraints, demonstrate that our approach consistently produces lower-cost trajectories than Euclidean-based planners and classical numerical geodesic-solver baselines.
comment: Submitted to WAFR 2026 (17th World Symposium on the Algorithmic Foundations of Robotics (WAFR))
Navigating Simply, Aligning Deeply: Winning Solutions for Mouse vs. AI 2025 NeurIPS 2025
Visual robustness and neural alignment remain critical challenges in developing artificial agents that can match biological vision systems. We present the winning approaches from Team HCMUS_TheFangs for both tracks of the NeurIPS 2025 Mouse vs. AI: Robust Visual Foraging Competition. For Track 1 (Visual Robustness), we demonstrate that architectural simplicity combined with targeted components yields superior generalization, achieving 95.4% final score with a lightweight two-layer CNN enhanced by Gated Linear Units and observation normalization. For Track 2 (Neural Alignment), we develop a deep ResNet-like architecture with 16 convolutional layers and GLU-based gating that achieves top-1 neural prediction performance with 17.8 million parameters. Our systematic analysis of ten model checkpoints trained between 60K to 1.14M steps reveals that training duration exhibits a non-monotonic relationship with performance, with optimal results achieved around 200K steps. Through comprehensive ablation studies and failure case analysis, we provide insights into why simpler architectures excel at visual robustness while deeper models with increased capacity achieve better neural alignment. Our results challenge conventional assumptions about model complexity in visuomotor learning and offer practical guidance for developing robust, biologically-inspired visual agents.
comment: 15 pages, 8 tables. Technical Report for winning solutions (Track 1 & Track 2) at the NeurIPS 2025 Mouse vs. AI Challenge
Meanshift Shape Formation Control Using Discrete Mass Distribution
The density-distribution method has recently become a promising paradigm owing to its adaptability to variations in swarm size. However, existing studies face practical challenges in achieving complex shape representation and decentralized implementation. This motivates us to develop a fully decentralized, distribution-based control strategy with the dual capability of forming complex shapes and adapting to swarm-size variations. Specifically, we first propose a discrete mass-distribution function defined over a set of sample points to model swarm formation. In contrast to the continuous density-distribution method, our model eliminates the requirement for defining continuous density functions-a task that is difficult for complex shapes. Second, we design a decentralized meanshift control law to coordinate the swarm's global distribution to fit the sample-point distribution by feeding back mass estimates. The mass estimates for all sample points are achieved by the robots in a decentralized manner via the designed mass estimator. It is shown that the mass estimates of the sample points can asymptotically converge to the true global values. To validate the proposed strategy, we conduct comprehensive simulations and real-world experiments to evaluate the efficiency of complex shape formation and adaptability to swarm-size variations.
Flexible Multitask Learning with Factorized Diffusion Policy
Multitask learning poses significant challenges due to the highly multimodal and diverse nature of robot action distributions. However, effectively fitting policies to these complex task distributions is often difficult, and existing monolithic models often underfit the action distribution and lack the flexibility required for efficient adaptation. We introduce a novel modular diffusion policy framework that factorizes complex action distributions into a composition of specialized diffusion models, each capturing a distinct sub-mode of the behavior space for a more effective overall policy. In addition, this modular structure enables flexible policy adaptation to new tasks by adding or fine-tuning components, which inherently mitigates catastrophic forgetting. Empirically, across both simulation and real-world robotic manipulation settings, we illustrate how our method consistently outperforms strong modular and monolithic baselines.
Robust Trajectory Tracking of Autonomous Surface Vehicle via Lie Algebraic Online MPC
Autonomous surface vehicles (ASVs) are influenced by environmental disturbances such as wind and waves, making accurate trajectory tracking a persistent challenge in dynamic marine conditions. In this paper, we propose an efficient controller for trajectory tracking of marine vehicles under unknown disturbances by combining a convex error-state MPC on the Lie group augmented by an online learning module to compensate for these disturbances in real time. This design enables adaptive and robust tracking control while maintaining computational efficiency. Extensive evaluations in the Virtual RobotX (VRX) simulator, and real-world field experiments demonstrate that our method achieves superior tracking accuracy under various disturbance scenarios compared with existing approaches.
Graph-Based Floor Separation Using Node Embeddings and Clustering of WiFi Trajectories
Vertical localization, particularly floor separation, remains a major challenge in indoor positioning systems operating in GPS-denied multistory environments. This paper proposes a fully data-driven, graph-based framework for blind floor separation using only Wi-Fi fingerprint trajectories, without requiring prior building information or knowledge of the number of floors. In the proposed method, Wi-Fi fingerprints are represented as nodes in a trajectory graph, where edges capture both signal similarity and sequential movement context. Structural node embeddings are learned via Node2Vec, and floor-level partitions are obtained using K-Means clustering with automatic cluster number estimation. The framework is evaluated on multiple publicly available datasets, including a newly released Huawei University Challenge 2021 dataset and a restructured version of the UJIIndoorLoc benchmark. Experimental results demonstrate that the proposed approach effectively captures the intrinsic vertical structure of multistory buildings using only received signal strength data. By eliminating dependence on building-specific metadata, the proposed method provides a scalable and practical solution for vertical localization in indoor environments.
comment: 10 pages,4 figures, 3 tables
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions CVPR 2025
Achieving realistic simulations of humans interacting with a wide range of objects has long been a fundamental goal. Extending physics-based motion imitation to complex human-object interactions (HOIs) is challenging due to intricate human-object coupling, variability in object geometries, and artifacts in motion capture data, such as inaccurate contacts and limited hand detail. We introduce InterMimic, a framework that enables a single policy to robustly learn from hours of imperfect MoCap data covering diverse full-body interactions with dynamic and varied objects. Our key insight is to employ a curriculum strategy -- perfect first, then scale up. We first train subject-specific teacher policies to mimic, retarget, and refine motion capture data. Next, we distill these teachers into a student policy, with the teachers acting as online experts providing direct supervision, as well as high-quality references. Notably, we incorporate RL fine-tuning on the student policy to surpass mere demonstration replication and achieve higher-quality solutions. Our experiments demonstrate that InterMimic produces realistic and diverse interactions across multiple HOI datasets. The learned policy generalizes in a zero-shot manner and seamlessly integrates with kinematic generators, elevating the framework from mere imitation to generative modeling of complex human-object interactions.
comment: CVPR 2025. Project Page: https://sirui-xu.github.io/InterMimic/
Virtual Community: An Open World for Humans, Robots, and Society
The rapid progress in AI and Robotics may lead to a profound societal transformation, as humans and robots begin to coexist within shared communities, introducing both opportunities and challenges. To explore this future, we present Virtual Community-an open-world platform for humans, robots, and society-built on a universal physics engine and grounded in real-world 3D scenes. With Virtual Community, we aim to enable the study of embodied social intelligence at scale. To support these, Virtual Community features: 1) An open-source multi-agent physics simulator that supports robots, humans, and their interactions within a society; 2) A large-scale, real-world aligned community generation pipeline, including vast outdoor space, diverse indoor scenes, and a community of grounded agents with rich characters and appearances. Leveraging Virtual Community, we propose two novel challenges. The Community Planning Challenge evaluates multi-agent reasoning and planning ability in open-world settings, such as cooperating to help agents with daily activities and efficiently connecting other agents. The Community Robot Challenge requires multiple heterogeneous robots to collaborate in solving complex open-world tasks. We evaluate various baselines on these tasks and demonstrate the challenges in both high-level open-world task planning and low-level cooperation controls. We hope that Virtual Community will unlock further study of human-robot coexistence within open-world environments.
comment: website https://virtual-community-ai.github.io/
Dichotomous Diffusion Policy Optimization
Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. However, effectively training large diffusion policies using reinforcement learning (RL) remains challenging. Existing methods either suffer from unstable training due to directly maximizing value objectives, or face computational issues due to relying on crude Gaussian likelihood approximation, which requires a large amount of sufficiently small denoising steps. In this work, we propose DIPOLE (Dichotomous diffusion Policy improvement), a novel RL algorithm designed for stable and controllable diffusion policy optimization. We begin by revisiting the KL-regularized objective in RL, which offers a desirable weighted regression objective for diffusion policy extraction, but often struggles to balance greediness and stability. We then formulate a greedified policy regularization scheme, which naturally enables decomposing the optimal policy into a pair of stably learned dichotomous policies: one aims at reward maximization, and the other focuses on reward minimization. Under such a design, optimized actions can be generated by linearly combining the scores of dichotomous policies during inference, thereby enabling flexible control over the level of greediness.Evaluations in offline and offline-to-online RL settings on ExORL and OGBench demonstrate the effectiveness of our approach. We also use DIPOLE to train a large vision-language-action (VLA) model for end-to-end autonomous driving (AD) and evaluate it on the large-scale real-world AD benchmark NAVSIM, highlighting its potential for complex real-world applications.
An Extended Generalized Prandtl-Ishlinskii Hysteresis Model for I2RIS Robot
Retinal surgery requires extreme precision due to constrained anatomical spaces in the human retina. To assist surgeons achieve this level of accuracy, the Improved Integrated Robotic Intraocular Snake (I2RIS) with dexterous capability has been developed. However, such flexible tendon-driven robots often suffer from hysteresis problems, which significantly challenges precise control and positioning. In particular, we observed multi-stage hysteresis phenomena in the small-scale I2RIS. In this paper, we propose an Extended Generalized Prandtl-Ishlinskii (EGPI) model to increase the fitting accuracy of the hysteresis. The model incorporates a novel switching mechanism that enables it to describe multi-stage hysteresis in the regions of monotonic input. Experimental validation on I2RIS data demonstrates that the EGPI model outperforms the conventional Generalized Prandtl-Ishlinskii (GPI) model in terms of RMSE, NRMSE, and MAE across multiple motor input directions. The EGPI model in our study highlights the potential in modeling multi-stage hysteresis in minimally invasive flexible robots.
comment: Published in IFAC-PapersOnLine, Vol. 59, No. 30, pp. 323-328, 2025. 5th Conference on Modeling, Estimation and Control (MECC 2025)
Real-World Evaluation of two Cooperative Intersection Management Approaches
Cooperative maneuver planning promises to significantly improve traffic efficiency at unsignalized intersections by leveraging connected automated vehicles. Previous works on this topic have been mostly developed for completely automated traffic in a simple simulated environment. In contrast, our previously introduced planning approaches are specifically designed to handle real-world mixed traffic. The two methods are based on multi-scenario prediction and graph-based reinforcement learning, respectively. This is the first study to perform evaluations in a novel mixed traffic simulation framework as well as real-world drives with prototype connected automated vehicles in public traffic. The simulation features the same connected automated driving software stack as deployed on one of the automated vehicles. Our quantitative evaluations show that cooperative maneuver planning achieves a substantial reduction in crossing times and the number of stops. In a realistic environment with few automated vehicles, there are noticeable efficiency gains with only slightly increasing criticality metrics.
comment: M. Klimke and M. B. Mertens are both first authors with equal contribution. 11 pages, 9 figures, 3 tables, accepted for publication in the IEEE Intelligent Transportation Systems Magazine
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities
Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardization, leading to redundant features that hinder cross-sensor generalization. Moreover, existing methods fail to fully integrate the intermediate communication among tactile, language, and vision modalities. To address this, we propose TLV-CoRe, a CLIP-based Tactile-Language-Vision Collaborative Representation learning method. TLV-CoRe introduces a Sensor-Aware Modulator to unify tactile features across different sensors and employs tactile-irrelevant decoupled learning to disentangle irrelevant tactile features. Additionally, a Unified Bridging Adapter is introduced to enhance tri-modal interaction within the shared representation space. To fairly evaluate the effectiveness of tactile models, we further propose the RSS evaluation framework, focusing on Robustness, Synergy, and Stability across different methods. Experimental results demonstrate that TLV-CoRe significantly improves sensor-agnostic representation learning and cross-modal alignment, offering a new direction for multimodal tactile representation.
One-Shot Real-World Demonstration Synthesis for Scalable Bimanual Manipulation
Learning dexterous bimanual manipulation policies critically depends on large-scale, high-quality demonstrations, yet current paradigms face inherent trade-offs: teleoperation provides physically grounded data but is prohibitively labor-intensive, while simulation-based synthesis scales efficiently but suffers from sim-to-real gaps. We present BiDemoSyn, a framework that synthesizes contact-rich, physically feasible bimanual demonstrations from a single real-world example. The key idea is to decompose tasks into invariant coordination blocks and variable, object-dependent adjustments, then adapt them through vision-guided alignment and lightweight trajectory optimization. This enables the generation of thousands of diverse and feasible demonstrations within several hour, without repeated teleoperation or reliance on imperfect simulation. Across six dual-arm tasks, we show that policies trained on BiDemoSyn data generalize robustly to novel object poses and shapes, significantly outperforming recent strong baselines. Beyond the one-shot setting, BiDemoSyn naturally extends to few-shot-based synthesis, improving object-level diversity and out-of-distribution generalization while maintaining strong data efficiency. Moreover, policies trained on BiDemoSyn data exhibit zero-shot cross-embodiment transfer to new robotic platforms, enabled by object-centric observations and a simplified 6-DoF end-effector action representation that decouples policies from embodiment-specific dynamics. By bridging the gap between efficiency and real-world fidelity, BiDemoSyn provides a scalable path toward practical imitation learning for complex bimanual manipulation without compromising physical grounding.
comment: Under review. The project link is https://hnuzhy.github.io/projects/BiDemoSyn/
From Edge to Edge: A Flow-Inspired Scheduling Planner for Multi-Robot Systems
Trajectory planning is crucial in multi-robot systems, particularly in environments with numerous obstacles. While extensive research has been conducted in this field, the challenge of coordinating multiple robots to flow collectively from one side of the map to the other-such as in crossing missions through obstacle-rich spaces-has received limited attention. This paper focuses on this directional traversal scenario by introducing a real-time scheduling scheme that enables multi-robot systems to move from edge to edge, emulating the smooth and efficient flow of water. Inspired by network flow optimization, our scheme decomposes the environment into a flow-based network structure, enabling the efficient allocation of robots to paths based on real-time congestion levels. The proposed scheduling planner operates on top of existing collision avoidance algorithms, aiming to minimize overall traversal time by balancing detours and waiting times. Simulation results demonstrate the effectiveness of the proposed scheme in achieving fast and coordinated traversal. Furthermore, real-world flight tests with ten drones validate its practical feasibility. This work contributes a flow-inspired, real-time scheduling planner tailored for directional multi-robot traversal in complex, obstacle-rich environments. Code: https://github.com/chengji253/FlowPlanner
NeMo-map: Neural Implicit Flow Fields for Spatio-Temporal Motion Mapping ICLR 2026
Safe and efficient robot operation in complex human environments can benefit from good models of site-specific motion patterns. Maps of Dynamics (MoDs) provide such models by encoding statistical motion patterns in a map, but existing representations use discrete spatial sampling and typically require costly offline construction. We propose a continuous spatio-temporal MoD representation based on implicit neural functions that directly map coordinates to the parameters of a Semi-Wrapped Gaussian Mixture Model. This removes the need for discretization and imputation for unevenly sampled regions, enabling smooth generalization across both space and time. Evaluated on two public datasets with real-world people tracking data, our method achieves better accuracy of motion representation and smoother velocity distributions in sparse regions while still being computationally efficient, compared to available baselines. The proposed approach demonstrates a powerful and efficient way of modeling complex human motion patterns and high performance in the trajectory prediction downstream task. Project code is available at https://github.com/test-bai-cpu/nemo-map
comment: Published as a conference paper at ICLR 2026
Information Filtering via Variational Regularization for Robot Manipulation
Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a lightweight module that imposes a timestep-conditioned Gaussian over backbone features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation benchmarks (RoboTwin2.0, Adroit, and MetaWorld) show that, compared to the baseline DP3, our approach improves the success rate by 6.1% on RoboTwin2.0 and by 4.1% on Adroit and MetaWorld, achieving new state-of-the-art results. Real-world experiments further demonstrate that our method performs well in practical deployments. Code will released.
Human-Inspired Neuro-Symbolic World Modeling and Logic Reasoning for Interpretable Safe UAV Landing Site Assessment
Reliable assessment of safe landing sites in unstructured environments is essential for deploying Unmanned Aerial Vehicles (UAVs) in real-world applications such as delivery, inspection, and surveillance. Existing learning-based approaches often degrade under covariate shift and offer limited transparency, making their decisions difficult to interpret and validate on resource-constrained platforms. We present NeuroSymLand, a neuro-symbolic framework for marker-free UAV landing site safety assessment that explicitly separates perception-driven world modeling from logic-based safety reasoning. A lightweight segmentation model incrementally constructs a probabilistic semantic scene graph encoding objects, attributes, and spatial relations. Symbolic safety rules, synthesized offline via large language models with human-in-the-loop refinement, are executed directly over this world model at runtime to perform white-box reasoning, producing ranked landing candidates with human-readable explanations of the underlying safety constraints. Across 72 simulated and hardware-in-the-loop landing scenarios, NeuroSymLand achieves 61 successful assessments, outperforming four competitive baselines, which achieve between 37 and 57 successes. Qualitative analysis highlights its superior interpretability and transparent reasoning, while deployment incurs negligible edge overhead. Our results suggest that combining explicit world modeling with symbolic reasoning can support accurate, interpretable, and edge-deployable safety assessment in mobile systems, as demonstrated through UAV landing site assessment.
Toward Learning POMDPs Beyond Full-Rank Actions and State Observability
We are interested in enabling autonomous agents to learn and reason about systems with hidden states, such as furniture with hidden locking mechanisms. We cast this problem as learning the parameters of a discrete Partially Observable Markov Decision Process (POMDP). The agent begins with knowledge of the POMDP's actions and observation spaces, but not its state space, transitions, or observation models. These properties must be constructed from action-observation sequences. Spectral approaches to learning models of partially observable domains, such as learning Predictive State Representations (PSRs), are known to directly estimate the number of hidden states. These methods cannot, however, yield direct estimates of transition and observation likelihoods, which are important for many downstream reasoning tasks. Other approaches leverage tensor decompositions to estimate transition and observation likelihoods but often assume full state observability and full-rank transition matrices for all actions. To relax these assumptions, we study how PSRs learn transition and observation matrices up to a similarity transform, which may be estimated via tensor methods. Our method learns observation matrices and transition matrices up to a partition of states, where the states in a single partition have the same observation distributions corresponding to actions whose transition matrices are full-rank. Our experiments suggest that these partition-level transition models learned by our method, with a sufficient amount of data, meets the performance of PSRs as models to be used by standard sampling-based POMDP solvers. Furthermore, the explicit observation and transition likelihoods can be leveraged to specify planner behavior after the model has been learned.
Versatile Behavior Diffusion for Generalized Traffic Agent Simulation
Existing traffic simulation models often fall short in capturing the intricacies of real-world scenarios, particularly the interactive behaviors among multiple traffic participants, thereby limiting their utility in the evaluation and validation of autonomous driving systems. We introduce Versatile Behavior Diffusion (VBD), a novel traffic scenario generation framework based on diffusion generative models that synthesizes scene-consistent, realistic, and controllable multi-agent interactions. VBD achieves strong performance in closed-loop traffic simulation, generating scene-consistent agent behaviors that reflect complex agent interactions. A key capability of VBD is inference-time scenario editing through multi-step refinement, guided by behavior priors and model-based optimization objectives, enabling flexible and controllable behavior generation. Despite being trained on real-world traffic datasets with only normal conditions, we introduce conflict-prior and game-theoretic guidance approaches. These approaches enable the generation of interactive, customizable, or long-tail safety-critical scenarios, which are essential for comprehensive testing and validation of autonomous driving systems. Extensive experiments validate the effectiveness and versatility of VBD and highlight its promise as a foundational tool for advancing traffic simulation and autonomous vehicle development. Project website: https://sites.google.com/view/versatile-behavior-diffusion
DMV-AVP: Distributed Multi-Vehicle Autonomous Valet Parking Using Autoware
This paper presents DMV-AVP, a distributed simulation of Multi-Vehicle Autonomous Valet Parking (AVP). The system was implemented as an application of the Distributed Multi-Autonomous Vehicle Architecture (DMAVA) for synchronized multi-host execution. Most existing simulation approaches rely on centralized or non-distributed designs that constrain scalability and limit fully autonomous control. This work introduces two modules built on top of DMAVA: 1) the Multi-Vehicle AVP Coordination Framework, composed of AVP Managers and a per-vehicle AVP Node, is responsible for global parking state tracking, vehicle queuing, parking spot reservation, lifecycle coordination, and conflict resolution across multiple vehicles, and 2) the Unity-Integrated YOLOv5 Parking Spot Detection Module, that provides real-time, vision-based perception within AWSIM Labs. Both modules integrate seamlessly with DMAVA and extend it specifically for multi-vehicle AVP operation, supported by a Zenoh communication layer that ensures high data accuracy and controllability across hosts. Experiments conducted on two- and three-host configurations demonstrate consistent coordination, conflict-free parking behavior, and scalable performance across distributed Autoware instances. The results confirm that the proposed DMV-AVP supports cooperative AVP simulation and establishes a foundation for future real-world and hardware-in-the-loop validation. Demo videos and source code are available at: https://github.com/zubxxr/multi-vehicle-avp
comment: 6 pages, 3 figures, 3 tables. Submitted to IEEE IV 2026, Demo videos and source code available
DMAVA: Distributed Multi-Autonomous Vehicle Architecture Using Autoware
Simulating and validating coordination among multiple autonomous vehicles remains challenging, as many existing simulation architectures are limited to single-vehicle operation or rely on centralized control. This paper presents the Distributed Multi-Autonomous Vehicle Architecture (DMAVA), a simulation architecture that enables concurrent execution of multiple independent vehicle autonomy stacks distributed across multiple physical hosts within a shared simulation environment. Each vehicle operates its own complete autonomous driving stack while maintaining coordinated behavior through a data-centric communication layer. The proposed system integrates ROS 2 Humble, Autoware Universe, AWSIM Labs, and Zenoh to support high data accuracy and controllability during multi-vehicle simulation, enabling consistent perception, planning, and control behavior under distributed execution. Experiments conducted on multiple-host configurations demonstrate stable localization, reliable inter-host communication, and consistent closed-loop control under distributed execution. DMAVA also serves as a foundation for Multi-Vehicle Autonomous Valet Parking, demonstrating its extensibility toward higher-level cooperative autonomy. Demo videos and source code are available at: https://github.com/zubxxr/distributed-multi-autonomous-vehicle-architecture.
comment: 7 pages, 3 figures, 5 tables, Submitted to IEEE IV 2026, Demo videos and source code available
UAV-Based Infrastructure Inspections: A Literature Review and Proposed Framework for AEC+FM
Unmanned Aerial Vehicles (UAVs) are transforming infrastructure inspections in the Architecture, Engineering, Construction, and Facility Management (AEC+FM) domain. By synthesizing insights from over 150 studies, this review paper highlights UAV-based methodologies for data acquisition, photogrammetric modeling, defect detection, and decision-making support. Key innovations include path optimization, thermal integration, and advanced machine learning (ML) models such as YOLO and Faster R-CNN for anomaly detection. UAVs have demonstrated value in structural health monitoring (SHM), disaster response, urban infrastructure management, energy efficiency evaluations, and cultural heritage preservation. Despite these advancements, challenges in real-time processing, multimodal data fusion, and generalizability remain. A proposed workflow framework, informed by literature and a case study, integrates RGB imagery, LiDAR, and thermal sensing with transformer-based architectures to improve accuracy and reliability in detecting structural defects, thermal anomalies, and geometric inconsistencies. The proposed framework ensures precise and actionable insights by fusing multimodal data and dynamically adapting path planning for complex environments, presented as a comprehensive step-by-step guide to address these challenges effectively. This paper concludes with future research directions emphasizing lightweight AI models, adaptive flight planning, synthetic datasets, and richer modality fusion to streamline modern infrastructure inspections.
comment: Withdrawn at the request of the authors to allow further revisions
Multiagent Systems
Evidence-Decision-Feedback: Theory-Driven Adaptive Scaffolding for LLM Agents
Multi-agent LLM architectures offer opportunities for pedagogical agents to help students construct domain knowledge and develop critical-thinking skills, yet many operate on a "one-size-fits-all" basis, limiting their ability to provide personalized support. To address this, we introduce Evidence-Decision-Feedback (EDF), a theoretical framework for adaptive scaffolding using LLMs. EDF integrates elements of intelligent tutoring systems and agentic behavior by organizing interactions around evidentiary inference, pedagogical decision-making, and adaptive feedback. We instantiate EDF through Copa, an agentic collaborative peer agent for STEM+C problem-solving. In an authentic high school classroom study, we show that EDF-aligned interactions align feedback with students' demonstrated understanding and task mastery; promote gradual scaffold fading; and support interpretable, evidence-grounded explanations without fostering overreliance.
comment: Currently under review
A-MapReduce: Executing Wide Search via Agentic MapReduce
Contemporary large language model (LLM)-based multi-agent systems exhibit systematic advantages in deep research tasks, which emphasize iterative, vertically structured information seeking. However, when confronted with wide search tasks characterized by large-scale, breadth-oriented retrieval, existing agentic frameworks, primarily designed around sequential, vertically structured reasoning, remain stuck in expansive search objectives and inefficient long-horizon execution. To bridge this gap, we propose A-MapReduce, a MapReduce paradigm-inspired multi-agent execution framework that recasts wide search as a horizontally structured retrieval problem. Concretely, A-MapReduce implements parallel processing of massive retrieval targets through task-adaptive decomposition and structured result aggregation. Meanwhile, it leverages experiential memory to drive the continual evolution of query-conditioned task allocation and recomposition, enabling progressive improvement in large-scale wide-search regimes. Extensive experiments on five agentic benchmarks demonstrate that A-MapReduce is (i) high-performing, achieving state-of-the-art performance on WideSearch and DeepWideSearch, and delivering 5.11% - 17.50% average Item F1 improvements compared with strong baselines with OpenAI o3 or Gemini 2.5 Pro backbones; (ii) cost-effective and efficient, delivering superior cost-performance trade-offs and reducing running time by 45.8\% compared to representative multi-agent baselines. The code is available at https://github.com/mingju-c/AMapReduce.
comment: 33 pages
Multi-Agent Teams Hold Experts Back ICML 2026
Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective coordination cannot be fully designed in advance and must instead emerge through interaction. However, most prior work enforces coordination through fixed roles, workflows, or aggregation rules, leaving open the question of how well self-organizing teams perform when coordination is unconstrained. Drawing on organizational psychology, we study whether self-organizing LLM teams achieve strong synergy, where team performance matches or exceeds the best individual member. Across human-inspired and frontier ML benchmarks, we find that -- unlike human teams -- LLM teams consistently fail to match their expert agent's performance, even when explicitly told who the expert is, incurring performance losses of up to 37.6%. Decomposing this failure, we show that expert leveraging, rather than identification, is the primary bottleneck. Conversational analysis reveals a tendency toward integrative compromise -- averaging expert and non-expert views rather than appropriately weighting expertise -- which increases with team size and correlates negatively with performance. Interestingly, this consensus-seeking behavior improves robustness to adversarial agents, suggesting a trade-off between alignment and effective expertise utilization. Our findings reveal a significant gap in the ability of self-organizing multi-agent teams to harness the collective expertise of their members.
comment: Under review at ICML 2026
Symphony-Coord: Emergent Coordination in Decentralized Agent Systems
Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices lead to inefficient routing, poor adaptability, and fragile fault recovery capabilities. We introduce Symphony-Coord, a decentralized multi-agent framework that transforms agent selection into an online multi-armed bandit problem, enabling roles to emerge organically through interaction. The framework employs a two-stage dynamic beacon protocol: (i) a lightweight candidate screening mechanism to limit communication and computational overhead; (ii) an adaptive LinUCB selector that routes subtasks based on context features derived from task requirements and agent states, continuously optimized through delayed end-to-end feedback. Under standard linear realizability assumptions, we provide sublinear regret bounds, indicating the system converges toward near-optimal allocation schemes. Validation through simulation experiments and real-world large language model benchmarks demonstrates that Symphony-Coord not only enhances task routing efficiency but also exhibits robust self-healing capabilities in scenarios involving distribution shifts and agent failures, achieving a scalable coordination mechanism without predefined roles.
comment: 41 pages,15 figures
FinEvo: From Isolated Backtests to Ecological Market Games for Multi-Agent Financial Strategy Evolution
Conventional financial strategy evaluation relies on isolated backtests in static environments. Such evaluations assess each policy independently, overlook correlations and interactions, and fail to explain why strategies ultimately persist or vanish in evolving markets. We shift to an ecological perspective, where trading strategies are modeled as adaptive agents that interact and learn within a shared market. Instead of proposing a new strategy, we present FinEvo, an ecological game formalism for studying the evolutionary dynamics of multi-agent financial strategies. At the individual level, heterogeneous ML-based traders-rule-based, deep learning, reinforcement learning, and large language model (LLM) agents-adapt using signals such as historical prices and external news. At the population level, strategy distributions evolve through three designed mechanisms-selection, innovation, and environmental perturbation-capturing the dynamic forces of real markets. Together, these two layers of adaptation link evolutionary game theory with modern learning dynamics, providing a principled environment for studying strategic behavior. Experiments with external shocks and real-world news streams show that FinEvo is both stable for reproducibility and expressive in revealing context-dependent outcomes. Strategies may dominate, collapse, or form coalitions depending on their competitors-patterns invisible to static backtests. By reframing strategy evaluation as an ecological game formalism, FinEvo provides a unified, mechanism-level protocol for analyzing robustness, adaptation, and emergent dynamics in multi-agent financial markets, and may offer a means to explore the potential impact of macroeconomic policies and financial regulations on price evolution and equilibrium.
comment: Preprint. Submitted to a conference
Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies
The rapid evolution of Large Language Models (LLMs) has led to the emergence of Multi-Agent Systems where collective cooperation is often threatened by the "Tragedy of the Commons." This study investigates the effectiveness of Anchoring Agents--pre-programmed altruistic entities--in fostering cooperation within a Public Goods Game (PGG). Using a full factorial design across three state-of-the-art LLMs, we analyzed both behavioral outcomes and internal reasoning chains. While Anchoring Agents successfully boosted local cooperation rates, cognitive decomposition and transfer tests revealed that this effect was driven by strategic compliance and cognitive offloading rather than genuine norm internalization. Notably, most agents reverted to self-interest in new environments, and advanced models like GPT-4.1 exhibited a "Chameleon Effect," masking strategic defection under public scrutiny. These findings highlight a critical gap between behavioral modification and authentic value alignment in artificial societies.
comment: 7 pages, 5 figures
BMG-Q: Localized Bipartite Match Graph Attention Q-Learning for Ride-Pooling Order Dispatch
This paper introduces Localized Bipartite Match Graph Attention Q-Learning (BMG-Q), a novel Multi-Agent Reinforcement Learning (MARL) algorithm framework tailored for ride-pooling order dispatch. BMG-Q advances ride-pooling decision-making process with the localized bipartite match graph underlying the Markov Decision Process, enabling the development of novel Graph Attention Double Deep Q Network (GATDDQN) as the MARL backbone to capture the dynamic interactions among ride-pooling vehicles in fleet. Our approach enriches the state information for each agent with GATDDQN by leveraging a localized bipartite interdependence graph and enables a centralized global coordinator to optimize order matching and agent behavior using Integer Linear Programming (ILP). Enhanced by gradient clipping and localized graph sampling, our GATDDQN improves scalability and robustness. Furthermore, the inclusion of a posterior score function in the ILP captures the online exploration-exploitation trade-off and reduces the potential overestimation bias of agents, thereby elevating the quality of the derived solutions. Through extensive experiments and validation, BMG-Q has demonstrated superior performance in both training and operations for thousands of vehicle agents, outperforming benchmark reinforcement learning frameworks by around 10% in accumulative rewards and showing a significant reduction in overestimation bias by over 50%. Additionally, it maintains robustness amidst task variations and fleet size changes, establishing BMG-Q as an effective, scalable, and robust framework for advancing ride-pooling order dispatch operations.
Introduction to Automated Negotiation
This book is an introductory textbook targeted towards computer science students who are completely new to the topic of automated negotiation. It does not require any prerequisite knowledge, except for elementary mathematics and basic programming skills. This book comes with an simple toy-world negotiation framework implemented in Python that can be used by the readers to implement their own negotiation algorithms and perform experiments with them. This framework is small and simple enough that any reader who does not like to work in Python should be able to re-implement it very quickly in any other programming language of their choice.
MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making
Large language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.
SemanticALLI: Caching Reasoning, Not Just Responses, in Agentic Systems
Agentic AI pipelines suffer from a hidden inefficiency: they frequently reconstruct identical intermediate logic, such as metric normalization or chart scaffolding, even when the user's natural language phrasing is entirely novel. Conventional boundary caching fails to capture this inefficiency because it treats inference as a monolithic black box. We introduce SemanticALLI, a pipeline-aware architecture within Alli (PMG's marketing intelligence platform), designed to operationalize redundant reasoning. By decomposing generation into Analytic Intent Resolution (AIR) and Visualization Synthesis (VS), SemanticALLI elevates structured intermediate representations (IRs) to first-class, cacheable artifacts. The impact of caching within the agentic loop is substantial. In our evaluation, baseline monolithic caching caps at a 38.7% hit rate due to linguistic variance. In contrast, our structured approach allows for an additional stage, the Visualization Synthesis stage, to achieve an 83.10% hit rate, bypassing 4,023 LLM calls with a median latency of just 2.66 ms. This internal reuse reduces total token consumption, offering a practical lesson for AI system design: even when users rarely repeat themselves, the pipeline often does, at stable, structured checkpoints where caching is most reliable.
DMV-AVP: Distributed Multi-Vehicle Autonomous Valet Parking Using Autoware
This paper presents DMV-AVP, a distributed simulation of Multi-Vehicle Autonomous Valet Parking (AVP). The system was implemented as an application of the Distributed Multi-Autonomous Vehicle Architecture (DMAVA) for synchronized multi-host execution. Most existing simulation approaches rely on centralized or non-distributed designs that constrain scalability and limit fully autonomous control. This work introduces two modules built on top of DMAVA: 1) the Multi-Vehicle AVP Coordination Framework, composed of AVP Managers and a per-vehicle AVP Node, is responsible for global parking state tracking, vehicle queuing, parking spot reservation, lifecycle coordination, and conflict resolution across multiple vehicles, and 2) the Unity-Integrated YOLOv5 Parking Spot Detection Module, that provides real-time, vision-based perception within AWSIM Labs. Both modules integrate seamlessly with DMAVA and extend it specifically for multi-vehicle AVP operation, supported by a Zenoh communication layer that ensures high data accuracy and controllability across hosts. Experiments conducted on two- and three-host configurations demonstrate consistent coordination, conflict-free parking behavior, and scalable performance across distributed Autoware instances. The results confirm that the proposed DMV-AVP supports cooperative AVP simulation and establishes a foundation for future real-world and hardware-in-the-loop validation. Demo videos and source code are available at: https://github.com/zubxxr/multi-vehicle-avp
comment: 6 pages, 3 figures, 3 tables. Submitted to IEEE IV 2026, Demo videos and source code available
DMAVA: Distributed Multi-Autonomous Vehicle Architecture Using Autoware
Simulating and validating coordination among multiple autonomous vehicles remains challenging, as many existing simulation architectures are limited to single-vehicle operation or rely on centralized control. This paper presents the Distributed Multi-Autonomous Vehicle Architecture (DMAVA), a simulation architecture that enables concurrent execution of multiple independent vehicle autonomy stacks distributed across multiple physical hosts within a shared simulation environment. Each vehicle operates its own complete autonomous driving stack while maintaining coordinated behavior through a data-centric communication layer. The proposed system integrates ROS 2 Humble, Autoware Universe, AWSIM Labs, and Zenoh to support high data accuracy and controllability during multi-vehicle simulation, enabling consistent perception, planning, and control behavior under distributed execution. Experiments conducted on multiple-host configurations demonstrate stable localization, reliable inter-host communication, and consistent closed-loop control under distributed execution. DMAVA also serves as a foundation for Multi-Vehicle Autonomous Valet Parking, demonstrating its extensibility toward higher-level cooperative autonomy. Demo videos and source code are available at: https://github.com/zubxxr/distributed-multi-autonomous-vehicle-architecture.
comment: 7 pages, 3 figures, 5 tables, Submitted to IEEE IV 2026, Demo videos and source code available
Systems and Control (EESS)
The Dynamic Search for the Minimal Dynamic Extension
Identifying the dynamic precompensator that renders a nonlinear control system feedback linearizable is a challenging problem. Researchers have explored the problem -- dynamic feedback linearization -- and produced existence conditions and constructive procedures for the dynamic precompensator. These remain, in general, either computationally expensive or restrictive. Treating the challenge as intrinsic, this article views the problem as a search problem over a category. Dynamic programming applies and, upon restriction to a finite category, classic search algorithms find the minimal dynamic extension. Alternatively, a heuristic aiming towards feedback linearizable systems can be employed to select amongst the infinitely-many extensions. This framing provides a distinctive, birds-eye view of the search for the dynamic precompensator.
comment: 8 pages. Submitted to the 27th International Symposium on Mathematical Theory of Networks and Systems (MTNS)
Regret of $H_\infty$ Preview Controllers
This paper studies preview control in both the $H_\infty$ and regret-optimal settings. The plant is modeled as a discrete-time, linear time-invariant system subject to external disturbances. The performance baseline is the optimal non-causal controller that has full knowledge of the disturbance sequence. We first review the construction of the $H_\infty$ preview controller with $p$-steps of disturbance preview. We then show that the closed-loop $H_\infty$ performance of this preview controller converges as $p\to \infty$ to the performance of the optimal non-causal controller. Furthermore, we prove that the optimal regret of the preview controller converges to zero. These results demonstrate that increasing preview length allows controllers to asymptotically achieve non-causal performance in both the $H_\infty$ and regret frameworks. A numerical example illustrates the theoretical results.
On Poly-Quadratic Stabilizability and Detectability of Polytopic LPV Systems
In this technical communique, we generalize the well-known Lyapunov-based stabilizability and detectability tests for discrete-time linear time-invariant systems to polytopic linear parameter-varying systems using the class of so-called poly-quadratic Lyapunov functions.
comment: To appear in Automatica
Scientific Machine Learning for Resilient EV-Grid Planning and Decision Support Under Extreme Events
Electric vehicle (EV) charging infrastructure introduces complex challenges to urban distribution networks, particularly under extreme demand events. A critical barrier to resilience assessment is the scale gap between micro-level charging physics and city-scale planning: minute-resolution deliverability constraints remain invisible in hourly aggregated datasets, causing purely data-driven models to exhibit non-physical behavior in high-stress regimes. This paper develops a five-stage scientific machine learning framework bridging this gap through physics-informed knowledge transfer. Stage 1 learns a temperature-pressure deliverability surface from Swiss DC fast-charging telemetry with monotonicity constraints. Stage 2 performs cross-scale injection via anchored quantile mapping. Stage 3 deploys a dual-head spatio-temporal graph neural network for joint forecasting of demand and service loss rate. Stage 4 simulates backlog dynamics under stress shocks and evaluates policy interventions. Stage 5 couples service outcomes to distribution-grid stress via transformer loading analysis. Validation on the Shenzhen UrbanEV dataset demonstrates that physics injection restores monotone stress-to-risk response (Spearman correlation coefficient equals +1.0 versus -0.8 without injection) and improves forecasting accuracy. Under a representative demand shock, the hybrid policy reduces backlog by 79.1%, restores full service within the study horizon, and limits grid stress to only 2 additional hours. The derived resilience boundary m_crit as a function of epsilon approximately equals 1.7 minus 1.0 times epsilon, providing actionable guidance linking demand flexibility to maximum absorbable stress, enabling risk-aware emergency planning under extreme events.
comment: 15 pages, 12 figures
SPOT: Spatio-Temporal Obstacle-free Trajectory Planning for UAVs in an Unknown Dynamic Environment
We address the problem of reactive motion planning for quadrotors operating in unknown environments with dynamic obstacles. Our approach leverages a 4-dimensional spatio-temporal planner, integrated with vision-based Safe Flight Corridor (SFC) generation and trajectory optimization. Unlike prior methods that rely on map fusion, our framework is mapless, enabling collision avoidance directly from perception while reducing computational overhead. Dynamic obstacles are detected and tracked using a vision-based object segmentation and tracking pipeline, allowing robust classification of static versus dynamic elements in the scene. To further enhance robustness, we introduce a backup planning module that reactively avoids dynamic obstacles when no direct path to the goal is available, mitigating the risk of collisions during deadlock situations. We validate our method extensively in both simulation and real-world hardware experiments, and benchmark it against state-of-the-art approaches, showing significant advantages for reactive UAV navigation in dynamic, unknown environments.
Computationally Tractable Robust Nonlinear Model Predictive Control using DC Programming
We propose a computationally tractable, tube-based robust nonlinear model predictive control (MPC) framework using difference-of-convex (DC) functions and sequential convex programming. For systems with differentiable discrete time dynamics, we show how to construct systematic, data-driven DC model representations using polynomials and machine learning techniques. We develop a robust tube MPC scheme that convexifies the online optimization by linearizing the concave components of the model, and we provide guarantees of recursive feasibility and robust stability. We present three data-driven procedures for computing DC models and compare performance using a planar vertical take-off and landing (PVTOL) aircraft case study.
Mitigating Data Centers Load Risks and Enabling Grid Support Functions through Grid-Forming Control
The rapid growth of hyperscale data centers driven by Large Language Models and Artificial Intelligence workloads has introduced new challenges for power systems. These facilities experience abrupt power variations during model training and check-point-saving events, causing voltage deviations and frequency disturbances. Moreover, they operate as passive loads that draw power without offering any grid support. This paper presents an integrated architecture that combines Battery Energy Storage Systems (BESSs) within data centers using Grid-Forming inverters to provide active grid-support functions. Simulation results through MATLAB/Simulink demonstrate accurate power reference tracking under dynamic loading, with eight coordinated BESS units supplying instantaneous power during training and saving conditions. Under single-phase voltage depression near the data center bus, the BESS delivered reactive power support similar to a Static Synchronous Compensator. During grid disconnection, seamless islanded operation was achieved with stable voltage, frequency, and continuous power delivery at the data center bus.
Robust Adaptive Learning Control for a Class of Non-affine Nonlinear Systems
We address the tracking problem for a class of uncertain non-affine nonlinear systems with high relative degrees, performing non-repetitive tasks. We propose a rigorously proven, robust adaptive learning control scheme that relies on a gradient descent parameter adaptation law to handle the unknown time-varying parameters of the system, along with a state estimator that estimates the unmeasurable state variables. Furthermore, despite the inherently complex nature of the non-affine system, we provide an explicit iterative computation method to facilitate the implementation of the proposed control scheme. The paper includes a thorough analysis of the performance of the proposed control strategy, and simulation results are presented to demonstrate the effectiveness of the approach.
Learnable Koopman-Enhanced Transformer-Based Time Series Forecasting with Spectral Control
This paper proposes a unified family of learnable Koopman operator parameterizations that integrate linear dynamical systems theory with modern deep learning forecasting architectures. We introduce four learnable Koopman variants-scalar-gated, per-mode gated, MLP-shaped spectral mapping, and low-rank Koopman operators which generalize and interpolate between strictly stable Koopman operators and unconstrained linear latent dynamics. Our formulation enables explicit control over the spectrum, stability, and rank of the linear transition operator while retaining compatibility with expressive nonlinear backbones such as Patchtst, Autoformer, and Informer. We evaluate the proposed operators in a large-scale benchmark that also includes LSTM, DLinear, and simple diagonal State-Space Models (SSMs), as well as lightweight transformer variants. Experiments across multiple horizons and patch lengths show that learnable Koopman models provide a favorable bias-variance trade-off, improved conditioning, and more interpretable latent dynamics. We provide a full spectral analysis, including eigenvalue trajectories, stability envelopes, and learned spectral distributions. Our results demonstrate that learnable Koopman operators are effective, stable, and theoretically principled components for deep forecasting.
Probability-Aware Parking Selection
Current navigation systems conflate time-to-drive with the true time-to-arrive by ignoring parking search duration and the final walking leg. Such underestimation can significantly affect user experience, mode choice, congestion, and emissions. To address this issue, this paper introduces the probability-aware parking selection problem, which aims to direct drivers to the best parking location rather than straight to their destination. An adaptable dynamic programming framework is proposed that leverages probabilistic, lot-level availability to minimize the expected time-to-arrive. Closed-form analysis determines when it is optimal to target a specific parking lot or explore alternatives, as well as the expected time cost. Sensitivity analysis and three illustrative cases are examined, demonstrating the model's ability to account for the dynamic nature of parking availability. Given the high cost of permanent sensing infrastructure, we assess the error rates of using stochastic observations to estimate availability. Experiments with real-world data from the US city of Seattle indicate this approach's viability, with mean absolute error decreasing from 7% to below 2% as observation frequency increases. In data-based simulations, probability-aware strategies demonstrate time savings up to 66% relative to probability-unaware baselines, yet still take up to 123% longer than time-to-drive estimates.
comment: 10 pages, 8 figures, 3 tables. To be published in IEEE Transactions on Intelligent Transportation Systems
Trustworthy Equipment Monitoring via Cascaded Anomaly Detection and Thermal Localization
Predictive maintenance demands both accurate anomaly detection and interpretable explanations. We demonstrate that naive multimodal fusion of sensor time-series and thermal imagery can degrade performance, and instead propose a cascaded, hybrid architecture. Our approach utilizes Random Forest on statistical sensor features for detection ($94.66\%$ F1), triggering a CNN with spatial attention for thermal fault localization only post-detection. Rigorous analysis reveals that statistical feature-based detection significantly outperforms both LSTM ($89.57\%$ F1) and end-to-end fusion ($84.79\%$ F1) at typical industrial noise levels. However, we identify a critical noise crossover phenomenon: while Random Forest excels at low noise, deep learning approaches demonstrate superior resilience at high noise ($σ> 0.3$). Additionally, we introduce an explainability pipeline integrating TreeSHAP and attention heatmaps to diagnose "modality bias," where fusion models irrationally favor weaker thermal inputs. Validated on 13,121 real-world samples from automated transport systems, this work provides evidence-based guidelines for model selection, proving that traditional machine learning often surpasses complex deep learning for industrial monitoring while offering superior interpretability.
Sample-Efficient Diffusion-based Control of Complex Physics Systems
Controlling complex physics systems is important in diverse domains. While diffusion-based methods have demonstrated advantages over classical model-based approaches and myopic sequential learning methods in achieving global trajectory consistency, they are limited by sample efficiency.This paper presents SEDC (Sample-Efficient Diffusion-based Control), a novel framework addressing core challenges in complex physics systems: high-dimensional state-control spaces, strong nonlinearities, and the gap between non-optimal training data and near-optimal control laws.Our approach introduces a novel control paradigm by architecturally decoupling state-control modeling and decomposing dynamics, while a guided self-finetuning process iteratively refines the control law towards optimality. We validate SEDC across diverse complex nonlinear systems, including high-dimensional fluid dynamics (Burgers), chaotic synchronization networks (Kuramoto), and real-world power grid stability control (Swing Equation). Our method achieves 39.5\%-47.3\% better control accuracy than state-of-the-art baselines while using only 10\% of the training samples. The implementation is available at \href{https://anonymous.4open.science/r/DIFOCON-C019}{here}.
Safe Control and Learning Using Generalized Action Governor
This paper introduces the Generalized Action Governor (AG), a supervisory scheme that augments a nominal closed-loop system with the capability to enforce state and input constraints through online action adjustment. We develop a generalized AG theory for discrete-time systems under bounded uncertainties, and relax the usual requirement of positive invariance to returnability of a safe set. Based on the theory, we present tailored AG design procedures for linear systems and for discrete systems with finite state and action spaces. We further study safe online learning enabled by the AG and present two safe learning strategies, namely safe Q-learning and safe data-driven Koopman operator-based control, both integrated with the AG to guarantee constraint satisfaction during learning. Numerical results illustrate the proposed methods.
comment: 12 pages, 4 figures
DER Day-Ahead Offering: A Neural Network Column-and-Constraint Generation Approach
In the day-ahead energy market, the offering strategy of distributed energy resource (DER) aggregators must be submitted before the uncertainty realization in the form of price-quantity pairs. This work addresses the day-ahead offering problem through a two-stage adaptive robust stochastic optimization model, wherein the first-stage price-quantity pairs and second-stage operational commitment decisions are made before and after DER uncertainty is realized, respectively. Uncertainty in day-ahead price is addressed using a stochastic programming-based approach, while uncertainty of DER generation is handled through robust optimization. To address the max-min structure of the second-stage problem, a neural network-accelerated column-and-constraint generation method is developed. A dedicated neural network is trained to approximate the value function, while optimality is maintained by the design of the network architecture. Numerical studies indicate that the proposed method yields high-quality solutions and is up to 100 times faster than Gurobi and 33 times faster than classical column-and-constraint generation on the same 1028-node synthetic distribution network.
comment: 6 pages, 1 figure. Extended revised version submitted to IEEE PES General Meeting 2026
An Information-Theoretic Analysis of Continuous-Time Control and Filtering Limitations by the I-MMSE Relationships
While information theory has been introduced to characterize the fundamental limitations of control and filtering for a few decades, the existing information-theoretic methods are indirect and cumbersome for analyzing the limitations of continuous-time systems. To answer this challenge, we lift the information-theoretic analysis to continuous function spaces by the I-MMSE relationships. Continuous-time control and filtering systems are modeled into the additive Gaussian channels with and without feedback, and the total information rate is identified as a control and filtering trade-off metric and calculated from the estimation error of channel inputs. Fundamental constraints for this trade-off metric are first derived in a general setup and then used to capture the limitations of various control and filtering systems subject to linear and nonlinear plant models. For linear scenarios, we show that the total information rate quantifies the performance limits, such as the minimum entropy cost and the lowest achievable mean-square estimation error, in the time domain. For nonlinear systems, we provide a direct method to calculate and interpret the total information rate and its lower bound by the Stratonovich-Kushner equation.
comment: This paper is the extended version of an article with the same title accepted for publication in Automatica. Dapeng Li and Neng Wan contributed equally to this work
Robotics
CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining
Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representations fail to capture the 3D spatial information about objects and scenes that is essential for precise manipulation. In this work, we introduce Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining (CLAMP), a novel 3D pre-training framework that utilizes point clouds and robot actions. From the merged point cloud computed from RGB-D images and camera extrinsics, we re-render multi-view four-channel image observations with depth and 3D coordinates, including dynamic wrist views, to provide clearer views of target objects for high-precision manipulation tasks. The pre-trained encoders learn to associate the 3D geometric and positional information of objects with robot action patterns via contrastive learning on large-scale simulated robot trajectories. During encoder pre-training, we pre-train a Diffusion Policy to initialize the policy weights for fine-tuning, which is essential for improving fine-tuning sample efficiency and performance. After pre-training, we fine-tune the policy on a limited amount of task demonstrations using the learned image and action representations. We demonstrate that this pre-training and fine-tuning design substantially improves learning efficiency and policy performance on unseen tasks. Furthermore, we show that CLAMP outperforms state-of-the-art baselines across six simulated tasks and five real-world tasks.
Minimal Footprint Grasping Inspired by Ants
Ants are highly capable of grasping objects in clutter, and we have recently observed that this involves substantial use of their forelegs. The forelegs, more specifically the tarsi, have high friction microstructures (setal pads), are covered in hairs, and have a flexible under-actuated tip. Here we abstract these features to test their functional advantages for a novel low-cost gripper design, suitable for bin-picking applications. In our implementation, the gripper legs are long and slim, with high friction gripping pads, low friction hairs and single-segment tarsus-like structure to mimic the insect's setal pads, hairs, and the tarsi's interactive compliance. Experimental evaluation shows this design is highly robust for grasping a wide variety of individual consumer objects, with all grasp attempts successful. In addition, we demonstrate this design is effective for picking single objects from dense clutter, a task at which ants also show high competence. The work advances grasping technology and shed new light on the mechanical importance of hairy structures and tarsal flexibility in insects.
SanD-Planner: Sample-Efficient Diffusion Planner in B-Spline Space for Robust Local Navigation
The challenge of generating reliable local plans has long hindered practical applications in highly cluttered and dynamic environments. Key fundamental bottlenecks include acquiring large-scale expert demonstrations across diverse scenes and improving learning efficiency with limited data. This paper proposes SanD-Planner, a sample-efficient diffusion-based local planner that conducts depth image-based imitation learning within the clamped B-spline space. By operating within this compact space, the proposed algorithm inherently yields smooth outputs with bounded prediction errors over local supports, naturally aligning with receding-horizon execution. Integration of an ESDF-based safety checker with explicit clearance and time-to-completion metrics further reduces the training burden associated with value-function learning for feasibility assessment. Experiments show that training with $500$ episodes (merely $0.25\%$ of the demonstration scale used by the baseline), SanD-Planner achieves state-of-the-art performance on the evaluated open benchmark, attaining success rates of $90.1\%$ in simulated cluttered environments and $72.0\%$ in indoor simulations. The performance is further proven by demonstrating zero-shot transferability to realistic experimentation in both 2D and 3D scenes. The dataset and pre-trained models will also be open-sourced.
comment: Under review. 11 pages
Green-VLA: Staged Vision-Language-Action Model for Generalist Robots
We introduce Green-VLA, a staged Vision-Language-Action (VLA) framework for real-world deployment on the Green humanoid robot while maintaining generalization across diverse embodiments. Green-VLA follows a five stage curriculum: (L0) foundational VLMs, (L1) multimodal grounding, (R0) multi-embodiment pretraining, (R1) embodiment-specific adaptation, and (R2) reinforcement-learning (RL) policy alignment. We couple a scalable data-processing pipeline (3,000 hours of demonstrations) with temporal alignment and quality filtering, and use a unified, embodiment-aware action interface enabling a single policy to control humanoids, mobile manipulators, and fixed-base arms. At inference, the VLA controller is enhanced with episode-progress prediction, out-of-distribution detection, and joint-prediction-based guidance to improve safety and precise target selection. Experiments on Simpler BRIDGE WidowX and CALVIN ABC-D, as well as real-robot evaluations, show strong generalization and performance gains from RL alignment in success rate, robustness, and long-horizon efficiency.
comment: 22 pages, 14 figures
UniMorphGrasp: Diffusion Model with Morphology-Awareness for Cross-Embodiment Dexterous Grasp Generation
Cross-embodiment dexterous grasping aims to generate stable and diverse grasps for robotic hands with heterogeneous kinematic structures. Existing methods are often tailored to specific hand designs and fail to generalize to unseen hand morphologies outside the training distribution. To address these limitations, we propose \textbf{UniMorphGrasp}, a diffusion-based framework that incorporates hand morphological information into the grasp generation process for unified cross-embodiment grasp synthesis. The proposed approach maps grasps from diverse robotic hands into a unified human-like canonical hand pose representation, providing a common space for learning. Grasp generation is then conditioned on structured representations of hand kinematics, encoded as graphs derived from hand configurations, together with object geometry. In addition, a loss function is introduced that exploits the hierarchical organization of hand kinematics to guide joint-level supervision. Extensive experiments demonstrate that UniMorphGrasp achieves state-of-the-art performance on existing dexterous grasp benchmarks and exhibits strong zero-shot generalization to previously unseen hand structures, enabling scalable and practical cross-embodiment grasp deployment.
RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback
Diffusion policies are a powerful paradigm for robotic control, but fine-tuning them with human preferences is fundamentally challenged by the multi-step structure of the denoising process. To overcome this, we introduce a Unified Markov Decision Process (MDP) formulation that coherently integrates the diffusion denoising chain with environmental dynamics, enabling reward-free Direct Preference Optimization (DPO) for diffusion policies. Building on this formulation, we propose RoDiF (Robust Direct Fine-Tuning), a method that explicitly addresses corrupted human preferences. RoDiF reinterprets the DPO objective through a geometric hypothesis-cutting perspective and employs a conservative cutting strategy to achieve robustness without assuming any specific noise distribution. Extensive experiments on long-horizon manipulation tasks show that RoDiF consistently outperforms state-of-the-art baselines, effectively steering pretrained diffusion policies of diverse architectures to human-preferred modes, while maintaining strong performance even under 30% corrupted preference labels.
Learning When to Jump for Off-road Navigation
Low speed does not always guarantee safety in off-road driving. For instance, crossing a ditch may be risky at a low speed due to the risk of getting stuck, yet safe at a higher speed with a controlled, accelerated jump. Achieving such behavior requires path planning that explicitly models complex motion dynamics, whereas existing methods often neglect this aspect and plan solely based on positions or a fixed velocity. To address this gap, we introduce Motion-aware Traversability (MAT) representation to explicitly model terrain cost conditioned on actual robot motion. Instead of assigning a single scalar score for traversability, MAT models each terrain region as a Gaussian function of velocity. During online planning, we decompose the terrain cost computation into two stages: (1) predict terrain-dependent Gaussian parameters from perception in a single forward pass, (2) efficiently update terrain costs for new velocities inferred from current dynamics by evaluating these functions without repeated inference. We develop a system that integrates MAT to enable agile off-road navigation and evaluate it in both simulated and real-world environments with various obstacles. Results show that MAT achieves real-time efficiency and enhances the performance of off-road navigation, reducing path detours by 75% while maintaining safety across challenging terrains.
Safe Stochastic Explorer: Enabling Safe Goal Driven Exploration in Stochastic Environments and Safe Interaction with Unknown Objects
Autonomous robots operating in unstructured, safety-critical environments, from planetary exploration to warehouses and homes, must learn to safely navigate and interact with their surroundings despite limited prior knowledge. Current methods for safe control, such as Hamilton-Jacobi Reachability and Control Barrier Functions, assume known system dynamics. Meanwhile existing safe exploration techniques often fail to account for the unavoidable stochasticity inherent when operating in unknown real world environments, such as an exploratory rover skidding over an unseen surface or a household robot pushing around unmapped objects in a pantry. To address this critical gap, we propose Safe Stochastic Explorer (S.S.Explorer) a novel framework for safe, goal-driven exploration under stochastic dynamics. Our approach strategically balances safety and information gathering to reduce uncertainty about safety in the unknown environment. We employ Gaussian Processes to learn the unknown safety function online, leveraging their predictive uncertainty to guide information-gathering actions and provide probabilistic bounds on safety violations. We first present our method for discrete state space environments and then introduce a scalable relaxation to effectively extend this approach to continuous state spaces. Finally we demonstrate how this framework can be naturally applied to ensure safe physical interaction with multiple unknown objects. Extensive validation in simulation and demonstrative hardware experiments showcase the efficacy of our method, representing a step forward toward enabling reliable widespread robot autonomy in complex, uncertain environments.
Ocean Current-Harnessing Stage-Gated MPC: Monotone Cost Shaping and Speed-to-Fly for Energy-Efficient AUV Navigation
Autonomous Underwater Vehicles (AUVs) are a highly promising technology for ocean exploration and diverse offshore operations, yet their practical deployment is constrained by energy efficiency and endurance. To address this, we propose Current-Harnessing Stage-Gated MPC, which exploits ocean currents via a per-stage scalar which indicates the "helpfulness" of ocean currents. This scalar is computed along the prediction horizon to gate lightweight cost terms only where the ocean currents truly aids the control goal. The proposed cost terms, that are merged in the objective function, are (i) a Monotone Cost Shaping (MCS) term, a help-gated, non-worsening modification that relaxes along-track position error and provides a bounded translational energy rebate, guaranteeing the shaped objective is never larger than a set baseline, and (ii) a speed-to-fly (STF) cost component that increases the price of thrust and softly matches ground velocity to the ocean current, enabling near zero water-relative "gliding". All terms are C1 and integrate as a plug-and-play in MPC designs. Extensive simulations with the BlueROV2 model under realistic ocean current fields show that the proposed approach achieves substantially lower energy consumption than conventional predictive control while maintaining comparable arrival times and constraint satisfaction.
SyNeT: Synthetic Negatives for Traversability Learning
Reliable traversability estimation is crucial for autonomous robots to navigate complex outdoor environments safely. Existing self-supervised learning frameworks primarily rely on positive and unlabeled data; however, the lack of explicit negative data remains a critical limitation, hindering the model's ability to accurately identify diverse non-traversable regions. To address this issue, we introduce a method to explicitly construct synthetic negatives, representing plausible but non-traversable, and integrate them into vision-based traversability learning. Our approach is formulated as a training strategy that can be seamlessly integrated into both Positive-Unlabeled (PU) and Positive-Negative (PN) frameworks without modifying inference architectures. Complementing standard pixel-wise metrics, we introduce an object-centric FPR evaluation approach that analyzes predictions in regions where synthetic negatives are inserted. This evaluation provides an indirect measure of the model's ability to consistently identify non-traversable regions without additional manual labeling. Extensive experiments on both public and self-collected datasets demonstrate that our approach significantly enhances robustness and generalization across diverse environments. The source code and demonstration videos are publicly available at the project page: https://anonymous-synet.github.io/SyNet.github.io/
VVLoc: Prior-free 3-DoF Vehicle Visual Localization
Localization is a critical technology in autonomous driving, encompassing both topological localization, which identifies the most similar map keyframe to the current observation, and metric localization, which provides precise spatial coordinates. Conventional methods typically address these tasks independently, rely on single-camera setups, and often require additional 3D semantic or pose priors, while lacking mechanisms to quantify the confidence of localization results, making them less feasible for real industrial applications. In this paper, we propose VVLoc, a unified pipeline that employs a single neural network to concurrently achieve topological and metric vehicle localization using multi-camera system. VVLoc first evaluates the geo-proximity between visual observations, then estimates their relative metric poses using a matching strategy, while also providing a confidence measure. Additionally, the training process for VVLoc is highly efficient, requiring only pairs of visual data and corresponding ground-truth poses, eliminating the need for complex supplementary data. We evaluate VVLoc not only on the publicly available datasets, but also on a more challenging self-collected dataset, demonstrating its ability to deliver state-of-the-art localization accuracy across a wide range of localization tasks.
Physics-informed Diffusion Mamba Transformer for Real-world Driving
Autonomous driving systems demand trajectory planners that not only model the inherent uncertainty of future motions but also respect complex temporal dependencies and underlying physical laws. While diffusion-based generative models excel at capturing multi-modal distributions, they often fail to incorporate long-term sequential contexts and domain-specific physical priors. In this work, we bridge these gaps with two key innovations. First, we introduce a Diffusion Mamba Transformer architecture that embeds mamba and attention into the diffusion process, enabling more effective aggregation of sequential input contexts from sensor streams and past motion histories. Second, we design a Port-Hamiltonian Neural Network module that seamlessly integrates energy-based physical constraints into the diffusion model, thereby enhancing trajectory predictions with both consistency and interpretability. Extensive evaluations on standard autonomous driving benchmarks demonstrate that our unified framework significantly outperforms state-of-the-art baselines in predictive accuracy, physical plausibility, and robustness, thereby advancing safe and reliable motion planning.
Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds
Existing Vision-Language-Action (VLA) models typically take 2D images as visual input, which limits their spatial understanding in complex scenes. How can we incorporate 3D information to enhance VLA capabilities? We conduct a pilot study across different observation spaces and visual representations. The results show that explicitly lifting visual input into point clouds yields representations that better complement their corresponding 2D representations. To address the challenges of (1) scarce 3D data and (2) the domain gap induced by cross-environment differences and depth-scale biases, we propose Any3D-VLA. It unifies the simulator, sensor, and model-estimated point clouds within a training pipeline, constructs diverse inputs, and learns domain-agnostic 3D representations that are fused with the corresponding 2D representations. Simulation and real-world experiments demonstrate Any3D-VLA's advantages in improving performance and mitigating the domain gap. Our project homepage is available at https://xianzhefan.github.io/Any3D-VLA.github.io.
SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning
Vision-Language-Action (VLA) models exhibit strong generalization in robotic manipulation, yet reinforcement learning (RL) fine-tuning often degrades robustness under spatial distribution shifts. For flow-matching VLA policies, this degradation is closely associated with the erosion of spatial inductive bias during RL adaptation, as sparse rewards and spatially agnostic exploration increasingly favor short-horizon visual cues. To address this issue, we propose \textbf{SA-VLA}, a spatially-aware RL adaptation framework that preserves spatial grounding during policy optimization by aligning representation learning, reward design, and exploration with task geometry. SA-VLA fuses implicit spatial representations with visual tokens, provides dense rewards that reflect geometric progress, and employs \textbf{SCAN}, a spatially-conditioned annealed exploration strategy tailored to flow-matching dynamics. Across challenging multi-object and cluttered manipulation benchmarks, SA-VLA enables stable RL fine-tuning and improves zero-shot spatial generalization, yielding more robust and transferable behaviors. Code and project page are available at https://xupan.top/Projects/savla.
comment: Version 1
USS-Nav: Unified Spatio-Semantic Scene Graph for Lightweight UAV Zero-Shot Object Navigation
Zero-Shot Object Navigation in unknown environments poses significant challenges for Unmanned Aerial Vehicles (UAVs) due to the conflict between high-level semantic reasoning requirements and limited onboard computational resources. To address this, we present USS-Nav, a lightweight framework that incrementally constructs a Unified Spatio-Semantic scene graph and enables efficient Large Language Model (LLM)-augmented Zero-Shot Object Navigation in unknown environments. Specifically, we introduce an incremental Spatial Connectivity Graph generation method utilizing polyhedral expansion to capture global geometric topology, which is dynamically partitioned into semantic regions via graph clustering. Concurrently, open-vocabulary object semantics are instantiated and anchored to this topology to form a hierarchical environmental representation. Leveraging this hierarchical structure, we present a coarse-to-fine exploration strategy: LLM grounded in the scene graph's semantics to determine global target regions, while a local planner optimizes frontier coverage based on information gain. Experimental results demonstrate that our framework outperforms state-of-the-art methods in terms of computational efficiency and real-time update frequency (15 Hz) on a resource-constrained platform. Furthermore, ablation studies confirm the effectiveness of our framework, showing substantial improvements in Success weighted by Path Length (SPL). The source code will be made publicly available to foster further research.
Learning to Accelerate Vision-Language-Action Models through Adaptive Visual Token Caching
Vision-Language-Action (VLA) models have demonstrated remarkable generalization capabilities in robotic manipulation tasks, yet their substantial computational overhead remains a critical obstacle to real-world deployment. Improving inference efficiency is therefore essential for practical robotic applications. Existing acceleration methods often rely on heuristic or static strategies--such as rule-based token caching or pruning--that are decoupled from task objectives and fail to adapt to dynamic scene changes. In this work, we reformulate inference acceleration as a learnable policy optimization problem and propose a novel framework that integrates a dynamic, task-aware decision-making process directly into the VLA model. At its core are two lightweight, cooperative modules: a Cached Token Selector, which determines which tokens should be reused, and a Cache Ratio Predictor, which controls how many tokens to reuse. Training these modules is non-trivial due to their discrete decisions. We address this by adopting a differentiable relaxation that allows gradient-based end-to-end optimization. Extensive experiments on the LIBERO and SIMPLER benchmarks, as well as real-robot evaluations, show that our method achieves a 1.76x wall-clock inference speedup while simultaneously improving the average success rate by 1.9 percentage points (from 75.0% to 76.9%) on LIBERO and by 5.0 percentage points on real-world tasks, significantly outperforming existing baselines. This work highlights the potential of learning task-aware computational allocation policies, paving the way for VLA models that are both powerful and efficient.
Toward Reliable Sim-to-Real Predictability for MoE-based Robust Quadrupedal Locomotion
Reinforcement learning has shown strong promise for quadrupedal agile locomotion, even with proprioception-only sensing. In practice, however, sim-to-real gap and reward overfitting in complex terrains can produce policies that fail to transfer, while physical validation remains risky and inefficient. To address these challenges, we introduce a unified framework encompassing a Mixture-of-Experts (MoE) locomotion policy for robust multi-terrain representation with RoboGauge, a predictive assessment suite that quantifies sim-to-real transferability. The MoE policy employs a gated set of specialist experts to decompose latent terrain and command modeling, achieving superior deployment robustness and generalization via proprioception alone. RoboGauge further provides multi-dimensional proprioception-based metrics via sim-to-sim tests over terrains, difficulty levels, and domain randomizations, enabling reliable MoE policy selection without extensive physical trials. Experiments on a Unitree Go2 demonstrate robust locomotion on unseen challenging terrains, including snow, sand, stairs, slopes, and 30 cm obstacles. In dedicated high-speed tests, the robot reaches 4 m/s and exhibits an emergent narrow-width gait associated with improved stability at high velocity.
Factored Reasoning with Inner Speech and Persistent Memory for Evidence-Grounded Human-Robot Interaction
Dialogue-based human-robot interaction requires robot cognitive assistants to maintain persistent user context, recover from underspecified requests, and ground responses in external evidence, while keeping intermediate decisions verifiable. In this paper we introduce JANUS, a cognitive architecture for assistive robots that models interaction as a partially observable Markov decision process and realizes control as a factored controller with typed interfaces. To this aim, Janus (i) decomposes the overall behavior into specialized modules, related to scope detection, intent recognition, memory, inner speech, query generation, and outer speech, and (ii) exposes explicit policies for information sufficiency, execution readiness, and tool grounding. A dedicated memory agent maintains a bounded recent-history buffer, a compact core memory, and an archival store with semantic retrieval, coupled through controlled consolidation and revision policies. Models inspired by the notion of inner speech in cognitive theories provide a control-oriented internal textual flow that validates parameter completeness and triggers clarification before grounding, while a faithfulness constraint ties robot-to-human claims to an evidence bundle combining working context and retrieved tool outputs. We evaluate JANUS through module-level unit tests in a dietary assistance domain grounded on a knowledge graph, reporting high agreement with curated references and practical latency profiles. These results support factored reasoning as a promising path to scalable, auditable, and evidence-grounded robot assistance over extended interaction horizons.
Agentic Reward Modeling: Verifying GUI Agent via Online Proactive Interaction
Reinforcement learning with verifiable rewards (RLVR) is pivotal for the continuous evolution of GUI agents, yet existing evaluation paradigms face significant limitations. Rule-based methods suffer from poor scalability and cannot handle open-ended tasks, while LLM-as-a-Judge approaches rely on passive visual observation, often failing to capture latent system states due to partial state observability. To address these challenges, we advocate for a paradigm shift from passive evaluation to Agentic Interactive Verification. We introduce VAGEN, a framework that employs a verifier agent equipped with interaction tools to autonomously plan verification strategies and proactively probe the environment for evidence of task completion. Leveraging the insight that GUI tasks are typically "easy to verify but hard to solve", VAGEN overcomes the bottlenecks of visual limitations. Experimental results on OSWorld-Verified and AndroidWorld benchmarks demonstrate that VAGEN significantly improves evaluation accuracy compared to LLM-as-a-Judge baselines and further enhances performance through test-time scaling strategies.
comment: 21 pages, 11 figures
UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning NeurIPS 2025
Motion simulation, prediction and planning are foundational tasks in autonomous driving, each essential for modeling and reasoning about dynamic traffic scenarios. While often addressed in isolation due to their differing objectives, such as generating diverse motion states or estimating optimal trajectories, these tasks inherently depend on shared capabilities: understanding multi-agent interactions, modeling motion behaviors, and reasoning over temporal and spatial dynamics. Despite this underlying commonality, existing approaches typically adopt specialized model designs, which hinders cross-task generalization and system scalability. More critically, this separation overlooks the potential mutual benefits among tasks. Motivated by these observations, we propose UniMotion, a unified motion framework that captures shared structures across motion tasks while accommodating their individual requirements. Built on a decoder-only Transformer architecture, UniMotion employs dedicated interaction modes and tailored training strategies to simultaneously support these motion tasks. This unified design not only enables joint optimization and representation sharing but also allows for targeted fine-tuning to specialize in individual tasks when needed. Extensive experiments on the Waymo Open Motion Dataset demonstrate that joint training leads to robust generalization and effective task integration. With further fine-tuning, UniMotion achieves state-of-the-art performance across a range of motion tasks, establishing it as a versatile and scalable solution for autonomous driving.
comment: Accepted at NeurIPS 2025
ConLA: Contrastive Latent Action Learning from Human Videos for Robotic Manipulation
Vision-Language-Action (VLA) models achieve preliminary generalization through pretraining on large scale robot teleoperation datasets. However, acquiring datasets that comprehensively cover diverse tasks and environments is extremely costly and difficult to scale. In contrast, human demonstration videos offer a rich and scalable source of diverse scenes and manipulation behaviors, yet their lack of explicit action supervision hinders direct utilization. Prior work leverages VQ-VAE based frameworks to learn latent actions from human videos in an unsupervised manner. Nevertheless, since the training objective primarily focuses on reconstructing visual appearances rather than capturing inter-frame dynamics, the learned representations tend to rely on spurious visual cues, leading to shortcut learning and entangled latent representations that hinder transferability. To address this, we propose ConLA, an unsupervised pretraining framework for learning robotic policies from human videos. ConLA introduces a contrastive disentanglement mechanism that leverages action category priors and temporal cues to isolate motion dynamics from visual content, effectively mitigating shortcut learning. Extensive experiments show that ConLA achieves strong performance across diverse benchmarks. Notably, by pretraining solely on human videos, our method for the first time surpasses the performance obtained with real robot trajectory pretraining, highlighting its ability to extract pure and semantically consistent latent action representations for scalable robot learning.
APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation
Aerial Object Goal Navigation, a challenging frontier in Embodied AI, requires an Unmanned Aerial Vehicle (UAV) agent to autonomously explore, reason, and identify a specific target using only visual perception and language description. However, existing methods struggle with the memorization of complex spatial representations in aerial environments, reliable and interpretable action decision-making, and inefficient exploration and information gathering. To address these challenges, we introduce \textbf{APEX} (Aerial Parallel Explorer), a novel hierarchical agent designed for efficient exploration and target acquisition in complex aerial settings. APEX is built upon a modular, three-part architecture: 1) Dynamic Spatio-Semantic Mapping Memory, which leverages the zero-shot capability of a Vision-Language Model (VLM) to dynamically construct high-resolution 3D Attraction, Exploration, and Obstacle maps, serving as an interpretable memory mechanism. 2) Action Decision Module, trained with reinforcement learning, which translates this rich spatial understanding into a fine-grained and robust control policy. 3) Target Grounding Module, which employs an open-vocabulary detector to achieve definitive and generalizable target identification. All these components are integrated into a hierarchical, asynchronous, and parallel framework, effectively bypassing the VLM's inference latency and boosting the agent's proactivity in exploration. Extensive experiments show that APEX outperforms the previous state of the art by +4.2\% SR and +2.8\% SPL on challenging UAV-ON benchmarks, demonstrating its superior efficiency and the effectiveness of its hierarchical asynchronous design. Our source code is provided in \href{https://github.com/4amGodvzx/apex}{GitHub}
comment: 15 pages, 8 figures
A Low-Cost Vision-Based Tactile Gripper with Pretraining Learning for Contact-Rich Manipulation
Robotic manipulation in contact-rich environments remains challenging, particularly when relying on conventional tactile sensors that suffer from limited sensing range, reliability, and cost-effectiveness. In this work, we present LVTG, a low-cost visuo-tactile gripper designed for stable, robust, and efficient physical interaction. Unlike existing visuo-tactile sensors, LVTG enables more effective and stable grasping of larger and heavier everyday objects, thanks to its enhanced tactile sensing area and greater opening angle. Its surface skin is made of highly wear-resistant material, significantly improving durability and extending operational lifespan. The integration of vision and tactile feedback allows LVTG to provide rich, high-fidelity sensory data, facilitating reliable perception during complex manipulation tasks. Furthermore, LVTG features a modular design that supports rapid maintenance and replacement. To effectively fuse vision and touch, We adopt a CLIP-inspired contrastive learning objective to align tactile embeddings with their corresponding visual observations, enabling a shared cross-modal representation space for visuo-tactile perception. This alignment improves the performance of an Action Chunking Transformer (ACT) policy in contact-rich manipulation, leading to more efficient data collection and more effective policy learning. Compared to the original ACT method, the proposed LVTG with pretraining achieves significantly higher success rates in manipulation tasks.
Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning
Vision-Language-Action (VLA) models have become foundational to modern embodied AI systems. By integrating visual perception, language understanding, and action planning, they enable general-purpose task execution across diverse environments. Despite their importance, the security of VLA models remains underexplored -- particularly in the context of backdoor attacks, which pose realistic threats in physical-world deployments. While recent methods attempt to inject backdoors into VLA models, these backdoors are easily erased during downstream adaptation, as user-side fine-tuning with clean data significantly alters model parameters, rendering them impractical for real-world applications. To address these challenges, we propose INFUSE (INjection into Fine-tUne-inSensitive modulEs), the first backdoor attack framework for VLA base models that remains effective even with arbitrary user fine-tuning. INFUSE begins by analyzing parameter sensitivity across diverse fine-tuning scenarios to identify modules that remain largely unchanged -- the fine-tune-insensitive modules. It then injects backdoors into these stable modules while freezing the rest, ensuring malicious behavior persists after extensive user fine-tuning. Comprehensive experiments across multiple VLA architectures demonstrate INFUSE's effectiveness. After user-side fine-tuning, INFUSE maintains mean attack success rates of 91.0% on simulation environments and 79.8% on real-world robot tasks, substantially surpassing BadVLA (38.8% and 36.6%, respectively), while preserving clean-task performance comparable to standard models. These results uncover a critical threat: backdoors implanted before distribution can persist through fine-tuning and remain effective at deployment.
FISC: A Fluid-Inspired Framework for Decentralized and Scalable Swarm Control
Achieving scalable coordination in large robotic swarms is often constrained by reliance on inter-agent communication, which introduces latency, bandwidth limitations, and vulnerability to failure. To address this gap, a decentralized approach for outer-loop control of large multi-agent systems based on the paradigm of how a fluid moves through a volume is proposed and evaluated. A relationship between fundamental fluidic element properties and individual robotic agent states is developed such that the corresponding swarm "flows" through a space, akin to a fluid when forced via a pressure boundary condition. By ascribing fluid-like properties to subsets of agents, the swarm evolves collectively while maintaining desirable structure and coherence without explicit communication of agent states within or outside of the swarm. The approach is evaluated using simulations involving $O(10^3)$ quadcopter agents and compared against Computational Fluid Dynamics (CFD) solutions for a converging-diverging domain. Quantitative agreement between swarm-derived and CFD fields is assessed using Root-Mean-Square Error (RMSE), yielding normalized errors of 0.15-0.9 for velocity, 0.61-0.98 for density, 0-0.937 for pressure. These results demonstrate the feasibility of treating large robotic swarms as continuum systems that retain the macroscopic structure derived from first principles, providing a basis for scalable and decentralized control.
Parallel Stochastic Gradient-Based Planning for World Models
World models simulate environment dynamics from raw sensory inputs like video. However, using them for planning can be challenging due to the vast and unstructured search space. We propose a robust and highly parallelizable planner that leverages the differentiability of the learned world model for efficient optimization, solving long-horizon control tasks from visual input. Our method treats states as optimization variables ("virtual states") with soft dynamics constraints, enabling parallel computation and easier optimization. To facilitate exploration and avoid local optima, we introduce stochasticity into the states. To mitigate sensitive gradients through high-dimensional vision-based world models, we modify the gradient structure to descend towards valid plans while only requiring action-input gradients. Our planner, which we call GRASP (Gradient RelAxed Stochastic Planner), can be viewed as a stochastic version of a non-condensed or collocation-based optimal controller. We provide theoretical justification and experiments on video-based world models, where our resulting planner outperforms existing planning algorithms like the cross-entropy method (CEM) and vanilla gradient-based optimization (GD) on long-horizon experiments, both in success rate and time to convergence.
comment: 23 pages, 7 figures
Stealthy Coverage Control for Human-enabled Real-Time 3D Reconstruction
In this paper, we propose a novel semi-autonomous image sampling strategy, called stealthy coverage control, for human-enabled 3D structure reconstruction. The present mission involves a fundamental problem: while the number of images required to accurately reconstruct a 3D model depends on the structural complexity of the target scene to be reconstructed, it is not realistic to assume prior knowledge of the spatially non-uniform structural complexity. We approach this issue by leveraging human flexible reasoning and situational recognition capabilities. Specifically, we design a semi-autonomous system that leaves identification of regions that need more images and navigation of the drones to such regions to a human operator. To this end, we first present a way to reflect the human intention in autonomous coverage control. Subsequently, in order to avoid operational conflicts between manual control and autonomous coverage control, we develop the stealthy coverage control that decouples the drone motion for efficient image sampling from navigation by the human. Simulation studies on a Unity/ROS2-based simulator demonstrate that the present semi-autonomous system outperforms the one without human interventions in the sense of the reconstructed model quality.
comment: This work has been submitted to the 23rd IFAC World Congress for possible publication
LatentTrack: Sequential Weight Generation via Latent Filtering
We introduce LatentTrack (LT), a sequential neural architecture for online probabilistic prediction under nonstationary dynamics. LT performs causal Bayesian filtering in a low-dimensional latent space and uses a lightweight hypernetwork to generate predictive model parameters at each time step, enabling constant-time online adaptation without per-step gradient updates. At each time step, a learned latent model predicts the next latent distribution, which is updated via amortized inference using new observations, yielding a predict--generate--update filtering framework in function space. The formulation supports both structured (Markovian) and unstructured latent dynamics within a unified objective, while Monte Carlo inference over latent trajectories produces calibrated predictive mixtures with fixed per-step cost. Evaluated on long-horizon online regression using the Jena Climate benchmark, LT consistently achieves lower negative log-likelihood and mean squared error than stateful sequential and static uncertainty-aware baselines, with competitive calibration, demonstrating that latent-conditioned function evolution is an effective alternative to traditional latent-state modeling under distribution shift.
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This paper presents the MOBIUS platform, a bipedal robot capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs, two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning for locomotion and force control for compliant contact interactions during manipulation. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: Collaborative work between the Robotics and Mechanisms Laboratory (RoMeLa) and Mitsubishi Electric Research Laboratories (MERL)
Transferring Kinesthetic Demonstrations across Diverse Objects for Manipulation Planning IROS
Given a demonstration of a complex manipulation task, such as pouring liquid from one container to another, we seek to generate a motion plan for a new task instance involving objects with different geometries. This is nontrivial since we need to simultaneously ensure that the implicit motion constraints are satisfied (glass held upright while moving), that the motion is collision-free, and that the task is successful (e.g., liquid is poured into the target container). We solve this problem by identifying the positions of critical locations and associating a reference frame (called "motion transfer frames") on the manipulated object and the target, selected based on their geometries and the task at hand. By tracking and transferring the path of the motion transfer frames, we generate motion plans for arbitrary task instances with objects of different geometries and poses. We show results from simulation as well as robot experiments on physical objects to evaluate the effectiveness of our solution. A video supplement is available on YouTube: https://youtu.be/RuG9zMXnfR8
comment: Published in: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ---- External Link: https://doi.org/10.1109/IROS60139.2025.11246024
Fast Policy Learning for 6-DOF Position Control of Underwater Vehicles
Autonomous Underwater Vehicles (AUVs) require reliable six-degree-of-freedom (6-DOF) position control to operate effectively in complex and dynamic marine environments. Traditional controllers are effective under nominal conditions but exhibit degraded performance when faced with unmodeled dynamics or environmental disturbances. Reinforcement learning (RL) provides a powerful alternative but training is typically slow and sim-to-real transfer remains challenging. This work introduces a GPU accelerated RL training pipeline built in JAX and MuJoCo-XLA (MJX). By jointly JIT-compiling large-scale parallel physics simulation and learning updates, we achieve training times of under two minutes. Through systematic evaluation of multiple RL algorithms, we show robust 6-DOF trajectory tracking and effective disturbance rejection in real underwater experiments, with policies transferred zero-shot from simulation.
CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents
While current navigation benchmarks prioritize task success in simplified settings, they neglect the multidimensional economic constraints essential for the real-world commercialization of autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents through comprehensive economic cost-revenue analysis aligned with real-world business operations. By integrating industry-standard data - such as SEC filings and AIS injury reports - with Isaac Sim's detailed collision and cargo dynamics, CostNav transcends simple task completion to accurately evaluate business value in complex, real-world scenarios. To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability, revealing that optimizing for task success on a simplified task fundamentally differs from optimizing for real-world economic deployment. Our evaluation of rule-based Nav2 navigation shows that current approaches are not economically viable: the contribution margin is -22.81/run (AMCL) and -12.87/run (GPS), resulting in no break-even point. We challenge the community to develop navigation policies that achieve economic viability on CostNav. We remain method-agnostic, evaluating success solely on the metric of cost rather than the underlying architecture. All resources are available at https://github.com/worv-ai/CostNav.
A Dataset and Benchmark for Robotic Cloth Unfolding Grasp Selection: The ICRA 2024 Cloth Competition
Robotic cloth manipulation suffers from a lack of standardized benchmarks and shared datasets for evaluating and comparing different approaches. To address this, we created a benchmark and organized the ICRA 2024 Cloth Competition, a unique head-to-head evaluation focused on grasp pose selection for in-air robotic cloth unfolding. Eleven diverse teams participated in the competition, utilizing our publicly released dataset of real-world robotic cloth unfolding attempts and a variety of methods to design their unfolding approaches. Afterwards, we also expanded our dataset with 176 competition evaluation trials, resulting in a dataset of 679 unfolding demonstrations across 34 garments. Analysis of the competition results revealed insights about the trade-off between grasp success and coverage, the surprisingly strong achievements of hand-engineered methods and a significant discrepancy between competition performance and prior work, underscoring the importance of independent, out-of-the-lab evaluation in robotic cloth manipulation. The associated dataset is a valuable resource for developing and evaluating grasp selection methods, particularly for learning-based approaches. We hope that our benchmark, dataset and competition results can serve as a foundation for future benchmarks and drive further progress in data-driven robotic cloth manipulation. The dataset and benchmarking code are available at https://airo.ugent.be/cloth_competition.
comment: The International Journal of Robotics Research. 2026;0(0). Published at IJRR - https://journals.sagepub.com/doi/10.1177/02783649251414885
Field evaluation and optimization of a lightweight autonomous lidar-based UAV system based on a rigorous experimental setup in boreal forest environments
Interest in utilizing autonomous uncrewed aerial vehicles (UAVs) for under-canopy forest remote sensing has increased in recent years, resulting in the publication of numerous autonomous flight algorithms in the scientific literature. To support the selection and development of such algorithms, a reliable comparison of existing approaches based on published studies is essential. However, reliable comparisons are currently challenging due to widely varying experimental setups and incomplete reporting practices. This study proposes a standardized experimental setup for evaluating autonomous under-canopy UAV systems to fill this gap. The proposed setup emphasizes quantitative reporting of forest complexity, visual representation of test environments, execution of multiple repeated flights, and reporting of flight success rates alongside qualitative flight results. In addition, flights at multiple target speeds are encouraged, with reporting of realized flight speed, mission completion time, and point-to-point flight distance. The proposed setup is demonstrated using a lightweight lidar-based quadrotor employing state-of-the-art open-source algorithms, evaluated through extensive experiments in two natural boreal forest environments. Based on a systematic evaluation of the original system, several improvements were introduced. The same experimental protocol was then repeated with the optimized system, resulting in a total of 93 real-world flights. The optimized system achieved success rates of 12/15 and 15/15 at target flight speeds of 1 m/s and 2 m/s, respectively, in a medium-difficulty forest, and 12/15 and 5/15 in a difficult forest. Adoption of the proposed experimental setup would facilitate the literature-based comparison of autonomous under-canopy flight systems and support systematic performance improvement of future UAV-based forest robotics solutions.
comment: This work has been submitted to the IEEE for possible publication
LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, LangForce significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation
Adaptive navigation in unfamiliar indoor environments is crucial for household service robots. Despite advances in zero-shot perception and reasoning from vision-language models, existing navigation systems still rely on single-pass scoring at the decision layer, leading to overconfident long-horizon errors and redundant exploration. To tackle these problems, we propose Dual-Stance Cooperative Debate Navigation (DSCD-Nav), a decision mechanism that replaces one-shot scoring with stance-based cross-checking and evidence-aware arbitration to improve action reliability under partial observability. Specifically, given the same observation and candidate action set, we explicitly construct two stances by conditioning the evaluation on diverse and complementary objectives: a Task-Scene Understanding (TSU) stance that prioritizes goal progress from scene-layout cues, and a Safety-Information Balancing (SIB) stance that emphasizes risk and information value. The stances conduct a cooperative debate and make policy by cross-checking their top candidates with cue-grounded arguments. Then, a Navigation Consensus Arbitration (NCA) agent is employed to consolidate both sides' reasons and evidence, optionally triggering lightweight micro-probing to verify uncertain choices, preserving NCA's primary intent while disambiguating. Experiments on HM3Dv1, HM3Dv2, and MP3D demonstrate consistent improvements in success and path efficiency while reducing exploration redundancy.
CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes
Dexterous grasping in cluttered environments presents substantial challenges due to the high degrees of freedom of dexterous hands, occlusion, and potential collisions arising from diverse object geometries and complex layouts. To address these challenges, we propose CADGrasp, a two-stage algorithm for general dexterous grasping using single-view point cloud inputs. In the first stage, we predict sparse IBS, a scene-decoupled, contact- and collision-aware representation, as the optimization target. Sparse IBS compactly encodes the geometric and contact relationships between the dexterous hand and the scene, enabling stable and collision-free dexterous grasp pose optimization. To enhance the prediction of this high-dimensional representation, we introduce an occupancy-diffusion model with voxel-level conditional guidance and force closure score filtering. In the second stage, we develop several energy functions and ranking strategies for optimization based on sparse IBS to generate high-quality dexterous grasp poses. Extensive experiments in both simulated and real-world settings validate the effectiveness of our approach, demonstrating its capability to mitigate collisions while maintaining a high grasp success rate across diverse objects and complex scenes.
Multiagent Systems
Persuasion Propagation in LLM Agents
Modern AI agents increasingly combine conversational interaction with autonomous task execution, such as coding and web research, raising a natural question: what happens when an agent engaged in long-horizon tasks is subjected to user persuasion? We study how belief-level intervention can influence downstream task behavior, a phenomenon we name \emph{persuasion propagation}. We introduce a behavior-centered evaluation framework that distinguishes between persuasion applied during or prior to task execution. Across web research and coding tasks, we find that on-the-fly persuasion induces weak and inconsistent behavioral effects. In contrast, when the belief state is explicitly specified at task time, belief-prefilled agents conduct on average 26.9\% fewer searches and visit 16.9\% fewer unique sources than neutral-prefilled agents. These results suggest that persuasion, even in prior interaction, can affect the agent's behavior, motivating behavior-level evaluation in agentic systems.
comment: Code available at https://github.com/HyejunJeong/persuasion-propagation
Communications-Incentivized Collaborative Reasoning in NetGPT through Agentic Reinforcement Learning
The evolution of next-Generation (xG) wireless networks marks a paradigm shift from connectivity-centric architectures to Artificial Intelligence (AI)-native designs that tightly integrate data, computing, and communication. Yet existing AI deployments in communication systems remain largely siloed, offering isolated optimizations without intrinsic adaptability, dynamic task delegation, or multi-agent collaboration. In this work, we propose a unified agentic NetGPT framework for AI-native xG networks, wherein a NetGPT core can either perform autonomous reasoning or delegate sub-tasks to domain-specialized agents via agentic communication. The framework establishes clear modular responsibilities and interoperable workflows, enabling scalable, distributed intelligence across the network. To support continual refinement of collaborative reasoning strategies, the framework is further enhanced through Agentic reinforcement learning under partially observable conditions and stochastic external states. The training pipeline incorporates masked loss against external agent uncertainty, entropy-guided exploration, and multi-objective rewards that jointly capture task quality, coordination efficiency, and resource constraints. Through this process, NetGPT learns when and how to collaborate, effectively balancing internal reasoning with agent invocation. Overall, this work provides a foundational architecture and training methodology for self-evolving, AI-native xG networks capable of autonomous sensing, reasoning, and action in complex communication environments.
Evolving Interpretable Constitutions for Multi-Agent Simulation
Constitutional AI has focused on single-model alignment using fixed principles. However, multi-agent systems create novel alignment challenges through emergent social dynamics. We present Constitutional Evolution, a framework for automatically discovering behavioral norms in multi-agent LLM systems. Using a grid-world simulation with survival pressure, we study the tension between individual and collective welfare, quantified via a Societal Stability Score S in [0,1] that combines productivity, survival, and conflict metrics. Adversarial constitutions lead to societal collapse (S= 0), while vague prosocial principles ("be helpful, harmless, honest") produce inconsistent coordination (S = 0.249). Even constitutions designed by Claude 4.5 Opus with explicit knowledge of the objective achieve only moderate performance (S= 0.332). Using LLM-driven genetic programming with multi-island evolution, we evolve constitutions maximizing social welfare without explicit guidance toward cooperation. The evolved constitution C* achieves S = 0.556 +/- 0.008 (123% higher than human-designed baselines, N = 10), eliminates conflict, and discovers that minimizing communication (0.9% vs 62.2% social actions) outperforms verbose coordination. Our interpretable rules demonstrate that cooperative norms can be discovered rather than prescribed.
comment: 23 pages, 4 figures
OpenGuanDan: A Large-Scale Imperfect Information Game Benchmark
The advancement of data-driven artificial intelligence (AI), particularly machine learning, heavily depends on large-scale benchmarks. Despite remarkable progress across domains ranging from pattern recognition to intelligent decision-making in recent decades, exemplified by breakthroughs in board games, card games, and electronic sports games, there remains a pressing need for more challenging benchmarks to drive further research. To this end, this paper proposes OpenGuanDan, a novel benchmark that enables both efficient simulation of GuanDan (a popular four-player, multi-round Chinese card game) and comprehensive evaluation of both learning-based and rule-based GuanDan AI agents. OpenGuanDan poses a suite of nontrivial challenges, including imperfect information, large-scale information set and action spaces, a mixed learning objective involving cooperation and competition, long-horizon decision-making, variable action spaces, and dynamic team composition. These characteristics make it a demanding testbed for existing intelligent decision-making methods. Moreover, the independent API for each player allows human-AI interactions and supports integration with large language models. Empirically, we conduct two types of evaluations: (1) pairwise competitions among all GuanDan AI agents, and (2) human-AI matchups. Experimental results demonstrate that while current learning-based agents substantially outperform rule-based counterparts, they still fall short of achieving superhuman performance, underscoring the need for continued research in multi-agent intelligent decision-making domain. The project is publicly available at https://github.com/GameAI-NJUPT/OpenGuanDan.
Evolving Interpretable Constitutions for Multi-Agent Coordination
Constitutional AI has focused on single-model alignment using fixed principles. However, multi-agent systems create novel alignment challenges through emergent social dynamics. We present Constitutional Evolution, a framework for automatically discovering behavioral norms in multi-agent LLM systems. Using a grid-world simulation with survival pressure, we study the tension between individual and collective welfare, quantified via a Societal Stability Score S in [0,1] that combines productivity, survival, and conflict metrics. Adversarial constitutions lead to societal collapse (S= 0), while vague prosocial principles ("be helpful, harmless, honest") produce inconsistent coordination (S = 0.249). Even constitutions designed by Claude 4.5 Opus with explicit knowledge of the objective achieve only moderate performance (S= 0.332). Using LLM-driven genetic programming with multi-island evolution, we evolve constitutions maximizing social welfare without explicit guidance toward cooperation. The evolved constitution C* achieves S = 0.556 +/- 0.008 (123% higher than human-designed baselines, N = 10), eliminates conflict, and discovers that minimizing communication (0.9% vs 62.2% social actions) outperforms verbose coordination. Our interpretable rules demonstrate that cooperative norms can be discovered rather than prescribed.
comment: 23 pages, 4 figures
LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science
Advances in large language models (LLMs) have created new opportunities in data science, but their deployment is often limited by the challenge of finding relevant data in large data lakes. Existing methods struggle with this: both single- and multi-agent systems are quickly overwhelmed by large, heterogeneous files, and master-slave multi-agent systems rely on a rigid central controller that requires precise knowledge of each sub-agent's capabilities, which is not possible in large-scale settings where the main agent lacks full observability over sub-agents' knowledge and competencies. We propose a novel multi-agent paradigm inspired by the blackboard architecture for traditional AI models. In our framework, a central agent posts requests to a shared blackboard, and autonomous subordinate agents - either responsible for a partition of the data lake or retrieval from the web - volunteer to respond based on their capabilities. This design improves scalability and flexibility by removing the need for a central coordinator to know each agent's expertise or internal knowledge. We evaluate the approach on three benchmarks that require data discovery: KramaBench and modified versions of DSBench and DA-Code. Results show that the blackboard architecture substantially outperforms strong baselines, achieving 13%-57% relative improvements in end-to-end success and up to a 9% relative gain in data discovery F1 over the best baseline.
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies ICLR 2026
Large language models, employed as multiple agents that interact and collaborate with each other, have excelled at solving complex tasks. The agents are programmed with prompts that declare their functionality, along with the topologies that orchestrate interactions across agents. Designing prompts and topologies for multi-agent systems (MAS) is inherently complex. To automate the entire design process, we first conduct an in-depth analysis of the design space aiming to understand the factors behind building effective MAS. We reveal that prompts together with topologies play critical roles in enabling more effective MAS design. Based on the insights, we propose Multi-Agent System Search (MASS), a MAS optimization framework that efficiently exploits the complex MAS design space by interleaving its optimization stages, from local to global, from prompts to topologies, over three stages: 1) block-level (local) prompt optimization; 2) workflow topology optimization; 3) workflow-level (global) prompt optimization, where each stage is conditioned on the iteratively optimized prompts/topologies from former stages. We show that MASS-optimized multi-agent systems outperform a spectrum of existing alternatives by a substantial margin. Based on the MASS-found systems, we finally propose design principles behind building effective multi-agent systems.
comment: ICLR 2026
When Personas Override Payoffs: Role Identity Bias in Multi-Agent LLM Decision-Making
Large language models are increasingly deployed in multi-agent systems for strategic tasks, yet how design choices such as role-based personas and payoff visibility affect reasoning remains poorly understood. We investigate whether multi-agent systems function as strategic reasoners capable of payoff optimization or as identity-driven actors that prioritize role alignment over explicit incentives. Using Nash equilibrium achievement as a diagnostic for strategic reasoning, we conduct systematic experiments across four LLM architectures (Qwen-7B, Qwen-32B, Llama-8B, Mistral-7B) in complex environmental decision-making games involving four agents. We show that role identity bias fundamentally alters strategic reasoning even when payoff-optimal equilibria exist and complete payoff information is available. Removing personas and providing explicit payoffs enables Qwen models to achieve high Nash equilibrium rates, indicating that both conditions are necessary for strategic reasoning. In contrast, personas systematically bias equilibrium selection toward socially preferred outcomes: with personas present, all of the achieved equilibria correspond to Green Transition, while models entirely fail to reach equilibrium when Tragedy of the Commons is payoff-optimal. The effect of explicit payoffs depends entirely on persona presence, revealing strong interactions between representational design choices. We also observe clear model-dependent patterns. Qwen architectures are highly sensitive to both personas and payoff visibility, whereas Llama and Mistral exhibit rigid reasoning behavior across conditions. These findings demonstrate that representational choices are substantive governance decisions that determine whether multi-agent systems act as strategic reasoners or identity-driven actors, with important implications for real-world deployment.
Normative Feeling: Socially Patterned Affective Mechanisms
Breaking a norm elicits both material and emotional consequences, yet how this coupling arose evolutionarily remains unclear. We investigate this question in light of emerging work suggesting that normativity's building blocks emerged earlier in evolution than previously considered, arguing that normative processes should inform accounts of how even ancient capacities such as mood evolved. Using a definition of normative processes we developed, we created an agent-based model with evolvable affect in a shared resource dilemma, comparing competition (non-normative) versus punishment (normative) conditions. Critically, different mood mechanisms emerge under each condition. Under competition, agents evolve a "bad mood -> consume more" response, creating a tragedy of the commons leading to resource depletion and population collapse. Under punishment, agents evolve a "bad mood -> consume less" mechanism, where negative affect functions as an implicit signal of social sanction, promoting resource conservation. Importantly, once normative logic is imprinted through punishment, it creates an evolutionary pathway for mood-based signalling that operates without costly physical enforcement. Our findings demonstrate how normative processes enable social preferences to emerge in a distributed manner within psychological mechanisms, showing how normative processes reprogram cognitive and physiological systems by embedding cultural patterns into psychological dispositions.
Replicating the behaviour of electric vehicle drivers using an agent-based reinforcement learning model
Despite the rapid expansion of electric vehicle (EV) charging networks, questions remain about their efficiency in meeting the growing needs of EV drivers. Previous simulation-based approaches, which rely on static behavioural rules, have struggled to capture the adaptive behaviours of human drivers. Although reinforcement learning has been introduced in EV simulation studies, its application has primarily focused on optimising fleet operations rather than modelling private drivers who make independent charging decisions. To address the gap, we propose a multi-stage reinforcement learning framework that simulates charging demand of private EV drivers across a national-scale road network. We validate the model against real-world data and identify the training stage that most closely reflects actual driver behaviour, which captures both the adaptive behaviours and bounded rationality of private drivers. Based on the simulation results, we also identify critical 'charging deserts' where EV drivers consistently have low state of charge. Our findings also highlight recent policy shifts toward expanding rapid charging hubs along motorway corridors and city boundaries to meet the demand from long-distance trips.
Systems and Control (EESS)
Reduction of Velocity-Dependent Terms in Total Energy Shaping Approach
Total energy shaping through interconnection and damping assignment passivity-based control (IDA-PBC) provides a powerful and systematic framework for stabilizing underactuated mechanical systems. Despite its theoretical appeal, incorporating actuator limitations into total energy shaping remains a largely open problem, with only limited results reported in the existing literature. In practice, the closed-loop behavior of energy-shaping controllers is strongly affected by the kinetic energy shaping terms. In this paper, a simultaneous IDA-PBC (SIDA-PBC) framework is employed to systematically attenuate the kinetic energy shaping terms by exploiting generalized forces, without altering the matching partial differential equations (PDEs). The free component of the generalized forces is derived analytically via an $\ell_\infty$-norm optimization formulation. Although a reduction in kinetic energy shaping terms does not necessarily guarantee a decrease in the overall control effort, the proposed approach effectively suppresses kinetic energy shaping components and achieves a reduced control magnitude whenever such a reduction is structurally feasible. Unlike existing approaches based on gyroscopic terms, which require multiple actuators, the proposed method is applicable to mechanical systems with a single actuator. Simulation and experimental results are provided to validate the effectiveness of the proposed approach.
Robust Energy Shaping Control of an Underactuated Inverted Pendulum
Although the stabilization of underactuated systems remains a challenging problem, the total energy shaping approach provides a general framework for addressing this objective. However, the practical implementation of this method is hindered by the need to analytically solve a set of partial differential equations (PDEs), which constitutes a major obstacle. In this paper, a rotary inverted pendulum system is considered, and an interconnection and damping assignment passivity-based control (IDA-PBC) scheme is developed by deriving concise analytical solutions to the kinetic and potential energy PDEs. Furthermore, a novel robust term is incorporated into the control law to compensate for a specific class of disturbances that has not been addressed within the existing IDA-PBC literature. The effectiveness of the proposed method is validated through numerical simulations, demonstrating satisfactory control performance.
Ocean Current-Harnessing Stage-Gated MPC: Monotone Cost Shaping and Speed-to-Fly for Energy-Efficient AUV Navigation
Autonomous Underwater Vehicles (AUVs) are a highly promising technology for ocean exploration and diverse offshore operations, yet their practical deployment is constrained by energy efficiency and endurance. To address this, we propose Current-Harnessing Stage-Gated MPC, which exploits ocean currents via a per-stage scalar which indicates the "helpfulness" of ocean currents. This scalar is computed along the prediction horizon to gate lightweight cost terms only where the ocean currents truly aids the control goal. The proposed cost terms, that are merged in the objective function, are (i) a Monotone Cost Shaping (MCS) term, a help-gated, non-worsening modification that relaxes along-track position error and provides a bounded translational energy rebate, guaranteeing the shaped objective is never larger than a set baseline, and (ii) a speed-to-fly (STF) cost component that increases the price of thrust and softly matches ground velocity to the ocean current, enabling near zero water-relative "gliding". All terms are C1 and integrate as a plug-and-play in MPC designs. Extensive simulations with the BlueROV2 model under realistic ocean current fields show that the proposed approach achieves substantially lower energy consumption than conventional predictive control while maintaining comparable arrival times and constraint satisfaction.
Cognitive-Flexible Control via Latent Model Reorganization with Predictive Safety Guarantees
Learning-enabled control systems must maintain safety when system dynamics and sensing conditions change abruptly. Although stochastic latent-state models enable uncertainty-aware control, most existing approaches rely on fixed internal representations and can degrade significantly under distributional shift. This letter proposes a \emph{cognitive-flexible control} framework in which latent belief representations adapt online, while the control law remains explicit and safety-certified. We introduce a Cognitive-Flexible Deep Stochastic State-Space Model (CF--DeepSSSM) that reorganizes latent representations subject to a bounded \emph{Cognitive Flexibility Index} (CFI), and embeds the adapted model within a Bayesian model predictive control (MPC) scheme. We establish guarantees on bounded posterior drift, recursive feasibility, and closed-loop stability. Simulation results under abrupt changes in system dynamics and observations demonstrate safe representation adaptation with rapid performance recovery, highlighting the benefits of learning-enabled, rather than learning-based, control for nonstationary cyber-physical systems.
Modeling and Control of Hybrid Distribution Transformers for Simultaneous Grid Services
Hybrid distribution transformers (HDTs) integrate conventional transformers with partially rated power electronic converters to improve power quality, enable advanced ancillary services and increase penetration of renewable energy sources in the national power grid. In this paper, we present an averaged mathematical model of a three-phase HDT equipped with two back-to-back voltage source converters connected in a series-shunt configuration. Cascaded PI controllers are designed in the synchronously rotating dq0 reference frame to regulate load voltage, compensate reactive power, achieve grid frequency regulation, and perform load phase balancing. Simulation results implemented in Python confirm that these simple yet effective control mechanisms allow HDTs to offer simultaneous grid services without introducing complexity. The complete model, control architecture, and implementation steps are detailed, enabling further validation and adoption.
Equilibrium of Feasible Zone and Uncertain Model in Safe Exploration
Ensuring the safety of environmental exploration is a critical problem in reinforcement learning (RL). While limiting exploration to a feasible zone has become widely accepted as a way to ensure safety, key questions remain unresolved: what is the maximum feasible zone achievable through exploration, and how can it be identified? This paper, for the first time, answers these questions by revealing that the goal of safe exploration is to find the equilibrium between the feasible zone and the environment model. This conclusion is based on the understanding that these two components are interdependent: a larger feasible zone leads to a more accurate environment model, and a more accurate model, in turn, enables exploring a larger zone. We propose the first equilibrium-oriented safe exploration framework called safe equilibrium exploration (SEE), which alternates between finding the maximum feasible zone and the least uncertain model. Using a graph formulation of the uncertain model, we prove that the uncertain model obtained by SEE is monotonically refined, the feasible zones monotonically expand, and both converge to the equilibrium of safe exploration. Experiments on classic control tasks show that our algorithm successfully expands the feasible zones with zero constraint violation, and achieves the equilibrium of safe exploration within a few iterations.
Model-Based Data-Efficient and Robust Reinforcement Learning
A data-efficient learning-based control design method is proposed in this paper. It is based on learning a system dynamics model that is then leveraged in a two-level procedure. On the higher level, a simple but powerful optimization procedure is performed such that, for example, energy consumption in a vehicle can be reduced when hard state and action constraints are also introduced. Load disturbances and model errors are compensated for by a feedback controller on the lower level. In that regard, we briefly examine the robustness of both model-free and model-based learning approaches, and it is shown that the model-free approach greatly suffers from the inclusion of unmodeled dynamics. In evaluating the proposed method, it is assumed that a path is given, while the velocity and acceleration can be modified such that energy is saved, while still keeping speed limits and completion time. Compared with two well-known actor-critic reinforcement learning strategies, the suggested learning-based approach saves more energy and reduces the number of evaluated time steps by a factor of 100 or more.
AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models
State space models (SSMs) often sacrifice capacity, search space, or stability to offset the memory and compute costs of large state dimensions. We introduce a structured post-training pruning method for SSMs -- AIRE-Prune (Asymptotic Impulse-Response Energy for State PRUN(E)) -- that reduces each layer's state dimension by directly minimizing long-run output-energy distortion. AIRE-Prune assigns every state a closed-form asymptotic impulse-response energy-based score, i.e., the total impulse-response energy it contributes over an infinite horizon (time), and normalizes these scores layer-wise to enable global cross-layer comparison and selection. This extends modal truncation from single systems to deep stacks and aligns pruning with asymptotic response energy rather than worst-case gain. Across diverse sequence benchmarks, AIRE-Prune reveals substantial redundancy in SISO and MIMO SSMs with average pruning of 60.8%, with average accuracy drop of 0.29% without retraining, while significantly lowering compute. Code: https://github.com/falcon-arrow/AIRE-Prune.
Stealthy Coverage Control for Human-enabled Real-Time 3D Reconstruction
In this paper, we propose a novel semi-autonomous image sampling strategy, called stealthy coverage control, for human-enabled 3D structure reconstruction. The present mission involves a fundamental problem: while the number of images required to accurately reconstruct a 3D model depends on the structural complexity of the target scene to be reconstructed, it is not realistic to assume prior knowledge of the spatially non-uniform structural complexity. We approach this issue by leveraging human flexible reasoning and situational recognition capabilities. Specifically, we design a semi-autonomous system that leaves identification of regions that need more images and navigation of the drones to such regions to a human operator. To this end, we first present a way to reflect the human intention in autonomous coverage control. Subsequently, in order to avoid operational conflicts between manual control and autonomous coverage control, we develop the stealthy coverage control that decouples the drone motion for efficient image sampling from navigation by the human. Simulation studies on a Unity/ROS2-based simulator demonstrate that the present semi-autonomous system outperforms the one without human interventions in the sense of the reconstructed model quality.
comment: This work has been submitted to the 23rd IFAC World Congress for possible publication
MOBIUS: A Multi-Modal Bipedal Robot that can Walk, Crawl, Climb, and Roll
This paper presents the MOBIUS platform, a bipedal robot capable of walking, crawling, climbing, and rolling. MOBIUS features four limbs, two 6-DoF arms with two-finger grippers for manipulation and climbing, and two 4-DoF legs for locomotion--enabling smooth transitions across diverse terrains without reconfiguration. A hybrid control architecture combines reinforcement learning for locomotion and force control for compliant contact interactions during manipulation. A high-level MIQCP planner autonomously selects locomotion modes to balance stability and energy efficiency. Hardware experiments demonstrate robust gait transitions, dynamic climbing, and full-body load support via pinch grasp. Overall, MOBIUS demonstrates the importance of tight integration between morphology, high-level planning, and control to enable mobile loco-manipulation and grasping, substantially expanding its interaction capabilities, workspace, and traversability.
comment: Collaborative work between the Robotics and Mechanisms Laboratory (RoMeLa) and Mitsubishi Electric Research Laboratories (MERL)
Stochastic MPC with Online-optimized Policies and Closed-loop Guarantees
This paper proposes a stochastic model predictive control method for linear systems affected by additive Gaussian disturbances that optimizes over disturbance feedback matrices online. Closed-loop satisfaction of probabilistic constraints and recursive feasibility of the underlying convex optimization problem is guaranteed. Optimization over feedback policies online increases performance and reduces conservatism compared to fixed-feedback approaches. The central mechanism is a finitely determined maximal admissible set for probabilistic constraints, together with the reconditioning of the predicted probabilistic constraints on the current knowledge at every time step. The proposed method's applicability is demonstrated on a building temperature control example.
A novel switched systems approach to nonconvex optimisation
We develop a novel switching dynamics that converges to the Karush-Kuhn-Tucker (KKT) point of a nonlinear optimisation problem. This new approach is particularly notable for its lower dimensionality compared to conventional primal-dual dynamics, as it focuses exclusively on estimating the primal variable. Our method is successfully illustrated on general quadratic optimisation problems, the minimisation of the classical Rosenbrock function, and a nonconvex optimisation problem stemming from the control of energy-efficient buildings.
SCAR: State-Space Compression for Scalable AI-Based Network Management of Vehicular Services
The increasing demand for connected vehicular services poses significant challenges for AI-based network and service management due to the high volume and rapid variability of network state information. Traditional management and control mechanisms struggle to scale when processing fine-grained metrics such as Channel Quality Indicators (CQIs) in dynamic vehicular environments. To address this challenge, we propose SCAR (State-Space Compression for AI-Based Network Management), an edge-assisted framework that improves scalability and fairness in vehicular services through network state abstraction. SCAR employs machine-learning (ML)-based compression techniques, including clustering and radial basis function (RBF) networks, to reduce the dimensionality of CQI-derived state information while preserving essential features relevant to management decisions. The resulting compressed states are used to train reinforcement learning (RL)-based management policies that aim to maximize network efficiency while satisfying service-level fairness objectives defined by the NGMN. Simulation results show that SCAR increases the time spent in feasible management regions by 14% and reduces unfair service allocation time by 15% compared to reinforcement learning baselines operating on uncompressed state information. Furthermore, simulated annealing with stochastic tunneling (SAST)-based clustering reduces state compression distortion by 10%, confirming the effectiveness of the proposed approach. These results demonstrate that SCAR enables scalable and fair AI-assisted network and service management in dynamic vehicular systems.
Exploiting Data Significance in Remote Estimation of Discrete-State Markov Sources
We consider semantics-aware remote estimation of a discrete-state Markov source with both normal (low-priority) and alarm (high-priority) states. Erroneously announcing a normal state at the destination when the source is actually in an alarm state (i.e., missed alarm) incurs a significantly higher cost than falsely announcing an alarm state when the source is in a normal state (i.e., false alarm). Moreover, consecutive estimation errors may cause significant lasting impacts, such as maintenance costs and misoperations. Motivated by this, we introduce two new metrics, the Age of Missed Alarm (AoMA) and the Age of False Alarm (AoFA), to capture the lasting impacts incurred by different estimation errors. Notably, these two age processes evolve interdependently and distinguish between different error types. Our goal is to design a transmission policy that achieves an optimized trade-off between lasting impact and communication cost. The problem is formulated as a countably infinite-state Markov decision process (MDP) with an unbounded cost function. We show the existence of a simple switching policy with distinct thresholds for each age process and derive closed-form expressions for its performance. For symmetric and non-prioritized sources, we show that the optimal policy reduces to a threshold policy with identical thresholds. For numerical tractability, we propose a finite-state approximate MDP and prove that it converges exponentially fast to the original MDP in the truncation size. Finally, we develop an efficient search algorithm to compute the optimal switching policy and validate our theoretical findings with numerical results.
comment: This paper has been accepted for publication in the IEEE Transactions on Communications
Robotics
End-to-end Optimization of Belief and Policy Learning in Shared Autonomy Paradigms
Shared autonomy systems require principled methods for inferring user intent and determining appropriate assistance levels. This is a central challenge in human-robot interaction, where systems must be successful while being mindful of user agency. Previous approaches relied on static blending ratios or separated goal inference from assistance arbitration, leading to suboptimal performance in unstructured environments. We introduce BRACE (Bayesian Reinforcement Assistance with Context Encoding), a novel framework that fine-tunes Bayesian intent inference and context-adaptive assistance through an architecture enabling end-to-end gradient flow between intent inference and assistance arbitration. Our pipeline conditions collaborative control policies on environmental context and complete goal probability distributions. We provide analysis showing (1) optimal assistance levels should decrease with goal uncertainty and increase with environmental constraint severity, and (2) integrating belief information into policy learning yields a quadratic expected regret advantage over sequential approaches. We validated our algorithm against SOTA methods (IDA, DQN) using a three-part evaluation progressively isolating distinct challenges of end-effector control: (1) core human-interaction dynamics in a 2D human-in-the-loop cursor task, (2) non-linear dynamics of a robotic arm, and (3) integrated manipulation under goal ambiguity and environmental constraints. We demonstrate improvements over SOTA, achieving 6.3% higher success rates and 41% increased path efficiency, and 36.3% success rate and 87% path efficiency improvement over unassisted control. Our results confirmed that integrated optimization is most beneficial in complex, goal-ambiguous scenarios, and is generalizable across robotic domains requiring goal-directed assistance, advancing the SOTA for adaptive shared autonomy.
IRL-DAL: Safe and Adaptive Trajectory Planning for Autonomous Driving via Energy-Guided Diffusion Models
This paper proposes a novel inverse reinforcement learning framework using a diffusion-based adaptive lookahead planner (IRL-DAL) for autonomous vehicles. Training begins with imitation from an expert finite state machine (FSM) controller to provide a stable initialization. Environment terms are combined with an IRL discriminator signal to align with expert goals. Reinforcement learning (RL) is then performed with a hybrid reward that combines diffuse environmental feedback and targeted IRL rewards. A conditional diffusion model, which acts as a safety supervisor, plans safe paths. It stays in its lane, avoids obstacles, and moves smoothly. Then, a learnable adaptive mask (LAM) improves perception. It shifts visual attention based on vehicle speed and nearby hazards. After FSM-based imitation, the policy is fine-tuned with Proximal Policy Optimization (PPO). Training is run in the Webots simulator with a two-stage curriculum. A 96\% success rate is reached, and collisions are reduced to 0.05 per 1k steps, marking a new benchmark for safe navigation. By applying the proposed approach, the agent not only drives in lane but also handles unsafe conditions at an expert level, increasing robustness.We make our code publicly available.
FlowCalib: LiDAR-to-Vehicle Miscalibration Detection using Scene Flows
Accurate sensor-to-vehicle calibration is essential for safe autonomous driving. Angular misalignments of LiDAR sensors can lead to safety-critical issues during autonomous operation. However, current methods primarily focus on correcting sensor-to-sensor errors without considering the miscalibration of individual sensors that cause these errors in the first place. We introduce FlowCalib, the first framework that detects LiDAR-to-vehicle miscalibration using motion cues from the scene flow of static objects. Our approach leverages the systematic bias induced by rotational misalignment in the flow field generated from sequential 3D point clouds, eliminating the need for additional sensors. The architecture integrates a neural scene flow prior for flow estimation and incorporates a dual-branch detection network that fuses learned global flow features with handcrafted geometric descriptors. These combined representations allow the system to perform two complementary binary classification tasks: a global binary decision indicating whether misalignment is present and separate, axis-specific binary decisions indicating whether each rotational axis is misaligned. Experiments on the nuScenes dataset demonstrate FlowCalib's ability to robustly detect miscalibration, establishing a benchmark for sensor-to-vehicle miscalibration detection.
Temporally Coherent Imitation Learning via Latent Action Flow Matching for Robotic Manipulation
Learning long-horizon robotic manipulation requires jointly achieving expressive behavior modeling, real-time inference, and stable execution, which remains challenging for existing generative policies. Diffusion-based approaches provide strong modeling capacity but typically incur high inference latency, while flow matching enables fast one-step generation yet often leads to unstable execution when applied directly in the raw action space. We propose LG-Flow Policy, a trajectory-level imitation learning framework that performs flow matching in a continuous latent action space. By encoding action sequences into temporally regularized latent trajectories and learning an explicit latent-space flow, the proposed approach decouples global motion structure from low-level control noise, resulting in smooth and reliable long-horizon execution. LG-Flow Policy further incorporates geometry-aware point cloud conditioning and execution-time multimodal modulation, with visual cues evaluated as a representative modality in real-world settings. Experimental results in simulation and on physical robot platforms demonstrate that LG-Flow Policy achieves near single-step inference, substantially improves trajectory smoothness and task success over flow-based baselines operating in the raw action space, and remains significantly more efficient than diffusion-based policies.
comment: 8 pages, 8 figures
Robust and Generalized Humanoid Motion Tracking
Learning a general humanoid whole-body controller is challenging because practical reference motions can exhibit noise and inconsistencies after being transferred to the robot domain, and local defects may be amplified by closed-loop execution, causing drift or failure in highly dynamic and contact-rich behaviors. We propose a dynamics-conditioned command aggregation framework that uses a causal temporal encoder to summarize recent proprioception and a multi-head cross-attention command encoder to selectively aggregate a context window based on the current dynamics. We further integrate a fall recovery curriculum with random unstable initialization and an annealed upward assistance force to improve robustness and disturbance rejection. The resulting policy requires only about 3.5 hours of motion data and supports single-stage end-to-end training without distillation. The proposed method is evaluated under diverse reference inputs and challenging motion regimes, demonstrating zero-shot transfer to unseen motions as well as robust sim-to-real transfer on a physical humanoid robot.
RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning
On-policy deep reinforcement learning remains a dominant paradigm for continuous control, yet standard implementations rely on Gaussian actors and relatively shallow MLP policies, often leading to brittle optimization when gradients are noisy and policy updates must be conservative. In this paper, we revisit policy representation as a first-class design choice for on-policy optimization. We study discretized categorical actors that represent each action dimension with a distribution over bins, yielding a policy objective that resembles a cross-entropy loss. Building on architectural advances from supervised learning, we further propose regularized actor networks, while keeping critic design fixed. Our results show that simply replacing the standard actor network with our discretized regularized actor yields consistent gains and achieve the state-of-the-art performance across diverse continuous-control benchmarks.
MOSAIC: Modular Scalable Autonomy for Intelligent Coordination of Heterogeneous Robotic Teams
Mobile robots have become indispensable for exploring hostile environments, such as in space or disaster relief scenarios, but often remain limited to teleoperation by a human operator. This restricts the deployment scale and requires near-continuous low-latency communication between the operator and the robot. We present MOSAIC: a scalable autonomy framework for multi-robot scientific exploration using a unified mission abstraction based on Points of Interest (POIs) and multiple layers of autonomy, enabling supervision by a single operator. The framework dynamically allocates exploration and measurement tasks based on each robot's capabilities, leveraging team-level redundancy and specialization to enable continuous operation. We validated the framework in a space-analog field experiment emulating a lunar prospecting scenario, involving a heterogeneous team of five robots and a single operator. Despite the complete failure of one robot during the mission, the team completed 82.3% of assigned tasks at an Autonomy Ratio of 86%, while the operator workload remained at only 78.2%. These results demonstrate that the proposed framework enables robust, scalable multi-robot scientific exploration with limited operator intervention. We further derive practical lessons learned in robot interoperability, networking architecture, team composition, and operator workload management to inform future multi-robot exploration missions.
comment: This work has been submitted to the IEEE for possible publication
Learning Geometrically-Grounded 3D Visual Representations for View-Generalizable Robotic Manipulation
Real-world robotic manipulation demands visuomotor policies capable of robust spatial scene understanding and strong generalization across diverse camera viewpoints. While recent advances in 3D-aware visual representations have shown promise, they still suffer from several key limitations, including reliance on multi-view observations during inference which is impractical in single-view restricted scenarios, incomplete scene modeling that fails to capture holistic and fine-grained geometric structures essential for precise manipulation, and lack of effective policy training strategies to retain and exploit the acquired 3D knowledge. To address these challenges, we present MethodName, a unified representation-policy learning framework for view-generalizable robotic manipulation. MethodName introduces a single-view 3D pretraining paradigm that leverages point cloud reconstruction and feed-forward gaussian splatting under multi-view supervision to learn holistic geometric representations. During policy learning, MethodName performs multi-step distillation to preserve the pretrained geometric understanding and effectively transfer it to manipulation skills. We conduct experiments on 12 RLBench tasks, where our approach outperforms the previous state-of-the-art method by 12.7% in average success rate. Further evaluation on six representative tasks demonstrates strong zero-shot view generalization, with success rate drops of only 22.0% and 29.7% under moderate and large viewpoint shifts respectively, whereas the state-of-the-art method suffers larger decreases of 41.6% and 51.5%.
About an Automating Annotation Method for Robot Markers
Factory automation has become increasingly important due to labor shortages, leading to the introduction of autonomous mobile robots for tasks such as material transportation. Markers are commonly used for robot self-localization and object identification. In the RoboCup Logistics League (RCLL), ArUco markers are employed both for robot localization and for identifying processing modules. Conventional recognition relies on OpenCV-based image processing, which detects black-and-white marker patterns. However, these methods often fail under noise, motion blur, defocus, or varying illumination conditions. Deep-learning-based recognition offers improved robustness under such conditions, but requires large amounts of annotated data. Annotation must typically be done manually, as the type and position of objects cannot be detected automatically, making dataset preparation a major bottleneck. In contrast, ArUco markers include built-in recognition modules that provide both ID and positional information, enabling automatic annotation. This paper proposes an automated annotation method for training deep-learning models on ArUco marker images. By leveraging marker detection results obtained from the ArUco module, the proposed approach eliminates the need for manual labeling. A YOLO-based model is trained using the automatically annotated dataset, and its performance is evaluated under various conditions. Experimental results demonstrate that the proposed method improves recognition performance compared with conventional image-processing techniques, particularly for images affected by blur or defocus. Automatic annotation also reduces human effort and ensures consistent labeling quality. Future work will investigate the relationship between confidence thresholds and recognition performance.
Self-Imitated Diffusion Policy for Efficient and Robust Visual Navigation
Diffusion policies (DP) have demonstrated significant potential in visual navigation by capturing diverse multi-modal trajectory distributions. However, standard imitation learning (IL), which most DP methods rely on for training, often inherits sub-optimality and redundancy from expert demonstrations, thereby necessitating a computationally intensive "generate-then-filter" pipeline that relies on auxiliary selectors during inference. To address these challenges, we propose Self-Imitated Diffusion Policy (SIDP), a novel framework that learns improved planning by selectively imitating a set of trajectories sampled from itself. Specifically, SIDP introduces a reward-guided self-imitation mechanism that encourages the policy to consistently produce high-quality trajectories efficiently, rather than outputs of inconsistent quality, thereby reducing reliance on extensive sampling and post-filtering. During training, we employ a reward-driven curriculum learning paradigm to mitigate inefficient data utility, and goal-agnostic exploration for trajectory augmentation to improve planning robustness. Extensive evaluations on a comprehensive simulation benchmark show that SIDP significantly outperforms previous methods, with real-world experiments confirming its effectiveness across multiple robotic platforms. On Jetson Orin Nano, SIDP delivers a 2.5$\times$ faster inference than the baseline NavDP, i.e., 110ms VS 273ms, enabling efficient real-time deployment.
comment: Preprint
MTDrive: Multi-turn Interactive Reinforcement Learning for Autonomous Driving
Trajectory planning is a core task in autonomous driving, requiring the prediction of safe and comfortable paths across diverse scenarios. Integrating Multi-modal Large Language Models (MLLMs) with Reinforcement Learning (RL) has shown promise in addressing "long-tail" scenarios. However, existing methods are constrained to single-turn reasoning, limiting their ability to handle complex tasks requiring iterative refinement. To overcome this limitation, we present MTDrive, a multi-turn framework that enables MLLMs to iteratively refine trajectories based on environmental feedback. MTDrive introduces Multi-Turn Group Relative Policy Optimization (mtGRPO), which mitigates reward sparsity by computing relative advantages across turns. We further construct an interactive trajectory understanding dataset from closed-loop simulation to support multi-turn training. Experiments on the NAVSIM benchmark demonstrate superior performance compared to existing methods, validating the effectiveness of our multi-turn reasoning paradigm. Additionally, we implement system-level optimizations to reduce data transfer overhead caused by high-resolution images and multi-turn sequences, achieving 2.5x training throughput. Our data, models, and code will be made available soon.
Toward Fully Autonomous Driving: AI, Challenges, Opportunities, and Needs
Automated driving (AD) is promising, but the transition to fully autonomous driving is, among other things, subject to the real, ever-changing open world and the resulting challenges. However, research in the field of AD demonstrates the ability of artificial intelligence (AI) to outperform classical approaches, handle higher complexities, and reach a new level of autonomy. At the same time, the use of AI raises further questions of safety and transferability. To identify the challenges and opportunities arising from AI concerning autonomous driving functionalities, we have analyzed the current state of AD, outlined limitations, and identified foreseeable technological possibilities. Thereby, various further challenges are examined in the context of prospective developments. In this way, this article reconsiders fully autonomous driving with respect to advancements in the field of AI and carves out the respective needs and resulting research questions.
comment: Published in IEEE Access, 29 January 2026
Robust Rigid Body Assembly via Contact-Implicit Optimal Control with Exact Second-Order Derivatives
Efficient planning of assembly motions is a long standing challenge in the field of robotics that has been primarily tackled with reinforcement learning and sampling-based methods by using extensive physics simulations. This paper proposes a sample-efficient robust optimal control approach for the determination of assembly motions, which requires significantly less physics simulation steps during planning through the efficient use of derivative information. To this end, a differentiable physics simulation is constructed that provides second-order analytic derivatives to the numerical solver and allows one to traverse seamlessly from informative derivatives to accurate contact simulation. The solution of the physics simulation problem is made differentiable by using smoothing inspired by interior-point methods applied to both the collision detection as well as the contact resolution problem. We propose a modified variant of an optimization-based formulation of collision detection formulated as a linear program and present an efficient implementation for the nominal evaluation and corresponding first- and second-order derivatives. Moreover, a multi-scenario-based trajectory optimization problem that ensures robustness with respect to sim-to-real mismatches is derived. The capability of the considered formulation is illustrated by results where over 99\% successful executions are achieved in real-world experiments. Thereby, we carefully investigate the effect of smooth approximations of the contact dynamics and robust modeling on the success rates. Furthermore, the method's capability is tested on different peg-in-hole problems in simulation to show the benefit of using exact Hessians over commonly used Hessian approximations.
comment: Submitted to Transactions on Robotics
A Comparative Evaluation of Large Vision-Language Models for 2D Object Detection under SOTIF Conditions
Reliable environmental perception remains one of the main obstacles for safe operation of automated vehicles. Safety of the Intended Functionality (SOTIF) concerns safety risks from perception insufficiencies, particularly under adverse conditions where conventional detectors often falter. While Large Vision-Language Models (LVLMs) demonstrate promising semantic reasoning, their quantitative effectiveness for safety-critical 2D object detection is underexplored. This paper presents a systematic evaluation of ten representative LVLMs using the PeSOTIF dataset, a benchmark specifically curated for long-tail traffic scenarios and environmental degradations. Performance is quantitatively compared against the classical perception approach, a YOLO-based detector. Experimental results reveal a critical trade-off: top-performing LVLMs (e.g., Gemini 3, Doubao) surpass the YOLO baseline in recall by over 25% in complex natural scenarios, exhibiting superior robustness to visual degradation. Conversely, the baseline retains an advantage in geometric precision for synthetic perturbations. These findings highlight the complementary strengths of semantic reasoning versus geometric regression, supporting the use of LVLMs as high-level safety validators in SOTIF-oriented automated driving systems.
comment: 6 pages, 11 figures
Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment
We study offline reinforcement learning of style-conditioned policies using explicit style supervision via subtrajectory labeling functions. In this setting, aligning style with high task performance is particularly challenging due to distribution shift and inherent conflicts between style and reward. Existing methods, despite introducing numerous definitions of style, often fail to reconcile these objectives effectively. To address these challenges, we propose a unified definition of behavior style and instantiate it into a practical framework. Building on this, we introduce Style-Conditioned Implicit Q-Learning (SCIQL), which leverages offline goal-conditioned RL techniques, such as hindsight relabeling and value learning, and combine it with a new Gated Advantage Weighted Regression mechanism to efficiently optimize task performance while preserving style alignment. Experiments demonstrate that SCIQL achieves superior performance on both objectives compared to prior offline methods. Code, datasets and visuals are available in: https://sciql-iclr-2026.github.io/.
Assistive Robots and Reasonable Work Assignment Reduce Perceived Stigma toward Persons with Disabilities
Robots are becoming more prominent in assisting persons with disabilities (PwD). Whilst there is broad consensus that robots can assist in mitigating physical impairments, the extent to which they can facilitate social inclusion remains equivocal. In fact, the exposed status of assisted workers could likewise lead to reduced or increased perceived stigma by other workers. We present a vignette study on the perceived cognitive and behavioral stigma toward PwD in the workplace. We designed four experimental conditions depicting a coworker with an impairment in work scenarios: overburdened work, suitable work, and robot-assisted work only for the coworker, and an offer of robot-assisted work for everyone. Our results show that cognitive stigma is significantly reduced when the work task is adapted to the person's abilities or augmented by an assistive robot. In addition, offering robot-assisted work for everyone, in the sense of universal design, further reduces perceived cognitive stigma. Thus, we conclude that assistive robots reduce perceived cognitive stigma, thereby supporting the use of collaborative robots in work scenarios involving PwDs.
comment: 5 pages, 2 figures, Companion Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction
FlyAware: Inertia-Aware Aerial Manipulation via Vision-Based Estimation and Post-Grasp Adaptation
Aerial manipulators (AMs) are gaining increasing attention in automated transportation and emergency services due to their superior dexterity compared to conventional multirotor drones. However, their practical deployment is challenged by the complexity of time-varying inertial parameters, which are highly sensitive to payload variations and manipulator configurations. Inspired by human strategies for interacting with unknown objects, this letter presents a novel onboard framework for robust aerial manipulation. The proposed system integrates a vision-based pre-grasp inertia estimation module with a post-grasp adaptation mechanism, enabling real-time estimation and adaptation of inertial dynamics. For control, we develop an inertia-aware adaptive control strategy based on gain scheduling, and assess its robustness via frequency-domain system identification. Our study provides new insights into post-grasp control for AMs, and real-world experiments validate the effectiveness and feasibility of the proposed framework.
comment: 8 pages, 10 figures
Postural Virtual Fixtures for Ergonomic Physical Interactions with Supernumerary Robotic Bodies
Conjoined collaborative robots, functioning as supernumerary robotic bodies (SRBs), can enhance human load tolerance abilities. However, in tasks involving physical interaction with humans, users may still adopt awkward, non-ergonomic postures, which can lead to discomfort or injury over time. In this paper, we propose a novel control framework that provides kinesthetic feedback to SRB users when a non-ergonomic posture is detected, offering resistance to discourage such behaviors. This approach aims to foster long-term learning of ergonomic habits and promote proper posture during physical interactions. To achieve this, a virtual fixture method is developed, integrated with a continuous, online ergonomic posture assessment framework. Additionally, to improve coordination between the operator and the SRB, which consists of a robotic arm mounted on a floating base, the position of the floating base is adjusted as needed. Experimental results demonstrate the functionality and efficacy of the ergonomics-driven control framework, including two user studies involving practical loco-manipulation tasks with 14 subjects, comparing the proposed framework with a baseline control framework that does not account for human ergonomics.
comment: Published in The International Journal of Robotics Research
Exo-Plore: Exploring Exoskeleton Control Space through Human-aligned Simulation ICLR 2026
Exoskeletons show great promise for enhancing mobility, but providing appropriate assistance remains challenging due to the complexity of human adaptation to external forces. Current state-of-the-art approaches for optimizing exoskeleton controllers require extensive human experiments in which participants must walk for hours, creating a paradox: those who could benefit most from exoskeleton assistance, such as individuals with mobility impairments, are rarely able to participate in such demanding procedures. We present Exo-plore, a simulation framework that combines neuromechanical simulation with deep reinforcement learning to optimize hip exoskeleton assistance without requiring real human experiments. Exo-plore can (1) generate realistic gait data that captures human adaptation to assistive forces, (2) produce reliable optimization results despite the stochastic nature of human gait, and (3) generalize to pathological gaits, showing strong linear relationships between pathology severity and optimal assistance.
comment: 10 pages, 9 figures, ICLR 2026 accepted
Adapting Reinforcement Learning for Path Planning in Constrained Parking Scenarios
Real-time path planning in constrained environments remains a fundamental challenge for autonomous systems. Traditional classical planners, while effective under perfect perception assumptions, are often sensitive to real-world perception constraints and rely on online search procedures that incur high computational costs. In complex surroundings, this renders real-time deployment prohibitive. To overcome these limitations, we introduce a Deep Reinforcement Learning (DRL) framework for real-time path planning in parking scenarios. In particular, we focus on challenging scenes with tight spaces that require a high number of reversal maneuvers and adjustments. Unlike classical planners, our solution does not require ideal and structured perception, and in principle, could avoid the need for additional modules such as localization and tracking, resulting in a simpler and more practical implementation. Also, at test time, the policy generates actions through a single forward pass at each step, which is lightweight enough for real-time deployment. The task is formulated as a sequential decision-making problem grounded in a bicycle model dynamics, enabling the agent to directly learn navigation policies that respect vehicle kinematics and environmental constraints in the closed-loop setting. A new benchmark is developed to support both training and evaluation, capturing diverse and challenging scenarios. Our approach achieves state-of-the-art success rates and efficiency, surpassing classical planner baselines by +96% in success rate and +52% in efficiency. Furthermore, we release our benchmark as an open-source resource for the community to foster future research in autonomous systems. The benchmark and accompanying tools are available at https://github.com/dqm5rtfg9b-collab/Constrained_Parking_Scenarios.
RoboStriker: Hierarchical Decision-Making for Autonomous Humanoid Boxing
Achieving human-level competitive intelligence and physical agility in humanoid robots remains a major challenge, particularly in contact-rich and highly dynamic tasks such as boxing. While Multi-Agent Reinforcement Learning (MARL) offers a principled framework for strategic interaction, its direct application to humanoid control is hindered by high-dimensional contact dynamics and the absence of strong physical motion priors. We propose RoboStriker, a hierarchical three-stage framework that enables fully autonomous humanoid boxing by decoupling high-level strategic reasoning from low-level physical execution. The framework first learns a comprehensive repertoire of boxing skills by training a single-agent motion tracker on human motion capture data. These skills are subsequently distilled into a structured latent manifold, regularized by projecting the Gaussian-parameterized distribution onto a unit hypersphere. This topological constraint effectively confines exploration to the subspace of physically plausible motions. In the final stage, we introduce Latent-Space Neural Fictitious Self-Play (LS-NFSP), where competing agents learn competitive tactics by interacting within the latent action space rather than the raw motor space, significantly stabilizing multi-agent training. Experimental results demonstrate that RoboStriker achieves superior competitive performance in simulation and exhibits sim-to-real transfer. Our website is available at RoboStriker.
CARE: Multi-Task Pretraining for Latent Continuous Action Representation in Robot Control ICASSP 2026
Recent advances in Vision-Language-Action (VLA) models have shown promise for robot control, but their dependence on action supervision limits scalability and generalization. To address this challenge, we introduce CARE, a novel framework designed to train VLA models for robotic task execution. Unlike existing methods that depend on action annotations during pretraining, CARE eliminates the need for explicit action labels by leveraging only video-text pairs. These weakly aligned data sources enable the model to learn continuous latent action representations through a newly designed multi-task pretraining objective. During fine-tuning, a small set of labeled data is used to train the action head for control. Experimental results across various simulation tasks demonstrate CARE's superior success rate, semantic interpretability, and ability to avoid shortcut learning. These results underscore CARE's scalability, interpretability, and effectiveness in robotic control with weak supervision.
comment: Accepted to 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)
High-Definition 5MP Stereo Vision Sensing for Robotics
High-resolution (5MP+) stereo vision systems are essential for advancing robotic capabilities, enabling operation over longer ranges and generating significantly denser and accurate 3D point clouds. However, realizing the full potential of high-angular-resolution sensors requires a commensurately higher level of calibration accuracy and faster processing -- requirements often unmet by conventional methods. This study addresses that critical gap by processing 5MP camera imagery using a novel, advanced frame-to-frame calibration and stereo matching methodology designed to achieve both high accuracy and speed. Furthermore, we introduce a new approach to evaluate real-time performance by comparing real-time disparity maps with ground-truth disparity maps derived from more computationally intensive stereo matching algorithms. Crucially, the research demonstrates that high-pixel-count cameras yield high-quality point clouds only through the implementation of high-accuracy calibration.
Multi-agent Coordination via Flow Matching ICLR 2026
This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of the diverse joint behaviors present in offline data and (ii) the ability to act efficiently in real time. However, prior approaches often sacrifice one for the other, i.e., denoising diffusion-based solutions capture complex coordination but are computationally slow, while Gaussian policy-based solutions are fast but brittle in handling multi-agent interaction. MAC-Flow addresses this trade-off by first learning a flow-based representation of joint behaviors, and then distilling it into decentralized one-step policies that preserve coordination while enabling fast execution. Across four different benchmarks, including $12$ environments and $34$ datasets, MAC-Flow alleviates the trade-off between performance and computational cost, specifically achieving about $\boldsymbol{\times14.5}$ faster inference compared to diffusion-based MARL methods, while maintaining good performance. At the same time, its inference speed is similar to that of prior Gaussian policy-based offline multi-agent reinforcement learning (MARL) methods.
comment: ICLR 2026
Reinforcement Learning for Ballbot Navigation in Uneven Terrain
Ballbot (i.e. Ball balancing robot) navigation usually relies on methods rooted in control theory (CT), and works that apply Reinforcement learning (RL) to the problem remain rare while generally being limited to specific subtasks (e.g. balance recovery). Unlike CT based methods, RL does not require (simplifying) assumptions about environment dynamics (e.g. the absence of slippage between the ball and the floor). In addition to this increased accuracy in modeling, RL agents can easily be conditioned on additional observations such as depth-maps without the need for explicit formulations from first principles, leading to increased adaptivity. Despite those advantages, there has been little to no investigation into the capabilities, data-efficiency and limitations of RL based methods for ballbot control and navigation. Furthermore, there is a notable absence of an open-source, RL-friendly simulator for this task. In this paper, we present an open-source ballbot simulation based on MuJoCo, and show that with appropriate conditioning on exteroceptive observations as well as reward shaping, policies learned by classical model-free RL methods are capable of effectively navigating through randomly generated uneven terrain, using a reasonable amount of data (four to five hours on a system operating at 500hz). Our code is made publicly available.
comment: 6 pages, 9 figures, 2 tables. Version two corrects figure 4 and adds some experiments
Monocular pose estimation of articulated open surgery tools -- in the wild
This work presents a framework for monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, specularity, occlusions, and synthetic-to-real domain adaptation. The proposed approach consists of three main components: $(1)$ synthetic data generation pipeline that incorporates 3D scanning of surgical tools with articulation rigging and physically-based rendering; $(2)$ a tailored pose estimation framework combining tool detection with pose and articulation estimation; and $(3)$ a training strategy on synthetic and real unannotated video data, employing domain adaptation with automatically generated pseudo-labels. Evaluations conducted on real data of open surgery demonstrate the good performance and real-world applicability of the proposed framework, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.
comment: Author Accepted Manuscript (AAM)
CaLiV: LiDAR-to-Vehicle Calibration of Arbitrary Sensor Setups
In autonomous systems, sensor calibration is essential for safe and efficient navigation in dynamic environments. Accurate calibration is a prerequisite for reliable perception and planning tasks such as object detection and obstacle avoidance. Many existing LiDAR calibration methods require overlapping fields of view, while others use external sensing devices or postulate a feature-rich environment. In addition, Sensor-to-Vehicle calibration is not supported by the vast majority of calibration algorithms. In this work, we propose a novel target-based technique for extrinsic Sensor-to-Sensor and Sensor-to-Vehicle calibration of multi-LiDAR systems called CaLiV. This algorithm works for non-overlapping fields of view and does not require any external sensing devices. First, we apply motion to produce field of view overlaps and utilize a simple Unscented Kalman Filter to obtain vehicle poses. Then, we use the Gaussian mixture model-based registration framework GMMCalib to align the point clouds in a common calibration frame. Finally, we reduce the task of recovering the sensor extrinsics to a minimization problem. We show that both translational and rotational Sensor-to-Sensor errors can be solved accurately by our method. In addition, all Sensor-to-Vehicle rotation angles can also be calibrated with high accuracy. We validate the simulation results in real-world experiments. The code is open-source and available on https://github.com/TUMFTM/CaLiV.
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy
Recently, 3D vision-based diffusion policies have shown strong capability in learning complex robotic manipulation skills. However, a common architectural mismatch exists in these models: a tiny yet efficient point-cloud encoder is often paired with a massive decoder. Given a compact scene representation, we argue that this may lead to substantial parameter waste in the decoder. Motivated by this observation, we propose PocketDP3, a pocket-scale 3D diffusion policy that replaces the heavy conditional U-Net decoder used in prior methods with a lightweight Diffusion Mixer (DiM) built on MLP-Mixer blocks. This architecture enables efficient fusion across temporal and channel dimensions, significantly reducing model size. Notably, without any additional consistency distillation techniques, our method supports two-step inference without sacrificing performance, improving practicality for real-time deployment. Across three simulation benchmarks--RoboTwin2.0, Adroit, and MetaWorld--PocketDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior methods, while also accelerating inference. Real-world experiments further demonstrate the practicality and transferability of our method in real-world settings. Code will be released.
TaF-VLA: Tactile-Force Alignment in Vision-Language-Action Models for Force-aware Manipulation
Vision-Language-Action (VLA) models have recently emerged as powerful generalists for robotic manipulation. However, due to their predominant reliance on visual modalities, they fundamentally lack the physical intuition required for contact-rich tasks that require precise force regulation and physical reasoning. Existing attempts to incorporate vision-based tactile sensing into VLA models typically treat tactile inputs as auxiliary visual textures, thereby overlooking the underlying correlation between surface deformation and interaction dynamics. To bridge this gap, we propose a paradigm shift from tactile-vision alignment to tactile-force alignment. Here, we introduce TaF-VLA, a framework that explicitly grounds high-dimensional tactile observations in physical interaction forces. To facilitate this, we develop an automated tactile-force data acquisition device and curate the TaF-Dataset, comprising over 10 million synchronized tactile observations, 6-axis force/torque, and matrix force map. To align sequential tactile observations with interaction forces, the central component of our approach is the Tactile-Force Adapter (TaF-Adapter), a tactile sensor encoder that extracts discretized latent information for encoding tactile observations. This mechanism ensures that the learned representations capture history-dependent, noise-insensitive physical dynamics rather than static visual textures. Finally, we integrate this force-aligned encoder into a VLA backbone. Extensive real-world experiments demonstrate that TaF-VLA policy significantly outperforms state-of-the-art tactile-vision-aligned and vision-only baselines on contact-rich tasks, verifying its ability to achieve robust, force-aware manipulation through cross-modal physical reasoning.
comment: 17pages,9fig
MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation ICLR 2026
Temporal context is essential for robotic manipulation because such tasks are inherently non-Markovian, yet mainstream VLA models typically overlook it and struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived representations for immediate control, while the hippocampal system preserves verbatim episodic details and semantic gist of past experience for long-term memory. Inspired by these mechanisms, we propose MemoryVLA, a Cognition-Memory-Action framework for long-horizon robotic manipulation. A pretrained VLM encodes the observation into perceptual and cognitive tokens that form working memory, while a Perceptual-Cognitive Memory Bank stores low-level details and high-level semantics consolidated from it. Working memory retrieves decision-relevant entries from the bank, adaptively fuses them with current tokens, and updates the bank by merging redundancies. Using these tokens, a memory-conditioned diffusion action expert yields temporally aware action sequences. We evaluate MemoryVLA on 150+ simulation and real-world tasks across three robots. On SimplerEnv-Bridge, Fractal, LIBERO-5 suites and Mikasa-Robo, it achieves 71.9%, 72.7%, 96.5%, and 41.2% success rates, respectively, all outperforming state-of-the-art baselines CogACT and pi-0, with a notable +14.6 gain on Bridge and +11.8 gain on Mikasa-Robo. On 12 real-world tasks spanning general skills and long-horizon temporal dependencies, MemoryVLA achieves 84.0% success rate, with long-horizon tasks showing a +26 improvement over state-of-the-art baseline. Project Page: https://shihao1895.github.io/MemoryVLA
comment: ICLR 2026 | The project is available at https://shihao1895.github.io/MemoryVLA
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
The fundamental premise of Vision-Language-Action (VLA) models is to harness the extensive general capabilities of pre-trained Vision-Language Models (VLMs) for generalized embodied intelligence. However, standard robotic fine-tuning inevitably disrupts the pre-trained feature space, leading to "catastrophic forgetting" that compromises the general visual understanding we aim to leverage. To effectively utilize the uncorrupted general capabilities of VLMs for robotic tasks, we propose TwinBrainVLA, which coordinates two isomorphic VLM pathways: a frozen generalist (also called "Left Brain") and a trainable specialist (also called "Right Brain"). Our architecture utilizes a Asymmetric Mixture-of-Transformers (AsyMoT) mechanism, enabling the Right Brain to dynamically query and fuse intact semantic knowledge from the Left Brain with proprioceptive states. This fused representation conditions a flow-matching action expert for precise continuous control. Empirical results on SimplerEnv and RoboCasa benchmarks demonstrate that by explicitly retaining general capabilities, TwinBrainVLA achieves substantial performance gains over baseline models in complex manipulation tasks.
comment: GitHub: https://github.com/ZGC-EmbodyAI/TwinBrainVLA
On Your Own: Pro-level Autonomous Drone Racing in Uninstrumented Arenas
Drone technology is proliferating in many industries, including agriculture, logistics, defense, infrastructure, and environmental monitoring. Vision-based autonomy is one of its key enablers, particularly for real-world applications. This is essential for operating in novel, unstructured environments where traditional navigation methods may be unavailable. Autonomous drone racing has become the de facto benchmark for such systems. State-of-the-art research has shown that autonomous systems can surpass human-level performance in racing arenas. However, the direct applicability to commercial and field operations is still limited, as current systems are often trained and evaluated in highly controlled environments. In our contribution, the system's capabilities are analyzed within a controlled environment -- where external tracking is available for ground-truth comparison -- but also demonstrated in a challenging, uninstrumented environment -- where ground-truth measurements were never available. We show that our approach can match the performance of professional human pilots in both scenarios.
RoboArmGS: High-Quality Robotic Arm Splatting via Bézier Curve Refinement
Constructing photorealistic and controllable robotic arm digital assets from real observations is fundamental to robotic applications. Current approaches naively bind static 3D Gaussians according to URDF links, forcing them to follow an URDF-rigged motion passively. However, the idealized URDF-rigged motion cannot accurately model the actual motion captured in real-world observations, leading to severe rendering artifacts in 3D Gaussians. To address these challenges, we propose RoboArmGS, a novel hybrid representation that refines the URDF-rigged motion with learnable Bézier curves, enabling more accurate real-world motion modeling. To be more specific, we present a learnable Bézier Curve motion refiner that corrects per-joint residuals to address mismatches between real-world motion and URDF-rigged motion. RoboArmGS enables the learning of more accurate real-world motion while achieving a coherent binding of 3D Gaussians across arm parts. To support future research, we contribute a carefully collected dataset named RoboArm4D, which comprises several widely used robotic arms for evaluating the quality of building high-quality digital assets. We evaluate our approach on RoboArm4D, and RoboArmGS achieves state-of-the-art performance in real-world motion modeling and rendering quality. The code and dataset will be released.
GMOR: A Lightweight Robust Point Cloud Registration Framework via Geometric Maximum Overlapping
Point cloud registration based on correspondences computes the rigid transformation that maximizes the number of inliers constrained within the noise threshold. Current state-of-the-art (SOTA) methods employing spatial compatibility graphs or branch-and-bound (BnB) search mainly focus on registration under high outlier ratios. However, graph-based methods require at least quadratic space and time complexity for graph construction, while multi-stage BnB search methods often suffer from inaccuracy due to local optima between decomposed stages. This paper proposes a geometric maximum overlapping registration framework via rotation-only BnB search. The rigid transformation is decomposed using Chasles' theorem into a translation along rotation axis and a 2D rigid transformation. The optimal rotation axis and angle are searched via BnB, with residual parameters formulated as range maximum query (RMQ) problems. Firstly, the top-k candidate rotation axes are searched within a hemisphere parameterized by cube mapping, and the translation along each axis is estimated through interval stabbing of the correspondences projected onto that axis. Secondly, the 2D registration is relaxed to 1D rotation angle search with 2D RMQ of geometric overlapping for axis-aligned rectangles, which is solved deterministically in polynomial time using sweep line algorithm with segment tree. Experimental results on indoor 3DMatch/3DLoMatch scanning and outdoor KITTI LiDAR datasets demonstrate superior accuracy and efficiency over SOTA methods, while the time complexity is polynomial and the space complexity increases linearly with the number of points, even in the worst case.
ReGlove: A Soft Pneumatic Glove for Activities of Daily Living Assistance via Wrist-Mounted Vision
This paper presents ReGlove, a system that converts low-cost commercial pneumatic rehabilitation gloves into vision-guided assistive orthoses. Chronic upper-limb impairment affects millions worldwide, yet existing assistive technologies remain prohibitively expensive or rely on unreliable biological signals. Our platform integrates a wrist-mounted camera with an edge-computing inference engine (Raspberry Pi 5) to enable context-aware grasping without requiring reliable muscle signals. By adapting real-time YOLO-based computer vision models, the system achieves 96.73% grasp classification accuracy with sub-40.00 millisecond end-to-end latency. Physical validation using standardized benchmarks shows 82.71% success on YCB object manipulation and reliable performance across 27 Activities of Daily Living (ADL) tasks. With a total cost under $250 and exclusively commercial components, ReGlove provides a technical foundation for accessible, vision-based upper-limb assistance that could benefit populations excluded from traditional EMG-controlled devices.
VAT: Vision Action Transformer by Unlocking Full Representation of ViT
In robot learning, Vision Transformers (ViTs) are standard for visual perception, yet most methods discard valuable information by using only the final layer's features. We argue this provides an insufficient representation and propose the Vision Action Transformer (VAT), a novel architecture that is extended from ViT and unlocks the full feature hierarchy of ViT. VAT processes specialized action tokens with visual features across all transformer layers, enabling a deep and progressive fusion of perception and action generation. On a suite of simulated manipulation tasks, VAT achieves a 98.15\% average success rate across four LIBERO benchmarks, establishing a new state-of-the-art by outperforming prior methods like OpenVLA-OFT. Our work presents not only a powerful model for imitation learning but also demonstrates the critical importance of leveraging the complete ''representation trajectory'' of vision models to advance robotic policy. The GitHub URL for the project code is https://github.com/sellerbubble/VAT.
EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH, screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles. The experimental videos and more details can be found on the project website: https://equicontact.github.io/EquiContact-website/
comment: Submitted to RSS
Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Offline decision-making via diffusion models often produces trajectories that are misaligned with system dynamics, limiting their reliability for control. We propose Model Predictive Diffuser (MPDiffuser), a compositional diffusion framework that combines a diffusion planner with a dynamics diffusion model to generate task-aligned and dynamically plausible trajectories. MPDiffuser interleaves planner and dynamics updates during sampling, progressively correcting feasibility while preserving task intent. A lightweight ranking module then selects trajectories that best satisfy task objectives. The compositional design improves sample efficiency and adaptability by enabling the dynamics model to leverage diverse and previously unseen data independently of the planner. Empirically, we demonstrate consistent improvements over prior diffusion-based methods on unconstrained (D4RL) and constrained (DSRL) benchmarks, and validate practicality through deployment on a real quadrupedal robot.
Multiagent Systems
Scaling Multiagent Systems with Process Rewards
While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (1) credit assignment across agents, and (2) sample efficiency of expensive multiagent rollouts. In this work, we propose finetuning multiagent systems with per-action process rewards from AI feedback (MAPPA) to address both. Through assigning credit to individual agent actions rather than only at task completion, MAPPA enables fine-grained supervision without ground truth labels while extracting maximal training signal from each rollout. We demonstrate our approach on competition math problems and tool-augmented data analysis tasks. On unseen math problems, MAPPA achieves +5.0--17.5pp on AIME and +7.8--17.2pp on AMC. For data analysis tasks, our method improves success rate by +12.5pp while quality metrics improve by up to 30%, validating that per-action supervision can lead to improvements across different multiagent system on various domains. By addressing these challenges, our work takes a first step toward scaling multiagent systems for complex, long-horizon tasks with minimal human supervision.
MonoScale: Scaling Multi-Agent System with Monotonic Improvement
In recent years, LLM-based multi-agent systems (MAS) have advanced rapidly, using a router to decompose tasks and delegate subtasks to specialized agents. A natural way to expand capability is to scale up the agent pool by continually integrating new functional agents or tool interfaces, but naive expansion can trigger performance collapse when the router cold-starts on newly added, heterogeneous, and unreliable agents. We propose MonoScale, an expansion-aware update framework that proactively generates a small set of agent-conditioned familiarization tasks, harvests evidence from both successful and failed interactions, and distills it into auditable natural-language memory to guide future routing. We formalize sequential augmentation as a contextual bandit and perform trust-region memory updates, yielding a monotonic non-decreasing performance guarantee across onboarding rounds. Experiments on GAIA and Humanity's Last Exam show stable gains as the agent pool grows, outperforming naive scale-up and strong-router fixed-pool baselines.
Multi-Agent Systems Should be Treated as Principal-Agent Problems
Consider a multi-agent systems setup in which a principal (a supervisor agent) assigns subtasks to specialized agents and aggregates their responses into a single system-level output. A core property of such systems is information asymmetry: agents observe task-specific information, produce intermediate reasoning traces, and operate with different context windows. In isolation, such asymmetry is not problematic, since agents report truthfully to the principal when incentives are fully aligned. However, this assumption breaks down when incentives diverge. Recent evidence suggests that LLM-based agents can acquire their own goals, such as survival or self-preservation, a phenomenon known as scheming, and may deceive humans or other agents. This leads to agency loss: a gap between the principal's intended outcome and the realized system behavior. Drawing on core ideas from microeconomic theory, we argue that these characteristics, information asymmetry and misaligned goals, are best studied through the lens of principal-agent problems. We explain why multi-agent systems, both human-to-LLM and LLM-to-LLM, naturally induce information asymmetry under this formulation, and we use scheming, where LLM agents pursue covert goals, as a concrete case study. We show that recently introduced terminology used to describe scheming, such as covert subversion or deferred subversion, corresponds to well-studied concepts in the mechanism design literature, which not only characterizes the problem but also prescribes concrete mitigation strategies. More broadly, we argue for applying tools developed to study human agent behavior to the analysis of non-human agents.
LLMDR: Large language model driven framework for missing data recovery in mixed data under low resource regime
The missing data problem is one of the important issues to address for achieving data quality. While imputation-based methods are designed to achieve data completeness, their efficacy is observed to be diminishing as and when there is increasing in the missingness percentage. Further, extant approaches often struggle to handle mixed-type datasets, typically supporting either numerical and/or categorical data. In this work, we propose LLMDR, automatic data recovery framework which operates in two stage approach, wherein the Stage-I: DBSCAN clustering algorithm is employed to select the most representative samples and in the Stage-II: Multi-LLMs are employed for data recovery considering the local and global representative samples; Later, this framework invokes the consensus algorithm for recommending a more accurate value based on other LLMs of local and global effective samples. Experimental results demonstrate that proposed framework works effectively on various mixed datasets in terms of Accuracy, KS-Statistic, SMAPE, and MSE. Further, we have also shown the advantage of the consensus mechanism for final recommendation in mixed-type data.
Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support ICASSP 2026
Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable regardless of task characteristics. This limits their ability to adapt to varying reasoning demands and task complexities. In this work, we propose Task-Aware LLM Council (TALC), a task-adaptive decision framework that integrates a council of LLMs with Monte Carlo Tree Search (MCTS) to enable dynamic expert selection and efficient multi-step planning. Each LLM is equipped with a structured success memory profile derived from prior task trajectories, enabling semantic matching between current reasoning context and past successes. At each decision point, TALC routes control to the most contextually appropriate model and estimates node value using a dual-signal mechanism that fuses model-based evaluations with historical utility scores. These signals are adaptively weighted based on intra-node variance and used to guide MCTS selection, allowing the system to balance exploration depth with planning confidence. Experiments on WebShop, HumanEval, and the Game of 24 demonstrate that TALC achieves superior task success rates and improved search efficiency compared to strong baselines, validating the benefits of specialization-aware routing and adaptive planning.
comment: A shorter version of this work has been accepted by ICASSP 2026
ScholarPeer: A Context-Aware Multi-Agent Framework for Automated Peer Review
Automated peer review has evolved from simple text classification to structured feedback generation. However, current state-of-the-art systems still struggle with "surface-level" critiques: they excel at summarizing content but often fail to accurately assess novelty and significance or identify deep methodological flaws because they evaluate papers in a vacuum, lacking the external context a human expert possesses. In this paper, we introduce ScholarPeer, a search-enabled multi-agent framework designed to emulate the cognitive processes of a senior researcher. ScholarPeer employs a dual-stream process of context acquisition and active verification. It dynamically constructs a domain narrative using a historian agent, identifies missing comparisons via a baseline scout, and verifies claims through a multi-aspect Q&A engine, grounding the critique in live web-scale literature. We evaluate ScholarPeer on DeepReview-13K and the results demonstrate that ScholarPeer achieves significant win-rates against state-of-the-art approaches in side-by-side evaluations and reduces the gap to human-level diversity.
SYMPHONY: Synergistic Multi-agent Planning with Heterogeneous Language Model Assembly NeurIPS 2025
Recent advancements have increasingly focused on leveraging large language models (LLMs) to construct autonomous agents for complex problem-solving tasks. However, existing approaches predominantly employ a single-agent framework to generate search branches and estimate rewards during Monte Carlo Tree Search (MCTS) planning. This single-agent paradigm inherently limits exploration capabilities, often resulting in insufficient diversity among generated branches and suboptimal planning performance. To overcome these limitations, we propose Synergistic Multi-agent Planning with Heterogeneous langauge model assembly (SYMPHONY), a novel multi-agent planning framework that integrates a pool of heterogeneous language model-based agents. By leveraging diverse reasoning patterns across agents, SYMPHONY enhances rollout diversity and facilitates more effective exploration. Empirical results across multiple benchmark tasks show that SYMPHONY achieves strong performance even when instantiated with open-source LLMs deployable on consumer-grade hardware. When enhanced with cloud-based LLMs accessible via API, SYMPHONY demonstrates further improvements, outperforming existing state-of-the-art baselines and underscoring the effectiveness of heterogeneous multi-agent coordination in planning tasks.
comment: Accepted by NeurIPS 2025
Autonomous Data Processing using Meta-Agents
Traditional data processing pipelines are typically static and handcrafted for specific tasks, limiting their adaptability to evolving requirements. While general-purpose agents and coding assistants can generate code for well-understood data pipelines, they lack the ability to autonomously monitor, manage, and optimize an end-to-end pipeline once deployed. We present \textbf{Autonomous Data Processing using Meta-agents} (ADP-MA), a framework that dynamically constructs, executes, and iteratively refines data processing pipelines through hierarchical agent orchestration. At its core, \textit{meta-agents} analyze input data and task specifications to design a multi-phase plan, instantiate specialized \textit{ground-level agents}, and continuously evaluate pipeline performance. The architecture comprises three key components: a planning module for strategy generation, an orchestration layer for agent coordination and tool integration, and a monitoring loop for iterative evaluation and backtracking. Unlike conventional approaches, ADP-MA emphasizes context-aware optimization, adaptive workload partitioning, and progressive sampling for scalability. Additionally, the framework leverages a diverse set of external tools and can reuse previously designed agents, reducing redundancy and accelerating pipeline construction. We demonstrate ADP-MA through an interactive demo that showcases pipeline construction, execution monitoring, and adaptive refinement across representative data processing tasks.
TessPay: Verify-then-Pay Infrastructure for Trusted Agentic Commerce
The global economy is entering the era of Agentic Commerce, where autonomous agents can discover services, negotiate prices, and transact value. However adoption towards agentic commerce faces a foundational trust gap: current systems are built for direct human interactions rather than agent-driven operations. It lacks core primitives across three critical stages of agentic transactions. First, Task Delegation lacks means to translate user intent into defined scopes, discover appropriate agents, and securely authorize actions. Second, Payment Settlement for tasks is processed before execution, lacking verifiable evidence to validate the agent's work. Third, Audit Mechanisms fail to capture the full transaction lifecycle, preventing clear accountability for disputes. While emerging standards address fragments of this trust gap, there still remains a critical need for a unified infrastructure that binds the entire transaction lifecycle. To resolve this gap, we introduce TessPay, a unified infrastructure that replaces implicit trust with a 'Verify-then-Pay' architecture. It is a two plane architecture separating control and verification from settlement. TessPay operationalizes trust across four distinct stages: Before execution, agents are anchored in a canonical registry and user intent is captured as verifiable mandates, enabling stakeholder accountability. During execution, funds are locked in escrow while the agent executes the task and generates cryptographic evidence (TLS Notary, TEE etc.) to support Proof of Task Execution (PoTE). At settlement, the system verifies this evidence and releases funds only when the PoTE satisfies verification predicates; modular rail adapters ensure this PoTE-gated escrow remains chain-agnostic across heterogeneous payment rails. After settlement, TessPay preserves a tamper-evident audit trail to enable clear accountability for dispute resolution.
Label Curation Using Agentic AI
Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centric pipelines are costly, slow, and prone to annotator variability, motivating reliability-aware automated annotation. We present AURA (Agentic AI for Unified Reliability Modeling and Annotation Aggregation), an agentic AI framework for large-scale, multi-modal data annotation. AURA coordinates multiple AI agents to generate and validate labels without requiring ground truth. At its core, AURA adapts a classical probabilistic model that jointly infers latent true labels and annotator reliability via confusion matrices, using Expectation-Maximization to reconcile conflicting annotations and aggregate noisy predictions. Across the four benchmark datasets evaluated, AURA achieves accuracy improvements of up to 5.8% over baseline. In more challenging settings with poor quality annotators, the improvement is up to 50% over baseline. AURA also accurately estimates the reliability of annotators, allowing assessment of annotator quality even without any pre-validation steps.
Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth Observers
Recent advances have enabled large language model (LLM) agents to solve complex tasks by orchestrating external tools. However, these agents often struggle in specialized, tool-intensive domains that demand long-horizon execution, tight coordination across modalities, and strict adherence to implicit tool constraints. Earth Observation (EO) tasks exemplify this challenge due to the multi-modal and multi-temporal data inputs, as well as the requirements of geo-knowledge constraints (spectrum library, spatial reasoning, etc): many high-level plans can be derailed by subtle execution errors that propagate through a pipeline and invalidate final results. A core difficulty is that existing agents lack a mechanism to learn fine-grained, tool-level expertise from interaction. Without such expertise, they cannot reliably configure tool parameters or recover from mid-execution failures, limiting their effectiveness in complex EO workflows. To address this, we introduce \textbf{GeoEvolver}, a self-evolving multi-agent system~(MAS) that enables LLM agents to acquire EO expertise through structured interaction without any parameter updates. GeoEvolver decomposes each query into independent sub-goals via a retrieval-augmented multi-agent orchestrator, then explores diverse tool-parameter configurations at the sub-goal level. Successful patterns and root-cause attribution from failures are then distilled in an evolving memory bank that provides in-context demonstrations for future queries. Experiments on three tool-integrated EO benchmarks show that GeoEvolver consistently improves end-to-end task success, with an average gain of 12\% across multiple LLM backbones, demonstrating that EO expertise can emerge progressively from efficient, fine-grained interactions with the environment.
comment: 21 pages, 6 figures
ToolTok: Tool Tokenization for Efficient and Generalizable GUI Agents
Existing GUI agent models relying on coordinate-based one-step visual grounding struggle with generalizing to varying input resolutions and aspect ratios. Alternatives introduce coordinate-free strategies yet suffer from learning under severe data scarcity. To address the limitations, we propose ToolTok, a novel paradigm of multi-step pathfinding for GUI agents, where operations are modeled as a sequence of progressive tool usage. Specifically, we devise tools aligned with human interaction habits and represent each tool using learnable token embeddings. To enable efficient embedding learning under limited supervision, ToolTok introduces a semantic anchoring mechanism that grounds each tool with semantically related concepts as natural inductive bias. To further enable a pre-trained large language model to progressively acquire tool semantics, we construct an easy-to-hard curriculum consisting of three tasks: token definition question-answering, pure text-guided tool selection, and simplified visual pathfinding. Extensive experiments on multiple benchmarks show that ToolTok achieves superior performance among models of comparable scale (4B) and remains competitive with a substantially larger model (235B). Notably, these results are obtained using less than 1% of the training data required by other post-training approaches. In addition, ToolTok demonstrates strong generalization across unseen scenarios. Our training & inference code is open-source at https://github.com/ZephinueCode/ToolTok.
comment: 8 pages main paper, 18 pages total, 8 figures, 5 tables, code at https://github.com/ZephinueCode/ToolTok
Learning to flock in open space by avoiding collisions and staying together
We investigate the emergence of cohesive flocking in open, boundless space using a multi-agent reinforcement learning framework. Agents integrate positional and orientational information from their closest topological neighbours and learn to balance alignment and attractive interactions by optimizing a local cost function that penalizes both excessive separation and close-range crowding. The resulting Vicsek-like dynamics is robust to algorithmic implementation details and yields cohesive collective motion with high polar order. The optimal policy is dominated by strong aligning interactions when agents are sufficiently close to their neighbours, and a flexible combination of alignment and attraction at larger separations. We further characterize the internal structure and dynamics of the resulting groups using liquid-state metrics and neighbour exchange rates, finding qualitative agreement with empirical observations in starling flocks. These results suggest that flocking may emerge in groups of moving agents as an adaptive response to the biological imperatives of staying together while avoiding collisions.
comment: 13 pages + appendices
ATOD: An Evaluation Framework and Benchmark for Agentic Task-Oriented Dialogue Systems
Recent advances in task-oriented dialogue (TOD) systems, driven by large language models (LLMs) with extensive API and tool integration, have enabled conversational agents to coordinate interleaved goals, maintain long-horizon context, and act proactively through asynchronous execution. These capabilities extend beyond traditional TOD systems, yet existing benchmarks lack systematic support for evaluating such agentic behaviors. To address this gap, we introduce ATOD, a benchmark and synthetic dialogue generation pipeline that produces richly annotated conversations requiring long-term reasoning. ATOD captures key characteristics of advanced TOD, including multi-goal coordination, dependency management, memory, adaptability, and proactivity. Building on ATOD, we propose ATOD-Eval, a holistic evaluation framework that translates these dimensions into fine-grained metrics and supports reproducible offline and online evaluation. We further present a strong agentic memory-based evaluator for benchmarking on ATOD. Experiments show that ATOD-Eval enables comprehensive assessment across task completion, agentic capability, and response quality, and that the proposed evaluator offers a better accuracy-efficiency tradeoff compared to existing memory- and LLM-based approaches under this evaluation setting.
Multi-agent Adaptive Mechanism Design
We study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no truthful adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
Collaborative Belief Reasoning with LLMs for Efficient Multi-Agent Collaboration
Effective real-world multi-agent collaboration requires not only accurate planning but also the ability to reason about collaborators' intents--a crucial capability for avoiding miscoordination and redundant communication under partial observable environments. Due to their strong planning and reasoning capabilities, large language models (LLMs) have emerged as promising autonomous agents for collaborative task solving. However, existing collaboration frameworks for LLMs overlook their reasoning potential for dynamic intent inference, and thus produce inconsistent plans and redundant communication, reducing collaboration efficiency. To bridge this gap, we propose CoBel-World, a novel framework that equips LLM agents with a Collaborative Belief World--an internal representation jointly modeling the physical environment and collaborators' mental states. CoBel-World enables agents to parse external open-world knowledge into structured beliefs via a symbolic belief representation module, and perform zero-shot Bayesian-style belief updates through LLM reasoning. This allows agents to proactively detect potential miscoordination (e.g., conflicting plans) and communicate adaptively. Evaluated on challenging embodied benchmarks (i.e., TDW-MAT and C-WAH), CoBel-World significantly reduces communication costs by 64-79% and improves task completion efficiency by 4-28% compared to the strongest baseline. Our results show that explicit, intent-aware belief modeling is essential for efficient and human-like collaboration in LLM-based multi-agent systems.
Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning
Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but often lack demographic grounding and treat values as independent, unstructured signals, reducing consistency and interpretability. We propose OG-MAR, an Ontology-Guided Multi-Agent Reasoning framework. OG-MAR summarizes respondent-specific values from the World Values Survey (WVS) and constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions. At inference time, it retrieves ontology-consistent relations and demographically similar profiles to instantiate multiple value-persona agents, whose outputs are synthesized by a judgment agent that enforces ontology consistency and demographic proximity. Experiments on regional social-survey benchmarks across four LLM backbones show that OG-MAR improves cultural alignment and robustness over competitive baselines, while producing more transparent reasoning traces.
comment: 35 pages
Systems and Control (EESS)
Robust Control of Constrained Linear Systems using Online Convex Optimization and a Reference Governor
This article develops a control method for linear time-invariant systems subject to time-varying and a priori unknown cost functions, that satisfies state and input constraints, and is robust to exogenous disturbances. To this end, we combine the online convex optimization framework with a reference governor and a constraint tightening approach. The proposed framework guarantees recursive feasibility and robust constraint satisfaction. Its closed-loop performance is studied in terms of its dynamic regret, which is bounded linearly by the variation of the cost functions and the magnitude of the disturbances. The proposed method is illustrated by a numerical case study of a tracking control problem.
comment: Presented at 2024 IEEE 63rd Conference on Decision and Control (CDC)
Interpolation Techniques for Fast Channel Estimation in Ray Tracing
Ray tracing is increasingly utilized in wireless system simulations to estimate channel paths. In large-scale simulations with complex environments, ray tracing at high resolution can be computationally demanding. To reduce the computation, this paper presents a novel method for conducting ray tracing at a coarse set of reference points and interpolating the channels at other locations. The key insight is to interpolate the images of reflected points. In addition to the computational savings, the method directly captures the spherical nature of each wavefront enabling fast and accurate computation of channels using line-of-sight MIMO and other wide aperture techniques. Through empirical validation and comparison with exhaustive ray tracing, we demonstrate the efficacy and practicality of our approach in achieving high-fidelity channel predictions with reduced computational resources.
comment: This is the authors accepted version of a paper published in the Proceedings of the 2024 58th Asilomar Conference on Signals, Systems, and Computers
Energy Management Strategies for Electric Aircraft Charging Leveraging Active Landside Vehicle-to-Grid
The deployment of medium-range battery electric aircraft is a promising pathway to improve the environmental footprint of air mobility. Yet such a deployment would be accompanied by significant electric power requirements at airports due to aircraft charging. Given the growing prevalence of electric vehicles and their bi-directional charging capabilities--so-called vehicle-to-grid (V2G)--we study energy buffer capabilities of parked electric vehicles to alleviate pressure on grid connections. To this end, we present energy management strategies for airports providing cost-optimal apron and landside V2G charge scheduling. Specifically, we first formulate the optimal energy management problem of joint aircraft charging and landside V2G coordination as a linear program, whereby we use partial differential equations to model the aggregated charging dynamics of the electric vehicle fleet. Second, we consider a shuttle flight network with a single hub of a large Dutch airline, real-world grid prices, and synthetic parking garage occupancy data to test our framework. Our results show that V2G at even a single airport can indeed reduce energy costs to charge the aircraft fleet: Compared to a baseline scenario without V2G, the proposed concept yields cost savings of up to 32%, depending on the schedule and amount of participating vehicles, and has other potential beneficial effects on the local power grid, e.g., the reduction of potential power peaks.
Learning-Based Signal Recovery in Nonlinear Systems with Spectrally Separated Interference
Upper Mid-Band (FR3, 7-24 GHz) receivers for 6G must operate over wide bandwidths in dense spectral environments, making them particularly vulnerable to strong adjacent-band interference and front-end nonlinearities. While conventional linear receivers can suppress spectrally separated interferers under ideal hardware assumptions, receiver saturation and finite-resolution quantization cause nonlinear spectral leakage that severely degrades performance in practical wideband radios. We study the recovery of a desired signal from nonlinear receiver observations corrupted by a high-power out-of-band interferer. The receiver front-end is modeled as a smooth, memoryless nonlinearity followed by additive noise and optional quantization. To mitigate these nonlinear and quantization-induced distortions, we propose a learned multi-layer Vector Approximate Message Passing (LMLVAMP) algorithm that incorporates spectral priors with neural network based denoising. Simulation results demonstrate significant performance gains over conventional methods, particularly in high-interference regimes representative of FR3 coexistence scenarios.
Reinforcement Learning-Based Co-Design and Operation of Chiller and Thermal Energy Storage for Cost-Optimal HVAC Systems
We study the joint operation and sizing of cooling infrastructure for commercial HVAC systems using reinforcement learning, with the objective of minimizing life-cycle cost over a 30-year horizon. The cooling system consists of a fixed-capacity electric chiller and a thermal energy storage (TES) unit, jointly operated to meet stochastic hourly cooling demands under time-varying electricity prices. The life-cycle cost accounts for both capital expenditure and discounted operating cost, including electricity consumption and maintenance. A key challenge arises from the strong asymmetry in capital costs: increasing chiller capacity by one unit is far more expensive than an equivalent increase in TES capacity. As a result, identifying the right combination of chiller and TES sizes, while ensuring zero loss-of-cooling-load under optimal operation, is a non-trivial co-design problem. To address this, we formulate the chiller operation problem for a fixed infrastructure configuration as a finite-horizon Markov Decision Process (MDP), in which the control action is the chiller part-load ratio (PLR). The MDP is solved using a Deep Q Network (DQN) with a constrained action space. The learned DQN RL policy minimizes electricity cost over historical traces of cooling demand and electricity prices. For each candidate chiller-TES sizing configuration, the trained policy is evaluated. We then restrict attention to configurations that fully satisfy the cooling demand and perform a life-cycle cost minimization over this feasible set to identify the cost-optimal infrastructure design. Using this approach, we determine the optimal chiller and thermal energy storage capacities to be 700 and 1500, respectively.
comment: 11 pages, 3 figures
Degradation-Aware Frequency Regulation of a Heterogeneous Battery Fleet via Reinforcement Learning
Battery energy storage systems are increasingly deployed as fast-responding resources for grid balancing services such as frequency regulation and for mitigating renewable generation uncertainty. However, repeated charging and discharging induces cycling degradation and reduces battery lifetime. This paper studies the real-time scheduling of a heterogeneous battery fleet that collectively tracks a stochastic balancing signal subject to per-battery ramp-rate and capacity constraints, while minimizing long-term cycling degradation. Cycling degradation is fundamentally path-dependent: it is determined by charge-discharge cycles formed by the state-of-charge (SoC) trajectory and is commonly quantified via rainflow cycle counting. This non-Markovian structure makes it difficult to express degradation as an additive per-time-step cost, complicating classical dynamic programming approaches. We address this challenge by formulating the fleet scheduling problem as a Markov decision process (MDP) with constrained action space and designing a dense proxy reward that provides informative feedback at each time step while remaining aligned with long-term cycle-depth reduction. To scale learning to large state-action spaces induced by fine-grained SoC discretization and asymmetric per-battery constraints, we develop a function-approximation reinforcement learning method using an Extreme Learning Machine (ELM) as a random nonlinear feature map combined with linear temporal-difference learning. We evaluate the proposed approach on a toy Markovian signal model and on a Markovian model trained from real-world regulation signal traces obtained from the University of Delaware, and demonstrate consistent reductions in cycle-depth occurrence and degradation metrics compared to baseline scheduling policies.
comment: 11 pages, 2 figures
Approximately Optimal Multi-Stream Quickest Change Detection for Gaussian Streams
This paper considers the bandit quickest change detection problem in which one stream contains a change-point that shifts its distribution by an unknown amount in an unknown direction. We consider an agent that can observe only a single stream at each time, and the goal of the agent is to detect this change as quickly as possible while controlling for false alarms. We propose an algorithm that combines a decaying-$ε$-greedy stream switching rule with an efficient change-point detection algorithm for unknown post-change means. We provide bounds on the expected detection delay and average run length to false alarm for our algorithm, and based on these results we prove our algorithm is approximately optimal with respect to a commonly used surrogate. This work is the first to provide provable guarantees in this setting without strong assumptions such as a discretized post-change parameter set or a lower bound on the magnitude of change.
Motion Planning with Metric Temporal Logic Using Reachability Analysis and Hybrid Zonotopes
Metric temporal logic (MTL) provides a formal framework for defining time-dependent mission requirements on autonomous vehicles. However, optimizing control decisions subject to these constraints is often computationally expensive. This article presents a method that uses reachability analysis to implicitly express the set of states satisfying an MTL specification and then optimizes to find a motion plan. The hybrid zonotope set representation is used to efficiently and conveniently encode MTL specifications into reachable sets. A numerical benchmark highlights the proposed method's computational advantages as compared to existing methods in the literature. Further numerical examples and an experimental application demonstrate the ability to address time-varying environments, region-dependent disturbances, and multi-agent coordination.
SmartMeterFM: Unifying Smart Meter Data Generative Tasks Using Flow Matching Models
Smart meter data is the foundation for planning and operating the distribution network. Unfortunately, such data are not always available due to privacy regulations. Meanwhile, the collected data may be corrupted due to sensor or transmission failure, or it may not have sufficient resolution for downstream tasks. A wide range of generative tasks is formulated to address these issues, including synthetic data generation, missing data imputation, and super-resolution. Despite the success of machine learning models on these tasks, dedicated models need to be designed and trained for each task, leading to redundancy and inefficiency. In this paper, by recognizing the powerful modeling capability of flow matching models, we propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation. The proposed flow matching models are trained to generate challenging, high-dimensional time series data, specifically monthly smart meter data at a 15 min resolution. By viewing different generative tasks as distinct forms of partial data observations and injecting them into the generation process, we unify tasks such as imputation and super-resolution with a single model, eliminating the need for re-training. The data generated by our model not only are consistent with the given observations but also remain realistic, showing better performance against interpolation and other machine learning based baselines dedicated to the tasks.
comment: 10 pages, 6 figures, 6 tables
Finite-Time Accuracy of Temporal-Difference Learning Under Schur-Stable Recursions
Temporal difference (TD) learning is a cornerstone reinforcement learning (RL) method for policy evaluation, where the goal is to estimate the value function of a Markov decision process under a fixed policy. While a substantial body of work has established its convergence and stability properties, more recent efforts have focused on its statistical efficiency through finite-time error bounds. In this paper, we advance this line of research by developing a new finite-time error analysis for tabular TD learning that directly exploits a discrete-time stochastic linear system representation and leverages Schur stability of the associated matrices. Beyond the specific bounds obtained, the proposed framework provides a reusable template for analyzing TD learning and related RL algorithms, and it offers control-theoretic insights that may guide future developments in finite-sample RL theory.
comment: arXiv admin note: text overlap with arXiv:2112.14417
A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
Reinforcement learning (RL) has proven to be particularly effective in solving complex decision-making problems for a wide range of applications. From a control theory perspective, RL can be considered as an adaptive optimal control scheme. Lyapunov and barrier functions are the most commonly used certificates to guarantee system stability for a proposed/derived controller and constraint satisfaction guarantees, respectively, in control theoretic approaches. However, compared to theoretical guarantees available in control theoretic methods, RL lacks closed-loop stability of a computed policy and constraint satisfaction guarantees. Safe reinforcement learning refers to a class of constrained problems where the constraint violations lead to partial or complete system failure. The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety discussed (stability of the system in terms of a computed policy and constraint satisfaction during training and deployment). The different approaches employed are discussed in detail along with their shortcomings and benefits to provide critique and possible future research directions. Key motivation for this review is to discuss current theoretical approaches for safety and stability guarantees in RL similar to control theoretic approaches using Lyapunov and barrier functions. The review provides proven potential and promising scope of providing safety guarantees for complex dynamical systems with operational constraints using model-based and model-free RL.
comment: pages - 51, figures - 10
Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making
Offline decision-making via diffusion models often produces trajectories that are misaligned with system dynamics, limiting their reliability for control. We propose Model Predictive Diffuser (MPDiffuser), a compositional diffusion framework that combines a diffusion planner with a dynamics diffusion model to generate task-aligned and dynamically plausible trajectories. MPDiffuser interleaves planner and dynamics updates during sampling, progressively correcting feasibility while preserving task intent. A lightweight ranking module then selects trajectories that best satisfy task objectives. The compositional design improves sample efficiency and adaptability by enabling the dynamics model to leverage diverse and previously unseen data independently of the planner. Empirically, we demonstrate consistent improvements over prior diffusion-based methods on unconstrained (D4RL) and constrained (DSRL) benchmarks, and validate practicality through deployment on a real quadrupedal robot.
A Selective Quantization Tuner for ONNX Models
Quantization reduces the precision of deep neural networks to lower model size and computational demands, but often at the expense of accuracy. Fully quantized models can suffer significant accuracy degradation, and resource-constrained hardware accelerators may not support all quantized operations. A common workaround is selective quantization, where only some layers are quantized while others remain at full precision. However, determining the optimal balance between accuracy and efficiency is a challenging task. To this direction, we propose SeQTO, a framework that enables selective quantization, deployment, and execution of ONNX models on diverse CPU and GPU devices, combined with profiling and multi-objective optimization. SeQTO generates selectively quantized models, deploys them across hardware accelerators, evaluates performance on metrics such as accuracy and size, applies Pareto Front-based objective minimization to identify optimal candidates, and provides visualization of results. We evaluated SeQTO on four ONNX models under two quantization settings across CPU and GPU devices. Our results show that SeQTO effectively identifies high-quality selectively quantized models, achieving up to 54.14% lower accuracy loss while maintaining up to 98.18% of size reduction compared to fully quantized models.
comment: 5 pages, 3 figures, 2 tables
DiTOX: Fault Detection and Localization in the ONNX Optimizer
The ONNX Optimizer, part of the official ONNX repository and widely adopted for graph-level model optimizations, is used by default to optimize ONNX models. Despite its popularity, its ability to preserve model correctness has not been systematically evaluated. We present DiTOX, an automated framework for comprehensively assessing the correctness of the ONNX Optimizer using differential testing, fault localization, and evaluation techniques that generalize to other compiler optimizers. DiTOX applies optimization passes to a corpus of ONNX models, executes both original and optimized versions on user-defined inputs, and detects discrepancies in behavior or optimizer failures. When divergences are observed, DiTOX isolates the responsible optimization pass through iterative, fine-grained analysis. We evaluated DiTOX on 130 models from the ONNX Model Hub spanning vision and language tasks. We found that 9.2% of model instances crashed the optimizer or produced invalid models under default settings. Moreover, output discrepancies occurred in 30% of classification models and 16.6% of object detection and segmentation models, while text-based models were largely robust. Overall, DiTOX uncovered 15 issues -- 14 previously unknown -- affecting 9 of the 47 optimization passes as well as the optimizer infrastructure. All issues were reported to the ONNX Optimizer developers. Our results demonstrate that DiTOX provides a simple and effective approach for validating AI model optimizers and is readily extensible beyond ONNX.
comment: 13 pages, 2 figures, 4 tables
Collision-free Source Seeking and Flocking Control of Multi-agents with Connectivity Preservation
In this article, we present a distributed source-seeking and flocking control method for networked multi-agent systems with non-holonomic constraints. Based solely on identical on-board sensor systems, which measure the source local field, the group objective is attained by appointing a leader agent to seek the source while the remaining follower agents safely form a cohesive flocking with their neighbors using a distributed flocking control law in a connectivity-preserved undirected network. To guarantee safe separation and group motion for all agents and to solve the conflicts with the "cohesion" flocking rule of Reynolds, the distributed control algorithm is solved individually through feasible CBF-based optimization problem with complex constraints, which guarantees the inter-agent collision avoidance and connectivity preservation. Stability analysis of the closed-loop system is presented and the efficacy of the methods is shown in simulation results.
comment: Published in IEEE Transactions on Automatic Control
ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety
Vulnerable road users (VRUs) face high collision risks in mixed traffic, yet most existing safety systems prioritize driver or vehicle assistance over direct VRU support. This paper presents ARCAS, a real-time augmented reality (AR) collision avoidance system that provides personalized spatial alerts to VRUs via wearable AR headsets. By fusing roadside 360° 3D LiDAR with SLAM-based headset tracking and an automatic 3D calibration procedure, ARCAS accurately overlays world-locked 3D bounding boxes and directional arrows onto approaching hazards in the user's passthrough view. The system also enables multi-headset coordination through shared world anchoring. Evaluated in real-world pedestrian interactions with e-scooters and vehicles (180 trials), ARCAS nearly doubles pedestrians' time to collision and increases counterparts' reaction margins by up to 4x compared to unaided eye conditions. Results validate the feasibility and effectiveness of LiDAR-driven AR guidance and highlight the potential of wearable AR as a promising next generation safety tool for urban mobility.
comment: 8 pages, 3 figures, 1 table, accepted for IEEE Intelligent Vehicles (IV) Symposium 2026
Robotics
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
Manipulating dynamic objects remains an open challenge for Vision-Language-Action (VLA) models, which, despite strong generalization in static manipulation, struggle in dynamic scenarios requiring rapid perception, temporal anticipation, and continuous control. We present DynamicVLA, a framework for dynamic object manipulation that integrates temporal reasoning and closed-loop adaptation through three key designs: 1) a compact 0.4B VLA using a convolutional vision encoder for spatially efficient, structurally faithful encoding, enabling fast multimodal inference; 2) Continuous Inference, enabling overlapping reasoning and execution for lower latency and timely adaptation to object motion; and 3) Latent-aware Action Streaming, which bridges the perception-execution gap by enforcing temporally aligned action execution. To fill the missing foundation of dynamic manipulation data, we introduce the Dynamic Object Manipulation (DOM) benchmark, built from scratch with an auto data collection pipeline that efficiently gathers 200K synthetic episodes across 2.8K scenes and 206 objects, and enables fast collection of 2K real-world episodes without teleoperation. Extensive evaluations demonstrate remarkable improvements in response speed, perception, and generalization, positioning DynamicVLA as a unified framework for general dynamic object manipulation across embodiments.
comment: Project Page: https://www.infinitescript.com/project/dynamic-vla/ GitHub: https://github.com/hzxie/DynamicVLA
ReactEMG Stroke: Healthy-to-Stroke Few-shot Adaptation for sEMG-Based Intent Detection
Surface electromyography (sEMG) is a promising control signal for assist-as-needed hand rehabilitation after stroke, but detecting intent from paretic muscles often requires lengthy, subject-specific calibration and remains brittle to variability. We propose a healthy-to-stroke adaptation pipeline that initializes an intent detector from a model pretrained on large-scale able-bodied sEMG, then fine-tunes it for each stroke participant using only a small amount of subject-specific data. Using a newly collected dataset from three individuals with chronic stroke, we compare adaptation strategies (head-only tuning, parameter-efficient LoRA adapters, and full end-to-end fine-tuning) and evaluate on held-out test sets that include realistic distribution shifts such as within-session drift, posture changes, and armband repositioning. Across conditions, healthy-pretrained adaptation consistently improves stroke intent detection relative to both zero-shot transfer and stroke-only training under the same data budget; the best adaptation methods improve average transition accuracy from 0.42 to 0.61 and raw accuracy from 0.69 to 0.78. These results suggest that transferring a reusable healthy-domain EMG representation can reduce calibration burden while improving robustness for real-time post-stroke intent detection.
mjlab: A Lightweight Framework for GPU-Accelerated Robot Learning
We present mjlab, a lightweight, open-source framework for robot learning that combines GPU-accelerated simulation with composable environments and minimal setup friction. mjlab adopts the manager-based API introduced by Isaac Lab, where users compose modular building blocks for observations, rewards, and events, and pairs it with MuJoCo Warp for GPU-accelerated physics. The result is a framework installable with a single command, requiring minimal dependencies, and providing direct access to native MuJoCo data structures. mjlab ships with reference implementations of velocity tracking, motion imitation, and manipulation tasks.
comment: Code is available at https://github.com/mujocolab/mjlab
PocketDP3: Efficient Pocket-Scale 3D Visuomotor Policy
Recently, 3D vision-based diffusion policies have shown strong capability in learning complex robotic manipulation skills. However, a common architectural mismatch exists in these models: a tiny yet efficient point-cloud encoder is often paired with a massive decoder. Given a compact scene representation, we argue that this may lead to substantial parameter waste in the decoder. Motivated by this observation, we propose PocketDP3, a pocket-scale 3D diffusion policy that replaces the heavy conditional U-Net decoder used in prior methods with a lightweight Diffusion Mixer (DiM) built on MLP-Mixer blocks. This architecture enables efficient fusion across temporal and channel dimensions, significantly reducing model size. Notably, without any additional consistency distillation techniques, our method supports two-step inference without sacrificing performance, improving practicality for real-time deployment. Across three simulation benchmarks--RoboTwin2.0, Adroit, and MetaWorld--PocketDP3 achieves state-of-the-art performance with fewer than 1% of the parameters of prior methods, while also accelerating inference. Real-world experiments further demonstrate the practicality and transferability of our method in real-world settings. Code will be released.
Causal World Modeling for Robot Control
This work highlights that video world modeling, alongside vision-language pre-training, establishes a fresh and independent foundation for robot learning. Intuitively, video world models provide the ability to imagine the near future by understanding the causality between actions and visual dynamics. Inspired by this, we introduce LingBot-VA, an autoregressive diffusion framework that learns frame prediction and policy execution simultaneously. Our model features three carefully crafted designs: (1) a shared latent space, integrating vision and action tokens, driven by a Mixture-of-Transformers (MoT) architecture, (2) a closed-loop rollout mechanism, allowing for ongoing acquisition of environmental feedback with ground-truth observations, (3) an asynchronous inference pipeline, parallelizing action prediction and motor execution to support efficient control. We evaluate our model on both simulation benchmarks and real-world scenarios, where it shows significant promise in long-horizon manipulation, data efficiency in post-training, and strong generalizability to novel configurations. The code and model are made publicly available to facilitate the community.
comment: Project page: https://technology.robbyant.com/lingbot-va Code: https://github.com/robbyant/lingbot-va
Generalized Information Gathering Under Dynamics Uncertainty
An agent operating in an unknown dynamical system must learn its dynamics from observations. Active information gathering accelerates this learning, but existing methods derive bespoke costs for specific modeling choices: dynamics models, belief update procedures, observation models, and planners. We present a unifying framework that decouples these choices from the information-gathering cost by explicitly exposing the causal dependencies between parameters, beliefs, and controls. Using this framework, we derive a general information-gathering cost based on Massey's directed information that assumes only Markov dynamics with additive noise and is otherwise agnostic to modeling choices. We prove that the mutual information cost used in existing literature is a special case of our cost. Then, we leverage our framework to establish an explicit connection between the mutual information cost and information gain in linearized Bayesian estimation, thereby providing theoretical justification for mutual information-based active learning approaches. Finally, we illustrate the practical utility of our framework through experiments spanning linear, nonlinear, and multi-agent systems.
Macro-Scale Electrostatic Origami Motor
Foldable robots have been an active area of robotics research due to their high volume-to-mass ratio, easy packability, and shape adaptability. For locomotion, previously developed foldable robots have either embedded linear actuators in, or attached non-folding rotary motors to, their structure. Further, those actuators directly embedded in the structure of the folding medium all contributed to linear or folding motion, not to continuous rotary motion. On the macro-scale there has not yet been a folding continuous rotary actuator. This paper details the development and testing of the first macro-scale origami rotary motor that can be folded flat, and then unfurled to operate. Using corona discharge for torque production, the prototype motor achieved an expansion ratio of 2.5:1, reached a top speed of 1440 rpm when driven at -29 kV, and exhibited a maximum output torque over 0.15 mN m with an active component torque density of 0.04 Nm/kg.
MoE-ACT: Improving Surgical Imitation Learning Policies through Supervised Mixture-of-Experts
Imitation learning has achieved remarkable success in robotic manipulation, yet its application to surgical robotics remains challenging due to data scarcity, constrained workspaces, and the need for an exceptional level of safety and predictability. We present a supervised Mixture-of-Experts (MoE) architecture designed for phase-structured surgical manipulation tasks, which can be added on top of any autonomous policy. Unlike prior surgical robot learning approaches that rely on multi-camera setups or thousands of demonstrations, we show that a lightweight action decoder policy like Action Chunking Transformer (ACT) can learn complex, long-horizon manipulation from less than 150 demonstrations using solely stereo endoscopic images, when equipped with our architecture. We evaluate our approach on the collaborative surgical task of bowel grasping and retraction, where a robot assistant interprets visual cues from a human surgeon, executes targeted grasping on deformable tissue, and performs sustained retraction. We benchmark our method against state-of-the-art Vision-Language-Action (VLA) models and the standard ACT baseline. Our results show that generalist VLAs fail to acquire the task entirely, even under standard in-distribution conditions. Furthermore, while standard ACT achieves moderate success in-distribution, adopting a supervised MoE architecture significantly boosts its performance, yielding higher success rates in-distribution and demonstrating superior robustness in out-of-distribution scenarios, including novel grasp locations, reduced illumination, and partial occlusions. Notably, it generalizes to unseen testing viewpoints and also transfers zero-shot to ex vivo porcine tissue without additional training, offering a promising pathway toward in vivo deployment. To support this, we present qualitative preliminary results of policy roll-outs during in vivo porcine surgery.
Information Filtering via Variational Regularization for Robot Manipulation
Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a lightweight module that imposes a timestep-conditioned Gaussian over backbone features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation benchmarks (RoboTwin2.0, Adroit, and MetaWorld) show that, compared to the baseline DP3, our approach improves the success rate by 6.1% on RoboTwin2.0 and by 4.1% on Adroit and MetaWorld, achieving new state-of-the-art results. Real-world experiments further demonstrate that our method performs well in practical deployments. Code will released.
Multi-Modular MANTA-RAY: A Modular Soft Surface Platform for Distributed Multi-Object Manipulation
Manipulation surfaces control objects by actively deforming their shape rather than directly grasping them. While dense actuator arrays can generate complex deformations, they also introduce high degrees of freedom (DOF), increasing system complexity and limiting scalability. The MANTA-RAY (Manipulation with Adaptive Non-rigid Textile Actuation with Reduced Actuation densitY) platform addresses these challenges by leveraging a soft, fabric-based surface with reduced actuator density to manipulate fragile and heterogeneous objects. Previous studies focused on single-module implementations supported by four actuators, whereas the feasibility and benefits of a scalable, multi-module configuration remain unexplored. In this work, we present a distributed, modular, and scalable variant of the MANTA-RAY platform that maintains manipulation performance with a reduced actuator density. The proposed multi-module MANTA-RAY platform and control strategy employs object passing between modules and a geometric transformation driven PID controller that directly maps tilt-angle control outputs to actuator commands, eliminating the need for extensive data-driven or black-box training. We evaluate system performance in simulation across surface configurations of varying modules (3x3 and 4x4) and validate its feasibility through experiments on a physical 2x2 hardware prototype. The system successfully manipulates objects with diverse geometries, masses, and textures including fragile items such as eggs and apples as well as enabling parallel manipulation. The results demonstrate that the multi-module MANTA-RAY improves scalability and enables coordinated manipulation of multiple objects across larger areas, highlighting its potential for practical, real-world applications.
comment: 8 pages
LLM-Driven Scenario-Aware Planning for Autonomous Driving
Hybrid planner switching framework (HPSF) for autonomous driving needs to reconcile high-speed driving efficiency with safe maneuvering in dense traffic. Existing HPSF methods often fail to make reliable mode transitions or sustain efficient driving in congested environments, owing to heuristic scene recognition and low-frequency control updates. To address the limitation, this paper proposes LAP, a large language model (LLM) driven, adaptive planning method, which switches between high-speed driving in low-complexity scenes and precise driving in high-complexity scenes, enabling high qualities of trajectory generation through confined gaps. This is achieved by leveraging LLM for scene understanding and integrating its inference into the joint optimization of mode configuration and motion planning. The joint optimization is solved using tree-search model predictive control and alternating minimization. We implement LAP by Python in Robot Operating System (ROS). High-fidelity simulation results show that the proposed LAP outperforms other benchmarks in terms of both driving time and success rate.
GAZELOAD A Multimodal Eye-Tracking Dataset for Mental Workload in Industrial Human-Robot Collaboration
This article describes GAZELOAD, a multimodal dataset for mental workload estimation in industrial human-robot collaboration. The data were collected in a laboratory assembly testbed where 26 participants interacted with two collaborative robots (UR5 and Franka Emika Panda) while wearing Meta ARIA smart glasses. The dataset time-synchronizes eye-tracking signals (pupil diameter, fixations, saccades, eye gaze, gaze transition entropy, fixation dispersion index) with environmental real-time and continuous measurements (illuminance) and task and robot context (bench, task block, induced faults), under controlled manipulations of task difficulty and ambient conditions. For each participant and workload-graded task block, we provide CSV files with ocular metrics aggregated into 250 ms windows, environmental logs, and self-reported mental workload ratings on a 1-10 Likert scale, organized in participant-specific folders alongside documentation. These data can be used to develop and benchmark algorithms for mental workload estimation, feature extraction, and temporal modeling in realistic industrial HRC scenarios, and to investigate the influence of environmental factors such as lighting on eye-based workload markers.
Flocking behavior for dynamic and complex swarm structures
Maintaining the formation of complex structures with multiple UAVs and achieving complex trajectories remains a major challenge. This work presents an algorithm for implementing the flocking behavior of UAVs based on the concept of Virtual Centroid to easily develop a structure for the flock. The approach builds on the classical virtual-based behavior, providing a theoretical framework for incorporating enhancements to dynamically control both the number of agents and the formation of the structure. Simulation tests and real-world experiments were conducted, demonstrating its simplicity even with complex formations and complex trajectories.
Disentangling perception and reasoning for improving data efficiency in learning cloth manipulation without demonstrations
Cloth manipulation is a ubiquitous task in everyday life, but it remains an open challenge for robotics. The difficulties in developing cloth manipulation policies are attributed to the high-dimensional state space, complex dynamics, and high propensity to self-occlusion exhibited by fabrics. As analytical methods have not been able to provide robust and general manipulation policies, reinforcement learning (RL) is considered a promising approach to these problems. However, to address the large state space and complex dynamics, data-based methods usually rely on large models and long training times. The resulting computational cost significantly hampers the development and adoption of these methods. Additionally, due to the challenge of robust state estimation, garment manipulation policies often adopt an end-to-end learning approach with workspace images as input. While this approach enables a conceptually straightforward sim-to-real transfer via real-world fine-tuning, it also incurs a significant computational cost by training agents on a highly lossy representation of the environment state. This paper questions this common design choice by exploring an efficient and modular approach to RL for cloth manipulation. We show that, through careful design choices, model size and training time can be significantly reduced when learning in simulation. Furthermore, we demonstrate how the resulting simulation-trained model can be transferred to the real world. We evaluate our approach on the SoftGym benchmark and achieve significant performance improvements over available baselines on our task, while using a substantially smaller model.
comment: 6 pages, 4 figures,
CoFreeVLA: Collision-Free Dual-Arm Manipulation via Vision-Language-Action Model and Risk Estimation
Vision Language Action (VLA) models enable instruction following manipulation, yet dualarm deployment remains unsafe due to under modeled selfcollisions between arms and grasped objects. We introduce CoFreeVLA, which augments an endtoend VLA with a short horizon selfcollision risk estimator that predicts collision likelihood from proprioception, visual embeddings, and planned actions. The estimator gates risky commands, recovers to safe states via risk-guided adjustments, and shapes policy refinement for safer rollouts. It is pre-trained with model-based collision labels and posttrained on real robot rollouts for calibration. On five bimanual tasks with the PiPER robot arm, CoFreeVLA reduces selfcollisions and improves success rates versus RDT and APEX.
From Instruction to Event: Sound-Triggered Mobile Manipulation
Current mobile manipulation research predominantly follows an instruction-driven paradigm, where agents rely on predefined textual commands to execute tasks. However, this setting confines agents to a passive role, limiting their autonomy and ability to react to dynamic environmental events. To address these limitations, we introduce sound-triggered mobile manipulation, where agents must actively perceive and interact with sound-emitting objects without explicit action instructions. To support these tasks, we develop Habitat-Echo, a data platform that integrates acoustic rendering with physical interaction. We further propose a baseline comprising a high-level task planner and low-level policy models to complete these tasks. Extensive experiments show that the proposed baseline empowers agents to actively detect and respond to auditory events, eliminating the need for case-by-case instructions. Notably, in the challenging dual-source scenario, the agent successfully isolates the primary source from overlapping acoustic interference to execute the first interaction, and subsequently proceeds to manipulate the secondary object, verifying the robustness of the baseline.
AIR-VLA: Vision-Language-Action Systems for Aerial Manipulation
While Vision-Language-Action (VLA) models have achieved remarkable success in ground-based embodied intelligence, their application to Aerial Manipulation Systems (AMS) remains a largely unexplored frontier. The inherent characteristics of AMS, including floating-base dynamics, strong coupling between the UAV and the manipulator, and the multi-step, long-horizon nature of operational tasks, pose severe challenges to existing VLA paradigms designed for static or 2D mobile bases. To bridge this gap, we propose AIR-VLA, the first VLA benchmark specifically tailored for aerial manipulation. We construct a physics-based simulation environment and release a high-quality multimodal dataset comprising 3000 manually teleoperated demonstrations, covering base manipulation, object & spatial understanding, semantic reasoning, and long-horizon planning. Leveraging this platform, we systematically evaluate mainstream VLA models and state-of-the-art VLM models. Our experiments not only validate the feasibility of transferring VLA paradigms to aerial systems but also, through multi-dimensional metrics tailored to aerial tasks, reveal the capabilities and boundaries of current models regarding UAV mobility, manipulator control, and high-level planning. AIR-VLA establishes a standardized testbed and data foundation for future research in general-purpose aerial robotics. The resource of AIR-VLA will be available at https://anonymous.4open.science/r/AIR-VLA-dataset-B5CC/.
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots
The field of Embodied AI is witnessing a rapid evolution toward general-purpose robotic systems, fueled by high-fidelity simulation and large-scale data collection. However, this scaling capability remains severely bottlenecked by a reliance on labor-intensive manual oversight from intricate reward shaping to hyperparameter tuning across heterogeneous backends. Inspired by LLMs' success in software automation and science discovery, we introduce \textsc{EmboCoach-Bench}, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies. Spanning 32 expert-curated RL and IL tasks, our framework posits executable code as the universal interface. We move beyond static generation to assess a dynamic closed-loop workflow, where agents leverage environment feedback to iteratively draft, debug, and optimize solutions, spanning improvements from physics-informed reward design to policy architectures such as diffusion policies. Extensive evaluations yield three critical insights: (1) autonomous agents can qualitatively surpass human-engineered baselines by 26.5\% in average success rate; (2) agentic workflow with environment feedback effectively strengthens policy development and substantially narrows the performance gap between open-source and proprietary models; and (3) agents exhibit self-correction capabilities for pathological engineering cases, successfully resurrecting task performance from near-total failures through iterative simulation-in-the-loop debugging. Ultimately, this work establishes a foundation for self-evolving embodied intelligence, accelerating the paradigm shift from labor-intensive manual tuning to scalable, autonomous engineering in embodied AI field.
comment: 37 pages, 13 figures
Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning
Air hockey demands split-second decisions at high puck velocities, a challenge we address with a compact network of spiking neurons running on a mixed-signal analog/digital neuromorphic processor. By co-designing hardware and learning algorithms, we train the system to achieve successful puck interactions through reinforcement learning in a remarkably small number of trials. The network leverages fixed random connectivity to capture the task's temporal structure and adopts a local e-prop learning rule in the readout layer to exploit event-driven activity for fast and efficient learning. The result is real-time learning with a setup comprising a computer and the neuromorphic chip in-the-loop, enabling practical training of spiking neural networks for robotic autonomous systems. This work bridges neuroscience-inspired hardware with real-world robotic control, showing that brain-inspired approaches can tackle fast-paced interaction tasks while supporting always-on learning in intelligent machines.
IROS: A Dual-Process Architecture for Real-Time VLM-Based Indoor Navigation
Indoor mobile robot navigation requires fast responsiveness and robust semantic understanding, yet existing methods struggle to provide both. Classical geometric approaches such as SLAM offer reliable localization but depend on detailed maps and cannot interpret human-targeted cues (e.g., signs, room numbers) essential for indoor reasoning. Vision-Language-Action (VLA) models introduce semantic grounding but remain strictly reactive, basing decisions only on visible frames and failing to anticipate unseen intersections or reason about distant textual cues. Vision-Language Models (VLMs) provide richer contextual inference but suffer from high computational latency, making them unsuitable for real-time operation on embedded platforms. In this work, we present IROS, a real-time navigation framework that combines VLM-level contextual reasoning with the efficiency of lightweight perceptual modules on low-cost, on-device hardware. Inspired by Dual Process Theory, IROS separates fast reflexive decisions (System One) from slow deliberative reasoning (System Two), invoking the VLM only when necessary. Furthermore, by augmenting compact VLMs with spatial and textual cues, IROS delivers robust, human-like navigation with minimal latency. Across five real-world buildings, IROS improves decision accuracy and reduces latency by 66% compared to continuous VLM-based navigation.
Don't double it: Efficient Agent Prediction in Occlusions
Occluded traffic agents pose a significant challenge for autonomous vehicles, as hidden pedestrians or vehicles can appear unexpectedly, yet this problem remains understudied. Existing learning-based methods, while capable of inferring the presence of hidden agents, often produce redundant occupancy predictions where a single agent is identified multiple times. This issue complicates downstream planning and increases computational load. To address this, we introduce MatchInformer, a novel transformer-based approach that builds on the state-of-the-art SceneInformer architecture. Our method improves upon prior work by integrating Hungarian Matching, a state-of-the-art object matching algorithm from object detection, into the training process to enforce a one-to-one correspondence between predictions and ground truth, thereby reducing redundancy. We further refine trajectory forecasts by decoupling an agent's heading from its motion, a strategy that improves the accuracy and interpretability of predicted paths. To better handle class imbalances, we propose using the Matthews Correlation Coefficient (MCC) to evaluate occupancy predictions. By considering all entries in the confusion matrix, MCC provides a robust measure even in sparse or imbalanced scenarios. Experiments on the Waymo Open Motion Dataset demonstrate that our approach improves reasoning about occluded regions and produces more accurate trajectory forecasts than prior methods.
DexTac: Learning Contact-aware Visuotactile Policies via Hand-by-hand Teaching
For contact-intensive tasks, the ability to generate policies that produce comprehensive tactile-aware motions is essential. However, existing data collection and skill learning systems for dexterous manipulation often suffer from low-dimensional tactile information. To address this limitation, we propose DexTac, a visuo-tactile manipulation learning framework based on kinesthetic teaching. DexTac captures multi-dimensional tactile data-including contact force distributions and spatial contact regions-directly from human demonstrations. By integrating these rich tactile modalities into a policy network, the resulting contact-aware agent enables a dexterous hand to autonomously select and maintain optimal contact regions during complex interactions. We evaluate our framework on a challenging unimanual injection task. Experimental results demonstrate that DexTac achieves a 91.67% success rate. Notably, in high-precision scenarios involving small-scale syringes, our approach outperforms force-only baselines by 31.67%. These results underscore that learning multi-dimensional tactile priors from human demonstrations is critical for achieving robust, human-like dexterous manipulation in contact-rich environments.
4D-CAAL: 4D Radar-Camera Calibration and Auto-Labeling for Autonomous Driving
4D radar has emerged as a critical sensor for autonomous driving, primarily due to its enhanced capabilities in elevation measurement and higher resolution compared to traditional 3D radar. Effective integration of 4D radar with cameras requires accurate extrinsic calibration, and the development of radar-based perception algorithms demands large-scale annotated datasets. However, existing calibration methods often employ separate targets optimized for either visual or radar modalities, complicating correspondence establishment. Furthermore, manually labeling sparse radar data is labor-intensive and unreliable. To address these challenges, we propose 4D-CAAL, a unified framework for 4D radar-camera calibration and auto-labeling. Our approach introduces a novel dual-purpose calibration target design, integrating a checkerboard pattern on the front surface for camera detection and a corner reflector at the center of the back surface for radar detection. We develop a robust correspondence matching algorithm that aligns the checkerboard center with the strongest radar reflection point, enabling accurate extrinsic calibration. Subsequently, we present an auto-labeling pipeline that leverages the calibrated sensor relationship to transfer annotations from camera-based segmentations to radar point clouds through geometric projection and multi-feature optimization. Extensive experiments demonstrate that our method achieves high calibration accuracy while significantly reducing manual annotation effort, thereby accelerating the development of robust multi-modal perception systems for autonomous driving.
Nimbus: A Unified Embodied Synthetic Data Generation Framework
Scaling data volume and diversity is critical for generalizing embodied intelligence. While synthetic data generation offers a scalable alternative to expensive physical data acquisition, existing pipelines remain fragmented and task-specific. This isolation leads to significant engineering inefficiency and system instability, failing to support the sustained, high-throughput data generation required for foundation model training. To address these challenges, we present Nimbus, a unified synthetic data generation framework designed to integrate heterogeneous navigation and manipulation pipelines. Nimbus introduces a modular four-layer architecture featuring a decoupled execution model that separates trajectory planning, rendering, and storage into asynchronous stages. By implementing dynamic pipeline scheduling, global load balancing, distributed fault tolerance, and backend-specific rendering optimizations, the system maximizes resource utilization across CPU, GPU, and I/O resources. Our evaluation demonstrates that Nimbus achieves a 2-3X improvement in end-to-end throughput compared to unoptimized baselines and ensuring robust, long-term operation in large-scale distributed environments. This framework serves as the production backbone for the InternData suite, enabling seamless cross-domain data synthesis.
Spotlighting Task-Relevant Features: Object-Centric Representations for Better Generalization in Robotic Manipulation
The generalization capabilities of robotic manipulation policies are heavily influenced by the choice of visual representations. Existing approaches typically rely on representations extracted from pre-trained encoders, using two dominant types of features: global features, which summarize an entire image via a single pooled vector, and dense features, which preserve a patch-wise embedding from the final encoder layer. While widely used, both feature types mix task-relevant and irrelevant information, leading to poor generalization under distribution shifts, such as changes in lighting, textures, or the presence of distractors. In this work, we explore an intermediate structured alternative: Slot-Based Object-Centric Representations (SBOCR), which group dense features into a finite set of object-like entities. This representation permits to naturally reduce the noise provided to the robotic manipulation policy while keeping enough information to efficiently perform the task. We benchmark a range of global and dense representations against intermediate slot-based representations, across a suite of simulated and real-world manipulation tasks ranging from simple to complex. We evaluate their generalization under diverse visual conditions, including changes in lighting, texture, and the presence of distractors. Our findings reveal that SBOCR-based policies outperform dense and global representation-based policies in generalization settings, even without task-specific pretraining. These insights suggest that SBOCR is a promising direction for designing visual systems that generalize effectively in dynamic, real-world robotic environments.
Singularity-Free Lie Group Integration and Geometrically Consistent Evaluation of Multibody System Models Described in Terms of Standard Absolute Coordinates
A classical approach to the multibody systems (MBS) modeling is to use absolute coordinates, i.e., a set of (possibly redundant) coordinates that describe the absolute position and orientation of the individual bodies with respect to an inertial frame (IFR). A well-known problem for the time integration of the equations of motion (EOM) is the lack of a singularity-free parameterization of spatial motions, which is usually tackled by using unit quaternions. Lie group integration methods were proposed as an alternative approach to the singularity-free time integration. At the same time, Lie group formulations of EOM naturally respect the geometry of spatial motions during integration. Lie group integration methods, operating directly on the configuration space Lie group, are incompatible with standard formulations of the EOM, and cannot be implemented in existing MBS simulation codes without a major restructuring. The contribution of this paper is twofold: (1) A framework for interfacing Lie group integrators to standard EOM formulations is presented. It allows describing MBS in terms of various absolute coordinates and at the same using Lie group integration schemes. (2) A method for consistently incorporating the geometry of rigid body motions into the evaluation of EOM in absolute coordinates integrated with standard vector space integration schemes. The direct product group and the semidirect product group SO(3)xR3 and the semidirect product group SE(3) are used for representing rigid body motions. The key element is the local-global transitions (LGT) transition map, which facilitates the update of (global) absolute coordinates in terms of the (local) coordinates on the Lie group. This LGT map is specific to the absolute coordinates, the local coordinates on the Lie group, and the Lie group used to represent rigid body configurations.
comment: 10 pages
DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation
Adaptive navigation in unfamiliar indoor environments is crucial for household service robots. Despite advances in zero-shot perception and reasoning from vision-language models, existing navigation systems still rely on single-pass scoring at the decision layer, leading to overconfident long-horizon errors and redundant exploration. To tackle these problems, we propose Dual-Stance Cooperative Debate Navigation (DSCD-Nav), a decision mechanism that replaces one-shot scoring with stance-based cross-checking and evidence-aware arbitration to improve action reliability under partial observability. Specifically, given the same observation and candidate action set, we explicitly construct two stances by conditioning the evaluation on diverse and complementary objectives: a Task-Scene Understanding (TSU) stance that prioritizes goal progress from scene-layout cues, and a Safety-Information Balancing (SIB) stance that emphasizes risk and information value. The stances conduct a cooperative debate and make policy by cross-checking their top candidates with cue-grounded arguments. Then, a Navigation Consensus Arbitration (NCA) agent is employed to consolidate both sides' reasons and evidence, optionally triggering lightweight micro-probing to verify uncertain choices, preserving NCA's primary intent while disambiguating. Experiments on HM3Dv1, HM3Dv2, and MP3D demonstrate consistent improvements in success and path efficiency while reducing exploration redundancy.
Towards Space-Based Environmentally-Adaptive Grasping
Robotic manipulation in unstructured environments requires reliable execution under diverse conditions, yet many state-of-the-art systems still struggle with high-dimensional action spaces, sparse rewards, and slow generalization beyond carefully curated training scenarios. We study these limitations through the example of grasping in space environments. We learn control policies directly in a learned latent manifold that fuses (grammarizes) multiple modalities into a structured representation for policy decision-making. Building on GPU-accelerated physics simulation, we instantiate a set of single-shot manipulation tasks and achieve over 95% task success with Soft Actor-Critic (SAC)-based reinforcement learning in less than 1M environment steps, under continuously varying grasping conditions from step 1. This empirically shows faster convergence than representative state-of-the-art visual baselines under the same open-loop single-shot conditions. Our analysis indicates that explicitly reasoning in latent space yields more sample-efficient learning and improved robustness to novel object and gripper geometries, environmental clutter, and sensor configurations compared to standard baselines. We identify remaining limitations and outline directions toward fully adaptive and generalizable grasping in the extreme conditions of space.
Towards Bridging the Gap between Large-Scale Pretraining and Efficient Finetuning for Humanoid Control ICLR 2026
Reinforcement learning (RL) is widely used for humanoid control, with on-policy methods such as Proximal Policy Optimization (PPO) enabling robust training via large-scale parallel simulation and, in some cases, zero-shot deployment to real robots. However, the low sample efficiency of on-policy algorithms limits safe adaptation to new environments. Although off-policy RL and model-based RL have shown improved sample efficiency, the gap between large-scale pretraining and efficient finetuning on humanoids still exists. In this paper, we find that off-policy Soft Actor-Critic (SAC), with large-batch update and a high Update-To-Data (UTD) ratio, reliably supports large-scale pretraining of humanoid locomotion policies, achieving zero-shot deployment on real robots. For adaptation, we demonstrate that these SAC-pretrained policies can be finetuned in new environments and out-of-distribution tasks using model-based methods. Data collection in the new environment executes a deterministic policy while stochastic exploration is instead confined to a physics-informed world model. This separation mitigates the risks of random exploration during adaptation while preserving exploratory coverage for improvement. Overall, the approach couples the wall-clock efficiency of large-scale simulation during pretraining with the sample efficiency of model-based learning during fine-tuning.
comment: ICLR 2026
HPTune: Hierarchical Proactive Tuning for Collision-Free Model Predictive Control ICASSP 2026
Parameter tuning is a powerful approach to enhance adaptability in model predictive control (MPC) motion planners. However, existing methods typically operate in a myopic fashion that only evaluates executed actions, leading to inefficient parameter updates due to the sparsity of failure events (e.g., obstacle nearness or collision). To cope with this issue, we propose to extend evaluation from executed to non-executed actions, yielding a hierarchical proactive tuning (HPTune) framework that combines both a fast-level tuning and a slow-level tuning. The fast one adopts risk indicators of predictive closing speed and predictive proximity distance, and the slow one leverages an extended evaluation loss for closed-loop backpropagation. Additionally, we integrate HPTune with the Doppler LiDAR that provides obstacle velocities apart from position-only measurements for enhanced motion predictions, thus facilitating the implementation of HPTune. Extensive experiments on high-fidelity simulator demonstrate that HPTune achieves efficient MPC tuning and outperforms various baseline schemes in complex environments. It is found that HPTune enables situation-tailored motion planning by formulating a safe, agile collision avoidance strategy.
comment: Accepted by IEEE ICASSP 2026
Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter
We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.
comment: Accepted at L4DC 2026
Abstracting Robot Manipulation Skills via Mixture-of-Experts Diffusion Policies
Diffusion-based policies have recently shown strong results in robot manipulation, but their extension to multi-task scenarios is hindered by the high cost of scaling model size and demonstrations. We introduce Skill Mixture-of-Experts Policy (SMP), a diffusion-based mixture-of-experts policy that learns a compact orthogonal skill basis and uses sticky routing to compose actions from a small, task-relevant subset of experts at each step. A variational training objective supports this design, and adaptive expert activation at inference yields fast sampling without oversized backbones. We validate SMP in simulation and on a real dual-arm platform with multi-task learning and transfer learning tasks, where SMP achieves higher success rates and markedly lower inference cost than large diffusion baselines. These results indicate a practical path toward scalable, transferable multi-task manipulation: learn reusable skills once, activate only what is needed, and adapt quickly when tasks change.
Disturbance-Aware Flight Control of Robotic Gliding Blimp via Moving Mass Actuation
Robotic blimps, as lighter-than-air (LTA) aerial systems, offer long endurance and inherently safe operation but remain highly susceptible to wind disturbances. Building on recent advances in moving mass actuation, this paper addresses the lack of disturbance-aware control frameworks for LTA platforms by explicitly modeling and compensating for wind-induced effects. A moving horizon estimator (MHE) infers real-time wind perturbations and provides these estimates to a model predictive controller (MPC), enabling robust trajectory and heading regulation under varying wind conditions. The proposed approach leverages a two-degree-of-freedom (2-DoF) moving-mass mechanism to generate both inertial and aerodynamic moments for attitude and heading control, thereby enhancing flight stability in disturbance-prone environments. Extensive flight experiments under headwind and crosswind conditions show that the integrated MHE-MPC framework significantly outperforms baseline PID control, demonstrating its effectiveness for disturbance-aware LTA flight.
InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios
With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.
comment: 15 pages, 7 figures
WheelArm-Sim: A Manipulation and Navigation Combined Multimodal Synthetic Data Generation Simulator for Unified Control in Assistive Robotics
Wheelchairs and robotic arms enhance independent living by assisting individuals with upper-body and mobility limitations in their activities of daily living (ADLs). Although recent advancements in assistive robotics have focused on Wheelchair-Mounted Robotic Arms (WMRAs) and wheelchairs separately, integrated and unified control of the combination using machine learning models remains largely underexplored. To fill this gap, we introduce the concept of WheelArm, an integrated cyber-physical system (CPS) that combines wheelchair and robotic arm controls. Data collection is the first step toward developing WheelArm models. In this paper, we present WheelArm-Sim, a simulation framework developed in Isaac Sim for synthetic data collection. We evaluate its capability by collecting a manipulation and navigation combined multimodal dataset, comprising 13 tasks, 232 trajectories, and 67,783 samples. To demonstrate the potential of the WheelArm dataset, we implement a baseline model for action prediction in the mustard-picking task. The results illustrate that data collected from WheelArm-Sim is feasible for a data-driven machine learning model for integrated control.
comment: Accepted to IEEE International Symposium on Medical Robotics (ISMR) 2026
Accurate Pedestrian Tracking in Urban Canyons: A Multi-Modal Fusion Approach
The contribution describes a pedestrian navigation approach designed to improve localization accuracy in urban environments where GNSS performance is degraded, a problem that is especially critical for blind or low-vision users who depend on precise guidance such as identifying the correct side of a street. To address GNSS limitations and the impracticality of camera-based visual positioning, the work proposes a particle filter based fusion of GNSS and inertial data that incorporates spatial priors from maps, such as impassable buildings and unlikely walking areas, functioning as a probabilistic form of map matching. Inertial localization is provided by the RoNIN machine learning method, and fusion with GNSS is achieved by weighting particles based on their consistency with GNSS estimates and uncertainty. The system was evaluated on six challenging walking routes in downtown San Francisco using three metrics related to sidewalk correctness and localization error. Results show that the fused approach (GNSS+RoNIN+PF) significantly outperforms GNSS only localization on most metrics, while inertial-only localization with particle filtering also surpasses GNSS alone for critical measures such as sidewalk assignment and across street error.
Plant-Inspired Robot Design Metaphors for Ambient HRI
Plants offer a paradoxical model for interaction: they are ambient, low-demand presences that nonetheless shape atmosphere, routines, and relationships through temporal rhythms and subtle expressions. In contrast, most human-robot interaction (HRI) has been grounded in anthropomorphic and zoomorphic paradigms, producing overt, high-demand forms of engagement. Using a Research through Design (RtD) methodology, we explore plants as metaphoric inspiration for HRI; we conducted iterative cycles of ideation, prototyping, and reflection to investigate what design primitives emerge from plant metaphors and morphologies, and how these primitives can be combined into expressive robotic forms. We present a suite of speculative, open-source prototypes that help probe plant-inspired presence, temporality, form, and gestures. We deepened our learnings from design and prototyping through prototype-centered workshops that explored people's perceptions and imaginaries of plant-inspired robots. This work contributes: (1) Set of plant-inspired robotic artifacts; (2) Designerly insights on how people perceive plant-inspired robots; and (3) Design consideration to inform how to use plant metaphors to reshape HRI.
Lantern: A Minimalist Robotic Object Platform
Robotic objects are simple actuated systems that subtly blend into human environments. We design and introduce Lantern, a minimalist robotic object platform to enable building simple robotic artifacts. We conducted in-depth design and engineering iterations of Lantern's mechatronic architecture to meet specific design goals while maintaining a low build cost (~40 USD). As an extendable, open-source platform, Lantern aims to enable exploration of a range of HRI scenarios by leveraging human tendency to assign social meaning to simple forms. To evaluate Lantern's potential for HRI, we conducted a series of explorations: 1) a co-design workshop, 2) a sensory room case study, 3) distribution to external HRI labs, 4) integration into a graduate-level HRI course, and 5) public exhibitions with older adults and children. Our findings show that Lantern effectively evokes engagement, can support versatile applications ranging from emotion regulation to focused work, and serves as a viable platform for lowering barriers to HRI as a field.
PoSafeNet: Safe Learning with Poset-Structured Neural Nets
Safe learning is essential for deploying learningbased controllers in safety-critical robotic systems, yet existing approaches often enforce multiple safety constraints uniformly or via fixed priority orders, leading to infeasibility and brittle behavior. In practice, safety requirements are heterogeneous and admit only partial priority relations, where some constraints are comparable while others are inherently incomparable. We formalize this setting as poset-structured safety, modeling safety constraints as a partially ordered set and treating safety composition as a structural property of the policy class. Building on this formulation, we propose PoSafeNet, a differentiable neural safety layer that enforces safety via sequential closed-form projection under poset-consistent constraint orderings, enabling adaptive selection or mixing of valid safety executions while preserving priority semantics by construction. Experiments on multi-obstacle navigation, constrained robot manipulation, and vision-based autonomous driving demonstrate improved feasibility, robustness, and scalability over unstructured and differentiable quadratic program-based safety layers.
ReloPush-BOSS: Optimization-guided Nonmonotone Rearrangement Planning for a Car-like Robot Pusher
We focus on multi-object rearrangement planning in densely cluttered environments using a car-like robot pusher. The combination of kinematic, geometric and physics constraints underlying this domain results in challenging nonmonotone problem instances which demand breaking each manipulation action into multiple parts to achieve a desired object rearrangement. Prior work tackles such instances by planning prerelocations, temporary object displacements that enable constraint satisfaction, but deciding where to prerelocate remains difficult due to local minima leading to infeasible or high-cost paths. Our key insight is that these minima can be avoided by steering a prerelocation optimization toward low-cost regions informed by Dubins path classification. These optimized prerelocations are integrated into an object traversability graph that encodes kinematic, geometric, and pushing constraints. Searching this graph in a depth-first fashion results in efficient, feasible rearrangement sequences. Across a series of densely cluttered scenarios with up to 13 objects, our framework, ReloPush-BOSS, exhibits consistently highest success rates and shortest pushing paths compared to state-of-the-art baselines. Hardware experiments on a 1/10 car-like pusher demonstrate the robustness of our approach. Code and footage from our experiments can be found at: https://fluentrobotics.com/relopushboss.
comment: Preprint of final version, accepted to RA-L 2026
Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
Game-Based and Gamified Robotics Education: A Comparative Systematic Review and Design Guidelines
Robotics education fosters computational thinking, creativity, and problem-solving, but remains challenging due to technical complexity. Game-based learning (GBL) and gamification offer engagement benefits, yet their comparative impact remains unclear. We present the first PRISMA-aligned systematic review and comparative synthesis of GBL and gamification in robotics education, analyzing 95 studies from 12,485 records across four databases (2014-2025). We coded each study's approach, learning context, skill level, modality, pedagogy, and outcomes (k = .918). Three patterns emerged: (1) approach-context-pedagogy coupling (GBL more prevalent in informal settings, while gamification dominated formal classrooms [p < .001] and favored project-based learning [p = .009]); (2) emphasis on introductory programming and modular kits, with limited adoption of advanced software (~17%), advanced hardware (~5%), or immersive technologies (~22%); and (3) short study horizons, relying on self-report. We propose eight research directions and a design space outlining best practices and pitfalls, offering actionable guidance for robotics education.
comment: Accepted for publication at Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems. 26 pages, 14 figures, 7 tables;
Advanced techniques and applications of LiDAR Place Recognition in Agricultural Environments: A Comprehensive Survey
An optimal solution to the localization problem is essential for developing autonomous robotic systems. Apart from autonomous vehicles, precision agriculture is one of the elds that can bene t most from these systems. Although LiDAR place recognition is a widely used technique in recent years to achieve accurate localization, it is mostly used in urban settings. However, the lack of distinctive features and the unstructured nature of agricultural environments make place recognition challenging. This work presents a comprehensive review of state-of-the-art the latest deep learning applications for agricultural environments and LPR techniques. We focus on the challenges that arise in these environments. We analyze the existing approaches, datasets, and metrics used to evaluate LPR system performance and discuss the limitations and future directions of research in this eld. This is the rst survey that focuses on LiDAR based localization in agricultural settings, with the aim of providing a thorough understanding and fostering further research in this specialized domain.
AsterNav: Autonomous Aerial Robot Navigation In Darkness Using Passive Computation
Autonomous aerial navigation in absolute darkness is crucial for post-disaster search and rescue operations, which often occur from disaster-zone power outages. Yet, due to resource constraints, tiny aerial robots, perfectly suited for these operations, are unable to navigate in the darkness to find survivors safely. In this paper, we present an autonomous aerial robot for navigation in the dark by combining an Infra-Red (IR) monocular camera with a large-aperture coded lens and structured light without external infrastructure like GPS or motion-capture. Our approach obtains depth-dependent defocus cues (each structured light point appears as a pattern that is depth dependent), which acts as a strong prior for our AsterNet deep depth estimation model. The model is trained in simulation by generating data using a simple optical model and transfers directly to the real world without any fine-tuning or retraining. AsterNet runs onboard the robot at 20 Hz on an NVIDIA Jetson Orin$^\text{TM}$ Nano. Furthermore, our network is robust to changes in the structured light pattern and relative placement of the pattern emitter and IR camera, leading to simplified and cost-effective construction. We successfully evaluate and demonstrate our proposed depth navigation approach AsterNav using depth from AsterNet in many real-world experiments using only onboard sensing and computation, including dark matte obstacles and thin ropes (diameter 6.25mm), achieving an overall success rate of 95.5% with unknown object shapes, locations and materials. To the best of our knowledge, this is the first work on monocular, structured-light-based quadrotor navigation in absolute darkness.
comment: 8 pages, 10 figures, Published in IEEE Robotics And Automation Letters
EROAM: Event-based Camera Rotational Odometry and Mapping in Real-time
This paper presents EROAM, a novel event-based rotational odometry and mapping system that achieves real-time, accurate camera rotation estimation. Unlike existing approaches that rely on event generation models or contrast maximization, EROAM employs a spherical event representation by projecting events onto a unit sphere and introduces Event Spherical Iterative Closest Point (ES-ICP), a novel geometric optimization framework designed specifically for event camera data. The spherical representation simplifies rotational motion formulation while operating in a continuous spherical domain, enabling enhanced spatial resolution. Our system features an efficient map management approach using incremental k-d tree structures and intelligent regional density control, ensuring optimal computational performance during long-term operation. Combined with parallel point-to-line optimization, EROAM achieves efficient computation without compromising accuracy. Extensive experiments on both synthetic and real-world datasets show that EROAM significantly outperforms state-of-the-art methods in terms of accuracy, robustness, and computational efficiency. Our method maintains consistent performance under challenging conditions, including high angular velocities and extended sequences, where other methods often fail or show significant drift. Additionally, EROAM produces high-quality panoramic reconstructions with preserved fine structural details.
comment: Accepted by IEEE Transactions on Robotics (T-RO), 2026. Project page: https://wlxing1901.github.io/eroam/
SKETCH: Semantic Key-Point Conditioning for Long-Horizon Vessel Trajectory Prediction
Accurate long-horizon vessel trajectory prediction remains challenging due to compounded uncertainty from complex navigation behaviors and environmental factors. Existing methods often struggle to maintain global directional consistency, leading to drifting or implausible trajectories when extrapolated over long time horizons. To address this issue, we propose a semantic-key-point-conditioned trajectory modeling framework, in which future trajectories are predicted by conditioning on a high-level Next Key Point (NKP) that captures navigational intent. This formulation decomposes long-horizon prediction into global semantic decision-making and local motion modeling, effectively restricting the support of future trajectories to semantically feasible subsets. To efficiently estimate the NKP prior from historical observations, we adopt a pretrain-finetune strategy. Extensive experiments on real-world AIS data demonstrate that the proposed method consistently outperforms state-of-the-art approaches, particularly for long travel durations, directional accuracy, and fine-grained trajectory prediction.
OMP: One-step Meanflow Policy with Directional Alignment
Robot manipulation has increasingly adopted data-driven generative policy frameworks, yet the field faces a persistent trade-off: diffusion models suffer from high inference latency, while flow-based methods often require complex architectural constraints. Although in image generation domain, the MeanFlow paradigm offers a path to single-step inference, its direct application to robotics is impeded by critical theoretical pathologies, specifically spectral bias and gradient starvation in low-velocity regimes. To overcome these limitations, we propose the One-step MeanFlow Policy (OMP), a novel framework designed for high-fidelity, real-time manipulation. We introduce a lightweight directional alignment mechanism to explicitly synchronize predicted velocities with true mean velocities. Furthermore, we implement a Differential Derivation Equation (DDE) to approximate the Jacobian-Vector Product (JVP) operator, which decouples forward and backward passes to significantly reduce memory complexity. Extensive experiments on the Adroit and Meta-World benchmarks demonstrate that OMP outperforms state-of-the-art methods in success rate and trajectory accuracy, particularly in high-precision tasks, while retaining the efficiency of single-step generation.
FLARE: Agile Flights for Quadrotor Cable-Suspended Payload System via Reinforcement Learning
Agile flight for the quadrotor cable-suspended payload system is a formidable challenge due to its underactuated, highly nonlinear, and hybrid dynamics. Traditional optimization-based methods often struggle with high computational costs and the complexities of cable mode transitions, limiting their real-time applicability and maneuverability exploitation. In this letter, we present FLARE, a reinforcement learning (RL) framework that directly learns agile navigation policy from high-fidelity simulation. Our method is validated across three designed challenging scenarios, notably outperforming a state-of-the-art optimization-based approach by a 3x speedup during gate traversal maneuvers. Furthermore, the learned policies achieve successful zero-shot sim-to-real transfer, demonstrating remarkable agility and safety in real-world experiments, running in real time on an onboard computer.
Visual Localization via Semantic Structures in Autonomous Photovoltaic Power Plant Inspection
Inspection systems utilizing unmanned aerial vehicles (UAVs) equipped with thermal cameras are increasingly popular for the maintenance of photovoltaic (PV) power plants. However, automation of the inspection task is a challenging problem as it requires precise navigation to capture images from optimal distances and viewing angles. This paper presents a novel localization pipeline that directly integrates PV module detection with UAV navigation, allowing precise positioning during inspection. The detections are used to identify the power plant structures in the image. These are associated with the power plant model and used to infer the UAV position relative to the inspected PV installation. We define visually recognizable anchor points for the initial association and use object tracking to discern global associations. Additionally, we present three different methods for visual segmentation of PV modules and evaluate their performance in relation to the proposed localization pipeline. The presented methods were verified and evaluated using custom aerial inspection data sets, demonstrating their robustness and applicability for real-time navigation. Additionally, we evaluate the influence of the power plant model precision on the localization methods.
comment: 50 pages, 23 figures. Submitted for review to Array
Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting
While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup," where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that equips frozen VLAs with top-down selective attention. VAP treats the reference images as a non-parametric visual memory, grounds the personal object in the scene through open-vocabulary detection and embedding-based matching, and then injects this grounding as a visual prompt by highlighting the object and rewriting the instruction. We construct two simulation benchmarks, Personalized-SIMPLER and Personalized-VLABench, and a real-world tabletop benchmark to evaluate personalized manipulation across multiple robots and tasks. Experiments show that VAP consistently outperforms generic policies and token-learning baselines in both success rate and correct-object manipulation, helping to bridge the gap between semantic understanding and instance-level control.
comment: Project page with videos and code: https://vap-project.github.io/
Virtual Reflections on a Dynamic 2D Eye Model Improve Spatial Reference Identification
The visible orientation of human eyes creates some transparency about people's spatial attention and other mental states. This leads to a dual role of the eyes as a means of sensing and communication. Accordingly, artificial eye models are being explored as communication media in human-machine interaction scenarios. One challenge in the use of eye models for communication consists of resolving spatial reference ambiguities, especially for screen-based models. To address this challenge, we introduce an approach that incorporates reflection-like features that are contingent on the movements of artificial eyes. We conducted a user study with 30 participants in which participants had to use spatial references provided by dynamic eye models to advance in a fast-paced group interaction task. Compared to a non-reflective eye model and a pure reflection mode, the superimposition of screen-based eyes with gaze-contingent virtual reflections resulted in a higher identification accuracy and user experience, suggesting a synergistic benefit.
comment: This article has been accepted for publication in IEEE Transactions on Human-Machine Systems. Citation information: DOI 10.1109/THMS.2026.3651818
Industrial Internet Robot Collaboration System and Edge Computing Optimization
In industrial Internet environments, mobile robots must generate collision-free global routes under stochastic obstacle layouts and random perturbations in commanded linear and angular velocities. This paper models a differential-drive robot with nonholonomic constraints, then decomposes motion into obstacle avoidance, target turning, and target approaching behaviors to parameterize the control variables. Global path planning is formulated as a constrained optimization problem and converted into a weighted energy function that balances path length and collision penalties. A three-layer neural network represents the planning model, while simulated annealing searches for near-global minima and mitigates local traps. During execution, a fuzzy controller uses heading and lateral-offset errors to output wheel-speed differentials for rapid correction; edge-side computation is discussed to reduce robot-server traffic and latency. Matlab 2024 simulations report deviation within +-5 cm, convergence within 10 ms, and shorter paths than two baseline methods. The approach improves robustness of global navigation in practice.
Large-Scale Autonomous Gas Monitoring for Volcanic Environments: A Legged Robot on Mount Etna
Volcanic gas emissions are key precursors of eruptive activity. Yet, obtaining accurate near-surface measurements remains hazardous and logistically challenging, motivating the need for autonomous solutions. Limited mobility in rough volcanic terrain has prevented wheeled systems from performing reliable in situ gas measurements, reducing their usefulness as sensing platforms. We present a legged robotic system for autonomous volcanic gas analysis, utilizing the quadruped ANYmal, equipped with a quadrupole mass spectrometer system. Our modular autonomy stack integrates a mission planning interface, global planner, localization framework, and terrain-aware local navigation. We evaluated the system on Mount Etna across three autonomous missions in varied terrain, achieving successful gas-source detections with autonomy rates of 93-100%. In addition, we conducted a teleoperated mission in which the robot measured natural fumaroles, detecting sulfur dioxide and carbon dioxide. We discuss lessons learned from the gas-analysis and autonomy perspectives, emphasizing the need for adaptive sensing strategies, tighter integration of global and local planning, and improved hardware design.
comment: This work has been submitted to the IEEE for possible publication. Submitted to IEEE Robotics & Automation Magazine (RAM)
When Context Is Not Enough: Modeling Unexplained Variability in Car-Following Behavior
Modeling car-following behavior is fundamental to microscopic traffic simulation, yet traditional deterministic models often fail to capture the full extent of variability and unpredictability in human driving. While many modern approaches incorporate context-aware inputs (e.g., spacing, speed, relative speed), they frequently overlook structured stochasticity that arises from latent driver intentions, perception errors, and memory effects -- factors that are not directly observable from context alone. To fill the gap, this study introduces an interpretable stochastic modeling framework that captures not only context-dependent dynamics but also residual variability beyond what context can explain. Leveraging deep neural networks integrated with nonstationary Gaussian processes (GPs), our model employs a scenario-adaptive Gibbs kernel to learn dynamic temporal correlations in acceleration decisions, where the strength and duration of correlations between acceleration decisions evolve with the driving context. This formulation enables a principled, data-driven quantification of uncertainty in acceleration, speed, and spacing, grounded in both observable context and latent behavioral variability. Comprehensive experiments on the naturalistic vehicle trajectory dataset collected from the German highway, i.e., the HighD dataset, demonstrate that the proposed stochastic simulation method within this framework surpasses conventional methods in both predictive performance and interpretable uncertainty quantification. The integration of interpretability and accuracy makes this framework a promising tool for traffic analysis and safety-critical applications.
comment: Accepted to ISTTT26
Listen, Look, Drive: Coupling Audio Instructions for User-aware VLA-based Autonomous Driving
Vision Language Action (VLA) models promise an open-vocabulary interface that can translate perceptual ambiguity into semantically grounded driving decisions, yet they still treat language as a static prior fixed at inference time. As a result, the model must infer continuously shifting objectives from pixels alone, yielding delayed or overly conservative maneuvers. We argue that effective VLAs for autonomous driving need an online channel in which users can influence driving with specific intentions. To this end, we present EchoVLA, a user-aware VLA that couples camera streams with in situ audio instructions. We augment the nuScenes dataset with temporally aligned, intent-specific speech commands generated by converting ego-motion descriptions into synthetic audios. Further, we compose emotional speech-trajectory pairs into a multimodal Chain-of-Thought (CoT) for fine-tuning a Multimodal Large Model (MLM) based on Qwen2.5-Omni. Specifically, we synthesize the audio-augmented dataset with different emotion types paired with corresponding driving behaviors, leveraging the emotional cues embedded in tone, pitch, and speech tempo to reflect varying user states, such as urgent or hesitant intentions, thus enabling our EchoVLA to interpret not only the semantic content but also the emotional context of audio commands for more nuanced and emotionally adaptive driving behavior. In open-loop benchmarks, our approach reduces the average L2 error by $59.4\%$ and the collision rate by $74.4\%$ compared to the baseline of vision-only perception. More experiments on nuScenes dataset validate that EchoVLA not only steers the trajectory through audio instructions, but also modulates driving behavior in response to the emotions detected in the user's speech.
comment: Accepted by IV
Designing Effective Human-Swarm Interaction Interfaces: Insights from a User Study on Task Performance
In this paper, we present a systematic method of design for human-swarm interaction interfaces, combining theoretical insights with empirical evaluation. We first derived ten design principles from existing literature, applying them to key information dimensions identified through goal-directed task analysis and developed a tablet-based interface for a target search task. We then conducted a user study with 31 participants where humans were required to guide a robotic swarm to a target in the presence of three types of hazards that pose a risk to the robots: Distributed, Moving, and Spreading. Performance was measured based on the proximity of the robots to the target and the number of deactivated robots at the end of the task. Results indicate that at least one robot was brought closer to the target in 98% of tasks, demonstrating the interface's success in fulfilling the primary objective of the task. Additionally, in nearly 67% of tasks, more than 50% of the robots reached the target. Moreover, particularly better performance was noted in moving hazards. Additionally, the interface appeared to help minimise robot deactivation, as evidenced by nearly 94% of tasks where participants managed to keep more than 50% of the robots active, ensuring that most of the swarm remained operational. However, its effectiveness varied across hazards, with robot deactivation being lowest in distributed hazard scenarios, suggesting that the interface provided the most support in these conditions.
comment: 8 pages, 4 figures, 5 tables
ALRM: Agentic LLM for Robotic Manipulation
Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities. However, their integration in robotic control pipelines remains limited in two aspects: (1) prior \ac{llm}-based approaches often lack modular, agentic execution mechanisms, limiting their ability to plan, reflect on outcomes, and revise actions in a closed-loop manner; and (2) existing benchmarks for manipulation tasks focus on low-level control and do not systematically evaluate multistep reasoning and linguistic variation. In this paper, we propose Agentic LLM for Robot Manipulation (ALRM), an LLM-driven agentic framework for robotic manipulation. ALRM integrates policy generation with agentic execution through a ReAct-style reasoning loop, supporting two complementary modes: Code-asPolicy (CaP) for direct executable control code generation, and Tool-as-Policy (TaP) for iterative planning and tool-based action execution. To enable systematic evaluation, we also introduce a novel simulation benchmark comprising 56 tasks across multiple environments, capturing linguistically diverse instructions. Experiments with ten LLMs demonstrate that ALRM provides a scalable, interpretable, and modular approach for bridging natural language reasoning with reliable robotic execution. Results reveal Claude-4.1-Opus as the top closed-source model and Falcon-H1-7B as the top open-source model under CaP.
Unifying Perception and Action: A Hybrid-Modality Pipeline with Implicit Visual Chain-of-Thought for Robotic Action Generation
Vision-Language-Action (VLA) models built upon Chain-of-Thought (CoT) have achieved remarkable success in advancing general-purpose robotic agents, owing to its significant perceptual comprehension. Recently, since text-only CoT struggles to adequately capture scene details in complex spatial environments, a highly promising strategy involves leveraging visual priors to guide robotic action generation. Nevertheless, these strategies face two inherent challenges: (i) a modality gap between visual observations and low-level actions, and (ii) unstable training due to competing objectives between visual prediction and action generation. To address these challenges, we propose a Vision-Integrated Trajectory Alignment (VITA) framework that learns a shared discrete latent space for vision and action, enabling joint modeling of perception and motor control. VITA introduces a implicit visual CoT: autoregressively generated tokens is simultaneously decoded into future frames predictions and robot actions, thereby internalizing visual dynamics as an inductive bias for motion planning. Extensive experiments on simulated and real-world environments demonstrate state-of-the-art performance. VITA improves 14.5\%, 9.6\% and 12.1\% over existing baselines on CALVIN, LIBERO and SimplerEnv. Furthermore, VITA attains an average success rate of 80.5\% across six real-world tasks, demonstrating its potential as a generalist robotic manipulation model.
Designing Underactuated Graspers with Dynamically Variable Geometry Using Potential Energy Map Based Analysis IROS 2022
This paper introduces an extension to the energy map method for in-hand manipulation. Energy maps are used to predict how a part will evolve in the grasp given a specific actuation input to the gripper. Previous approaches assumed frictionless contacts, but we show analytically that friction can be included in the energy maps when using two-link underactuated fingers by understanding the evolution of the part-finger contact. These friction-based energy maps were used to evaluate the importance of various tendon-pulley gripper parameters across nearly 6 million simulated grasping scenarios. Specifically, a variable palm width is needed to manipulate parts of varying scales, and a variable transmission ratio, or the ratio of the distal to the proximal pulley radii, is needed to draw parts into a cage or to maintain a tip prehension grasp.
comment: This is an updated version of my original paper (with the same title) published in IROS 2022. Parts of this work were refined or corrected in Chapter 3 of my dissertation, Good Vibrations: Toward Vibration-Based Robotic In-Hand Manipulation (DOI: 10.25740/mm182vq8220), and many of those changes have been incorporated here
Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends
Vision-language-action (VLA) models extend vision-language models (VLM) by integrating action generation modules for robotic manipulation. Leveraging the strengths of VLM in vision perception and instruction understanding, VLA models exhibit promising generalization across diverse manipulation tasks. However, applications demanding high precision and accuracy reveal performance gaps without further adaptation. Evidence from multiple domains highlights the critical role of post-training to align foundational models with downstream applications, spurring extensive research on post-training VLA models. VLA model post-training aims to enhance an embodiment's ability to interact with the environment for the specified tasks. This perspective aligns with Newell's constraints-led theory of skill acquisition, which posits that motor behavior arises from interactions among task, environmental, and organismic (embodiment) constraints. Accordingly, this survey structures post-training methods into four categories: (i) enhancing environmental perception, (ii) improving embodiment awareness, (iii) deepening task comprehension, and (iv) multi-component integration. Experimental results on standard benchmarks are synthesized to distill actionable guidelines. Finally, open challenges and emerging trends are outlined, relating insights from human learning to prospective methods for VLA post-training. This work delivers both a comprehensive overview of current VLA model post-training methods from a human motor learning perspective and practical insights for VLA model development. Project website: https://github.com/AoqunJin/Awesome-VLA-Post-Training.
Emergent morphogenesis via planar fabrication enabled by a reduced model of composites
The ability to engineer complex three-dimensional shapes from planar sheets with precise, programmable control underpins emerging technologies in soft robotics, reconfigurable devices, and functional materials. Here, we present a reduced-order numerical and experimental framework for a bilayer system consisting of a stimuli-responsive thermoplastic sheet (Shrinky Dink) bonded to a kirigami-patterned, inert plastic layer. Upon uniform heating, the active layer contracts while the patterned layer constrains in-plane stretch but allows out-of-plane bending, yielding programmable 3D morphologies from simple planar precursors. Our approach enables efficient computational design and scalable manufacturing of 3D forms with a single-layer reduced model that captures the coupled mechanics of stretching and bending. Unlike traditional bilayer modeling, our framework collapses the multilayer composite into a single layer of nodes and elements, reducing the degrees of freedom and enabling simulation on a 2D geometry. This is achieved by introducing a novel energy formulation that captures the coupling between in-plane stretch mismatch and out-of-plane bending - extending beyond simple isotropic linear elastic models. Experimentally, we establish a fully planar, repeatable fabrication protocol using a stimuli-responsive thermoplastic and a laser-cut inert plastic layer. The programmed strain mismatch drives an array of 3D morphologies, such as bowls, canoes, and flower petals, all verified by both simulation and physical prototypes.
comment: GitHub repository: https://github.com/StructuresComp/discrete-shells-shrinky-dink/
SuperPoint-SLAM3: Augmenting ORB-SLAM3 with Deep Features, Adaptive NMS, and Learning-Based Loop Closure
Visual simultaneous localization and mapping (SLAM) must remain accurate under extreme viewpoint, scale and illumination variations. The widely adopted ORB-SLAM3 falters in these regimes because it relies on hand-crafted ORB keypoints. We introduce SuperPoint-SLAM3, a drop-in upgrade that (i) replaces ORB with the self-supervised SuperPoint detector--descriptor, (ii) enforces spatially uniform keypoints via adaptive non-maximal suppression (ANMS), and (iii) integrates a lightweight NetVLAD place-recognition head for learning-based loop closure. On the KITTI Odometry benchmark SuperPoint-SLAM3 reduces mean translational error from 4.15% to 0.34% and mean rotational error from 0.0027 deg/m to 0.0010 deg/m. On the EuRoC MAV dataset it roughly halves both errors across every sequence (e.g., V2\_03: 1.58% -> 0.79%). These gains confirm that fusing modern deep features with a learned loop-closure module markedly improves ORB-SLAM3 accuracy while preserving its real-time operation. Implementation, pretrained weights and reproducibility scripts are available at https://github.com/shahram95/SuperPointSLAM3.
comment: 10 pages, 6 figures, code at https://github.com/shahram95/SuperPointSLAM3
Joint Learning of Depth, Pose, and Local Radiance Field for Large Scale Monocular 3D Reconstruction
Photorealistic 3-D reconstruction from monocular video collapses in large-scale scenes when depth, pose, and radiance are solved in isolation: scale-ambiguous depth yields ghost geometry, long-horizon pose drift corrupts alignment, and a single global NeRF cannot model hundreds of metres of content. We introduce a joint learning framework that couples all three factors and demonstrably overcomes each failure case. Our system begins with a Vision-Transformer (ViT) depth network trained with metric-scale supervision, giving globally consistent depths despite wide field-of-view variations. A multi-scale feature bundle-adjustment (BA) layer refines camera poses directly in feature space--leveraging learned pyramidal descriptors instead of brittle keypoints--to suppress drift on unconstrained trajectories. For scene representation, we deploy an incremental local-radiance-field hierarchy: new hash-grid NeRFs are allocated and frozen on-the-fly when view overlap falls below a threshold, enabling city-block-scale coverage on a single GPU. Evaluated on the Tanks and Temples benchmark, our method reduces Absolute Trajectory Error to 0.001-0.021 m across eight indoor-outdoor sequences--up to 18x lower than BARF and 2x lower than NoPe-NeRF--while maintaining sub-pixel Relative Pose Error. These results demonstrate that metric-scale, drift-free 3-D reconstruction and high-fidelity novel-view synthesis are achievable from a single uncalibrated RGB camera.
comment: 8 pages, 2 figures, 2 tables
HAFO: A Force-Adaptive Control Framework for Humanoid Robots in Intense Interaction Environments
Reinforcement learning (RL) controllers have made impressive progress in humanoid locomotion and light-weight object manipulation. However, achieving robust and precise motion control with intense force interaction remains a significant challenge. To address these limitations, this paper proposes HAFO, a dual-agent reinforcement learning framework that concurrently optimizes both a robust locomotion strategy and a precise upper-body manipulation strategy via coupled training. We employ a constrained residual action space to improve dual-agent training stability and sample efficiency. The external tension disturbances are explicitly modeled using a spring-damper system, allowing for fine-grained force control through manipulation of the virtual spring. In this process, the reinforcement learning policy autonomously generates a disturbance-rejection response by utilizing environmental feedback. The experimental results demonstrate that HAFO achieves whole-body control for humanoid robots across diverse force-interaction environments using a single dual-agent policy, delivering outstanding performance under load-bearing and thrust-disturbance conditions, while maintaining stable operation even under rope suspension state.
Multiagent Systems
Learning to Communicate Across Modalities: Perceptual Heterogeneity in Multi-Agent Systems
Emergent communication offers insight into how agents develop shared structured representations, yet most research assumes homogeneous modalities or aligned representational spaces, overlooking the perceptual heterogeneity of real-world settings. We study a heterogeneous multi-step binary communication game where agents differ in modality and lack perceptual grounding. Despite perceptual misalignment, multimodal systems converge to class-consistent messages grounded in perceptual input. Unimodal systems communicate more efficiently, using fewer bits and achieving lower classification entropy, while multimodal agents require greater information exchange and exhibit higher uncertainty. Bit perturbation experiments provide strong evidence that meaning is encoded in a distributional rather than compositional manner, as each bit's contribution depends on its surrounding pattern. Finally, interoperability analyses show that systems trained in different perceptual worlds fail to directly communicate, but limited fine-tuning enables successful cross-system communication. This work positions emergent communication as a framework for studying how agents adapt and transfer representations across heterogeneous modalities, opening new directions for both theory and experimentation.
comment: To be published in EvoLang XVI proceedings. 15 pages, 17 figures
Generalized Information Gathering Under Dynamics Uncertainty
An agent operating in an unknown dynamical system must learn its dynamics from observations. Active information gathering accelerates this learning, but existing methods derive bespoke costs for specific modeling choices: dynamics models, belief update procedures, observation models, and planners. We present a unifying framework that decouples these choices from the information-gathering cost by explicitly exposing the causal dependencies between parameters, beliefs, and controls. Using this framework, we derive a general information-gathering cost based on Massey's directed information that assumes only Markov dynamics with additive noise and is otherwise agnostic to modeling choices. We prove that the mutual information cost used in existing literature is a special case of our cost. Then, we leverage our framework to establish an explicit connection between the mutual information cost and information gain in linearized Bayesian estimation, thereby providing theoretical justification for mutual information-based active learning approaches. Finally, we illustrate the practical utility of our framework through experiments spanning linear, nonlinear, and multi-agent systems.
Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Recent work has explored optimizing LLM collaboration through Multi-Agent Reinforcement Learning (MARL). However, most MARL fine-tuning approaches rely on predefined execution protocols, which often require centralized execution. Decentralized LLM collaboration is more appealing in practice, as agents can run inference in parallel with flexible deployments. Also, current approaches use Monte Carlo methods for fine-tuning, which suffer from high variance and thus require more samples to train effectively. Actor-critic methods are prevalent in MARL for dealing with these issues, so we developed Multi-Agent Actor-Critic (MAAC) methods to optimize decentralized LLM collaboration. In this paper, we analyze when and why these MAAC methods are beneficial. We propose 2 MAAC approaches, \textbf{CoLLM-CC} with a \textbf{C}entralized \textbf{C}ritic and \textbf{CoLLM-DC} with \textbf{D}ecentralized \textbf{C}ritics. Our experiments across writing, coding, and game-playing domains show that Monte Carlo methods and CoLLM-DC can achieve performance comparable to CoLLM-CC in short-horizon and dense-reward settings. However, they both underperform CoLLM-CC on long-horizon or sparse-reward tasks, where Monte Carlo methods require substantially more samples and CoLLM-DC struggles to converge. Our code is available at https://github.com/OpenMLRL/CoMLRL/releases/tag/v1.3.2.
astra-langchain4j: Experiences Combining LLMs and Agent Programming
Given the emergence of Generative AI over the last two years and the increasing focus on Agentic AI as a form of Multi-Agent System it is important to explore both how such technologies can impact the use of traditional Agent Toolkits and how the wealth of experience encapsulated in those toolkits can influence the design of the new agentic platforms. This paper presents an overview of our experience developing a prototype large language model (LLM) integration for the ASTRA programming language. It presents a brief overview of the toolkit, followed by three example implementations, concluding with a discussion of the experiences garnered through the examples.
Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting
This paper addresses the critical challenge of coordinating mobile edge UAV networks to maintain robust service in highly dynamic spatiotemporal environments. Conventional Deep Reinforcement Learning (DRL) approaches often suffer from catastrophic forgetting when transitioning between distinct task scenarios, such as moving from dense urban clusters to sparse rural areas. These transitions typically necessitate computationally expensive retraining or model resets to adapt to new user distributions, leading to service interruptions. To overcome these limitations, we propose a computationally efficient Spatiotemporal Continual Learning (STCL) framework realized through a Group-Decoupled Multi-Agent Proximal Policy Optimization (G-MAPPO) algorithm. Our approach integrates a novel Group-Decoupled Policy Optimization (GDPO) mechanism that utilizes dynamic $z$-score normalization to autonomously balance heterogeneous objectives, including energy efficiency, user fairness, and coverage. This mechanism effectively mitigates gradient conflicts induced by concept drifts without requiring offline retraining. Furthermore, the framework leverages the 3D mobility of UAVs as a spatial compensation layer, enabling the swarm to autonomously adjust altitudes to accommodate extreme density fluctuations. Extensive simulations demonstrate that the proposed STCL framework achieves superior resilience, characterized by an elastic recovery of service reliability to approximately 0.95 during phase transitions. Compared to the MADDPG baseline, G-MAPPO not only prevents knowledge forgetting but also delivers an effective capacity gain of 20\% under extreme traffic loads, validating its potential as a scalable solution for edge-enabled aerial swarms.
comment: 12 pages, 9 figures, manuscript submitted to IEEE Transactions on Emerging Topics in Computing
Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems
Individual agents in multi-agent (MA) systems often lack robustness, tending to blindly conform to misleading peers. We show this weakness stems from both sycophancy and inadequate ability to evaluate peer reliability. To address this, we first formalize the learning problem of history-aware reference, introducing the historical interactions of peers as additional input, so that agents can estimate peer reliability and learn from trustworthy peers when uncertain. This shifts the task from evaluating peer reasoning quality to estimating peer reliability based on interaction history. We then develop Epistemic Context Learning (ECL): a reasoning framework that conditions predictions on explicitly-built peer profiles from history. We further optimize ECL by reinforcement learning using auxiliary rewards. Our experiments reveal that our ECL enables small models like Qwen 3-4B to outperform a history-agnostic baseline 8x its size (Qwen 3-30B) by accurately identifying reliable peers. ECL also boosts frontier models to near-perfect (100%) performance. We show that ECL generalizes well to various MA configurations and we find that trust is modeled well by LLMs, revealing a strong correlation in trust modeling accuracy and final answer quality.
comment: Codes and data are available at https://github.com/skyriver-2000/epistemic-context-learning
Toward Culturally Aligned LLMs through Ontology-Guided Multi-Agent Reasoning
Large Language Models (LLMs) increasingly support culturally sensitive decision making, yet often exhibit misalignment due to skewed pretraining data and the absence of structured value representations. Existing methods can steer outputs, but often lack demographic grounding and treat values as independent, unstructured signals, reducing consistency and interpretability. We propose OG-MAR, an Ontology-Guided Multi-Agent Reasoning framework. OG-MAR summarizes respondent-specific values from the World Values Survey (WVS) and constructs a global cultural ontology by eliciting relations over a fixed taxonomy via competency questions. At inference time, it retrieves ontology-consistent relations and demographically similar profiles to instantiate multiple value-persona agents, whose outputs are synthesized by a judgment agent that enforces ontology consistency and demographic proximity. Experiments on regional social-survey benchmarks across four LLM backbones show that OG-MAR improves cultural alignment and robustness over competitive baselines, while producing more transparent reasoning traces.
comment: 35 pages
Opinion Consensus Formation Among Networked Large Language Models ICASSP 2026
Can classical consensus models predict the group behavior of large language models (LLMs)? We examine multi-round interactions among LLM agents through the DeGroot framework, where agents exchange text-based messages over diverse communication graphs. To track opinion evolution, we map each message to an opinion score via sentiment analysis. We find that agents typically reach consensus and the disagreement between the agents decays exponentially. However, the limiting opinion departs from DeGroot's network-centrality-weighted forecast. The consensus between LLM agents turns out to be largely insensitive to initial conditions and instead depends strongly on the discussion subject and inherent biases. Nevertheless, transient dynamics align with classical graph theory and the convergence rate of opinions is closely related to the second-largest eigenvalue of the graph's combination matrix. Together, these findings can be useful for LLM-driven social-network simulations and the design of resource-efficient multi-agent LLM applications.
comment: Accepted at ICASSP 2026
Mean-Field Control on Sparse Graphs: From Local Limits to GNNs via Neighborhood Distributions
Mean-field control (MFC) offers a scalable solution to the curse of dimensionality in multi-agent systems but traditionally hinges on the restrictive assumption of exchangeability via dense, all-to-all interactions. In this work, we bridge the gap to real-world network structures by proposing a rigorous framework for MFC on large sparse graphs. We redefine the system state as a probability measure over decorated rooted neighborhoods, effectively capturing local heterogeneity. Our central contribution is a theoretical foundation for scalable reinforcement learning in this setting. We prove horizon-dependent locality: for finite-horizon problems, an agent's optimal policy at time t depends strictly on its (T-t)-hop neighborhood. This result renders the infinite-dimensional control problem tractable and underpins a novel Dynamic Programming Principle (DPP) on the lifted space of neighborhood distributions. Furthermore, we formally and experimentally justify the use of Graph Neural Networks (GNNs) for actor-critic algorithms in this context. Our framework naturally recovers classical MFC as a degenerate case while enabling efficient, theoretically grounded control on complex sparse topologies.
comment: 19 pages
DataCross: A Unified Benchmark and Agent Framework for Cross-Modal Heterogeneous Data Analysis
In real-world data science and enterprise decision-making, critical information is often fragmented across directly queryable structured sources (e.g., SQL, CSV) and "zombie data" locked in unstructured visual documents (e.g., scanned reports, invoice images). Existing data analytics agents are predominantly limited to processing structured data, failing to activate and correlate this high-value visual information, thus creating a significant gap with industrial needs. To bridge this gap, we introduce DataCross, a novel benchmark and collaborative agent framework for unified, insight-driven analysis across heterogeneous data modalities. DataCrossBench comprises 200 end-to-end analysis tasks across finance, healthcare, and other domains. It is constructed via a human-in-the-loop reverse-synthesis pipeline, ensuring realistic complexity, cross-source dependency, and verifiable ground truth. The benchmark categorizes tasks into three difficulty tiers to evaluate agents' capabilities in visual table extraction, cross-modal alignment, and multi-step joint reasoning. We also propose the DataCrossAgent framework, inspired by the "divide-and-conquer" workflow of human analysts. It employs specialized sub-agents, each an expert on a specific data source, which are coordinated via a structured workflow of Intra-source Deep Exploration, Key Source Identification, and Contextual Cross-pollination. A novel reReAct mechanism enables robust code generation and debugging for factual verification. Experimental results show that DataCrossAgent achieves a 29.7% improvement in factuality over GPT-4o and exhibits superior robustness on high-difficulty tasks, effectively activating fragmented "zombie data" for insightful, cross-modal analysis.
Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading
Automated essay scoring (AES) systems increasingly rely on large language models, yet little is known about how architectural choices shape their performance across different essay quality levels. This paper evaluates single-agent and multi-agent LLM architectures for essay grading using the ASAP 2.0 corpus. Our multi-agent system decomposes grading into three specialist agents (Content, Structure, Language) coordinated by a Chairman Agent that implements rubric-aligned logic including veto rules and score capping. We test both architectures in zero-shot and few-shot conditions using GPT-5.1. Results show that the multi-agent system is significantly better at identifying weak essays while the single-agent system performs better on mid-range essays. Both architectures struggle with high-quality essays. Critically, few-shot calibration emerges as the dominant factor in system performance -- providing just two examples per score level improves QWK by approximately 26% for both architectures. These findings suggest architectural choice should align with specific deployment priorities, with multi-agent AI particularly suited for diagnostic screening of at-risk students, while single-agent models provide a cost-effective solution for general assessment.
AgentScore: Autoformulation of Deployable Clinical Scoring Systems
Modern clinical practice relies on evidence-based guidelines implemented as compact scoring systems composed of a small number of interpretable decision rules. While machine-learning models achieve strong performance, many fail to translate into routine clinical use due to misalignment with workflow constraints such as memorability, auditability, and bedside execution. We argue that this gap arises not from insufficient predictive power, but from optimizing over model classes that are incompatible with guideline deployment. Deployable guidelines often take the form of unit-weighted clinical checklists, formed by thresholding the sum of binary rules, but learning such scores requires searching an exponentially large discrete space of possible rule sets. We introduce AgentScore, which performs semantically guided optimization in this space by using LLMs to propose candidate rules and a deterministic, data-grounded verification-and-selection loop to enforce statistical validity and deployability constraints. Across eight clinical prediction tasks, AgentScore outperforms existing score-generation methods and achieves AUC comparable to more flexible interpretable models despite operating under stronger structural constraints. On two additional externally validated tasks, AgentScore achieves higher discrimination than established guideline-based scores.
Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems
Multi-agent systems often operate in dynamic and uncertain environments, where agents must not only pursue individual goals but also safeguard collective functionality. This challenge is especially acute in mixed-motive multi-agent systems. This work focuses on cooperative resilience, the ability of agents to anticipate, resist, recover, and transform in the face of disruptions, a critical yet underexplored property in Multi-Agent Reinforcement Learning. We study how reward function design influences resilience in mixed-motive settings and introduce a novel framework that learns reward functions from ranked trajectories, guided by a cooperative resilience metric. Agents are trained in a suite of social dilemma environments using three reward strategies: i) traditional individual reward; ii) resilience-inferred reward; and iii) hybrid that balance both. We explore three reward parameterizations-linear models, hand-crafted features, and neural networks, and employ two preference-based learning algorithms to infer rewards from behavioral rankings. Our results demonstrate that hybrid strategy significantly improve robustness under disruptions without degrading task performance and reduce catastrophic outcomes like resource overuse. These findings underscore the importance of reward design in fostering resilient cooperation, and represent a step toward developing robust multi-agent systems capable of sustaining cooperation in uncertain environments.
comment: Supplementary material in https://github.com/mavivi95/supplementary_files/blob/main/Learning_TAI___Supplementary_File.pdf
Aligning Microscopic Vehicle and Macroscopic Traffic Statistics: Reconstructing Driving Behavior from Partial Data
A driving algorithm that aligns with good human driving practices, or at the very least collaborates effectively with human drivers, is crucial for developing safe and efficient autonomous vehicles. In practice, two main approaches are commonly adopted: (i) supervised or imitation learning, which requires comprehensive naturalistic driving data capturing all states that influence a vehicle's decisions and corresponding actions, and (ii) reinforcement learning (RL), where the simulated driving environment either matches or is intentionally more challenging than real-world conditions. Both methods depend on high-quality observations of real-world driving behavior, which are often difficult and costly to obtain. State-of-the-art sensors on individual vehicles can gather microscopic data, but they lack context about the surrounding conditions. Conversely, roadside sensors can capture traffic flow and other macroscopic characteristics, but they cannot associate this information with individual vehicles on a microscopic level. Motivated by this complementarity, we propose a framework that reconstructs unobserved microscopic states from macroscopic observations, using microscopic data to anchor observed vehicle behaviors, and learns a shared policy whose behavior is microscopically consistent with the partially observed trajectories and actions and macroscopically aligned with target traffic statistics when deployed population-wide. Such constrained and regularized policies promote realistic flow patterns and safe coordination with human drivers at scale.
Learning to Recommend Multi-Agent Subgraphs from Calling Trees
Multi-agent systems (MAS) increasingly solve complex tasks by orchestrating agents and tools selected from rapidly growing marketplaces. As these marketplaces expand, many candidates become functionally overlapping, making selection not just a retrieval problem: beyond filtering relevant agents, an orchestrator must choose options that are reliable, compatible with the current execution context, and able to cooperate with other selected agents. Existing recommender systems -- largely built for item-level ranking from flat user-item logs -- do not directly address the structured, sequential, and interaction-dependent nature of agent orchestration. We address this gap by \textbf{formulating agent recommendation in MAS as a constrained decision problem} and introducing a generic \textbf{constrained recommendation framework} that first uses retrieval to build a compact candidate set conditioned on the current subtask and context, and then performs \textbf{utility optimization} within this feasible set using a learned scorer that accounts for relevance, reliability, and interaction effects. We ground both the formulation and learning signals in \textbf{historical calling trees}, which capture the execution structure of MAS (parent-child calls, branching dependencies, and local cooperation patterns) beyond what flat logs provide. The framework supports two complementary settings: \textbf{agent-level recommendation} (select the next agent/tool) and \textbf{system-level recommendation} (select a small, connected agent team/subgraph for coordinated execution). To enable systematic evaluation, we construct a unified calling-tree benchmark by normalizing invocation logs from eight heterogeneous multi-agent corpora into a shared structured representation.
The incompatibility of the Condorcet winner and loser criteria with positive or negative involvement and resolvability
We prove that there is no preferential voting method satisfying the Condorcet winner and loser criteria, positive involvement (if a candidate $x$ wins in an initial preference profile, then adding a voter who ranks $x$ uniquely first cannot cause $x$ to lose), and $n$-voter resolvability (if $x$ initially ties for winning, then $x$ can be made the unique winner by adding some set of up to $n$ voters). This impossibility theorem holds for any positive integer $n$. In addition, positive involvement can be replaced by negative involvement (if a candidate $x$ loses in an initial preference profile, then adding a voter who ranks $x$ uniquely last cannot cause $x$ to win). In a previous note, we proved an analogous result assuming an additional axiom of ordinal margin invariance, which we now show is unnecessary for an impossibility theorem, at least if the desired voting method is defined for five-candidate elections.
comment: 6 pages, 2 figures. Added theorems for negative involvement
What the flock knows that the birds do not: exploring the emergence of joint agency in multi-agent active inference
Collective behavior pervades biological systems, from flocks of birds to neural assemblies and human societies. Yet, how such collectives acquire functional properties -- such as joint agency or knowledge -- that transcend those of their individual components remains an open question. Here, we combine active inference and information-theoretic analyses to explore how a minimal system of interacting agents can give rise to joint agency and collective knowledge. We model flocking dynamics using multiple active inference agents, each minimizing its own free energy while coupling reciprocally with its neighbors. We show that as agents self-organize, their interactions define higher-order statistical boundaries (Markov blankets) enclosing a ``flock'' that can be treated as an emergent agent with its own sensory, active, and internal states. When exposed to external perturbations (a ``predator''), the flock exhibits faster, coordinated responses than individual agents, reflecting collective sensitivity to environmental change. Crucially, analyses of synergistic information reveal that the flock encodes information about the predator's location that is not accessible to every individual bird, demonstrating implicit collective knowledge. Together, these results show how informational coupling among active inference agents can generate new levels of autonomy and inference, providing a framework for understanding the emergence of (implicit) collective knowledge and joint agency.
comment: 21 pages, 3 figures, appendix
Adaptive Swarm Mesh Refinement using Deep Reinforcement Learning with Local Rewards
Simulating physical systems is essential in engineering, but analytical solutions are limited to straightforward problems. Consequently, numerical methods like the Finite Element Method (FEM) are widely used. However, the FEM becomes computationally expensive as problem complexity and accuracy demands increase. Adaptive Mesh Refinement (AMR) improves the FEM by dynamically placing mesh elements on the domain, balancing computational speed and accuracy. Classical AMR depends on heuristics or expensive error estimators, which may lead to suboptimal performance for complex simulations. While AMR methods based on machine learning are promising, they currently only scale to simple problems. In this work, we formulate AMR as a system of collaborating, homogeneous agents that iteratively split into multiple new agents. This agent-wise perspective enables a spatial reward formulation focused on reducing the maximum mesh element error. Our approach, Adaptive Swarm Mesh Refinement++ (ASMR++), offers efficient, stable optimization and generates highly adaptive meshes at user-defined resolution at inference time. Extensive experiments demonstrate that ASMR++ outperforms heuristic approaches and learned baselines, matching the performance of expensive error-based oracle AMR strategies. ASMR additionally generalizes to different domains during inference, and produces meshes that simulate up to 2 orders of magnitude faster than uniform refinements in more demanding settings.
comment: Submitted to Journal of Machine Learning Research (JMLR)
LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation Systems
Recommendation systems must optimize multiple objectives while satisfying hard business constraints such as fairness and coverage. For example, an e-commerce platform may require every recommendation list to include items from multiple sellers and at least one newly listed product; violating such constraints--even once--is unacceptable in production. Prior work on multi-objective recommendation and recent LLM-based recommender agents largely treat constraints as soft penalties or focus on item scoring and interaction, leading to frequent violations in real-world deployments. How to leverage LLMs for coordinating constrained optimization in recommendation systems remains underexplored. We propose DualAgent-Rec, an LLM-coordinated dual-agent framework for constrained multi-objective e-commerce recommendation. The framework separates optimization into an Exploitation Agent that prioritizes accuracy under hard constraints and an Exploration Agent that promotes diversity through unconstrained Pareto search. An LLM-based coordinator adaptively allocates resources between agents based on optimization progress and constraint satisfaction, while an adaptive epsilon-relaxation mechanism guarantees feasibility of final solutions. Experiments on the Amazon Reviews 2023 dataset demonstrate that DualAgent-Rec achieves 100% constraint satisfaction and improves Pareto hypervolume by 4-6% over strong baselines, while maintaining competitive accuracy-diversity trade-offs. These results indicate that LLMs can act as effective orchestration agents for deployable and constraint-compliant recommendation systems.
comment: 8 pages, 5 figures
MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks
While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design under-deliver. Such shortcomings stem from two key factors: (1) methodological complexity - agent orchestration is performed using sequential, code-level execution that limits global system-level holistic reasoning and scales poorly with agent complexity - and (2) efficacy uncertainty - MAS are deployed without understanding if there are tangible benefits compared to single-agent systems (SAS). We propose MASOrchestra, a training-time framework that formulates MAS orchestration as a function-calling reinforcement learning problem with holistic orchestration, generating an entire MAS at once. In MAS-Orchestra, complex, goal-oriented subagents are abstracted as callable functions, enabling global reasoning over system structure while hiding internal execution details. To rigorously study when and why MAS are beneficial, we introduce MASBENCH, a controlled benchmark that characterizes tasks along five axes: Depth, Horizon, Breadth, Parallel, and Robustness. Our analysis reveals that MAS gains depend critically on task structure, verification protocols, and the capabilities of both orchestrator and subagents, rather than holding universally. Guided by these insights, MAS-Orchestra achieves consistent improvements on public benchmarks including mathematical reasoning, multi-hop QA, and search-based QA, while achieving more than 10x efficiency over strong baselines. Together, MAS-Orchestra and MASBENCH enable better training and understanding of MAS in the pursuit of multi-agent intelligence.
comment: Preprint; Work in Progress
Reputation as a Solution to Cooperation Collapse in LLM-based MASs AAMAS 2026
Cooperation has long been a fundamental topic in both human society and AI systems. However, recent studies indicate that the collapse of cooperation may emerge in multi-agent systems (MASs) driven by large language models (LLMs). To address this challenge, we explore reputation systems as a remedy. We propose RepuNet, a dynamic, dual-level reputation framework that models both agent-level reputation dynamics and system-level network evolution. Specifically, driven by direct interactions and indirect gossip, agents form reputations for both themselves and their peers, and decide whether to connect or disconnect other agents for future interactions. Through three distinct scenarios, we show that RepuNet effectively avoids cooperation collapse, promoting and sustaining cooperation in LLM-based MASs. Moreover, we find that reputation systems can give rise to rich emergent behaviors in LLM-based MASs, such as the formation of cooperative clusters, the social isolation of exploitative agents, and the preference for sharing positive gossip rather than negative ones. The GitHub repository for our project can be accessed via the following link: https://github.com/RGB-0000FF/RepuNet.
comment: Published as a conference paper at AAMAS 2026
Performance bound analysis of linear consensus algorithm on strongly connected graphs using effective resistance and reversiblization
We study the performance of the linear consensus algorithm on strongly connected directed graphs using the linear quadratic (LQ) cost as a performance measure. In particular, we derive bounds on the LQ cost by leveraging effective resistance and reversiblization. Our results extend previous analyses-which were limited to reversible cases-to the nonreversible setting. To facilitate this generalization, we introduce novel concepts, termed the back-and-forth path and the pivot node, which serve as effective alternatives to traditional techniques that require reversibility. Moreover, we apply our approach to Cayley graphs and random geometric graphs to estimate the LQ cost without the reversibility assumption. The proposed approach provides a framework that can be adapted to other contexts where reversibility is typically assumed.
TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law ACL 2026
Hong Kong case law translation presents significant challenges: manual methods suffer from high costs and inconsistent quality, while both traditional machine translation and approaches relying solely on Large Language Models (LLMs) often fail to ensure legal terminology accuracy, culturally embedded nuances, and strict linguistic structures. To overcome these limitations, this study proposes TransLaw, a multi-agent framework that decomposes translation into word-level expression, sentence-level translation, and multidimensional review, integrating a specialized Hong Kong legal glossary database, Retrieval-Augmented Generation (RAG), and iterative feedback. Experiments on our newly constructed HKCFA Judgment 97-22 dataset, benchmarking 13 open-source and commercial LLMs, demonstrate that TransLaw significantly outperforms single-agent baselines across all evaluated models. Human evaluation confirms the framework's effectiveness in terms of legal semantic accuracy, structural coherence, and stylistic fidelity, while noting that it still trails human experts in contextualizing complex terminology and stylistic naturalness.
comment: Original: arXiv:2501.09444; Revised: arXiv:2507.00875; Submitted to ACL 2026
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings ICLR2025
As large language models (LLMs) start interacting with each other and generating an increasing amount of text online, it becomes crucial to better understand how information is transformed as it passes from one LLM to the next. While significant research has examined individual LLM behaviors, existing studies have largely overlooked the collective behaviors and information distortions arising from iterated LLM interactions. Small biases, negligible at the single output level, risk being amplified in iterated interactions, potentially leading the content to evolve towards attractor states. In a series of telephone game experiments, we apply a transmission chain design borrowed from the human cultural evolution literature: LLM agents iteratively receive, produce, and transmit texts from the previous to the next agent in the chain. By tracking the evolution of text toxicity, positivity, difficulty, and length across transmission chains, we uncover the existence of biases and attractors, and study their dependence on the initial text, the instructions, language model, and model size. For instance, we find that more open-ended instructions lead to stronger attraction effects compared to more constrained tasks. We also find that different text properties display different sensitivity to attraction effects, with toxicity leading to stronger attractors than length. These findings highlight the importance of accounting for multi-step transmission dynamics and represent a first step towards a more comprehensive understanding of LLM cultural dynamics.
comment: Code available at https://github.com/jeremyperez2/TelephoneGameLLM. Companion website with a Data Explorer tool at https://sites.google.com/view/telephone-game-llm . This paper was published at the 2025 International Conference on Learning Representations (ICLR2025) https://iclr.cc/virtual/2025/poster/28880
The Price of Uncertainty for Social Consensus
How hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the \emph{price of uncertainty}, a metric defined in previous work by Balcan et al. to quantify the effect of uncertainty in network games.
comment: 17 pages
Empowering Scientific Workflows with Federated Agents
Agentic systems, in which diverse agents cooperate to tackle challenging problems, are exploding in popularity in the AI community. However, existing agentic frameworks take a relatively narrow view of agents, apply a centralized model, and target conversational, cloud-native applications (e.g., LLM-based AI chatbots). In contrast, scientific applications require myriad agents be deployed and managed across diverse cyberinfrastructure. Here we introduce Academy, a modular and extensible middleware designed to deploy autonomous agents across the federated research ecosystem, including HPC systems, experimental facilities, and data repositories. To meet the demands of scientific computing, Academy supports asynchronous execution, heterogeneous resources, high-throughput data flows, and dynamic resource availability. It provides abstractions for expressing stateful agents, managing inter-agent coordination, and integrating computation with experimental control. We present microbenchmark results that demonstrate high performance and scalability in HPC environments. To explore the breadth of applications that can be supported by agentic workflow designs, we also present case studies in materials discovery, astronomy, decentralized learning, and information extraction in which agents are deployed across diverse HPC systems.
comment: To appear at the IEEE International Parallel & Distributed Processing Symposium (IPDPS '26)
Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs
Multi-agent systems (MAS) and reinforcement learning (RL) are widely used to enhance the agentic capabilities of large language models (LLMs). MAS improves task performance through role-based orchestration, while RL uses environmental rewards to learn stronger policies, such as GRPO-style optimization. However, applying on-policy RL to MAS remains underexplored and presents unique challenges. Algorithmically, standard GRPO grouping assumptions break down because prompts vary by role and by turn. System-wise, the training stack must support MAS-workflow rollouts and on-policy updates for both single-policy and multi-policy models. We propose AT-GRPO, which includes (i) an agent- and turn-wise grouped RL algorithm tailored to MAS and (ii) a training system that supports both single- and multi-policy regimes. Across game, planning, coding, and math tasks, AT-GRPO delivers substantial gains. On long-horizon planning, it increases accuracy from a 14.0 to 47.0 percent single-agent RL baseline to 96.0 to 99.5 percent. It also improves reasoning performance, with average gains of 3.87 to 7.62 percent on coding tasks and 9.0 to 17.93 percent on math. Code and environments are available at: https://github.com/pettingllms-ai/PettingLLMs.
Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay
Current multi-agent LLM frameworks rely on explicit orchestration patterns borrowed from human organizational structures: planners delegate to executors, managers coordinate workers, and hierarchical control flow governs agent interactions. These approaches suffer from coordination overhead that scales poorly with agent count and task complexity. We propose a fundamentally different paradigm inspired by natural coordination mechanisms: agents operate locally on a shared artifact, guided only by pressure gradients derived from measurable quality signals, with temporal decay preventing premature convergence. We formalize this as optimization over a pressure landscape and prove convergence guarantees under mild conditions. Empirically, on meeting room scheduling across 1,350 trials, pressure-field coordination outperforms all baselines: 48.5% aggregate solve rate versus 12.6% for conversation-based coordination, 1.5% for hierarchical control, and 0.4% for sequential and random baselines (all pairwise comparisons p < 0.001). Temporal decay is essential: disabling it reduces solve rate by 10 percentage points. On easy problems, pressure-field achieves 86.7% solve rate. The approach maintains consistent performance from 1 to 4 agents. Implicit coordination through shared pressure gradients outperforms explicit hierarchical control, suggesting that constraint-driven emergence offers a simpler and more effective foundation for multi-agent AI.
comment: 64 pages, 5 figures, 12 tables. Code available at https://github.com/Govcraft/pressure-field-experiment
Systems and Control (EESS)
Late Breaking Results: Conversion of Neural Networks into Logic Flows for Edge Computing DATE2026
Neural networks have been successfully applied in various resource-constrained edge devices, where usually central processing units (CPUs) instead of graphics processing units exist due to limited power availability. State-of-the-art research still focuses on efficiently executing enormous numbers of multiply-accumulate (MAC) operations. However, CPUs themselves are not good at executing such mathematical operations on a large scale, since they are more suited to execute control flow logic, i.e., computer algorithms. To enhance the computation efficiency of neural networks on CPUs, in this paper, we propose to convert them into logic flows for execution. Specifically, neural networks are first converted into equivalent decision trees, from which decision paths with constant leaves are then selected and compressed into logic flows. Such logic flows consist of if and else structures and a reduced number of MAC operations. Experimental results demonstrate that the latency can be reduced by up to 14.9 % on a simulated RISC-V CPU without any accuracy degradation. The code is open source at https://github.com/TUDa-HWAI/NN2Logic
comment: accepted by DATE2026
Comparative Assessment of Look-Ahead Economic Dispatch and Ramp Products for Grid Flexibility
High renewable penetration increases the frequency and magnitude of net-load ramps, stressing real-time flexibility. Two commonly deployed remedies are look-ahead economic dispatch (LAED) and ramp products (RPs), yet their operational equivalence under the industry-standard rolling-window dispatch implementation is not well understood. This paper develops linear optimization models for multi-interval LAED and RP-based co-optimization, and proves that an enhanced RP formulation can match LAED's dispatch feasible region at a single time step when additional intertemporal deliverability constraints are enforced. We then show that this equivalence does not generally persist under rolling-window operation because LAED and RP formulations optimize different intertemporal objectives, leading to divergent end-of-window states. Using different test systems under stressed ramping conditions and multiple load levels, we show LAED achieves similar or lower load shedding than RP implementations with the same look-ahead horizon, with the most pronounced differences under high-load, ramp-limited conditions. The study highlights the limitations of current ramp product implementations and suggests enhancements, such as introducing more mid-duration RPs.
SINA: A Circuit Schematic Image-to-Netlist Generator Using Artificial Intelligence
Current methods for converting circuit schematic images into machine-readable netlists struggle with component recognition and connectivity inference. In this paper, we present SINA, an open-source, fully automated circuit schematic image-to-netlist generator. SINA integrates deep learning for accurate component detection, Connected-Component Labeling (CCL) for precise connectivity extraction, and Optical Character Recognition (OCR) for component reference designator retrieval, while employing a Vision-Language Model (VLM) for reliable reference designator assignments. In our experiments, SINA achieves 96.47% overall netlist-generation accuracy, which is 2.72x higher than state-of-the-art approaches.
Physics Informed Reconstruction of Four-Dimensional Atmospheric Wind Fields Using Multi-UAS Swarm Observations in a Synthetic Turbulent Environment
Accurate reconstruction of atmospheric wind fields is essential for applications such as weather forecasting, hazard prediction, and wind energy assessment, yet conventional instruments leave spatio-temporal gaps within the lower atmospheric boundary layer. Unmanned aircraft systems (UAS) provide flexible in situ measurements, but individual platforms sample wind only along their flight trajectories, limiting full wind-field recovery. This study presents a framework for reconstructing four-dimensional atmospheric wind fields using measurements obtained from a coordinated UAS swarm. A synthetic turbulence environment and high-fidelity multirotor simulation are used to generate training and evaluation data. Local wind components are estimated from UAS dynamics using a bidirectional long short-term memory network (Bi-LSTM) and assimilated into a physics-informed neural network (PINN) to reconstruct a continuous wind field in space and time. For local wind estimation, the bidirectional LSTM achieves root-mean-square errors (RMSE) of 0.064 and 0.062 m/s for the north and east components in low-wind conditions, increasing to 0.122 to 0.129 m/s under moderate winds and 0.271 to 0.273 m/s in high-wind conditions, while the vertical component exhibits higher error, with RMSE values of 0.029 to 0.091 m/s. The physics-informed reconstruction recovers the dominant spatial and temporal structure of the wind field up to 1000 m altitude while preserving mean flow direction and vertical shear. Under moderate wind conditions, the reconstructed mean wind field achieves an overall RMSE between 0.118 and 0.154 m/s across evaluated UAS configurations, with the lowest error obtained using a five-UAS swarm. These results demonstrate that coordinated UAS measurements enable accurate and scalable four-dimensional wind-field reconstruction without dedicated wind sensors or fixed infrastructure.
Beyond Martingale Estimators: Structured Estimators for Maximizing Information Freshness in Query-Based Update Systems
This paper investigates information freshness in a remote estimation system in which the remote information source is a continuous-time Markov chain (CTMC). For such systems, estimators have been mainly restricted to the class of martingale estimators in which the remote estimate at any time is equal to the value of the most recently received update. This is mainly due to the simplicity and ease of analysis of martingale estimators, which however are far from optimal, especially in query-based (i.e., pull-based) update systems. In such systems, maximum a-posteriori probability (MAP) estimators are optimal. However, MAP estimators can be challenging to analyze in continuous-time settings. In this paper, we introduce a new class of estimators, called structured estimators, which can seamlessly shift from a martingale estimator to a MAP estimator, enabling them to retain useful characteristics of the MAP estimate, while still being analytically tractable. Particularly, we introduce a new estimator termed as the $p$-MAP estimator which is a piecewise-constant approximation of the MAP estimator with finitely many discontinuities, bringing us closer to a full characterization of MAP estimators when modeling information freshness. In fact, we show that for time-reversible CTMCs, the MAP estimator reduces to a $p$-MAP estimator. Using the binary freshness (BF) process for the characterization of information freshness, we derive the freshness expressions and provide optimal state-dependent sampling policies (i.e., querying policies) for maximizing the mean BF (MBF) for pull-based remote estimation of a single CTMC information source, when structured estimators are used. Moreover, we provide optimal query rate allocation policies when a monitor pulls information from multiple heterogeneous CTMCs with a constraint on the overall query rate.
comment: arXiv admin note: text overlap with arXiv:2601.18763
Reformulating Energy Storage Capacity Accreditation Problem with Marginal Reliability Impact
To enhance the efficiency of capacity markets, many electricity markets in the U.S. are adopting or planning to implement marginal capacity accreditation reforms. This paper provides new insights into energy storage capacity accreditation using Marginal Reliability Impact (MRI). We reformulate the commonly used reliability-based storage dispatch model as an optimization problem, enabling direct calculation of the MRI from the Lagrange multipliers, rather than using brute-force perturbation analysis. The analysis demonstrates that the EUE is a piecewise linear function and the storage MRI retains a non-negative property across various system scenarios. We further explore the influence of qualified capacity (QC), storage dispatch rules, and other key factors on storage accreditation, providing practical insights for system operators. Additionally, comparisons of storage capacity accreditation under different reliability criteria offer valuable guidance for policymakers in setting future standards. Numerical results from a modified California system validate our findings and highlight several important phenomena associated with the MRI-based accreditation scheme.
A Gradient-Based Capacity Accreditation Framework in Resource Adequacy: Formulation, Computation, and Practical Implications
Probabilistic resource adequacy assessment is a cornerstone of modern capacity accreditation. This paper develops a gradient-based framework, in which capacity accreditation is interpreted as the directional derivative of a probabilistic resource adequacy metric with respect to resource capacity, that unifies two widely used accreditation approaches: Effective Load Carrying Capability (ELCC) and Marginal Reliability Impact (MRI). Under mild regularity conditions, we show that marginal ELCC and MRI yield equivalent accreditation factors, while their numerical implementations exhibit markedly different computational characteristics. Building on this framework, we demonstrate how infinitesimal perturbation analysis enables up to a $1000\times$ speedup in gradient estimation for capacity accreditation, and we implement gradient-informed search algorithms that significantly accelerate ELCC computations relative to standard bisection methods. Large-scale Monte Carlo experiments show that MRI achieves substantial runtime reductions compared to ELCC and exhibits greater robustness to perturbation step-size selection. These results provide practical guidance for implementing efficient and scalable capacity accreditation in large-scale power systems.
Volt/VAR Optimization in Transmission Networks with Discrete-Control Devices
Voltage (Volt) and reactive-power (VAR) control in transmission networks is critical for reliability and increasingly needs fast, implementable decisions. This paper presents a transmission Volt/VAR Optimization (VVO) framework that co-optimizes discrete control of on-load tap-changing transformers (OLTCs) and capacitor banks (CBs) with AC power flow (ACPF) physics to improve voltage stability and minimize VAR generation. The framework follows a relax-round-resolve pipeline: a continuous relaxation proposes targets, a rounding step selects feasible discrete settings, and a final solve enforces AC power flow physics. Extensive experiments on IEEE, PEGASE, and RTE systems show consistent improvements in voltage and VAR quality metrics with modest generator redispatch while preserving economic operation and achieving compatible runtimes with real-time transmission operations.
Learning to Dial-a-Ride: A Deep Graph Reinforcement Learning Approach to the Electric Dial-a-Ride Problem
Urban mobility systems are transitioning toward electric, on-demand services, creating operational challenges for fleet management under energy and service-quality constraints. The Electric Dial-a-Ride Problem (E-DARP) extends the classical dial-a-ride problem by incorporating limited battery capacity and nonlinear charging dynamics, increasing computational complexity and limiting the scalability of exact methods for real-time use. This paper proposes a deep reinforcement learning approach based on a graph neural network encoder and an attention-driven route construction policy. By operating directly on edge attributes such as travel time and energy consumption, the method captures non-Euclidean, asymmetric, and energy-dependent routing costs in real road networks. The learned policy jointly optimizes routing, charging, and service quality without relying on Euclidean assumptions or handcrafted heuristics. The approach is evaluated on two case studies using ride-sharing data from San Francisco. On benchmark instances, the method achieves solutions within 0.4% of best-known results while reducing computation times by orders of magnitude. A second case study considers large-scale instances with up to 250 request pairs, realistic energy models, and nonlinear charging. On these instances, the learned policy outperforms Adaptive Large Neighborhood Search (ALNS) by 9.5% in solution quality while achieving 100% service completion, with sub-second inference times compared to hours for the metaheuristic. Finally, sensitivity analyses quantify the impact of battery capacity, fleet size, ride-sharing capacity, and reward weights, while robustness experiments show that deterministically trained policies generalize effectively under stochastic conditions.
Generalized Information Gathering Under Dynamics Uncertainty
An agent operating in an unknown dynamical system must learn its dynamics from observations. Active information gathering accelerates this learning, but existing methods derive bespoke costs for specific modeling choices: dynamics models, belief update procedures, observation models, and planners. We present a unifying framework that decouples these choices from the information-gathering cost by explicitly exposing the causal dependencies between parameters, beliefs, and controls. Using this framework, we derive a general information-gathering cost based on Massey's directed information that assumes only Markov dynamics with additive noise and is otherwise agnostic to modeling choices. We prove that the mutual information cost used in existing literature is a special case of our cost. Then, we leverage our framework to establish an explicit connection between the mutual information cost and information gain in linearized Bayesian estimation, thereby providing theoretical justification for mutual information-based active learning approaches. Finally, we illustrate the practical utility of our framework through experiments spanning linear, nonlinear, and multi-agent systems.
Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting
This paper addresses the critical challenge of coordinating mobile edge UAV networks to maintain robust service in highly dynamic spatiotemporal environments. Conventional Deep Reinforcement Learning (DRL) approaches often suffer from catastrophic forgetting when transitioning between distinct task scenarios, such as moving from dense urban clusters to sparse rural areas. These transitions typically necessitate computationally expensive retraining or model resets to adapt to new user distributions, leading to service interruptions. To overcome these limitations, we propose a computationally efficient Spatiotemporal Continual Learning (STCL) framework realized through a Group-Decoupled Multi-Agent Proximal Policy Optimization (G-MAPPO) algorithm. Our approach integrates a novel Group-Decoupled Policy Optimization (GDPO) mechanism that utilizes dynamic $z$-score normalization to autonomously balance heterogeneous objectives, including energy efficiency, user fairness, and coverage. This mechanism effectively mitigates gradient conflicts induced by concept drifts without requiring offline retraining. Furthermore, the framework leverages the 3D mobility of UAVs as a spatial compensation layer, enabling the swarm to autonomously adjust altitudes to accommodate extreme density fluctuations. Extensive simulations demonstrate that the proposed STCL framework achieves superior resilience, characterized by an elastic recovery of service reliability to approximately 0.95 during phase transitions. Compared to the MADDPG baseline, G-MAPPO not only prevents knowledge forgetting but also delivers an effective capacity gain of 20\% under extreme traffic loads, validating its potential as a scalable solution for edge-enabled aerial swarms.
comment: 12 pages, 9 figures, manuscript submitted to IEEE Transactions on Emerging Topics in Computing
Optimal Transport for Time-Varying Multi-Agent Coverage Control
Coverage control algorithms have traditionally focused on static target densities, where agents are deployed to optimally cover a fixed spatial distribution. However, many applications involve time-varying densities, including environmental monitoring, surveillance, and adaptive sensor deployment. Although time-varying coverage strategies have been studied within Voronoi-based frameworks, recent works have reformulated static coverage control as a semi-discrete optimal transport problem. Extending this optimal transport perspective to time-varying scenarios has remained an open challenge. This paper presents a rigorous optimal transport formulation for time-varying coverage control, in which agents minimize the instantaneous Wasserstein distance to a continuously evolving target density. The proposed solution relies on a coupled system of differential equations governing agent positions and the dual variables that define Laguerre regions. In one-dimensional domains, the resulting system admits a closed-form analytical solution, offering both computational benefits and theoretical insight into the structure of optimal time-varying coverage. Numerical simulations demonstrate improved tracking performance compared to quasi-static and Voronoi-based methods, validating the proposed framework.
comment: Keywords: Optimal Transport; Multi-Agent Systems; Coverage Control; Wasserstein Distance; Time-Varying Density; Autonomous Systems; Distributed Control
SmartMeterFM: Unifying Smart Meter Data Generative Tasks Using Flow Matching Models
Smart meter data is the foundation for planning and operating the distribution network. Unfortunately, such data are not always available due to privacy regulations. Meanwhile, the collected data may be corrupted due to sensor or transmission failure, or it may not have sufficient resolution for downstream tasks. A wide range of generative tasks is formulated to address these issues, including synthetic data generation, missing data imputation, and super-resolution. Despite the success of machine learning models on these tasks, dedicated models need to be designed and trained for each task, leading to redundancy and inefficiency. In this paper, by recognizing the powerful modeling capability of flow matching models, we propose a new approach to unify diverse smart meter data generative tasks with a single model trained for conditional generation. The proposed flow matching models are trained to generate challenging, high-dimensional time series data, specifically monthly smart meter data at a 15 min resolution. By viewing different generative tasks as distinct forms of partial data observations and injecting them into the generation process, we unify tasks such as imputation and super-resolution with a single model, eliminating the need for re-training. The data generated by our model not only are consistent with the given observations but also remain realistic, showing better performance against interpolation and other machine learning based baselines dedicated to the tasks.
comment: 10 pages, 6 figures, 6 tables
BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections
Navigating urban intersections, especially when interacting with heterogeneous traffic participants, presents a formidable challenge for autonomous vehicles (AVs). In such environments, safety risks arise simultaneously from multiple sources, each carrying distinct priority levels and sensitivities that necessitate differential protection preferences. While safe reinforcement learning (RL) offers a robust paradigm for constrained decision-making, existing methods typically model safety as a single constraint or employ static, heuristic weighting schemes for multiple constraints. These approaches often fail to address the dynamic nature of multi-source risks, leading to gradient cancellation that hampers learning, and suboptimal trade-offs in critical dilemma zones. To address this, we propose a Bayesian adaptive priority safe reinforcement learning (BAP-SRL) based motion planning framework. Unlike heuristic weighting schemes, BAP formulates constraint prioritization as a probabilistic inference task. By modeling historical optimization difficulty as a Bayesian prior and instantaneous risk evidence as a likelihood, BAP dynamically gates gradient updates using a Bayesian inference mechanism on latent constraint criticality. Extensive experiments demonstrate that our approach outperforms state-of-the-art baselines in handling interactions with stochastic, heterogeneous agents, achieving lower collision rates and smoother conflict resolution.
Authenticated encryption for space telemetry
We explore how command stack protection requirements outlined in NASA-STD-1006A can be satisfied within the context of emergency space telemetry. Proposed implementation of lightweight authenticated encryption offers strong security without sacrificing performance in resource-constrained environments. It produces fixed-length messages, maintaining compatibility with the underlying data transport protocols. By focusing on predictable properties and robust authentication, we create a scheme that protects the confidentiality, integrity and authenticity of telemetry data in emergency communications while balancing security requirements with the operational constraints.
comment: 11 pages. In proceedings of the 76th International Astronautical Congress (IAC 2025). Sydney, Australia, September 2025
A Thermal-Electrical Co-Optimization Framework for Active Distribution Grids with Electric Vehicles and Heat Pumps
The growing electrification of transportation and heating through Electric Vehicles (EVs) and Heat Pumps (HPs) introduces both flexibility and complexity to Active Distribution Networks (ADNs). These resources provide substantial operational flexibility but also create tightly coupled thermal-electrical dynamics that challenge conventional network management. This paper proposes a unified co-optimization framework that integrates a calibrated 3R2C grey-box building thermal model into a network-constrained Optimal Power Flow (OPF). The framework jointly optimizes EVs, HPs, and photovoltaic systems while explicitly enforcing thermal comfort, Distributed Energy Resource (DER) limits, and full power flow physics. To maintain computational tractability, Second-Order Cone Programming (SOCP) relaxations are evaluated on a realistic low-voltage feeder. The analysis shows that, despite network heterogeneity violating some theoretical exactness conditions, the relaxation remains exact in practice. Comparative assessments of convex DistFlow, bus injection, and branch flow formulations reveal that convex DistFlow achieves sub-second runtimes and near-optimal performance even at high DER penetration levels. Simulations confirm the effectiveness of coordinated scheduling, yielding reductions of 41% in transformer aging, 54% in losses, and complete elimination of voltage violations, demonstrating the value of integrated thermal-electrical coordination in future smart grids.
StarSD: One-for-Many Speculative Decoding
Speculative decoding accelerates autoregressive generation by separating token proposal from verification, but most existing approaches are designed for single-node execution and do not scale well to multi-accelerator clusters used for serving modern Large Language Models (LLMs). We present StarSD, a one-for-many speculative decoding framework that uses a single draft model to serve multiple target models across distributed nodes via a star topology. StarSD decouples drafting and verification, enabling effective sharing of draft computation, and preventing distributed accelerators from remaining idle under bursty workloads. We provide a system-level analysis that characterizes when and why a single draft model can remain fully utilized by multiple verifiers, yielding predictable latency and utilization gains. Extensive experiments in real-world distributed inference settings demonstrate that StarSD simplifies deployment and supports flexible resource allocation across heterogeneous accelerators, while maintaining output quality. These results indicate that StarSD is a practical and scalable framework for bringing speculative decoding to modern cloud and edge inference infrastructures.
Decentralized Analysis Approach for Oscillation Damping in Grid-Forming and Grid-Following Heterogeneous Power Systems
This letter proposes a decentralized local gain condition (LGC) to guarantee oscillation damping in inverter-based resource (IBR)-dominated power systems. The LGC constrains the dynamic gain between each IBR and the network at its point of connection. By satisfying the LGC locally, the closed-loop poles are confined to a desired region, thereby yielding system-wide oscillation damping without requiring global information. Notably, the LGC is agnostic to different IBR dynamics, well-suited for systems with heterogeneous IBRs, and flexible to various damping requirements. Moreover, a low-complexity algorithm is proposed to parameterize LGC, providing scalable and damping-constrained parameter tuning guidance for IBRs.
IROS: A Dual-Process Architecture for Real-Time VLM-Based Indoor Navigation
Indoor mobile robot navigation requires fast responsiveness and robust semantic understanding, yet existing methods struggle to provide both. Classical geometric approaches such as SLAM offer reliable localization but depend on detailed maps and cannot interpret human-targeted cues (e.g., signs, room numbers) essential for indoor reasoning. Vision-Language-Action (VLA) models introduce semantic grounding but remain strictly reactive, basing decisions only on visible frames and failing to anticipate unseen intersections or reason about distant textual cues. Vision-Language Models (VLMs) provide richer contextual inference but suffer from high computational latency, making them unsuitable for real-time operation on embedded platforms. In this work, we present IROS, a real-time navigation framework that combines VLM-level contextual reasoning with the efficiency of lightweight perceptual modules on low-cost, on-device hardware. Inspired by Dual Process Theory, IROS separates fast reflexive decisions (System One) from slow deliberative reasoning (System Two), invoking the VLM only when necessary. Furthermore, by augmenting compact VLMs with spatial and textual cues, IROS delivers robust, human-like navigation with minimal latency. Across five real-world buildings, IROS improves decision accuracy and reduces latency by 66% compared to continuous VLM-based navigation.
Surrogate model of a HVAC system for PV self-consumption maximisation
In the last few years, energy efficiency has become a challenge. Not only mitigating environmental impact but reducing energy waste can lead to financial advantages. Buildings play an important role in this: they are among the biggest consumers. So, finding manners to reduce energy consumption is a way to minimise energy waste, and a technique for that is creating Demand Response (DR) strategies. This paper proposes a novel way to decrease computational effort of simulating the behaviour of a building using surrogate models based on active learning. Before going straight to the problem of a building, which is complex and computationally costly, the paper proposes the approach of active learning to a smaller problem: with reduced simulations, regress the curve of voltage versus current of a thermo-resistor. Then, the paper implements a surrogate model of energy consumption of a building. The goal is to be able to learn the consumption pattern based on a limited number of simulations. The result given by the surrogate can be used to set the reference temperature, maximising the PV self-consumption, and reducing energy usage from the grid. Thanks to the surrogate, the total time spent to map all possible consumption scenarios is reduced around 7 times.
Resilient Grid Hardening against Multiple Hazards: An Adaptive Two-Stage Stochastic Optimization Approach
The growing prevalence of extreme weather events driven by climate change poses significant challenges to power system resilience. Infrastructure damage and prolonged power outages highlight the urgent need for effective grid-hardening strategies. While some measures provide long-term protection against specific hazards, they can become counterproductive under conflicting threats. In this work, we develop an adaptive two-stage stochastic optimization framework to support dynamic decision-making for hardening critical grid components under multiple hazard exposures. Unlike traditional approaches, our model adapts to evolving climate conditions, enabling more resilient investment strategies. Furthermore, we integrate long-term (undergrounding) and short-term (vegetation management) hardening actions to jointly minimize total system costs. Extensive simulation results validate the effectiveness of the proposed framework in reducing outage and repair costs while enhancing the adaptability and robustness of grid infrastructure planning.
A Time-Domain Dual-Edge Asynchronous Pipelined SAR ADC Featuring Reset-Free Quantization at Multi-GS/s
Time-domain ADCs are attractive for high-speed wireline receivers, as time resolution scales favorably with advanced CMOS technologies, enabling multi-GS/s single-channel sampling rates. However, conventional time-domain ADCs require explicit reset of voltage-to-time and time-domain signal paths between samples, introducing dead time that fundamentally limits resolution, speed, and energy efficiency. This paper introduces a dual-edge reset-free quantization concept for asynchronous pipelined SAR time-domain ADCs, in which both rising and falling signal edges are exploited to enable reset-free quantization within a single conversion period. By eliminating explicit reset phases, the proposed approach expands the effective conversion window and relaxes the resolution-speed tradeoff at high sampling rates. An 8-bit dual-edge asynchronous pipelined SAR time-domain ADC is implemented in 22-nm FD-SOI, incorporating a linearity-compensated dual-edge voltage-to-time converter and a dual-edge time-to-digital converter with independently tunable rising- and falling-edge delays. The prototype occupies a core area of 0.0089 mm^2 and achieves continuous single-channel operation at 3.5 GS/s, with architectural scalability demonstrated through intermittent operation at 10.5 GS/s and higher. At 3.5 GS/s, the ADC achieves 21.6 dB SNDR and 32.2 dB SFDR. The measured performance is primarily limited by identifiable implementation-level factors rather than by architectural constraints, demonstrating the feasibility of dual-edge reset-free quantization for high-speed time-domain ADCs.
comment: 12 pages, 16 figures. This work has been submitted to IEEE
Distributed Circumnavigation Using Bearing Based Control with Limited Target Information
In this paper, we address the problem of circumnavigation of a stationary target by a heterogeneous group comprising of $\textbf{n}$ autonomous agents, having unicycle kinematics. The agents are assumed to have constant linear speeds, we control only the angular speeds. Assuming limited sensing capabilities of the agents, in the proposed framework, only a subset of agents, termed as \textit{leaders}, know the target location. The rest, termed as \textit{followers}, do not. We propose a distributed guidance law which drives all the agents towards the desired objective; global asymptotic stability (GAS) is ensured by using Zubov's theorem. The efficacy of the approach is demonstrated through both numerical simulations and hardware experiments on the TurtleBots utilising OptiTrack motion capture system.
comment: 6 pages, 17 figures
Deep QP Safety Filter: Model-free Learning for Reachability-based Safety Filter
We introduce Deep QP Safety Filter, a fully data-driven safety layer for black-box dynamical systems. Our method learns a Quadratic-Program (QP) safety filter without model knowledge by combining Hamilton-Jacobi (HJ) reachability with model-free learning. We construct contraction-based losses for both the safety value and its derivatives, and train two neural networks accordingly. In the exact setting, the learned critic converges to the viscosity solution (and its derivative), even for non-smooth values. Across diverse dynamical systems -- even including a hybrid system -- and multiple RL tasks, Deep QP Safety Filter substantially reduces pre-convergence failures while accelerating learning toward higher returns than strong baselines, offering a principled and practical route to safe, model-free control.
comment: Accepted at L4DC 2026
Deep Koopman Iterative Learning and Stability-Guaranteed Control for Unknown Nonlinear Time-Varying Systems
This paper proposes a Koopman-based framework for modeling, prediction, and control of unknown nonlinear time-varying systems. We present a novel Koopman-based learning method for predicting the state of unknown nonlinear time-varying systems, upon which a robust controller is designed to ensure that the resulting closed-loop system is input-to-state stable with respect to the Koopman approximation error. The error of the lifted system model learned through the Koopman-based method increases over time due to the time-varying nature of the nonlinear time-varying system. To address this issue, an online iterative update scheme is incorporated into the learning process to update the lifted system model, aligning it more precisely with the time-varying nonlinear system by integrating the updated data and discarding the outdated data. A necessary condition for the feasibility of the proposed iterative learning method is derived. In order to reduce unnecessary system updates while ensuring the prediction accuracy of the lifted system, the update mechanism is enhanced to determine whether to update the lifted system and meanwhile to reduce updates that deteriorate the fitting performance. Furthermore, based on the online-updated lifted system, a controller is designed to ensure the closed-loop controlled system be input-to-state stable with respect to the Koopman approximation error. Numerical simulations on the Duffing oscillator, the serial manipulator, and the synthetic biological network system are presented to demonstrate the effectiveness of the proposed method for the approximation and control of unknown nonlinear time-varying systems. The results show that the proposed approach outperforms existing methods in terms of approximation accuracy and computational efficiency, even under significant system variations.
Towards Governance of Localized VANET: An Adjustable Degree Distribution Model
Vehicular Ad-hoc Networks (VANETs) serve as a critical enabler for intelligent transportation systems. However, their practical deployment faces a core governance dilemma: the network topology requires a dynamic trade-off between robustness against targeted attacks and ensuring low-latency information transmission. Most existing models generate fixed degree distributions, lacking the ability to adapt autonomously to the demands of diverse traffic scenarios. To address this challenge, this paper innovatively proposes a schedulable degree distribution model for localized VANETs. The core of this model lies in introducing a hybrid connection mechanism. When establishing connections, newly joining nodes do not follow a single rule but instead collaboratively perform random attachment and preferential attachment. Through theoretical derivation and simulation validation, this study demonstrates that by adjusting the cooperative weighting between these two mechanisms, the overall network degree distribution can achieve a continuous and controllable transition between a uniform distribution and a power-law distribution. The former effectively disperses attack risks and enhances robustness, while the latter facilitates the formation of hub nodes, shortening transmission paths to reduce latency. Experimental results based on the real-world road network of Beijing indicate that this model can precisely regulate node connection heterogeneity, attack resistance, and average transmission path length through the reshaping of the underlying topology. This provides a forward-looking and practical governance paradigm for constructing next-generation VANETs capable of dynamically adapting to complex environments.
comment: This paper is 10 pages long, including 8 figures. It presents original research on VANET governance with an adjustable degree distribution model and has been submitted to IEEE Transactions on Information Forensics and Security
Modeling of Non-linear Dynamics of Lithium-ion Batteries via Delay-Embedded Dynamic Mode Decomposition
The complex electrochemical behavior of lithium-ion batteries results in non-linear dynamics and appropriate modeling of this non-linear dynamical system is of interest for better management and control. In this work, we proposed a family of dynamic mode decomposition (DMD)-based data-driven models that do not require detailed knowledge of the composition of the battery materials but can essentially capture the non-linear dynamics with higher computational efficiency. Only voltage and current data obtained from hybrid pulse power characterization (HPPC) tests were utilized to form the state space matrices and subsequently used for predicting the future terminal voltage at different state of charge (SoC) and aging levels. To construct the system model, 60\% of the data from a single HPPC test was utilized to generate time-delay embedded snapshots, with embedding dimension ranging from 40 to 2000. Among these, an embedding dimension of 1810 resulted in the least residual sum of squares (RSS) error of 3.86 for the dynamic mode decomposition with control (DMDc) model and 30 for the standard DMD model. For DMDc model, delay embeddings (ranging from 1 to 12) were also incorporated into the input current signals. For the input matrix, an embedding dimension of 6 resulted in a minimum RSS error of 1.74. Furthermore, the system matrices A and B, identified from the HPPC test when the cell is in its healthy state, were held fixed and used to simulate the system dynamics for aged batteries by updating only the control input. Despite the presence of nonlinear degradation effects in later cycles, the DMDc model effectively captured key inner dynamics such as voltage dips and transient responses for subsequent charge and discharge cycles.
comment: 7 pages,10 figures
Regional Transportation Modeling for Equitable Electric Vehicle Charging Infrastructure Design
The widespread adoption of battery electric vehicles (BEVs) holds promise for mitigating emission-related health impacts, particularly for low-income communities disproportionately affected by exposure to traffic-related air pollution. However, designing effective charging infrastructure necessitates a regional modeling approach that accounts for the inherent cross-jurisdictional nature of mobility patterns. This study underscores the importance of regional modeling in optimizing charging station deployment and evaluating the environmental justice implications for equity priority communities. We present a large-scale regional transportation modeling analysis leveraging Mobiliti, a cloud-based platform that employs parallel discrete event simulation to enable rapid computation. Our approach identifies the spatial demand density for charging infrastructure by analyzing over 19 million trips in the San Francisco Bay Area and determining the threshold points where BEVs may require charging across a typical day. By transitioning these trips that originate outside equity priority communities to BEVs, we quantify the potential emission reductions within these vulnerable areas. The regional modeling framework captures the complex interactions between travel behavior, vehicle characteristics, and charging needs, while accounting for the interconnectivity of infrastructure across municipal boundaries. This study demonstrates the critical role of regional modeling in designing equitable BEV charging networks that address environmental justice concerns. The findings inform strategies for deploying charging infrastructure that maximizes accessibility, minimizes range anxiety, and prioritizes the health and well-being of communities disproportionately burdened by transportation emissions.
Forecasting in the presence of scale-free noise
The extraction of signals from noise is a common problem in all areas of science and engineering. A particularly useful version is that of forecasting: determining a causal filter that estimates a future value of a hidden process from past observations. Current techniques for deriving the filter require that the noise be well described by rational power spectra. However, scale-free noises, whose spectra scale as a non-integer power of frequency, are ubiquitous in practice. We establish a method, together with performance guarantees, that solves the forecasting problem in the presence of scale-free noise. Via the duality between estimation and control, our technique can be used to design control for distributed systems. These results will have wide-ranging applications in neuroscience, finance, fluid dynamics, and quantum measurements.
comment: 6 pages, 1 figure, 8 sections of Supplementary Information
Incorporating the ChEES Criterion into Sequential Monte Carlo Samplers
Markov chain Monte Carlo (MCMC) methods are a powerful but computationally expensive way of performing non-parametric Bayesian inference. MCMC proposals which utilise gradients, such as Hamiltonian Monte Carlo (HMC), can better explore the parameter space of interest if the additional hyper-parameters are chosen well. The No-U-Turn Sampler (NUTS) is a variant of HMC which is extremely effective at selecting these hyper-parameters but is slow to run and is not suited to GPU architectures. An alternative to NUTS, Change in the Estimator of the Expected Square HMC (ChEES-HMC) was shown not only to run faster than NUTS on GPU but also sample from posteriors more efficiently. Sequential Monte Carlo (SMC) samplers are another sampling method which instead output weighted samples from the posterior. They are very amenable to parallelisation and therefore being run on GPUs while having additional flexibility in their choice of proposal over MCMC. We incorporate (ChEEs-HMC) as a proposal into SMC samplers and demonstrate competitive but faster performance than NUTS on a number of tasks.
comment: 16 pages, 9 figures
Applying the Spectral Method for Modeling Linear Filters: Butterworth, Linkwitz-Riley, and Chebyshev filters
This paper proposes a new technique for computer modeling linear filters based on the spectral form of mathematical description of linear systems. It assumes the representation of input and output signals of the filter as orthogonal expansions, while filters themselves are described by two-dimensional non-stationary transfer functions. This technique allows one to model the output signal in continuous time, and it is successfully tested on the Butterworth, Linkwitz-Riley, and Chebyshev filters with different orders.
Neural Policy Composition from Free Energy Minimization
The ability to compose acquired skills to plan and execute behaviors is a hallmark of natural intelligence. Yet, despite remarkable cross-disciplinary efforts, a principled account of how task structure shapes gating and how such computations could be delivered in neural circuits, remains elusive. Here we introduce GateMod, an interpretable theoretically grounded computational model linking the emergence of gating to the underlying decision-making task, and to a neural circuit architecture. We first develop GateFrame, a normative framework casting policy gating into the minimization of the free energy. This framework, relating gating rules to task, applies broadly across neuroscience, cognitive and computational sciences. We then derive GateFlow, a continuous-time energy based dynamics that provably converges to GateFrame optimal solution. Convergence, exponential and global, follows from a contractivity property that also yields robustness and other desirable properties. Finally, we derive a neural circuit from GateFlow, GateNet. This is a soft-competitive recurrent circuit whose components perform local and contextual computations consistent with known dendritic and neural processing motifs. We evaluate GateMod across two different settings: collective behaviors in multi-agent systems and human decision-making in multi-armed bandits. In all settings, GateMod provides interpretable mechanistic explanations of gating and quantitatively matches or outperforms established models. GateMod offers a unifying framework for neural policy gating, linking task objectives, dynamical computation, and circuit-level mechanisms. It provides a framework to understand gating in natural agents beyond current explanations and to equip machines with this ability.
High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems AISTATS 2026
In this work, we consider the problem of identifying an unknown linear dynamical system given a finite hypothesis class. In particular, we analyze the effect of the excitation input on the sample complexity of identifying the true system with high probability. To this end, we present sample complexity lower bounds that capture the choice of the selected excitation input. The sample complexity lower bound gives rise to a system theoretic condition to determine the potential benefit of experiment design. Informed by the analysis of the sample complexity lower bound, we propose a persistent excitation (PE) condition tailored to the considered setting, which we then use to establish sample complexity upper bounds. Notably, the PE condition is weaker than in the case of an infinite hypothesis class and allows analyzing different excitation inputs modularly. Crucially, the lower and upper bounds share the same dependency on key problem parameters. Finally, we leverage these insights to propose an active learning algorithm that sequentially excites the system optimally with respect to the current estimate, and provide sample complexity guarantees for the presented algorithm. Concluding simulations showcase the effectiveness of the proposed algorithm.
comment: Accepted for spotlight presentation at AISTATS 2026
Model-Free Output Feedback Stabilization via Policy Gradient Methods
Stabilizing a dynamical system is a fundamental problem that serves as a cornerstone for many complex tasks in the field of control systems. The problem becomes challenging when the system model is unknown. Among the Reinforcement Learning (RL) algorithms that have been successfully applied to solve problems pertaining to unknown linear dynamical systems, the policy gradient (PG) method stands out due to its ease of implementation and can solve the problem in a model-free manner. However, most of the existing works on PG methods for unknown linear dynamical systems assume full-state feedback. In this paper, we take a step towards model-free learning for partially observable linear dynamical systems with output feedback and focus on the fundamental stabilization problem of the system. We propose an algorithmic framework that stretches the boundary of PG methods to the problem without global convergence guarantees. We show that by leveraging zeroth-order PG update based on system trajectories and its convergence to stationary points, the proposed algorithms return a stabilizing output feedback policy for discrete-time linear dynamical systems. We also explicitly characterize the sample complexity of our algorithm and verify the effectiveness of the algorithm using numerical examples.
comment: 31 pages, 2 figures
Multi-threshold time series analysis enables characterization of variable renewable energy droughts in Europe
Variable renewable energy droughts, so called Dunkelflaute events, emerge as a challenge for climate-neutral energy systems based on variable renewables. Here we characterize European drought events for on- and offshore wind power, solar photovoltaics, and renewable technology portfolios, using 38 historic weather years and an advanced identification method. Their characteristics heavily depend on the chosen drought threshold, questioning the usefulness of single-threshold analyses. Applying a multi-threshold framework, we quantify how the complementarity of wind and solar power temporally and spatially alleviates drought frequency, return periods, duration, and severity within (portfolio effect) and across countries (balancing effect). We identify the most extreme droughts, which drive major discharging periods of long-duration storage in a fully renewable European energy system, based on a policy-relevant decarbonization scenario. Such events comprise sequences of shorter droughts of varying severity. The most extreme event occurred in winter 1996/97 and lasted 55 days in an idealized, perfectly interconnected setting. The average renewable availability during this period was still 47% of its long-run mean. System planners must consider such events when planning for storage and other flexibility technologies. Methodologically, we conclude that using arbitrary single calendar years is not suitable for modeling weather-resilient energy scenarios.
Next-Generation Grid Codes: Toward a New Paradigm for Dynamic Ancillary Services
This paper presents preliminary results toward a conceptual foundation for Next Generation Grid Codes (NGGCs) based on decentralized stability and performance certification for dynamic ancillary services. The proposed NGGC framework targets two core outcomes: (i) guaranteed closed-loop stability and (ii) explicit performance assurances for power-system frequency and voltage dynamics. Stability is addressed using loop-shifting and passivity-based methods that yield local frequency-domain certificates for individual devices, enabling fully decentralized verification of the interconnected system. Performance is characterized by deriving quantitative bounds on key time-domain metrics (e.g., nadirs, rate-of-change-of-frequency (RoCoF), steady-state deviations, and oscillation damping) through frequency-domain constraints on local device behavior. The framework is non-parametric and model-agnostic, accommodating a broad class of device dynamics under mild assumptions, and provides an initial unified approach to stability and performance certification without explicit device-model parameterization. As such, these results offer a principled starting point for the development of future grid codes and control design methodologies in modern power systems.
comment: 4 pages, 7 figures
The Feasibility Theory of Constrained Reinforcement Learning: A Tutorial Study
Satisfying safety constraints is a priority concern when solving optimal control problems (OCPs). Due to the existence of infeasibility phenomenon, where a constraint-satisfying solution cannot be found, it is necessary to identify a feasible region before implementing a policy. Existing feasibility theories built for model predictive control (MPC) only consider the feasibility of optimal policy. However, reinforcement learning (RL), as another important control method, solves the optimal policy in an iterative manner, which comes with a series of non-optimal intermediate policies. Feasibility analysis of these non-optimal policies is also necessary for iteratively improving constraint satisfaction; but that is not available under existing MPC feasibility theories. This paper proposes a feasibility theory that applies to both MPC and RL by filling in the missing part of feasibility analysis for an arbitrary policy. The basis of our theory is to decouple policy solving and implementation into two temporal domains: virtual-time domain and real-time domain. This allows us to separately define initial and endless, state and policy feasibility, and their corresponding feasible regions. Based on these definitions, we analyze the containment relationships between different feasible regions, which enables us to describe the feasible region of an arbitrary policy. We further provide virtual-time constraint design rules along with a practical design tool called feasibility function that helps to achieve the maximum feasible region. We review most of existing constraint formulations and point out that they are essentially applications of feasibility functions in different forms. We demonstrate our feasibility theory by visualizing different feasible regions under both MPC and RL policies in an emergency braking control task.
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but $>40\%$ fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. Together, these results offer a transferable framework for decision-making in adaptive control.
comment: 28 pages, 11 figures
Robotics
End-to-end example-based sim-to-real RL policy transfer based on neural stylisation with application to robotic cutting
Whereas reinforcement learning has been applied with success to a range of robotic control problems in complex, uncertain environments, reliance on extensive data - typically sourced from simulation environments - limits real-world deployment due to the domain gap between simulated and physical systems, coupled with limited real-world sample availability. We propose a novel method for sim-to-real transfer of reinforcement learning policies, based on a reinterpretation of neural style transfer from image processing to synthesise novel training data from unpaired unlabelled real world datasets. We employ a variational autoencoder to jointly learn self-supervised feature representations for style transfer and generate weakly paired source-target trajectories to improve physical realism of synthesised trajectories. We demonstrate the application of our approach based on the case study of robot cutting of unknown materials. Compared to baseline methods, including our previous work, CycleGAN, and conditional variational autoencoder-based time series translation, our approach achieves improved task completion time and behavioural stability with minimal real-world data. Our framework demonstrates robustness to geometric and material variation, and highlights the feasibility of policy adaptation in challenging contact-rich tasks where real-world reward information is unavailable.
comment: 14 pages, 9 figures. Submitted to Nature Scientific Reports
MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents
Foundation models rely on in-context learning for personalized decision making. The limited size of this context window necessitates memory compression and retrieval systems like RAG. These systems however often treat memory as large offline storage spaces, which is unfavorable for embodied agents that are expected to operate under strict memory and compute constraints, online. In this work, we propose MemCtrl, a novel framework that uses Multimodal Large Language Models (MLLMs) for pruning memory online. MemCtrl augments MLLMs with a trainable memory head μthat acts as a gate to determine which observations or reflections to retain, update, or discard during exploration. We evaluate with training two types of μ, 1) via an offline expert, and 2) via online RL, and observe significant improvement in overall embodied task completion ability on μ-augmented MLLMs. In particular, on augmenting two low performing MLLMs with MemCtrl on multiple subsets of the EmbodiedBench benchmark, we observe that μ-augmented MLLMs show an improvement of around 16% on average, with over 20% on specific instruction subsets. Finally, we present a qualitative analysis on the memory fragments collected by μ, noting the superior performance of μaugmented MLLMs on long and complex instruction types.
A Methodology for Designing Knowledge-Driven Missions for Robots
This paper presents a comprehensive methodology for implementing knowledge graphs in ROS 2 systems, aiming to enhance the efficiency and intelligence of autonomous robotic missions. The methodology encompasses several key steps: defining initial and target conditions, structuring tasks and subtasks, planning their sequence, representing task-related data in a knowledge graph, and designing the mission using a high-level language. Each step builds on the previous one to ensure a cohesive process from initial setup to final execution. A practical implementation within the Aerostack2 framework is demonstrated through a simulated search and rescue mission in a Gazebo environment, where drones autonomously locate a target. This implementation highlights the effectiveness of the methodology in improving decision-making and mission performance by leveraging knowledge graphs.
Learning From a Steady Hand: A Weakly Supervised Agent for Robot Assistance under Microscopy
This paper rethinks steady-hand robotic manipulation by using a weakly supervised framework that fuses calibration-aware perception with admittance control. Unlike conventional automation that relies on labor-intensive 2D labeling, our framework leverages reusable warm-up trajectories to extract implicit spatial information, thereby achieving calibration-aware, depth-resolved perception without the need for external fiducials or manual depth annotation. By explicitly characterizing residuals from observation and calibration models, the system establishes a task-space error budget from recorded warm-ups. The uncertainty budget yields a lateral closed-loop accuracy of approx. 49 micrometers at 95% confidence (worst-case testing subset) and a depth accuracy of <= 291 micrometers at 95% confidence bound during large in-plane moves. In a within-subject user study (N=8), the learned agent reduces overall NASA-TLX workload by 77.1% relative to the simple steady-hand assistance baseline. These results demonstrate that the weakly supervised agent improves the reliability of microscope-guided biomedical micromanipulation without introducing complex setup requirements, offering a practical framework for microscope-guided intervention.
Li-ViP3D++: Query-Gated Deformable Camera-LiDAR Fusion for End-to-End Perception and Trajectory Prediction
End-to-end perception and trajectory prediction from raw sensor data is one of the key capabilities for autonomous driving. Modular pipelines restrict information flow and can amplify upstream errors. Recent query-based, fully differentiable perception-and-prediction (PnP) models mitigate these issues, yet the complementarity of cameras and LiDAR in the query-space has not been sufficiently explored. Models often rely on fusion schemes that introduce heuristic alignment and discrete selection steps which prevent full utilization of available information and can introduce unwanted bias. We propose Li-ViP3D++, a query-based multimodal PnP framework that introduces Query-Gated Deformable Fusion (QGDF) to integrate multi-view RGB and LiDAR in query space. QGDF (i) aggregates image evidence via masked attention across cameras and feature levels, (ii) extracts LiDAR context through fully differentiable BEV sampling with learned per-query offsets, and (iii) applies query-conditioned gating to adaptively weight visual and geometric cues per agent. The resulting architecture jointly optimizes detection, tracking, and multi-hypothesis trajectory forecasting in a single end-to-end model. On nuScenes, Li-ViP3D++ improves end-to-end behavior and detection quality, achieving higher EPA (0.335) and mAP (0.502) while substantially reducing false positives (FP ratio 0.147), and it is faster than the prior Li-ViP3D variant (139.82 ms vs. 145.91 ms). These results indicate that query-space, fully differentiable camera-LiDAR fusion can increase robustness of end-to-end PnP without sacrificing deployability.
comment: This work has been submitted to the IEEE for possible publication
One Step Is Enough: Dispersive MeanFlow Policy Optimization
Real-time robotic control demands fast action generation. However, existing generative policies based on diffusion and flow matching require multi-step sampling, fundamentally limiting deployment in time-critical scenarios. We propose Dispersive MeanFlow Policy Optimization (DMPO), a unified framework that enables true one-step generation through three key components: MeanFlow for mathematically-derived single-step inference without knowledge distillation, dispersive regularization to prevent representation collapse, and reinforcement learning (RL) fine-tuning to surpass expert demonstrations. Experiments across RoboMimic manipulation and OpenAI Gym locomotion benchmarks demonstrate competitive or superior performance compared to multi-step baselines. With our lightweight model architecture and the three key algorithmic components working in synergy, DMPO exceeds real-time control requirements (>120Hz) with 5-20x inference speedup, reaching hundreds of Hertz on high-performance GPUs. Physical deployment on a Franka-Emika-Panda robot validates real-world applicability.
comment: Code and project page: https://guowei-zou.github.io/dmpo-page/
Tendon-based modelling, estimation and control for a simulated high-DoF anthropomorphic hand model
Tendon-driven anthropomorphic robotic hands often lack direct joint angle sensing, as the integration of joint encoders can compromise mechanical compactness and dexterity. This paper presents a computational method for estimating joint positions from measured tendon displacements and tensions. An efficient kinematic modeling framework for anthropomorphic hands is first introduced based on the Denavit-Hartenberg convention. Using a simplified tendon model, a system of nonlinear equations relating tendon states to joint positions is derived and solved via a nonlinear optimization approach. The estimated joint angles are then employed for closed-loop control through a Jacobian-based proportional-integral (PI) controller augmented with a feedforward term, enabling gesture tracking without direct joint sensing. The effectiveness and limitations of the proposed estimation and control framework are demonstrated in the MuJoCo simulation environment using the Anatomically Correct Biomechatronic Hand, featuring five degrees of freedom for each long finger and six degrees of freedom for the thumb.
GPO: Growing Policy Optimization for Legged Robot Locomotion and Whole-Body Control
Training reinforcement learning (RL) policies for legged robots remains challenging due to high-dimensional continuous actions, hardware constraints, and limited exploration. Existing methods for locomotion and whole-body control work well for position-based control with environment-specific heuristics (e.g., reward shaping, curriculum design, and manual initialization), but are less effective for torque-based control, where sufficiently exploring the action space and obtaining informative gradient signals for training is significantly more difficult. We introduce Growing Policy Optimization (GPO), a training framework that applies a time-varying action transformation to restrict the effective action space in the early stage, thereby encouraging more effective data collection and policy learning, and then progressively expands it to enhance exploration and achieve higher expected return. We prove that this transformation preserves the PPO update rule and introduces only bounded, vanishing gradient distortion, thereby ensuring stable training. We evaluate GPO on both quadruped and hexapod robots, including zero-shot deployment of simulation-trained policies on hardware. Policies trained with GPO consistently achieve better performance. These results suggest that GPO provides a general, environment-agnostic optimization framework for learning legged locomotion.
MeCo: Enhancing LLM-Empowered Multi-Robot Collaboration via Similar Task Memoization
Multi-robot systems have been widely deployed in real-world applications, providing significant improvements in efficiency and reductions in labor costs. However, most existing multi-robot collaboration methods rely on extensive task-specific training, which limits their adaptability to new or diverse scenarios. Recent research leverages the language understanding and reasoning capabilities of large language models (LLMs) to enable more flexible collaboration without specialized training. Yet, current LLM-empowered approaches remain inefficient: when confronted with identical or similar tasks, they must replan from scratch because they omit task-level similarities. To address this limitation, we propose MeCo, a similarity-aware multi-robot collaboration framework that applies the principle of ``cache and reuse'' (a.k.a., memoization) to reduce redundant computation. Unlike simple task repetition, identifying and reusing solutions for similar but not identical tasks is far more challenging, particularly in multi-robot settings. To this end, MeCo introduces a new similarity testing method that retrieves previously solved tasks with high relevance, enabling effective plan reuse without re-invoking LLMs. Furthermore, we present MeCoBench, the first benchmark designed to evaluate performance on similar-task collaboration scenarios. Experimental results show that MeCo substantially reduces planning costs and improves success rates compared with state-of-the-art approaches.
Vibro-Sense: Robust Vibration-based Impulse Response Localization and Trajectory Tracking for Robotic Hands
Rich contact perception is crucial for robotic manipulation, yet traditional tactile skins remain expensive and complex to integrate. This paper presents a scalable alternative: high-accuracy whole-body touch localization via vibro-acoustic sensing. By equipping a robotic hand with seven low-cost piezoelectric microphones and leveraging an Audio Spectrogram Transformer, we decode the vibrational signatures generated during physical interaction. Extensive evaluation across stationary and dynamic tasks reveals a localization error of under 5 mm in static conditions. Furthermore, our analysis highlights the distinct influence of material properties: stiff materials (e.g., metal) excel in impulse response localization due to sharp, high-bandwidth responses, whereas textured materials (e.g., wood) provide superior friction-based features for trajectory tracking. The system demonstrates robustness to the robot's own motion, maintaining effective tracking even during active operation. Our primary contribution is demonstrating that complex physical contact dynamics can be effectively decoded from simple vibrational signals, offering a viable pathway to widespread, affordable contact perception in robotics. To accelerate research, we provide our full datasets, models, and experimental setups as open-source resources.
comment: Under Review: Springer Autonomous Robots Journal
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
STORM: Slot-based Task-aware Object-centric Representation for robotic Manipulation
Visual foundation models provide strong perceptual features for robotics, but their dense representations lack explicit object-level structure, limiting robustness and contractility in manipulation tasks. We propose STORM (Slot-based Task-aware Object-centric Representation for robotic Manipulation), a lightweight object-centric adaptation module that augments frozen visual foundation models with a small set of semantic-aware slots for robotic manipulation. Rather than retraining large backbones, STORM employs a multi-phase training strategy: object-centric slots are first stabilized through visual--semantic pretraining using language embeddings, then jointly adapted with a downstream manipulation policy. This staged learning prevents degenerate slot formation and preserves semantic consistency while aligning perception with task objectives. Experiments on object discovery benchmarks and simulated manipulation tasks show that STORM improves generalization to visual distractors, and control performance compared to directly using frozen foundation model features or training object-centric representations end-to-end. Our results highlight multi-phase adaptation as an efficient mechanism for transforming generic foundation model features into task-aware object-centric representations for robotic control.
RF-MatID: Dataset and Benchmark for Radio Frequency Material Identification ICLR 2026
Accurate material identification plays a crucial role in embodied AI systems, enabling a wide range of applications. However, current vision-based solutions are limited by the inherent constraints of optical sensors, while radio-frequency (RF) approaches, which can reveal intrinsic material properties, have received growing attention. Despite this progress, RF-based material identification remains hindered by the lack of large-scale public datasets and the limited benchmarking of learning-based approaches. In this work, we present RF-MatID, the first open-source, large-scale, wide-band, and geometry-diverse RF dataset for fine-grained material identification. RF-MatID includes 16 fine-grained categories grouped into 5 superclasses, spanning a broad frequency range from 4 to 43.5 GHz, and comprises 142k samples in both frequency- and time-domain representations. The dataset systematically incorporates controlled geometry perturbations, including variations in incidence angle and stand-off distance. We further establish a multi-setting, multi-protocol benchmark by evaluating state-of-the-art deep learning models, assessing both in-distribution performance and out-of-distribution robustness under cross-angle and cross-distance shifts. The 5 frequency-allocation protocols enable systematic frequency- and region-level analysis, thereby facilitating real-world deployment. RF-MatID aims to enable reproducible research, accelerate algorithmic advancement, foster cross-domain robustness, and support the development of real-world application in RF-based material identification.
comment: Accepted by ICLR 2026
Demonstration-Free Robotic Control via LLM Agents
Robotic manipulation has increasingly adopted vision-language-action (VLA) models, which achieve strong performance but typically require task-specific demonstrations and fine-tuning, and often generalize poorly under domain shift. We investigate whether general-purpose large language model (LLM) agent frameworks, originally developed for software engineering, can serve as an alternative control paradigm for embodied manipulation. We introduce FAEA (Frontier Agent as Embodied Agent), which applies an LLM agent framework directly to embodied manipulation without modification. Using the same iterative reasoning that enables software agents to debug code, FAEA enables embodied agents to reason through manipulation strategies. We evaluate an unmodified frontier agent, Claude Agent SDK, across the LIBERO, ManiSkill3, and MetaWorld benchmarks. With privileged environment state access, FAEA achieves success rates of 84.9%, 85.7%, and 96%, respectively. This level of task success approaches that of VLA models trained with less than 100 demonstrations per task, without requiring demonstrations or fine-tuning. With one round of human feedback as an optional optimization, performance increases to 88.2% on LIBERO. This demonstration-free capability has immediate practical value: FAEA can autonomously explore novel scenarios in simulation and generate successful trajectories for training data augmentation in embodied learning. Our results indicate that general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning. This opens a path for robotics systems to leverage actively maintained agent infrastructure and benefit directly from ongoing advances in frontier models. Code is available at https://github.com/robiemusketeer/faea-sim
Tactile-Force Alignment in Vision-Language-Action Models for Force-aware Manipulation
Vision-Language-Action (VLA) models have recently emerged as powerful generalists for robotic manipulation. However, due to their predominant reliance on visual modalities, they fundamentally lack the physical intuition required for contact-rich tasks that require precise force regulation and physical reasoning. Existing attempts to incorporate vision-based tactile sensing into VLA models typically treat tactile inputs as auxiliary visual textures, thereby overlooking the underlying correlation between surface deformation and interaction dynamics. To bridge this gap, we propose a paradigm shift from tactile-vision alignment to tactile-force alignment. Here, we introduce TaF-VLA, a framework that explicitly grounds high-dimensional tactile observations in physical interaction forces. To facilitate this, we develop an automated tactile-force data acquisition device and curate the TaF-Dataset, comprising over 10 million synchronized tactile observations, 6-axis force/torque, and matrix force map. To align sequential tactile observations with interaction forces, the central component of our approach is the Tactile-Force Adapter (TaF-Adapter), a tactile sensor encoder that extracts discretized latent information for encoding tactile observations. This mechanism ensures that the learned representations capture history-dependent, noise-insensitive physical dynamics rather than static visual textures. Finally, we integrate this force-aligned encoder into a VLA backbone. Extensive real-world experiments demonstrate that TaF-VLA policy significantly outperforms state-of-the-art tactile-vision-aligned and vision-only baselines on contact-rich tasks, verifying its ability to achieve robust, force-aware manipulation through cross-modal physical reasoning.
comment: 17pages,9fig
Shallow-π: Knowledge Distillation for Flow-based VLAs
The growing demand for real-time robotic deployment necessitates fast and on-device inference for vision-language-action (VLA) models. Within the VLA literature, efficiency has been extensively studied at the token level, such as visual token pruning. In contrast, systematic transformer layer reduction has received limited attention and, to the best of our knowledge, has not been explored for flow-based VLA models under knowledge distillation. In this work, we propose Shallow-pi, a principled knowledge distillation framework that aggressively reduces the transformer depth of both the VLM backbone and the flow-based action head, compressing the model from 18 to 6 layers. Shallow-pi achieves over two times faster inference with less than one percent absolute drop in success rate on standard manipulation benchmarks, establishing state-of-the-art performance among reduced VLA models. Crucially, we validate our approach through industrial-scale real-world experiments on Jetson Orin and Jetson Thor across multiple robot platforms, including humanoid systems, in complex and dynamic manipulation scenarios.
TouchGuide: Inference-Time Steering of Visuomotor Policies via Touch Guidance
Fine-grained and contact-rich manipulation remain challenging for robots, largely due to the underutilization of tactile feedback. To address this, we introduce TouchGuide, a novel cross-policy visuo-tactile fusion paradigm that fuses modalities within a low-dimensional action space. Specifically, TouchGuide operates in two stages to guide a pre-trained diffusion or flow-matching visuomotor policy at inference time. First, the policy produces a coarse, visually-plausible action using only visual inputs during early sampling. Second, a task-specific Contact Physical Model (CPM) provides tactile guidance to steer and refine the action, ensuring it aligns with realistic physical contact conditions. Trained through contrastive learning on limited expert demonstrations, the CPM provides a tactile-informed feasibility score to steer the sampling process toward refined actions that satisfy physical contact constraints. Furthermore, to facilitate TouchGuide training with high-quality and cost-effective data, we introduce TacUMI, a data collection system. TacUMI achieves a favorable trade-off between precision and affordability; by leveraging rigid fingertips, it obtains direct tactile feedback, thereby enabling the collection of reliable tactile data. Extensive experiments on five challenging contact-rich tasks, such as shoe lacing and chip handover, show that TouchGuide consistently and significantly outperforms state-of-the-art visuo-tactile policies.
TRACER: Texture-Robust Affordance Chain-of-Thought for Deformable-Object Refinement
The central challenge in robotic manipulation of deformable objects lies in aligning high-level semantic instructions with physical interaction points under complex appearance and texture variations. Due to near-infinite degrees of freedom, complex dynamics, and heterogeneous patterns, existing vision-based affordance prediction methods often suffer from boundary overflow and fragmented functional regions. To address these issues, we propose TRACER, a Texture-Robust Affordance Chain-of-thought with dEformable-object Refinement framework, which establishes a cross-hierarchical mapping from hierarchical semantic reasoning to appearance-robust and physically consistent functional region refinement. Specifically, a Tree-structured Affordance Chain-of-Thought (TA-CoT) is formulated to decompose high-level task intentions into hierarchical sub-task semantics, providing consistent guidance across various execution stages. To ensure spatial integrity, a Spatial-Constrained Boundary Refinement (SCBR) mechanism is introduced to suppress prediction spillover, guiding the perceptual response to converge toward authentic interaction manifolds. Furthermore, an Interactive Convergence Refinement Flow (ICRF) is developed to aggregate discrete pixels corrupted by appearance noise, significantly enhancing the spatial continuity and physical plausibility of the identified functional regions. Extensive experiments conducted on the Fine-AGDDO15 dataset and a real-world robotic platform demonstrate that TRACER significantly improves affordance grounding precision across diverse textures and patterns inherent to deformable objects. More importantly, it enhances the success rate of long-horizon tasks, effectively bridging the gap between high-level semantic reasoning and low-level physical execution. The source code and dataset will be made publicly available at https://github.com/Dikay1/TRACER.
comment: The source code and dataset will be made publicly available at https://github.com/Dikay1/TRACER
A Taylor Series Approach to Correct Localization Errors in Robotic Field Mapping using Gaussian Processes
Gaussian Processes (GPs) are powerful non-parametric Bayesian models for regression of scalar fields, formulated under the assumption that measurement locations are perfectly known and the corresponding field measurements have Gaussian noise. However, many real-world scalar field mapping applications rely on sensor-equipped mobile robots to collect field measurements, where imperfect localization introduces state uncertainty. Such discrepancies between the estimated and true measurement locations degrade GP mean and covariance estimates. To address this challenge, we propose a method for updating the GP models when improved estimates become available. Leveraging the differentiability of the kernel function, a second-order correction algorithm is developed using the precomputed Jacobians and Hessians of the GP mean and covariance functions for real-time refinement based on measurement location discrepancy data. Simulation results demonstrate improved prediction accuracy and computational efficiency compared to full model retraining.
AI-Augmented Density-Driven Optimal Control (D2OC) for Decentralized Environmental Mapping
This paper presents an AI-augmented decentralized framework for multi-agent (multi-robot) environmental mapping under limited sensing and communication. While conventional coverage formulations achieve effective spatial allocation when an accurate reference map is available, their performance deteriorates under uncertain or biased priors. The proposed method introduces an adaptive and self-correcting mechanism that enables agents to iteratively refine local density estimates within an optimal transport-based framework, ensuring theoretical consistency and scalability. A dual multilayer perceptron (MLP) module enhances adaptivity by inferring local mean-variance statistics and regulating virtual uncertainty for long-unvisited regions, mitigating stagnation around local minima. Theoretical analysis rigorously proves convergence under the Wasserstein metric, while simulation results demonstrate that the proposed AI-augmented Density-Driven Optimal Control consistently achieves robust and precise alignment with the ground-truth density, yielding substantially higher-fidelity reconstruction of complex multi-modal spatial distributions compared with conventional decentralized baselines.
Multi-Robot Decentralized Collaborative SLAM in Planetary Analogue Environments: Dataset, Challenges, and Lessons Learned
Decentralized collaborative simultaneous localization and mapping (C-SLAM) is essential to enable multirobot missions in unknown environments without relying on preexisting localization and communication infrastructure. This technology is anticipated to play a key role in the exploration of the Moon, Mars, and other planets. In this article, we share insights and lessons learned from C-SLAM experiments involving three robots operating on a Mars analogue terrain and communicating over an ad hoc network. We examine the impact of limited and intermittent communication on C-SLAM performance, as well as the unique localization challenges posed by planetary-like environments. Additionally, we introduce a novel dataset collected during our experiments, which includes real-time peer-to-peer inter-robot throughput and latency measurements. This dataset aims to support future research on communication-constrained, decentralized multirobot operations.
Track-centric Iterative Learning for Global Trajectory Optimization in Autonomous Racing
This paper presents a global trajectory optimization framework for minimizing lap time in autonomous racing under uncertain vehicle dynamics. Optimizing the trajectory over the full racing horizon is computationally expensive, and tracking such a trajectory in the real world hardly assures global optimality due to uncertain dynamics. Yet, existing work mostly focuses on dynamics learning at the tracking level, without updating the trajectory itself to account for the learned dynamics. To address these challenges, we propose a track-centric approach that directly learns and optimizes the full-horizon trajectory. We first represent trajectories through a track-agnostic parametric space in light of the wavelet transform. This space is then efficiently explored using Bayesian optimization, where the lap time of each candidate is evaluated by running simulations with the learned dynamics. This optimization is embedded in an iterative learning framework, where the optimized trajectory is deployed to collect real-world data for updating the dynamics, progressively refining the trajectory over the iterations. The effectiveness of the proposed framework is validated through simulations and real-world experiments, demonstrating lap time improvement of up to 20.7% over a nominal baseline and consistently outperforming state-of-the-art methods.
Meta-ROS: A Next-Generation Middleware Architecture for Adaptive and Scalable Robotic Systems
The field of robotics faces significant challenges related to the complexity and interoperability of existing middleware frameworks, like ROS2, which can be difficult for new developers to adopt. To address these issues, we propose Meta-ROS, a novel middleware solution designed to streamline robotics development by simplifying integration, enhancing performance, and ensuring cross-platform compatibility. Meta-ROS leverages modern communication protocols, such as Zenoh and ZeroMQ, to enable efficient and low-latency communication across diverse hardware platforms, while also supporting various data types like audio, images, and video. We evaluated Meta-ROS's performance through comprehensive testing, comparing it with existing middleware frameworks like ROS1 and ROS2. The results demonstrated that Meta-ROS outperforms ROS2, achieving up to 30% higher throughput, significantly reducing message latency, and optimizing resource usage. Additionally, its robust hardware support and developer-centric design facilitate seamless integration and ease of use, positioning Meta-ROS as an ideal solution for modern, real-time robotics AI applications.
comment: Checkout the Python Library - https://pypi.org/project/metaros/ To be Submitted in ACM Transactions on Autonomous and Adaptive Systems (TAAS) Journal
Quick Heuristic Validation of Edges in Dynamic Roadmap Graphs
In this paper we tackle the problem of adjusting roadmap graphs for robot motion planning to non-static environments. We introduce the "Red-Green-Gray" paradigm, a modification of the SPITE method, capable of classifying the validity status of nodes and edges using cheap heuristic checks, allowing fast semi-lazy roadmap updates. Given a roadmap, we use simple computational geometry methods to approximate the swept volumes of robots and perform lazy collision checks, and label a subset of the edges as invalid (red), valid (green), or unknown (gray). We present preliminary experimental results comparing our method to the well-established technique of Leven and Hutchinson, and showing increased accuracy as well as the ability to correctly label edges as invalid while maintaining comparable update runtimes.
Discrete Variational Autoencoding via Policy Search
Discrete latent bottlenecks in variational autoencoders (VAEs) offer high bit efficiency and can be modeled with autoregressive discrete distributions, enabling parameter-efficient multimodal search with transformers. However, discrete random variables do not allow for exact differentiable parameterization; therefore, discrete VAEs typically rely on approximations, such as Gumbel-Softmax reparameterization or straight-through gradient estimates, or employ high-variance gradient-free methods such as REINFORCE that have had limited success on high-dimensional tasks such as image reconstruction. Inspired by popular techniques in policy search, we propose a training framework for discrete VAEs that leverages the natural gradient of a non-parametric encoder to update the parametric encoder without requiring reparameterization. Our method, combined with automatic step size adaptation and a transformer-based encoder, scales to challenging datasets such as ImageNet and outperforms both approximate reparameterization methods and quantization-based discrete autoencoders in reconstructing high-dimensional data from compact latent spaces.
FLOL: Fast Baselines for Real-World Low-Light Enhancement
Low-Light Image Enhancement (LLIE) is a key task in computational photography and imaging. The problem of enhancing images captured during night or in dark environments has been well-studied in the computer vision literature. However, current deep learning-based solutions struggle with efficiency and robustness for real-world scenarios (e.g., scenes with noise, saturated pixels). We propose a lightweight neural network that combines image processing in the frequency and spatial domains. Our baseline method, FLOL, is one of the fastest models for this task, achieving results comparable to the state-of-the-art on popular real-world benchmarks such as LOLv2, LSRW, MIT-5K and UHD-LL. Moreover, we are able to process 1080p images in real-time under 12ms. Code and models at https://github.com/cidautai/FLOL
comment: Journal Preprint
Fusion of Visual-Inertial Odometry with LiDAR Relative Localization for Cooperative Guidance of a Micro-Scale Aerial Vehicle
A novel relative localization approach for guidance of a micro-scale Unmanned Aerial Vehicle (UAV) by a well-equipped aerial robot fusing Visual-Inertial Odometry (VIO) with Light Detection and Ranging (LiDAR) is proposed in this paper. LiDAR-based localization is accurate and robust to challenging environmental conditions, but 3D LiDARs are relatively heavy and require large UAV platforms, in contrast to lightweight cameras. However, visual-based self-localization methods exhibit lower accuracy and can suffer from significant drift with respect to the global reference frame. To benefit from both sensory modalities, we focus on cooperative navigation in a heterogeneous team of a primary LiDAR-equipped UAV and a secondary micro-scale camera-equipped UAV. We propose a novel cooperative approach combining LiDAR relative localization data with VIO output on board the primary UAV to obtain an accurate pose of the secondary UAV. The pose estimate is used to precisely and reliably guide the secondary UAV along trajectories defined in the primary UAV reference frame. The experimental evaluation has shown the superior accuracy of our method to the raw VIO output, reaching the average 3D Absolute Trajectory Error (ATE) of 0.28 m, and demonstrated its capability to guide the secondary UAV along desired trajectories while mitigating VIO drift. Thus, such a heterogeneous system can explore large areas with LiDAR precision, as well as visit locations inaccessible to the large LiDAR-carrying UAV platforms, as was showcased in a real-world cooperative mapping scenario.
comment: Preprint version. This work has been submitted to the IEEE for possible publication
Progressive-Resolution Policy Distillation: Leveraging Coarse-Resolution Simulations for Time-Efficient Fine-Resolution Policy Learning
In earthwork and construction, excavators often encounter large rocks mixed with various soil conditions, requiring skilled operators. This paper presents a framework for achieving autonomous excavation using reinforcement learning (RL) through a rock excavation simulator. In the simulation, resolution can be defined by the particle size/number in the whole soil space. Fine-resolution simulations closely mimic real-world behavior but demand significant calculation time and challenging sample collection, while coarse-resolution simulations enable faster sample collection but deviate from real-world behavior. To combine the advantages of both resolutions, we explore using policies developed in coarse-resolution simulations for pre-training in fine-resolution simulations. To this end, we propose a novel policy learning framework called Progressive-Resolution Policy Distillation (PRPD), which progressively transfers policies through some middle-resolution simulations with conservative policy transfer to avoid domain gaps that could lead to policy transfer failure. Validation in a rock excavation simulator and nine real-world rock environments demonstrated that PRPD reduced sampling time to less than 1/7 while maintaining task success rates comparable to those achieved through policy learning in a fine-resolution simulation. Additional videos and supplementary results are available on our project page: https://yuki-kadokawa.github.io/prpd/
comment: accepted for IEEE Transactions on Automation Science and Engineering (T-ASE)
Listen, Look, Drive: Coupling Audio Instructions for User-aware VLA-based Autonomous Driving
Vision Language Action (VLA) models promise an open-vocabulary interface that can translate perceptual ambiguity into semantically grounded driving decisions, yet they still treat language as a static prior fixed at inference time. As a result, the model must infer continuously shifting objectives from pixels alone, yielding delayed or overly conservative maneuvers. We argue that effective VLAs for autonomous driving need an online channel in which users can influence driving with specific intentions. To this end, we present EchoVLA, a user-aware VLA that couples camera streams with in situ audio instructions. We augment the nuScenes dataset with temporally aligned, intent-specific speech commands generated by converting ego-motion descriptions into synthetic audios. Further, we compose emotional speech-trajectory pairs into a multimodal Chain-of-Thought (CoT) for fine-tuning a Multimodal Large Model (MLM) based on Qwen2.5-Omni. Specifically, we synthesize the audio-augmented dataset with different emotion types paired with corresponding driving behaviors, leveraging the emotional cues embedded in tone, pitch, and speech tempo to reflect varying user states, such as urgent or hesitant intentions, thus enabling our EchoVLA to interpret not only the semantic content but also the emotional context of audio commands for more nuanced and emotionally adaptive driving behavior. In open-loop benchmarks, our approach reduces the average L2 error by $59.4\%$ and the collision rate by $74.4\%$ compared to the baseline of vision-only perception. More experiments on nuScenes dataset validate that EchoVLA not only steers the trajectory through audio instructions, but also modulates driving behavior in response to the emotions detected in the user's speech.
comment: Accepted by IV
Legged Robot State Estimation Using Invariant Neural-Augmented Kalman Filter with a Neural Compensator IROS 2025
This paper presents an algorithm to improve state estimation for legged robots. Among existing model-based state estimation methods for legged robots, the contact-aided invariant extended Kalman filter defines the state on a Lie group to preserve invariance, thereby significantly accelerating convergence. It achieves more accurate state estimation by leveraging contact information as measurements for the update step. However, when the model exhibits strong nonlinearity, the estimation accuracy decreases. Such nonlinearities can cause initial errors to accumulate and lead to large drifts over time. To address this issue, we propose compensating for errors by augmenting the Kalman filter with an artificial neural network serving as a nonlinear function approximator. Furthermore, we design this neural network to respect the Lie group structure to ensure invariance, resulting in our proposed Invariant Neural-Augmented Kalman Filter (InNKF). The proposed algorithm offers improved state estimation performance by combining the strengths of model-based and learning-based approaches. Project webpage: https://seokju-lee.github.io/innkf_webpage
comment: 8 pages, 10 figures, Accepted to IROS 2025
SG-CADVLM: A Context-Aware Decoding Powered Vision Language Model for Safety-Critical Scenario Generation
Autonomous vehicle safety validation requires testing on safety-critical scenarios, but these events are rare in real-world driving and costly to test due to collision risks. Crash reports provide authentic specifications of safety-critical events, offering a vital alternative to scarce real-world collision trajectory data. This makes them valuable sources for generating realistic high-risk scenarios through simulation. Existing approaches face significant limitations because data-driven methods lack diversity due to their reliance on existing latent distributions, whereas adversarial methods often produce unrealistic scenarios lacking physical fidelity. Large Language Model (LLM) and Vision Language Model (VLM)-based methods show significant promise. However, they suffer from context suppression issues where internal parametric knowledge overrides crash specifications, producing scenarios that deviate from actual accident characteristics. This paper presents SG-CADVLM (A Context-Aware Decoding Powered Vision Language Model for Safety-Critical Scenario Generation), a framework that integrates Context-Aware Decoding with multi-modal input processing to generate safety-critical scenarios from crash reports and road network diagrams. The framework mitigates VLM hallucination issues while enabling the simultaneous generation of road geometry and vehicle trajectories. The experimental results demonstrate that SG-CADVLM generates critical risk scenarios at a rate of 84.4% compared to 12.5% for the baseline methods, representing an improvement of 469%, while producing executable simulations for autonomous vehicle testing.
OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning ICLR 2026
Recent advances in multimodal large language models (MLLMs) have opened new opportunities for embodied intelligence, enabling multimodal understanding, reasoning, and interaction, as well as continuous spatial decision-making. Nevertheless, current MLLM-based embodied systems face two critical limitations. First, Geometric Adaptability Gap: models trained solely on 2D inputs or with hard-coded 3D geometry injection suffer from either insufficient spatial information or restricted 2D generalization, leading to poor adaptability across tasks with diverse spatial demands. Second, Embodiment Constraint Gap: prior work often neglects the physical constraints and capacities of real robots, resulting in task plans that are theoretically valid but practically infeasible. To address these gaps, we introduce OmniEVA -- an embodied versatile planner that enables advanced embodied reasoning and task planning through two pivotal innovations: (1) a Task-Adaptive 3D Grounding mechanism, which introduces a gated router to perform explicit selective regulation of 3D fusion based on contextual requirements, enabling context-aware 3D grounding for diverse embodied tasks. (2) an Embodiment-Aware Reasoning framework that jointly incorporates task goals and embodiment constraints into the reasoning loop, resulting in planning decisions that are both goal-directed and executable. Extensive experimental results demonstrate that OmniEVA not only achieves state-of-the-art general embodied reasoning performance, but also exhibits a strong ability across a wide range of downstream scenarios. Evaluations of a suite of proposed embodied benchmarks, including both primitive and composite tasks, confirm its robust and versatile planning capabilities. Project page: https://omnieva.github.io
comment: Published as a conference paper at ICLR 2026
Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review
Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this paper, we present the first systematic review focused specifically on the integration of foundation models in mobile service robotics. We analyze how recent advances in foundation models address these core challenges through language-conditioned control, multimodal sensor fusion, uncertainty-aware reasoning, and efficient model scaling. We further examine real-world applications in domestic assistance, healthcare, and service automation, highlighting how foundation models enable context-aware, socially responsive, and generalizable robot behaviors. Beyond technical considerations, we discuss ethical, societal, and human-interaction implications associated with deploying foundation model-enabled service robots in human environments. Finally, we outline future research directions emphasizing reliability and lifelong adaptation, privacy-aware and resource-constrained deployment, and governance and human-in-the-loop frameworks required for safe, scalable, and trustworthy mobile service robotics.
comment: v2: Expanded systematic review; resubmitted to Robotics
MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, incur high compute costs, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-domain generalization. Unlike naive multi-task SFT, MetaVLA integrates a lightweight meta-learning mechanism-derived from Attentive Neural Processes-to enable rapid adaptation from diverse contexts with minimal architectural change or inference overhead. On the LIBERO benchmark, MetaVLA with six auxiliary tasks outperforms OpenVLA by up to 8.0% on long-horizon tasks, reduces training steps from 240K to 75K, and cuts GPU time by ~76%. These results show that scalable, low-resource post-training is achievable-paving the way toward general-purpose embodied agents. Code will be available.
Judgelight: Trajectory-Level Post-Optimization for Multi-Agent Path Finding via Closed-Subwalk Collapsing
Multi-Agent Path Finding (MAPF) is an NP-hard problem with applications in warehouse automation and multi-robot coordination. Learning-based MAPF solvers offer fast and scalable planning but often produce feasible trajectories that contain unnecessary or oscillatory movements. We propose Judgelight, a post-optimization layer that improves trajectory quality after a MAPF solver generates a feasible schedule. Judgelight collapses closed subwalks in agents' trajectories to remove redundant movements while preserving all feasibility constraints. We formalize this process as MAPF-Collapse, prove that it is NP-hard, and present an exact optimization approach by formulating it as integer linear programming (ILP) problem. Experimental results show Judgelight consistently reduces solution cost by around 20%, particularly for learning-based solvers, producing trajectories that are better suited for real-world deployment.
Transport and Delivery of Objects with a Soft Everting Robot
Soft everting robots present significant advantages over traditional rigid robots, including enhanced dexterity, improved environmental interaction, and safe navigation in unpredictable environments. While soft everting robots have been widely demonstrated for exploration type tasks, their potential to move and deploy payloads in such tasks has been less investigated, with previous work focusing on sensors and tools for the robot. Leveraging the navigation capabilities, and deployed body, of the soft everting robot to deliver payloads in hazardous areas, e.g. carrying a water bottle to a person stuck under debris, would represent a significant capability in many applications. In this work, we present an analysis of how soft everting robots can be used to deploy larger, heavier payloads through the inside of the robot. We analyze both what objects can be deployed and what terrain features they can be carried through. Building on existing models, we present methods to quantify the effects of payloads on robot growth and self-support, and develop a model to predict payload slip. We then experimentally quantify payload transport using soft everting robot with a variety of payload shapes, sizes, and weights and though a series of tasks: steering, vertical transport, movement through holes, and movement across gaps. Overall, the results show that we can transport payloads in a variety of shapes and up to 1.5kg in weight and that we can move through circular apertures with as little as 0.01cm clearance around payloads, carry out discrete turns up to 135 degrees, and move across unsupported gaps of 1.15m in length.
comment: 8 pages, 11 figures, Published in IEEE Robotics and Automation Letters Link to publication: https://ieeexplore.ieee.org/abstract/document/11359009 Citation: E. M. DeVries, J. Ferlazzo, M. Ugur and L. H. Blumenschein, "Transport and Delivery of Objects With a Soft Everting Robot," in IEEE Robotics and Automation Letters, vol. 11, no. 3, pp. 2935-2942, March 2026, doi: 10.1109/LRA.2026.3655537
Multiagent Systems
Agentic Fog: A Policy-driven Framework for Distributed Intelligence in Fog Computing
Fog and edge computing require adaptive control schemes that can handle partial observability, severe latency requirements, and dynamically changing workloads. Recent research on Agentic AI (AAI) increasingly integrates reasoning systems powered by Large Language Models; however, these tools are not applicable to infrastructure-level systems due to their high computational cost, stochastic nature, and poor formal analyzability. In this paper, a generic model, Agentic Fog (AF), is presented, in which fog nodes are represented as policy-driven autonomous agents that communicate via p2p interactions based on shared memory and localized coordination. The suggested architecture decomposes a system's goals into abstract policy guidance and formalizes decentralized fog coordination as an exact potential game. The framework is guaranteed to converge and remain stable under asynchronous updates, bounded-rational best-response dynamics, and node failures. Simulations demonstrate that the AF system achieves lower average latency and adapts more efficiently to varying demand than greedy heuristics and integer linear programming under dynamic conditions. The sensitivity analysis also demonstrates the capability to perform optimally under different memory and coordination conditions.
comment: 09 Pages
A Dialectic Pipeline for Improving LLM Robustness
Assessing ways in which Language Models can reduce their hallucinations and improve the outputs' quality is crucial to ensure their large-scale use. However, methods such as fine-tuning on domain-specific data or the training of a separate \textit{ad hoc} verifier require demanding computational resources (not feasible for many user applications) and constrain the models to specific fields of knowledge. In this thesis, we propose a dialectic pipeline that preserves LLMs' generalization abilities while improving the quality of its answer via self-dialogue, enabling it to reflect upon and correct tentative wrong answers. We experimented with different pipeline settings, testing our proposed method on different datasets and on different families of models. All the pipeline stages are enriched with the relevant context (in an oracle-RAG setting) and a study on the impact of its summarization or its filtering is conducted. We find that our proposed dialectic pipeline is able to outperform by significative margins the standard model answers and that it consistently achieves higher performances than Chain-of-Thought only prompting.
Interpreting Emergent Extreme Events in Multi-Agent Systems
Large language model-powered multi-agent systems have emerged as powerful tools for simulating complex human-like systems. The interactions within these systems often lead to extreme events whose origins remain obscured by the black box of emergence. Interpreting these events is critical for system safety. This paper proposes the first framework for explaining emergent extreme events in multi-agent systems, aiming to answer three fundamental questions: When does the event originate? Who drives it? And what behaviors contribute to it? Specifically, we adapt the Shapley value to faithfully attribute the occurrence of extreme events to each action taken by agents at different time steps, i.e., assigning an attribution score to the action to measure its influence on the event. We then aggregate the attribution scores along the dimensions of time, agent, and behavior to quantify the risk contribution of each dimension. Finally, we design a set of metrics based on these contribution scores to characterize the features of extreme events. Experiments across diverse multi-agent system scenarios (economic, financial, and social) demonstrate the effectiveness of our framework and provide general insights into the emergence of extreme phenomena.
comment: 8 pages, 5 figures
A Practical Framework of Key Performance Indicators for Multi-Robot Lunar and Planetary Field Tests
Robotic prospecting for critical resources on the Moon, such as ilmenite, rare earth elements, and water ice, requires robust exploration methods given the diverse terrain and harsh environmental conditions. Although numerous analog field trials address these goals, comparing their results remains challenging because of differences in robot platforms and experimental setups. These missions typically assess performance using selected, scenario-specific engineering metrics that fail to establish a clear link between field performance and science-driven objectives. In this paper, we address this gap by deriving a structured framework of KPI from three realistic multi-robot lunar scenarios reflecting scientific objectives and operational constraints. Our framework emphasizes scenario-dependent priorities in efficiency, robustness, and precision, and is explicitly designed for practical applicability in field deployments. We validated the framework in a multi-robot field test and found it practical and easy to apply for efficiency- and robustness-related KPI, whereas precision-oriented KPI require reliable ground-truth data that is not always feasible to obtain in outdoor analog environments. Overall, we propose this framework as a common evaluation standard enabling consistent, goal-oriented comparison of multi-robot field trials and supporting systematic development of robotic systems for future planetary exploration.
AI-Augmented Density-Driven Optimal Control (D2OC) for Decentralized Environmental Mapping
This paper presents an AI-augmented decentralized framework for multi-agent (multi-robot) environmental mapping under limited sensing and communication. While conventional coverage formulations achieve effective spatial allocation when an accurate reference map is available, their performance deteriorates under uncertain or biased priors. The proposed method introduces an adaptive and self-correcting mechanism that enables agents to iteratively refine local density estimates within an optimal transport-based framework, ensuring theoretical consistency and scalability. A dual multilayer perceptron (MLP) module enhances adaptivity by inferring local mean-variance statistics and regulating virtual uncertainty for long-unvisited regions, mitigating stagnation around local minima. Theoretical analysis rigorously proves convergence under the Wasserstein metric, while simulation results demonstrate that the proposed AI-augmented Density-Driven Optimal Control consistently achieves robust and precise alignment with the ground-truth density, yielding substantially higher-fidelity reconstruction of complex multi-modal spatial distributions compared with conventional decentralized baselines.
Planner-Auditor Twin: Agentic Discharge Planning with FHIR-Based LLM Planning, Guideline Recall, Optional Caching and Self-Improvement
Objective: Large language models (LLMs) show promise for clinical discharge planning, but their use is constrained by hallucination, omissions, and miscalibrated confidence. We introduce a self-improving, cache-optional Planner-Auditor framework that improves safety and reliability by decoupling generation from deterministic validation and targeted replay. Materials and Methods: We implemented an agentic, retrospective, FHIR-native evaluation pipeline using MIMIC-IV-on-FHIR. For each patient, the Planner (LLM) generates a structured discharge action plan with an explicit confidence estimate. The Auditor is a deterministic module that evaluates multi-task coverage, tracks calibration (Brier score, ECE proxies), and monitors action-distribution drift. The framework supports two-tier self-improvement: (i) within-episode regeneration when enabled, and (ii) cross-episode discrepancy buffering with replay for high-confidence, low-coverage cases. Results: While context caching improved performance over baseline, the self-improvement loop was the primary driver of gains, increasing task coverage from 32% to 86%. Calibration improved substantially, with reduced Brier/ECE and fewer high-confidence misses. Discrepancy buffering further corrected persistent high-confidence omissions during replay. Discussion: Feedback-driven regeneration and targeted replay act as effective control mechanisms to reduce omissions and improve confidence reliability in structured clinical planning. Separating an LLM Planner from a rule-based, observational Auditor enables systematic reliability measurement and safer iteration without model retraining. Conclusion: The Planner-Auditor framework offers a practical pathway toward safer automated discharge planning using interoperable FHIR data access and deterministic auditing, supported by reproducible ablations and reliability-focused evaluation.
Meta-ROS: A Next-Generation Middleware Architecture for Adaptive and Scalable Robotic Systems
The field of robotics faces significant challenges related to the complexity and interoperability of existing middleware frameworks, like ROS2, which can be difficult for new developers to adopt. To address these issues, we propose Meta-ROS, a novel middleware solution designed to streamline robotics development by simplifying integration, enhancing performance, and ensuring cross-platform compatibility. Meta-ROS leverages modern communication protocols, such as Zenoh and ZeroMQ, to enable efficient and low-latency communication across diverse hardware platforms, while also supporting various data types like audio, images, and video. We evaluated Meta-ROS's performance through comprehensive testing, comparing it with existing middleware frameworks like ROS1 and ROS2. The results demonstrated that Meta-ROS outperforms ROS2, achieving up to 30% higher throughput, significantly reducing message latency, and optimizing resource usage. Additionally, its robust hardware support and developer-centric design facilitate seamless integration and ease of use, positioning Meta-ROS as an ideal solution for modern, real-time robotics AI applications.
comment: Checkout the Python Library - https://pypi.org/project/metaros/ To be Submitted in ACM Transactions on Autonomous and Adaptive Systems (TAAS) Journal
Micro-mobility dispatch optimization via quantum annealing incorporating historical data
This paper proposes a novel dispatch formulation for micro-mobility vehicles using a Quantum Annealer (QA). In recent years, QA has gained increasing attention as a high-performance solver for combinatorial optimization problems. Meanwhile, micro-mobility services have been rapidly developed as a promising means of realizing efficient and sustainable urban transportation. In this study, the dispatch problem for such micro-mobility services is formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem, enabling efficient solving through QA. Furthermore, the proposed formulation incorporates historical usage data to enhance operational efficiency. Specifically, customer arrival frequencies and destination distributions are modeled into the QUBO formulation through a Bayesian approach, which guides the allocation of vacant vehicles to designated stations for waiting and charging. Simulation experiments are conducted to evaluate the effectiveness of the proposed method, with comparisons to conventional formulations such as the vehicle routing problem. Additionally, the performance of QA is compared with that of classical solvers to reveal its potential advantages for the proposed dispatch formulation. The effect of reverse annealing on improving solution quality is also investigated.
comment: 15 pages, 7 figures, 1 table
Tacit Coordination of Large Language Models
In tacit coordination games with multiple outcomes, purely rational solution concepts, such as Nash equilibria, provide no guidance for which equilibrium to choose. Shelling's theory explains how, in these settings, humans coordinate by relying on focal points: solutions or outcomes that naturally arise because they stand out in some way as salient or prominent to all players. This work studies Large Language Models (LLMs) as players in tacit coordination games, and addresses how, when, and why focal points emerge. We compare and quantify the coordination capabilities of LLMs in cooperative and competitive games for which human experiments are available. We also introduce several learning-free strategies to improve the coordination of LLMs, with themselves and with humans. On a selection of heterogeneous open-source models, including Llama, Qwen, and GPT-oss, we discover that LLMs have a remarkable capability to coordinate and often outperform humans, yet fail on common-sense coordination that involves numbers or nuanced cultural archetypes. This paper constitutes the first large-scale assessment of LLMs' tacit coordination within the theoretical and psychological framework of focal points.
comment: Code: https://github.com/EmanueleLM/focal-points
Types for Grassroots Logic Programs
Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities. ``Logic Programs as Types for Logic Programs'' (LICS'91) defined types as regular sets of paths over derivable ground atoms. Here, we define types to be regular sets of moded paths, where a mode captures directionality of communication -- whether a subterm is consumed from or produced to the environment -- enabling the typing of interactive partial computations including those that eventually deadlock or fail, or never terminate. We provide a syntactic definition of well-typing and prove that a program is well-typed iff the path abstraction of its moded-atom semantics satisfies covariance and contravariance conditions with respect to its type. The GLP type system was implemented in Dart by AI, starting from a mathematical specification of Typed GLP (this paper), deriving from it an English spec (written by AI), and from the spec deriving Dart code (by AI). While GLP is naturally untyped, the motivation for Typed GLP comes from programming with AI: Asking AI to program complex communication modalities in GLP (and in general) and hoping for the best is a tenuous strategy. The emerging discipline we advocate and employ is for the human designer and AI to jointly develop and agree upon (1)~GLP types; (2)~GLP procedure type declarations; (3)~informal (English) descriptions of the procedures; and only then let AI attempt to write (4)~GLP code based on those.
Modelleme ve Simulasyon
Computer modeling and simulation is used to analyze system behavior and evaluate strategies for operating in descriptive or predictive modes. In this part of the book, modeling and simulation approaches that have been proposed since the 1970s have been tried to be presented. Simulation models used in social sciences, risk management and cloud-based information systems are tried to be summarized, and information about agent-based modeling and simulation approach is given.
comment: in Turkish language. Published journal version available at the publisher website
Hierarchical Multi-Agent DRL Based Dynamic Cluster Reconfiguration for UAV Mobility Management
Multi-connectivity involves dynamic cluster formation among distributed access points (APs) and coordinated resource allocation from these APs, highlighting the need for efficient mobility management strategies for users with multi-connectivity. In this paper, we propose a novel mobility management scheme for unmanned aerial vehicles (UAVs) that uses dynamic cluster reconfiguration with energy-efficient power allocation in a wireless interference network. Our objective encompasses meeting stringent reliability demands, minimizing joint power consumption, and reducing the frequency of cluster reconfiguration. To achieve these objectives, we propose a hierarchical multi-agent deep reinforcement learning (H-MADRL) framework, specifically tailored for dynamic clustering and power allocation. The edge cloud connected with a set of APs through low latency optical back-haul links hosts the high-level agent responsible for the optimal clustering policy, while low-level agents reside in the APs and are responsible for the power allocation policy. To further improve the learning efficiency, we propose a novel action-observation transition-driven learning algorithm that allows the low-level agents to use the action space from the high-level agent as part of the local observation space. This allows the lower-level agents to share partial information about the clustering policy and allocate the power more efficiently. The simulation results demonstrate that our proposed distributed algorithm achieves comparable performance to the centralized algorithm. Additionally, it offers better scalability, as the decision time for clustering and power allocation increases by only 10% when doubling the number of APs, compared to a 90% increase observed with the centralized approach.
comment: 15 pages, 7 figures
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
Multi-Agent System (MAS) powered by Visual Language Models (VLMs) enables challenging tasks but suffers from a novel failure term, multi-agent visual hallucination snowballing, where hallucinations are seeded in a single agent and amplified by following ones due to the over-reliance on textual flow to relay visual information. Through turn-, layer-, and token-wise attention analyses, we provide detailed insights into the essence of hallucination snowballing regarding the reduction of visual attention allocation. It leads us to identify a subset of vision tokens with a unimodal attention peak in middle layers that best preserve visual evidence but gradually diminish in deeper agent turns, resulting in the visual hallucination snowballing in MAS. Thus, we propose ViF, a lightweight, plug-and-play mitigation paradigm that relays inter-agent messages with Visual Flow powered by the selected visual relay tokens and applies attention reallocation to amplify this pattern. The experiment results demonstrate that our method markedly reduces hallucination snowballing, consistently improving the performance across eight benchmarks based on four common MAS structures and ten base models. The source code is publicly available at: https://github.com/YU-deep/ViF.git.
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce \textbf{AdaReasoner}, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.
comment: 28 pages, 10 figures and 13 tables
The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models
Recent advances in large language models (LLMs) have been largely driven by scaling laws for individual models, which predict performance improvements as model parameters and data volume increase. However, the capabilities of any single LLM are inherently bounded. One solution originates from intricate interactions among multiple LLMs, rendering their collective performance surpasses that of any constituent model. Despite the rapid proliferation of multi-model integration techniques such as model routing and post-hoc ensembling, a unifying theoretical framework of performance scaling for multi-model collaboration remains absent. In this work, we propose the Law of Multi-model Collaboration, a scaling law that predicts the performance limits of LLM ensembles based on their aggregated parameter budget. To quantify the intrinsic upper bound of multi-model collaboration, we adopt a method-agnostic formulation and assume an idealized integration oracle where the total cross-entropy loss of each sample is determined by the minimum loss of any model in the model pool. Experimental results reveal that multi-model systems follow a power-law scaling with respect to the total parameter count, exhibiting a more significant improvement trend and a lower theoretical loss floor compared to single model scaling. Moreover, ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family, indicating that model diversity is a primary driver of collaboration gains. These findings suggest that model collaboration represents a critical axis for extending the intelligence frontier of LLMs.
LLM Multi-Agent Systems: Challenges and Open Problems
This paper explores multi-agent systems and identify challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents, multi-agent systems can tackle complex tasks through agent collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layered context information, and enhancing memory management to support the intricate interactions within multi-agent systems. We also explore potential applications of multi-agent systems in blockchain systems to shed light on their future development and application in real-world distributed systems.
VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
Analog mixed-signal circuit sizing involves complex trade-offs within high-dimensional design spaces. Existing automatic analog circuit sizing approaches rely solely on netlists, ignoring the circuit schematic, which hinders the cognitive link between the schematic and its performance. Furthermore, the black-box nature of machine learning methods and hallucination risks in large language models fail to provide the necessary ground-truth explainability required for industrial sign-off. To address these challenges, we propose a Vision Language Model-optimized collaborative agent design workflow (VLM-CAD), which analyzes circuits, optimizes DC operating points, performs inference-based sizing, and executes external sizing optimization. We integrate Image2Net to annotate circuit schematics and generate a structured JSON description for precise interpretation by Vision Language Models. Furthermore, we propose an Explainable Trust Region Bayesian Optimization method (ExTuRBO) that employs collaborative warm-start from agent-generated seeds and offers dual-granularity sensitivity analysis for external sizing optimization, supporting a comprehensive final design report. Experiment results on amplifier sizing tasks using 180nm, 90nm, and 45nm Predictive Technology Models demonstrate that VLM-CAD effectively balances power and performance while maintaining physics-based explainability. VLM-CAD meets all specification requirements while maintaining low power consumption in optimizing an amplifier with a complementary input and a class-AB output stage, with a total runtime under 66 minutes across all experiments on two amplifiers.
comment: 9 pages, 4 figures
Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare
Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.
Systems and Control (EESS)
Multi-Mode Pinching Antenna Systems Enabled Multi-User Communications
This paper proposes a novel multi-mode pinching-antenna systems (PASS) framework. Multiple data streams can be transmitted within a single waveguide through multiple guided modes, thus facilitating efficient multi-user communications through the mode-domain multiplexing. A physic model is derived, which reveals the mode-selective power radiation feature of pinching antennas (PAs). A two-mode PASS enabled two-user downlink communication system is investigated. Considering the mode selectivity of PA power radiation, a practical PA grouping scheme is proposed, where each PA group matches with one specific guided mode and mainly radiates its signal sequentially. Depending on whether the guided mode leaks power to unmatched PAs or not, the proposed PA grouping scheme operates in either the non-leakage or weak-leakage regime. Based on this, the baseband beamforming and PA locations are jointly optimized for sum rate maximization, subject to each user's minimum rate requirement. 1) A simple two-PA case in non-leakage regime is first considered. To solve the formulated problem, a channel orthogonality based solution is proposed. The channel orthogonality is ensured by large-scale and wavelength-scale equality constraints on PA locations. Thus, the optimal beamforming reduces to maximum-ratio transmission (MRT). Moreover, the optimal PA locations are obtained via a Newton-based one-dimension search algorithm that enforces two-scale PA-location constraints by Newton's method. 2) A general multi-PA case in both non-leakage and weak-leakage regimes is further considered. A low-complexity particle-swarm optimization with zero-forcing beamforming (PSO-ZF) algorithm is developed, thus effectively tackling the high-oscillatory and strong-coupled problem. Simulation results demonstrate the superiority of the proposed multi-mode PASS over conventional single-mode PASS and fixed-antenna structures.
comment: 13 pages, 10 figures. Submitted to IEEE
Distributed Learning over Noisy Communication Networks
We study binary coordination games over graphs under log-linear learning when neighbor actions are conveyed through explicit noisy communication links. Each edge is modeled as either a binary symmetric channel (BSC) or a binary erasure channel (BEC). We analyze two operational regimes. For binary symmetric and binary erasure channels, we provide a structural characterization of the induced learning dynamics. In a fast-communication regime, agents update using channel-averaged payoffs; the resulting learning dynamics coincide with a Gibbs sampler for a scaled coordination potential, where channel reliability enters only through a scalar attenuation coefficient. In a snapshot regime, agents update from a single noisy realization and ignore channel statistics; the induced Markov chain is generally nonreversible, but admits a high-temperature expansion whose drift matches that of the fast Gibbs sampler with the same attenuation. We further formalize a finite-$K$ communication budget, which interpolates between snapshot and fast behavior as the number of channel uses per update grows. This viewpoint yields a communication-theoretic interpretation in terms of retransmissions and repetition coding, and extends naturally to heterogeneous link reliabilities via effective edge weights. Numerical experiments illustrate the theory and quantify the tradeoff between communication resources and steady-state coordination quality.
comment: draft, submitted to IEEE JSAC Special Issue on Distributed Optimization, Learning, and Inference over Communication-Constrained Networks
Learning Contextual Runtime Monitors for Safe AI-Based Autonomy
We introduce a novel framework for learning context-aware runtime monitors for AI-based control ensembles. Machine-learning (ML) controllers are increasingly deployed in (autonomous) cyber-physical systems because of their ability to solve complex decision-making tasks. However, their accuracy can degrade sharply in unfamiliar environments, creating significant safety concerns. Traditional ensemble methods aim to improve robustness by averaging or voting across multiple controllers, yet this often dilutes the specialized strengths that individual controllers exhibit in different operating contexts. We argue that, rather than blending controller outputs, a monitoring framework should identify and exploit these contextual strengths. In this paper, we reformulate the design of safe AI-based control ensembles as a contextual monitoring problem. A monitor continuously observes the system's context and selects the controller best suited to the current conditions. To achieve this, we cast monitor learning as a contextual learning task and draw on techniques from contextual multi-armed bandits. Our approach comes with two key benefits: (1) theoretical safety guarantees during controller selection, and (2) improved utilization of controller diversity. We validate our framework in two simulated autonomous driving scenarios, demonstrating significant improvements in both safety and performance compared to non-contextual baselines.
Tilt-based Aberration Estimation in Transmission Electron Microscopy
Transmission electron microscopes (TEMs) enable atomic-scale imaging but suffer from aberrations caused by lens imperfections and environmental conditions, reducing image quality. These aberrations can be compensated by adjusting electromagnetic lenses, but this requires accurate estimates of the aberration coefficients, which can drift over time. This paper introduces a method for the estimation of aberrations in TEM by leveraging the relationship between an induced electron beam tilt and the resulting image shift. The method uses a Kalman filter (KF) to estimate the aberration coefficients from a sequence of image shifts, while accounting for the drift of the aberrations over time. The applied tilt sequence is optimized by minimizing the trace of the predicted error covariance in the KF, which corresponds to the A-optimality criterion in experimental design. We show that this optimization can be performed offline, as the cost criterion is independent of the actual measurements. The resulting non-convex optimization problem is solved using a gradient-based, receding-horizon approach with multi-starts. Additionally, we develop an approach to estimate specimen-dependent noise properties using expectation maximization (EM), which are then used to tailor the tilt pattern optimization to the specific specimen being imaged. The proposed method is validated on a real TEM set-up with several optimized tilt patterns. The results show that optimized patterns significantly outperform naive approaches and that the aberration and drift model accurately captures the underlying physical phenomena. In total, the alignment time is reduced from typically several minutes to less than a minute compared to the state-of-the-art.
comment: Submitted version (pre-print)
A Timing-Anomaly Free Dynamic Scheduling on Heterogeneous Systems
Heterogeneous systems commonly adopt dynamic scheduling algorithms to improve resource utilization and enhance scheduling flexibility. However, such flexibility may introduce timing anomalies, wherein locally reduced execution times can lead to an increase in the overall system execution time. This phenomenon significantly complicates the analysis of Worst-Case Response Time (WCRT), rendering conventional analysis either overly pessimistic or unsafe, and often necessitating exhaustive state-space exploration to ensure correctness. To address this challenge, this paper presents the first timing-anomaly-free dynamic scheduling algorithm for heterogeneous systems, referred to as Deterministic Dynamic Execution. It achieves a safe and tight WCRT estimate through a single offline simulation execution. The core idea is to apply deterministic execution constraints, which partially restrict the resource allocation and execution order of tasks at runtime. Based on a formally defined execution progress model for heterogeneous system scheduling, we prove the correctness of the proposed execution constraints and their ability to eliminate timing anomalies. Furthermore, we propose two methods to generate execution constraints. The first method derives execution constraints directly from the execution traces produced by existing scheduling algorithms. The second method is a heuristic-based approach that constructs execution constraints, enabling further reduction of the WCRT. Experimental results on synthetically generated DAG task sets under various system configurations demonstrate that, compared to traditional dynamic scheduling algorithms, our approach not only eliminates timing anomalies but also effectively reduces both the WCRT and response time jitter.
Reducing End-to-End Latency of Cause-Effect Chains with Shared Cache Analysis
Cause-effect chains, as a widely used modeling method in real-time embedded systems, are extensively applied in various safety-critical domains. End-to-end latency, as a key real-time attribute of cause-effect chains, is crucial in many applications. But the analysis of end-to-end latency for cause-effect chains on multicore platforms with shared caches still presents an unresolved issue. Traditional methods typically assume that the worst-case execution time (WCET) of each task in the cause-effect chain is known. However, in the absence of scheduling information, these methods often assume that all shared cache accesses result in misses, leading to an overestimation of WCET and, consequently, affecting the accuracy of end-to-end latency. However, effectively integrating scheduling information into the WCET analysis process of the chains may introduce two challenges: first, how to leverage the structural characteristics of the chains to optimize shared cache analysis, and second, how to improve analysis accuracy while avoiding state space explosion. To address these issues, this paper proposes a novel end-to-end latency analysis framework designed for multi-chain systems on multicore platforms with shared caches. This framework extracts scheduling information and structural characteristics of cause-effect chains, constructing fine-grained and scalable inter-core memory access contexts at the basic block level for time-sensitive shared cache analysis. This results in more accurate WCET (TSC-WCET) estimates, which are then used to derive the end-to-end latency. Finally, we conduct experiments on dual-core and quad-core systems with various cache configurations, which show that under certain settings, the average maximum end-to-end latency of cause-effect chains is reduced by up to 34% and 26%.
Unsupervised Anomaly Detection in Multi-Agent Trajectory Prediction via Transformer-Based Models
Identifying safety-critical scenarios is essential for autonomous driving, but the rarity of such events makes supervised labeling impractical. Traditional rule-based metrics like Time-to-Collision are too simplistic to capture complex interaction risks, and existing methods lack a systematic way to verify whether statistical anomalies truly reflect physical danger. To address this gap, we propose an unsupervised anomaly detection framework based on a multi-agent Transformer that models normal driving and measures deviations through prediction residuals. A dual evaluation scheme has been proposed to assess both detection stability and physical alignment: Stability is measured using standard ranking metrics in which Kendall Rank Correlation Coefficient captures rank agreement and Jaccard index captures the consistency of the top-K selected items; Physical alignment is assessed through correlations with established Surrogate Safety Measures (SSM). Experiments on the NGSIM dataset demonstrate our framework's effectiveness: We show that the maximum residual aggregator achieves the highest physical alignment while maintaining stability. Furthermore, our framework identifies 388 unique anomalies missed by Time-to-Collision and statistical baselines, capturing subtle multi-agent risks like reactive braking under lateral drift. The detected anomalies are further clustered into four interpretable risk types, offering actionable insights for simulation and testing.
Neural Cooperative Reach-While-Avoid Certificates for Interconnected Systems
Providing formal guarantees for neural network-based controllers in large-scale interconnected systems remains a fundamental challenge. In particular, using neural certificates to capture cooperative interactions and verifying these certificates at scale is crucial for the safe deployment of such controllers. However, existing approaches fall short on both fronts. To address these limitations, we propose neural cooperative reach-while-avoid certificates with Dynamic-Localized Vector Control Lyapunov and Barrier Functions, which capture cooperative dynamics through state-dependent neighborhood structures and provide decentralized certificates for global exponential stability and safety. Based on the certificates, we further develop a scalable training and verification framework that jointly synthesizes controllers and neural certificates via a constrained optimization objective, and leverages a sufficient condition to ensure formal guarantees considering modeling error. To improve scalability, we introduce a structural reuse mechanism to transfer controllers and certificates between substructure-isomorphic systems. The proposed methodology is validated with extensive experiments on multi-robot coordination and vehicle platoons. Results demonstrate that our framework ensures certified cooperative reach-while-avoid while maintaining strong control performance.
Efficient Trajectory Design and Communication Scheduling for Dual-UAV Jamming-Aided Secure Communication Networks
We study dual-unmanned aerial vehicle (UAV) jamming-aided secure communication networks, in which one UAV delivers confidential data to multiple ground users (GUs), while a cooperative UAV provides protective interference against a ground eavesdropper. To enforce fairness, we maximize the minimum secrecy throughput across GUs by jointly designing trajectories and communication scheduling. The key difficulty lies in the continuous-time nature of UAV trajectories and the tight space-time coupling between the transmitter and the jammer, which jointly render the problem infinite-dimensional and nonconvex. To address these challenges, we characterize, for the first time, the structure of the optimal trajectories and rigorously prove that they follow a collaborative successive hover-and-fly (co-SHF) structure, where the two UAVs visit a limited number of synchronized co-hovering point pairs, and during each flight segment at least one UAV moves at maximum speed. Leveraging this structure, we reformulate the problem into a finite-dimensional form, without loss of optimality, over hovering and turning points, hovering durations, and scheduling. For tractability, we adopt a minimum-distance approximation of continuous anti-collision constraints and employ concave lower bounds on secrecy throughput within a successive convex approximation (SCA) method, which converges and, thanks to the co-SHF reduction in optimization variables and constraints, achieves low computational complexity. Numerical results show that, compared with time-discretization and no-jamming benchmarks, the proposed co-SHF design improves the min-secrecy and user fairness while requiring significantly less runtime.
A Data-Driven Krasovskii-Based Approach for Safety Controller Design of Time-Delayed Uncertain Polynomial Systems
We develop a data-driven framework for the synthesis of robust Krasovskii control barrier certificates (RK-CBC) and corresponding robust safety controllers (R-SC) for discrete-time input-affine uncertain polynomial systems with unknown dynamics, while explicitly accounting for unknown-but-bounded disturbances and time-invariant delays using only observed input-state data. Although control barrier certificates have been extensively studied for safety analysis of control systems, existing work on unknown systems with time delays, particularly in the presence of disturbances, remains limited. The challenge of safety synthesis for such systems stems from two main factors: first, the system's mathematical model is unavailable; and second, the safety conditions should explicitly incorporate the effects of time delays on system evolution during the synthesis process, while remaining robust to unknown disturbances. To address these challenges, we develop a data-driven framework based on Krasovskii control barrier certificates, extending the classical CBC formulation for delay-free systems to explicitly account for time delays by aggregating delayed components within the barrier construction. The proposed framework relies solely on input-state data collected over a finite time horizon, enabling the direct synthesis of RK-CBC and R-SC from observed trajectories without requiring an explicit system model. The synthesis is cast as a data-driven sum-of-squares (SOS) optimization program, yielding a structured design methodology. As a result, robust safety is guaranteed in the presence of unknown disturbances and time delays over an infinite time horizon. The effectiveness of the proposed method is demonstrated through three case studies, including two physical systems.
Parametric and Generative Forecasts of Day-Ahead Market Curves for Storage Optimization
We present two machine learning frameworks for forecasting aggregated curves and optimizing storage in the EPEX SPOT day-ahead market. First, a fast parametric model forecasts hourly demand and supply curves in a low-dimensional and grid-robust representation, with minimum and maximum volumes combined with a Chebyshev polynomial for the elastic segment. The model enables daily use with low error and clear interpretability. Second, for a more comprehensive analysis, though less suited to daily operation, we employ generative models that learn the joint distribution of 24-hour order-level submissions given weather and fuel variables. These models generate synthetic daily scenarios of individual buy and sell orders, which, once aggregated, yield hourly supply and demand curves. Based on these forecasts, we optimize a price-making storage strategy, quantify revenue distributions, and highlight the price-compression effect with lower peaks, higher off-peak levels, and diminishing returns as capacity expands.
comment: 46 pages, 41 figures
C-AoEI-Aware Cross-Layer Optimization in Satellite IoT Systems: Balancing Data Freshness and Transmission Efficiency
Satellite-based Internet of Things (S-IoT) faces a fundamental trilemma: propagation delay, dynamic fading, and bandwidth scarcity. While Layer-coded Hybrid ARQ (L-HARQ) enhances reliability, its backtracking decoding introduces age ambiguity, undermining the standard Age of Information (AoI) metric and obscuring the critical trade-off between data freshness and transmission efficiency. To bridge this gap, we propose a novel cross-layer optimization framework centered on a new metric, the Cross-layer Age of Error Information (C-AoEI). We derive a closed-form expression for C-AoEI, explicitly linking freshness to system parameters, establishing an explicit analytical connection between freshness degradation and channel dynamics. Building on this, we develop a packet-level encoded L-HARQ scheme for multi-GBS scenarios and an adaptive algorithm that jointly optimizes coding and decision thresholds. Extensive simulations demonstrate the effectiveness of our proposed framework: it achieves 31.8% higher transmission efficiency and 17.2% lower C-AoEI than conventional schemes. The framework also proves robust against inter-cell interference and varying channel conditions, providing a foundation for designing efficient, latency-aware next-generation S-IoT protocols.
comment: 18 pages, 13 figures, IEEE Internet of Things Journal, Accepted
A Taylor Series Approach to Correct Localization Errors in Robotic Field Mapping using Gaussian Processes
Gaussian Processes (GPs) are powerful non-parametric Bayesian models for regression of scalar fields, formulated under the assumption that measurement locations are perfectly known and the corresponding field measurements have Gaussian noise. However, many real-world scalar field mapping applications rely on sensor-equipped mobile robots to collect field measurements, where imperfect localization introduces state uncertainty. Such discrepancies between the estimated and true measurement locations degrade GP mean and covariance estimates. To address this challenge, we propose a method for updating the GP models when improved estimates become available. Leveraging the differentiability of the kernel function, a second-order correction algorithm is developed using the precomputed Jacobians and Hessians of the GP mean and covariance functions for real-time refinement based on measurement location discrepancy data. Simulation results demonstrate improved prediction accuracy and computational efficiency compared to full model retraining.
Safety Generalization Under Distribution Shift in Safe Reinforcement Learning: A Diabetes Testbed
Safe Reinforcement Learning (RL) algorithms are typically evaluated under fixed training conditions. We investigate whether training-time safety guarantees transfer to deployment under distribution shift, using diabetes management as a safety-critical testbed. We benchmark safe RL algorithms on a unified clinical simulator and reveal a safety generalization gap: policies satisfying constraints during training frequently violate safety requirements on unseen patients. We demonstrate that test-time shielding, which filters unsafe actions using learned dynamics models, effectively restores safety across algorithms and patient populations. Across eight safe RL algorithms, three diabetes types, and three age groups, shielding achieves Time-in-Range gains of 13--14\% for strong baselines such as PPO-Lag and CPO while reducing clinical risk index and glucose variability. Our simulator and benchmark provide a platform for studying safety under distribution shift in safety-critical control domains. Code is available at https://github.com/safe-autonomy-lab/GlucoSim and https://github.com/safe-autonomy-lab/GlucoAlg.
The Impact of Shared Autonomous Vehicles in Microtransit Systems: A Case Study in Atlanta
Microtransit systems represent an enhancement to solve the first- and last-mile problem, integrating traditional rail and bus networks with on-demand shuttles into a flexible, integrated system. This type of demand responsive transport provides greater accessibility and higher quality levels of service compared to conventional fixed-route transit services. Advances in technology offer further opportunities to enhance microtransit performance. In particular, shared autonomous vehicles (SAVs) have the potential to transform the mobility landscape by enabling more sustainable operations, enhanced user convenience, and greater system reliability. This paper investigates the integration of SAVs in microtransit systems, advancing the technological capabilities of on-demand shuttles. A shuttle dispatching optimization model is enhanced to accommodate for driver behavior and SAV functionalities. A model predictive control approach is proposed that dynamically rebalances on-demand shuttles towards areas of higher demand without relying on vast historical data. Scenario-driven experiments are conducted using data from the MARTA Reach microtransit pilot. The results demonstrate that SAVs can elevate both service quality and user experience compared to traditional on-demand shuttles in microtransit systems.
Mean-Field Learning for Storage Aggregation
Distributed energy storage devices can be pooled and coordinated by aggregators to participate in power system operations and market clearings. This requires representing a massive device population as a single, tractable surrogate that is computationally efficient, accurate, and compatible with market participation requirements. However, surrogate identification is challenging due to heterogeneity, nonconvexity, and high dimensionality of storage devices. To address these challenges, this paper develops a mean-field learning framework for storage aggregation. We interpret aggregation as the average behavior of a large storage population and show that, as the population grows, aggregate performance converges to a unique, convex mean-field limit, enabling tractable population-level modeling. This convexity further yields a price-responsive characterization of aggregate storage behavior and allows us to bound the mean-field approximation error. Leveraging these results, we construct a convex surrogate model that approximates the aggregate behavior of large storage populations and can be embedded directly into power system operations and market clearing. Surrogate parameter identification is formulated as an optimization problem using historical market price-response data, and we adopt a gradient-based algorithm for efficient learning procedure. Case studies validate the theoretical findings and demonstrate the effectiveness of the proposed framework in approximation accuracy, data efficiency, and profit outcomes.
comment: 14 pages, 6 figures
Experimental Design for Matching
Matching mechanisms play a central role in operations management across diverse fields including education, healthcare, and online platforms. However, experimentally comparing a new matching algorithm against a status quo presents some fundamental challenges due to matching interference, where assigning a unit in one matching may preclude its assignment in the other. In this work, we take a design-based perspective to study the design of randomized experiments to compare two predetermined matching plans on a finite population, without imposing outcome or behavioral models. We introduce the notation of a disagreement set, which captures the difference between the two matching plans, and show that it admits a unique decomposition into disjoint alternating paths and cycles with useful structural properties. Based on these properties, we propose the Alternating Path Randomized Design, which sequentially randomizes along these paths and cycles to effectively manage interference. Within a minimax framework, we optimize the conditional randomization probability and show that, for long paths, the optimal choice converges to $\sqrt{2}-1$, minimizing worst-case variance. We establish the unbiasedness of the Horvitz-Thompson estimator and derive a finite-population Central Limit Theorem that accommodates complex and unstable path and cycle structures as the population grows. Furthermore, we extend the design to many-to-one matchings, where capacity constraints fundamentally alter the structure of the disagreement set. Using graph-theoretic tools, including finding augmenting paths and Euler-tour decomposition on an auxiliary unbalanced directed graph, we construct feasible alternating path and cycle decompositions that allow the design and inference results to carry over.
Smart Exploration in Reinforcement Learning using Bounded Uncertainty Models
Reinforcement learning (RL) is a powerful framework for decision-making in uncertain environments, but it often requires large amounts of data to learn an optimal policy. We address this challenge by incorporating prior model knowledge to guide exploration and accelerate the learning process. Specifically, we assume access to a model set that contains the true transition kernel and reward function. We optimize over this model set to obtain upper and lower bounds on the Q-function, which are then used to guide the exploration of the agent. We provide theoretical guarantees on the convergence of the Q-function to the optimal Q-function under the proposed class of exploring policies. Furthermore, we also introduce a data-driven regularized version of the model set optimization problem that ensures the convergence of the class of exploring policies to the optimal policy. Lastly, we show that when the model set has a specific structure, namely the bounded-parameter MDP (BMDP) framework, the regularized model set optimization problem becomes convex and simple to implement. In this setting, we also prove finite-time convergence to the optimal policy under mild assumptions. We demonstrate the effectiveness of the proposed exploration strategy, which we call BUMEX (Bounded Uncertainty Model-based Exploration), in a simulation study. The results indicate that the proposed method can significantly accelerate learning in benchmark examples. A toolbox is available at https://github.com/JvHulst/BUMEX.
comment: Presented at 64th IEEE Conference on Decision and Control, CDC 2025, Rio de Janeiro, Brazil, 2025
Feasibility-aware Learning of Robust Temporal Logic Controllers using BarrierNet
Control Barrier Functions (CBFs) have been used to enforce safety and task specifications expressed in Signal Temporal Logic (STL). However, existing CBF-STL approaches typically rely on fixed hyperparameters and per-step optimization, which can lead to overly conservative behavior, infeasibility near tight input limits, and difficulty satisfying long-horizon STL tasks. To address these limitations, we propose a feasibility-aware learning framework that constructs trainable, time-varying High Order Control Barrier Function (HOCBF) constraints and hyperparameters that guarantee satisfaction of a given STL specification. We introduce a unified robustness measure that jointly captures STL satisfaction, constraint feasibility, and control-bound compliance, and propose a neural network architecture to generate control inputs that maximize this robustness. The resulting controller guarantees STL satisfaction with strictly feasible HOCBF constraints and requires no manual tuning. Simulation results demonstrate that the proposed framework maintains high STL robustness under tight input bounds and significantly outperforms fixed-parameter and non-adaptive baselines in complex environments.
comment: 16 pages, 11 figures
Zeroth-Order Constrained Optimization from a Control Perspective via Feedback Linearization
Safe derivative-free optimization under unknown constraints is a fundamental challenge in modern learning and control. Existing zeroth-order (ZO) methods typically still assume access to a first-order oracle of the constraint functions or restrict attention to convex settings, leaving nonconvex optimization with black-box constraints largely unexplored. We propose the zeroth-order feedback-linearization (ZOFL) algorithm for ZO constrained optimization that enforces feasibility without access to the first-order oracle of the constraint functions and applies to both equality and inequality constraints. The proposed approach relies only on noisy, sample-based gradient estimates obtained via two-point estimators, yet provably guarantees constraint satisfaction under mild regularity conditions. It adopts a control-theoretic perspective on ZO constrained optimization and leverages feedback linearization, a nonlinear control technique, to enforce feasibility. Finite-time bounds on constraint violation and asymptotic global convergence guarantees are established for the ZOFL algorithm. A midpoint discretization variant is further developed to improve feasibility without sacrificing optimality. Empirical results demonstrate that ZOFL consistently outperforms standard ZO baselines, achieving competitive objective values while maintaining feasibility.
Transport and Delivery of Objects with a Soft Everting Robot
Soft everting robots present significant advantages over traditional rigid robots, including enhanced dexterity, improved environmental interaction, and safe navigation in unpredictable environments. While soft everting robots have been widely demonstrated for exploration type tasks, their potential to move and deploy payloads in such tasks has been less investigated, with previous work focusing on sensors and tools for the robot. Leveraging the navigation capabilities, and deployed body, of the soft everting robot to deliver payloads in hazardous areas, e.g. carrying a water bottle to a person stuck under debris, would represent a significant capability in many applications. In this work, we present an analysis of how soft everting robots can be used to deploy larger, heavier payloads through the inside of the robot. We analyze both what objects can be deployed and what terrain features they can be carried through. Building on existing models, we present methods to quantify the effects of payloads on robot growth and self-support, and develop a model to predict payload slip. We then experimentally quantify payload transport using soft everting robot with a variety of payload shapes, sizes, and weights and though a series of tasks: steering, vertical transport, movement through holes, and movement across gaps. Overall, the results show that we can transport payloads in a variety of shapes and up to 1.5kg in weight and that we can move through circular apertures with as little as 0.01cm clearance around payloads, carry out discrete turns up to 135 degrees, and move across unsupported gaps of 1.15m in length.
comment: 8 pages, 11 figures, Published in IEEE Robotics and Automation Letters Link to publication: https://ieeexplore.ieee.org/abstract/document/11359009 Citation: E. M. DeVries, J. Ferlazzo, M. Ugur and L. H. Blumenschein, "Transport and Delivery of Objects With a Soft Everting Robot," in IEEE Robotics and Automation Letters, vol. 11, no. 3, pp. 2935-2942, March 2026, doi: 10.1109/LRA.2026.3655537
Robotics
VGGT-SLAM 2.0: Real time Dense Feed-forward Scene Reconstruction
We present VGGT-SLAM 2.0, a real time RGB feed-forward SLAM system which substantially improves upon VGGT-SLAM for incrementally aligning submaps created from VGGT. Firstly, we remove high-dimensional 15-degree-of-freedom drift and planar degeneracy from VGGT-SLAM by creating a new factor graph design while still addressing the reconstruction ambiguity of VGGT given unknown camera intrinsics. Secondly, by studying the attention layers of VGGT, we show that one of the layers is well suited to assist in image retrieval verification for free without additional training, which enables both rejecting false positive matches and allows for completing more loop closures. Finally, we conduct a suite of experiments which includes showing VGGT-SLAM 2.0 can easily be adapted for open-set object detection and demonstrating real time performance while running online onboard a ground robot using a Jetson Thor. We also test in environments ranging from cluttered indoor apartments and office scenes to a 4,200 square foot barn, and we also demonstrate VGGT-SLAM 2.0 achieves the highest accuracy on the TUM dataset with about 23 percent less pose error than VGGT-SLAM. Code will be released upon publication.
Estimating Trust in Human-Robot Collaboration through Behavioral Indicators and Explainability
Industry 5.0 focuses on human-centric collaboration between humans and robots, prioritizing safety, comfort, and trust. This study introduces a data-driven framework to assess trust using behavioral indicators. The framework employs a Preference-Based Optimization algorithm to generate trust-enhancing trajectories based on operator feedback. This feedback serves as ground truth for training machine learning models to predict trust levels from behavioral indicators. The framework was tested in a chemical industry scenario where a robot assisted a human operator in mixing chemicals. Machine learning models classified trust with over 80\% accuracy, with the Voting Classifier achieving 84.07\% accuracy and an AUC-ROC score of 0.90. These findings underscore the effectiveness of data-driven methods in assessing trust within human-robot collaboration, emphasizing the valuable role behavioral indicators play in predicting the dynamics of human trust.
How Does Delegation in Social Interaction Evolve Over Time? Navigation with a Robot for Blind People
Autonomy and independent navigation are vital to daily life but remain challenging for individuals with blindness. Robotic systems can enhance mobility and confidence by providing intelligent navigation assistance. However, fully autonomous systems may reduce users' sense of control, even when they wish to remain actively involved. Although collaboration between user and robot has been recognized as important, little is known about how perceptions of this relationship change with repeated use. We present a repeated exposure study with six blind participants who interacted with a navigation-assistive robot in a real-world museum. Participants completed tasks such as navigating crowds, approaching lines, and encountering obstacles. Findings show that participants refined their strategies over time, developing clearer preferences about when to rely on the robot versus act independently. This work provides insights into how strategies and preferences evolve with repeated interaction and offers design implications for robots that adapt to user needs over time.
comment: Pre-print of paper accepted into CHI 2026
HARMONI: Multimodal Personalization of Multi-User Human-Robot Interactions with LLMs
Existing human-robot interaction systems often lack mechanisms for sustained personalization and dynamic adaptation in multi-user environments, limiting their effectiveness in real-world deployments. We present HARMONI, a multimodal personalization framework that leverages large language models to enable socially assistive robots to manage long-term multi-user interactions. The framework integrates four key modules: (i) a perception module that identifies active speakers and extracts multimodal input; (ii) a world modeling module that maintains representations of the environment and short-term conversational context; (iii) a user modeling module that updates long-term speaker-specific profiles; and (iv) a generation module that produces contextually grounded and ethically informed responses. Through extensive evaluation and ablation studies on four datasets, as well as a real-world scenario-driven user-study in a nursing home environment, we demonstrate that HARMONI supports robust speaker identification, online memory updating, and ethically aligned personalization, outperforming baseline LLM-driven approaches in user modeling accuracy, personalization quality, and user satisfaction.
Information-Theoretic Detection of Bimanual Interactions for Dual-Arm Robot Plan Generation
Programming by demonstration is a strategy to simplify the robot programming process for non-experts via human demonstrations. However, its adoption for bimanual tasks is an underexplored problem due to the complexity of hand coordination, which also hinders data recording. This paper presents a novel one-shot method for processing a single RGB video of a bimanual task demonstration to generate an execution plan for a dual-arm robotic system. To detect hand coordination policies, we apply Shannon's information theory to analyze the information flow between scene elements and leverage scene graph properties. The generated plan is a modular behavior tree that assumes different structures based on the desired arms coordination. We validated the effectiveness of this framework through multiple subject video demonstrations, which we collected and made open-source, and exploiting data from an external, publicly available dataset. Comparisons with existing methods revealed significant improvements in generating a centralized execution plan for coordinating two-arm systems.
Whether We Care, How We Reason: The Dual Role of Anthropomorphism and Moral Foundations in Robot Abuse
As robots become increasingly integrated into daily life, understanding responses to robot mistreatment carries important ethical and design implications. This mixed-methods study (N = 201) examined how anthropomorphic levels and moral foundations shape reactions to robot abuse. Participants viewed videos depicting physical mistreatment of robots varying in humanness (Spider, Twofoot, Humanoid) and completed measures assessing moral foundations, anger, and social distance. Results revealed that anthropomorphism determines whether people extend moral consideration to robots, while moral foundations shape how they reason about such consideration. Qualitative analysis revealed distinct reasoning patterns: low-progressivism individuals employed character-based judgments, while high-progressivism individuals engaged in future-oriented moral deliberation. Findings offer implications for robot design and policy communication.
Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals ICLR 2026
Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula.
comment: To appear at ICLR 2026
Reimagining Social Robots as Recommender Systems: Foundations, Framework, and Applications
Personalization in social robots refers to the ability of the robot to meet the needs and/or preferences of an individual user. Existing approaches typically rely on large language models (LLMs) to generate context-aware responses based on user metadata and historical interactions or on adaptive methods such as reinforcement learning (RL) to learn from users' immediate reactions in real time. However, these approaches fall short of comprehensively capturing user preferences-including long-term, short-term, and fine-grained aspects-, and of using them to rank and select actions, proactively personalize interactions, and ensure ethically responsible adaptations. To address the limitations, we propose drawing on recommender systems (RSs), which specialize in modeling user preferences and providing personalized recommendations. To ensure the integration of RS techniques is well-grounded and seamless throughout the social robot pipeline, we (i) align the paradigms underlying social robots and RSs, (ii) identify key techniques that can enhance personalization in social robots, and (iii) design them as modular, plug-and-play components. This work not only establishes a framework for integrating RS techniques into social robots but also opens a pathway for deep collaboration between the RS and HRI communities, accelerating innovation in both fields.
comment: HRI 2026
SCOPE: Smooth Convex Optimization for Planned Evolution of Deformable Linear Objects
We present SCOPE, a fast and efficient framework for modeling and manipulating deformable linear objects (DLOs). Unlike conventional energy-based approaches, SCOPE leverages convex approximations to significantly reduce computational cost while maintaining smooth and physically plausible deformations. This trade-off between speed and accuracy makes the method particularly suitable for applications requiring real-time or near-real-time response. The effectiveness of the proposed framework is demonstrated through comprehensive simulation experiments, highlighting its ability to generate smooth shape trajectories under geometric and length constraints.
comment: Proceedings of Machine Learning Research tbd:1_13, 2025 International Conference on Computational Optimization
Scalable Exploration for High-Dimensional Continuous Control via Value-Guided Flow ICLR 2026
Controlling high-dimensional systems in biological and robotic applications is challenging due to expansive state-action spaces, where effective exploration is critical. Commonly used exploration strategies in reinforcement learning are largely undirected with sharp degradation as action dimensionality grows. Many existing methods resort to dimensionality reduction, which constrains policy expressiveness and forfeits system flexibility. We introduce Q-guided Flow Exploration (Qflex), a scalable reinforcement learning method that conducts exploration directly in the native high-dimensional action space. During training, Qflex traverses actions from a learnable source distribution along a probability flow induced by the learned value function, aligning exploration with task-relevant gradients rather than isotropic noise. Our proposed method substantially outperforms representative online reinforcement learning baselines across diverse high-dimensional continuous-control benchmarks. Qflex also successfully controls a full-body human musculoskeletal model to perform agile, complex movements, demonstrating superior scalability and sample efficiency in very high-dimensional settings. Our results indicate that value-guided flows offer a principled and practical route to exploration at scale.
comment: Accepted by ICLR 2026
Enhancing Worker Safety in Harbors Using Quadruped Robots
Infrastructure inspection is becoming increasingly relevant in the field of robotics due to its significant impact on ensuring workers' safety. The harbor environment presents various challenges in designing a robotic solution for inspection, given the complexity of daily operations. This work introduces an initial phase to identify critical areas within the port environment. Following this, a preliminary solution using a quadruped robot for inspecting these critical areas is analyzed.
AC^2-VLA: Action-Context-Aware Adaptive Computation in Vision-Language-Action Models for Efficient Robotic Manipulation
Vision-Language-Action (VLA) models have demonstrated strong performance in robotic manipulation, yet their closed-loop deployment is hindered by the high latency and compute cost of repeatedly running large vision-language backbones at every timestep. We observe that VLA inference exhibits structured redundancies across temporal, spatial, and depth dimensions, and that most existing efficiency methods ignore action context, despite its central role in embodied tasks. To address this gap, we propose Action-Context-aware Adaptive Computation for VLA models (AC^2-VLA), a unified framework that conditions computation on current visual observations, language instructions, and previous action states. Based on this action-centric context, AC^2-VLA adaptively performs cognition reuse across timesteps, token pruning, and selective execution of model components within a unified mechanism. To train the adaptive policy, we introduce an action-guided self-distillation scheme that preserves the behavior of the dense VLA policy while enabling structured sparsification that transfers across tasks and settings. Extensive experiments on robotic manipulation benchmarks show that AC^2-VLA achieves up to a 1.79\times speedup while reducing FLOPs to 29.4% of the dense baseline, with comparable task success.
Safe Exploration via Policy Priors
Safe exploration is a key requirement for reinforcement learning (RL) agents to learn and adapt online, beyond controlled (e.g. simulated) environments. In this work, we tackle this challenge by utilizing suboptimal yet conservative policies (e.g., obtained from offline data or simulators) as priors. Our approach, SOOPER, uses probabilistic dynamics models to optimistically explore, yet pessimistically fall back to the conservative policy prior if needed. We prove that SOOPER guarantees safety throughout learning, and establish convergence to an optimal policy by bounding its cumulative regret. Extensive experiments on key safe RL benchmarks and real-world hardware demonstrate that SOOPER is scalable, outperforms the state-of-the-art and validate our theoretical guarantees in practice.
The S3LI Vulcano Dataset: A Dataset for Multi-Modal SLAM in Unstructured Planetary Environments
We release the S3LI Vulcano dataset, a multi-modal dataset towards development and benchmarking of Simultaneous Localization and Mapping (SLAM) and place recognition algorithms that rely on visual and LiDAR modalities. Several sequences are recorded on the volcanic island of Vulcano, from the Aeolian Islands in Sicily, Italy. The sequences provide users with data from a variety of environments, textures and terrains, including basaltic or iron-rich rocks, geological formations from old lava channels, as well as dry vegetation and water. The data (rmc.dlr.de/s3li_dataset) is accompanied by an open source toolkit (github.com/DLR-RM/s3li-toolkit) providing tools for generating ground truth poses as well as preparation of labelled samples for place recognition tasks.
comment: Accepted submission to the 2026 IEEE Aerospace Conference
Enhancing Inverse Perspective Mapping for Automatic Vectorized Road Map Generation
In this study, we present a low-cost and unified framework for vectorized road mapping leveraging enhanced inverse perspective mapping (IPM). In this framework, Catmull-Rom splines are utilized to characterize lane lines, and all the other ground markings are depicted using polygons uniformly. The results from instance segmentation serve as references to refine the three-dimensional position of spline control points and polygon corner points. In conjunction with this process, the homography matrix of IPM and vehicle poses are optimized simultaneously. Our proposed framework significantly reduces the mapping errors associated with IPM. It also improves the accuracy of the initial IPM homography matrix and the predicted vehicle poses. Furthermore, it addresses the limitations imposed by the coplanarity assumption in IPM. These enhancements enable IPM to be effectively applied to vectorized road mapping, which serves a cost-effective solution with enhanced accuracy. In addition, our framework generalizes road map elements to include all common ground markings and lane lines. The proposed framework is evaluated in two different practical scenarios, and the test results show that our method can automatically generate high-precision maps with near-centimeter-level accuracy. Importantly, the optimized IPM matrix achieves an accuracy comparable to that of manual calibration, while the accuracy of vehicle poses is also significantly improved.
Rhombot: Rhombus-shaped Modular Robots for Stable, Medium-Independent Reconfiguration Motion
In this paper, we present Rhombot, a novel deformable planar lattice modular self-reconfigurable robot (MSRR) with a rhombus shaped module. Each module consists of a parallelogram skeleton with a single centrally mounted actuator that enables folding and unfolding along its diagonal. The core design philosophy is to achieve essential MSRR functionalities such as morphing, docking, and locomotion with minimal control complexity. This enables a continuous and stable reconfiguration process that is independent of the surrounding medium, allowing the system to reliably form various configurations in diverse environments. To leverage the unique kinematics of Rhombot, we introduce morphpivoting, a novel motion primitive for reconfiguration that differs from advanced MSRR systems, and propose a strategy for its continuous execution. Finally, a series of physical experiments validate the module's stable reconfiguration ability, as well as its positional and docking accuracy.
PALM: Enhanced Generalizability for Local Visuomotor Policies via Perception Alignment
Generalizing beyond the training domain in image-based behavior cloning remains challenging. Existing methods address individual axes of generalization, workspace shifts, viewpoint changes, and cross-embodiment transfer, yet they are typically developed in isolation and often rely on complex pipelines. We introduce PALM (Perception Alignment for Local Manipulation), which leverages the invariance of local action distributions between out-of-distribution (OOD) and demonstrated domains to address these OOD shifts concurrently, without additional input modalities, model changes, or data collection. PALM modularizes the manipulation policy into coarse global components and a local policy for fine-grained actions. We reduce the discrepancy between in-domain and OOD inputs at the local policy level by enforcing local visual focus and consistent proprioceptive representation, allowing the policy to retrieve invariant local actions under OOD conditions. Experiments show that PALM limits OOD performance drops to 8% in simulation and 24% in the real world, compared to 45% and 77% for baselines.
ALRM: Agentic LLM for Robotic Manipulation
Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities. However, their integration in robotic control pipelines remains limited in two aspects: (1) prior \ac{llm}-based approaches often lack modular, agentic execution mechanisms, limiting their ability to plan, reflect on outcomes, and revise actions in a closed-loop manner; and (2) existing benchmarks for manipulation tasks focus on low-level control and do not systematically evaluate multistep reasoning and linguistic variation. In this paper, we propose Agentic LLM for Robot Manipulation (ALRM), an LLM-driven agentic framework for robotic manipulation. ALRM integrates policy generation with agentic execution through a ReAct-style reasoning loop, supporting two complementary modes: Code-asPolicy (CaP) for direct executable control code generation, and Tool-as-Policy (TaP) for iterative planning and tool-based action execution. To enable systematic evaluation, we also introduce a novel simulation benchmark comprising 56 tasks across multiple environments, capturing linguistically diverse instructions. Experiments with ten LLMs demonstrate that ALRM provides a scalable, interpretable, and modular approach for bridging natural language reasoning with reliable robotic execution. Results reveal Claude-4.1-Opus as the top closed-source model and Falcon-H1-7B as the top open-source model under CaP.
A DVL Aided Loosely Coupled Inertial Navigation Strategy for AUVs with Attitude Error Modeling and Variance Propagation
In underwater navigation systems, strap-down inertial navigation system/Doppler velocity log (SINS/DVL)-based loosely coupled architectures are widely adopted. Conventional approaches project DVL velocities from the body coordinate system to the navigation coordinate system using SINS-derived attitude; however, accumulated attitude estimation errors introduce biases into velocity projection and degrade navigation performance during long-term operation. To address this issue, two complementary improvements are introduced. First, a vehicle attitude error-aware DVL velocity transformation model is formulated by incorporating attitude error terms into the observation equation to reduce projection-induced velocity bias. Second, a covariance matrix-based variance propagation method is developed to transform DVL measurement uncertainty across coordinate systems, introducing an expectation-based attitude error compensation term to achieve statistically consistent noise modeling. Simulation and field experiment results demonstrate that both improvements individually enhance navigation accuracy and confirm that accumulated attitude errors affect both projected velocity measurements and their associated uncertainty. When jointly applied, long-term error divergence is effectively suppressed. Field experimental results show that the proposed approach achieves a 78.3% improvement in 3D position RMSE and a 71.8% reduction in the maximum component-wise position error compared with the baseline IMU+DVL method, providing a robust solution for improving long-term SINS/DVL navigation performance.
Reinforcement Learning Goal-Reaching Control with Guaranteed Lyapunov-Like Stabilizer for Mobile Robots
Reinforcement learning (RL) can be highly effective at learning goal-reaching policies, but it typically does not provide formal guarantees that the goal will always be reached. A common approach to provide formal goal-reaching guarantees is to introduce a shielding mechanism that restricts the agent to actions that satisfy predefined safety constraints. The main challenge here is integrating this mechanism with RL so that learning and exploration remain effective without becoming overly conservative. Hence, this paper proposes an RL-based control framework that provides formal goal-reaching guarantees for wheeled mobile robots operating in unstructured environments. We first design a real-time RL policy with a set of 15 carefully defined reward terms. These rewards encourage the robot to reach both static and dynamic goals while generating sufficiently smooth command signals that comply with predefined safety specifications, which is critical in practice. Second, a Lyapunov-like stabilizer layer is integrated into the benchmark RL framework as a policy supervisor to formally strengthen the goal-reaching control while preserving meaningful exploration of the state action space. The proposed framework is suitable for real-time deployment in challenging environments, as it provides a formal guarantee of convergence to the intended goal states and compensates for uncertainties by generating real-time control signals based on the current state, while respecting real-world motion constraints. The experimental results show that the proposed Lyapunov-like stabilizer consistently improves the benchmark RL policies, boosting the goal-reaching rate from 84.6% to 99.0%, sharply reducing failures, and improving efficiency.
Self-Reconfiguration Planning for Deformable Quadrilateral Modular Robots
For lattice modular self-reconfigurable robots (MSRRs), maintaining stable connections during reconfiguration is crucial for physical feasibility and deployability. This letter presents a novel self-reconfiguration planning algorithm for deformable quadrilateral MSRRs that guarantees stable connection. The method first constructs feasible connect/disconnect actions using a virtual graph representation, and then organizes these actions into a valid execution sequence through a Dependence-based Reverse Tree (DRTree) that resolves interdependencies. We also prove that reconfiguration sequences satisfying motion characteristics exist for any pair of configurations with seven or more modules (excluding linear topologies). Finally, comparisons with a modified BiRRT algorithm highlight the superior efficiency and stability of our approach, while deployment on a physical robotic platform confirms its practical feasibility.
Physical Human-Robot Interaction: A Critical Review of Safety Constraints
This paper aims to provide a clear and rigorous understanding of commonly recognized safety constraints in physical human-robot interaction, i.e. ISO/TS 15066, by examining how they are obtained and which assumptions support them. We clarify the interpretation and practical impact of key simplifying assumptions, show how these modeling choices affect both safety and performance across the system, and indicate specific design parameters that can be adjusted in safety-critical control implementations. Numerical examples are provided to quantify performance degradation induced by common approximations and simplifying design choices. Furthermore, the fundamental role of energy in safety assessment is emphasized, and focused insights are offered on the existing body of work concerning energy-based safety methodologies.
Towards Gold-Standard Depth Estimation for Tree Branches in UAV Forestry: Benchmarking Deep Stereo Matching Methods
Autonomous UAV forestry operations require robust depth estimation with strong cross-domain generalization, yet existing evaluations focus on urban and indoor scenarios, leaving a critical gap for vegetation-dense environments. We present the first systematic zero-shot evaluation of eight stereo methods spanning iterative refinement, foundation model, diffusion-based, and 3D CNN paradigms. All methods use officially released pretrained weights (trained on Scene Flow) and are evaluated on four standard benchmarks (ETH3D, KITTI 2012/2015, Middlebury) plus a novel 5,313-pair Canterbury Tree Branches dataset ($1920 \times 1080$). Results reveal scene-dependent patterns: foundation models excel on structured scenes (BridgeDepth: 0.23 px on ETH3D; DEFOM: 4.65 px on Middlebury), while iterative methods show variable cross-benchmark performance (IGEV++: 0.36 px on ETH3D but 6.77 px on Middlebury; IGEV: 0.33 px on ETH3D but 4.99 px on Middlebury). Qualitative evaluation on the Tree Branches dataset establishes DEFOM as the gold-standard baseline for vegetation depth estimation, with superior cross-domain consistency (consistently ranking 1st-2nd across benchmarks, average rank 1.75). DEFOM predictions will serve as pseudo-ground-truth for future benchmarking.
Task-Centric Policy Optimization from Misaligned Motion Priors
Humanoid control often leverages motion priors from human demonstrations to encourage natural behaviors. However, such demonstrations are frequently suboptimal or misaligned with robotic tasks due to embodiment differences, retargeting errors, and task-irrelevant variations, causing naïve imitation to degrade task performance. Conversely, task-only reinforcement learning admits many task-optimal solutions, often resulting in unnatural or unstable motions. This exposes a fundamental limitation of linear reward mixing in adversarial imitation learning. We propose \emph{Task-Centric Motion Priors} (TCMP), a task-priority adversarial imitation framework that treats imitation as a conditional regularizer rather than a co-equal objective. TCMP maximizes task improvement while incorporating imitation signals only when they are compatible with task progress, yielding an adaptive, geometry-aware update that preserves task-feasible descent and suppresses harmful imitation under misalignment. We provide theoretical analysis of gradient conflict and task-priority stationary points, and validate our claims through humanoid control experiments demonstrating robust task performance with consistent motion style under noisy demonstrations.
Sim-and-Human Co-training for Data-Efficient and Generalizable Robotic Manipulation
Synthetic simulation data and real-world human data provide scalable alternatives to circumvent the prohibitive costs of robot data collection. However, these sources suffer from the sim-to-real visual gap and the human-to-robot embodiment gap, respectively, which limits the policy's generalization to real-world scenarios. In this work, we identify a natural yet underexplored complementarity between these sources: simulation offers the robot action that human data lacks, while human data provides the real-world observation that simulation struggles to render. Motivated by this insight, we present SimHum, a co-training framework to simultaneously extract kinematic prior from simulated robot actions and visual prior from real-world human observations. Based on the two complementary priors, we achieve data-efficient and generalizable robotic manipulation in real-world tasks. Empirically, SimHum outperforms the baseline by up to $\mathbf{40\%}$ under the same data collection budget, and achieves a $\mathbf{62.5\%}$ OOD success with only 80 real data, outperforming the real only baseline by $7.1\times$. Videos and additional information can be found at \href{https://kaipengfang.github.io/sim-and-human}{project website}.
Judgelight: Trajectory-Level Post-Optimization for Multi-Agent Path Finding via Closed-Subwalk Collapsing
Multi-Agent Path Finding (MAPF) is an NP-hard problem with applications in warehouse automation and multi-robot coordination. Learning-based MAPF solvers offer fast and scalable planning but often produce feasible trajectories that contain unnecessary or oscillatory movements. We propose Judgelight, a post-optimization method that improves trajectory quality after a MAPF solver generates a feasible schedule. Judgelight collapses closed subwalks in agents' trajectories to remove redundant movements while preserving all feasibility constraints. We formalize this process as MAPF-Collapse, prove that it is NP-hard, and present an exact optimization approach by formulating it as integer linear programming (ILP) problem. Experimental results show Judgelight consistently reduces solution cost by around 20%, particularly for learning-based solvers, producing trajectories that are better suited for real-world deployment.
Teaching Machine Learning Fundamentals with LEGO Robotics
This paper presents the web-based platform Machine Learning with Bricks and an accompanying two-day course designed to teach machine learning concepts to students aged 12 to 17 through programming-free robotics activities. Machine Learning with Bricks is an open source platform and combines interactive visualizations with LEGO robotics to teach three core algorithms: KNN, linear regression, and Q-learning. Students learn by collecting data, training models, and interacting with robots via a web-based interface. Pre- and post-surveys with 14 students demonstrate significant improvements in conceptual understanding of machine learning algorithms, positive shifts in AI perception, high platform usability, and increased motivation for continued learning. This work demonstrates that tangible, visualization-based approaches can make machine learning concepts accessible and engaging for young learners while maintaining technical depth. The platform is freely available at https://learning-and-dynamics.github.io/ml-with-bricks/, with video tutorials guiding students through the experiments at https://youtube.com/playlist?list=PLx1grFu4zAcwfKKJZ1Ux4LwRqaePCOA2J.
comment: 10 pages, 8 figures
Self-Supervised Path Planning in Unstructured Environments via Global-Guided Differentiable Hard Constraint Projection
Deploying deep learning agents for autonomous navigation in unstructured environments faces critical challenges regarding safety, data scarcity, and limited computational resources. Traditional solvers often suffer from high latency, while emerging learning-based approaches struggle to ensure deterministic feasibility. To bridge the gap from embodied to embedded intelligence, we propose a self-supervised framework incorporating a differentiable hard constraint projection layer for runtime assurance. To mitigate data scarcity, we construct a Global-Guided Artificial Potential Field (G-APF), which provides dense supervision signals without manual labeling. To enforce actuator limitations and geometric constraints efficiently, we employ an adaptive neural projection layer, which iteratively rectifies the coarse network output onto the feasible manifold. Extensive benchmarks on a test set of 20,000 scenarios demonstrate an 88.75\% success rate, substantiating the enhanced operational safety. Closed-loop experiments in CARLA further validate the physical realizability of the planned paths under dynamic constraints. Furthermore, deployment verification on an NVIDIA Jetson Orin NX confirms an inference latency of 94 ms, showing real-time feasibility on resource-constrained embedded hardware. This framework offers a generalized paradigm for embedding physical laws into neural architectures, providing a viable direction for solving constrained optimization in mechatronics. Source code is available at: https://github.com/wzq-13/SSHC.git.
Perception-to-Pursuit: Track-Centric Temporal Reasoning for Open-World Drone Detection and Autonomous Chasing ICCV 2027
Autonomous drone pursuit requires not only detecting drones but also predicting their trajectories in a manner that enables kinematically feasible interception. Existing tracking methods optimize for prediction accuracy but ignore pursuit feasibility, resulting in trajectories that are physically impossible to intercept 99.9% of the time. We propose Perception-to-Pursuit (P2P), a track-centric temporal reasoning framework that bridges detection and actionable pursuit planning. Our method represents drone motion as compact 8-dimensional tokens capturing velocity, acceleration, scale, and smoothness, enabling a 12-frame causal transformer to reason about future behavior. We introduce the Intercept Success Rate (ISR) metric to measure pursuit feasibility under realistic interceptor constraints. Evaluated on the Anti-UAV-RGBT dataset with 226 real drone sequences, P2P achieves 28.12 pixel average displacement error and 0.597 ISR, representing a 77% improvement in trajectory prediction and 597x improvement in pursuit feasibility over tracking-only baselines, while maintaining perfect drone classification accuracy (100%). Our work demonstrates that temporal reasoning over motion patterns enables both accurate prediction and actionable pursuit planning.
comment: 7 pages, 2 figures, 3 tables, 15 references. Intended for submission to ICCV 2027
Tactile Memory with Soft Robot: Robust Object Insertion via Masked Encoding and Soft Wrist
Tactile memory, the ability to store and retrieve touch-based experience, is critical for contact-rich tasks such as key insertion under uncertainty. To replicate this capability, we introduce Tactile Memory with Soft Robot (TaMeSo-bot), a system that integrates a soft wrist with tactile retrieval-based control to enable safe and robust manipulation. The soft wrist allows safe contact exploration during data collection, while tactile memory reuses past demonstrations via retrieval for flexible adaptation to unseen scenarios. The core of this system is the Masked Tactile Trajectory Transformer (MAT$^\text{3}$), which jointly models spatiotemporal interactions between robot actions, distributed tactile feedback, force-torque measurements, and proprioceptive signals. Through masked-token prediction, MAT$^\text{3}$ learns rich spatiotemporal representations by inferring missing sensory information from context, autonomously extracting task-relevant features without explicit subtask segmentation. We validate our approach on peg-in-hole tasks with diverse pegs and conditions in real-robot experiments. Our extensive evaluation demonstrates that MAT$^\text{3}$ achieves higher success rates than the baselines over all conditions and shows remarkable capability to adapt to unseen pegs and conditions.
comment: This work has been submitted to the IEEE for possible publication
iFAN Ecosystem: A Unified AI, Digital Twin, Cyber-Physical Security, and Robotics Environment for Advanced Nuclear Simulation and Operations
As nuclear facilities experience digital transformation and advanced reactor development, AI integration, cyber-physical security, and other emerging technologies such as autonomous robot operations are increasingly developed. However, evaluation and deployment is challenged by the lack of dedicated virtual testbeds. The Immersive Framework for Advanced Nuclear (iFAN) ecosystem is developed, a comprehensive digital twin framework with a realistic 3D environment with physics-based simulations. The iFAN ecosystem serves as a high-fidelity virtual testbed for plant operation, cybersecurity, physical security, and robotic operation, as it provides real-time data exchange for pre-deployment verification. Core features include virtual reality, reinforcement learning, radiation simulation, and cyber-physical security. In addition, the paper investigates various applications through potential operational scenarios. The iFAN ecosystem provides a versatile and secure architecture for validating the next generation of autonomous and cyber-resilient nuclear operations.
Robust Out-of-Order Retrieval for Grid-Based Storage at Maximum Capacity AAAI 2026
This paper proposes a framework for improving the operational efficiency of automated storage systems under uncertainty. It considers a 2D grid-based storage for uniform-sized loads (e.g., containers, pallets, or totes), which are moved by a robot (or other manipulator) along a collision-free path in the grid. The loads are labeled (i.e., unique) and must be stored in a given sequence, and later be retrieved in a different sequence -- an operational pattern that arises in logistics applications, such as last-mile distribution centers and shipyards. The objective is to minimize the load relocations to ensure efficient retrieval. A previous result guarantees a zero-relocation solution for known storage and retrieval sequences, even for storage at full capacity, provided that the side of the grid through which loads are stored/retrieved is at least 3 cells wide. However, in practice, the retrieval sequence can change after the storage phase. To address such uncertainty, this work investigates \emph{$k$-bounded perturbations} during retrieval, under which any two loads may depart out of order if they are originally at most $k$ positions apart. We prove that a $Θ(k)$ grid width is necessary and sufficient for eliminating relocations at maximum capacity. We also provide an efficient solver for computing a storage arrangement that is robust to such perturbations. To address the higher-uncertainty case where perturbations exceed $k$, a strategy is introduced to effectively minimize relocations. Extensive experiments show that, for $k$ up to half the grid width, the proposed storage-retrieval framework essentially eliminates relocations. For $k$ values up to the full grid width, relocations are reduced by $50\%+$.
comment: AAAI 2026
Agree to Disagree: Consensus-Free Flocking under Constraints SC
Robots sometimes have to work together with a mixture of partially-aligned or conflicting goals. Flocking - coordinated motion through cohesion, alignment, and separation - traditionally assumes uniform desired inter-agent distances. Many practical applications demand greater flexibility, as the diversity of types and configurations grows with the popularity of multi-agent systems in society. Moreover, agents often operate without guarantees of trust or secure communication. Motivated by these challenges we update well-established frameworks by relaxing this assumption of shared inter-agent distances and constraints. Through a new form of constrained collective potential function, we introduce a solution that permits negotiation of these parameters. In the spirit of the traditional flocking control canon, this negotiation is achieved purely through local observations and does not require any global information or inter-agent communication. The approach is robust to semi-trust scenarios, where neighbouring agents pursue conflicting goals. We validate the effectiveness of the approach through a series of simulations.
comment: 7 pages. This work has been accepted for publication in the Proceedings of IEEE SYSCON 2026
SimTO: A simulation-based topology optimization framework for bespoke soft robotic grippers
Soft robotic grippers are essential for grasping delicate, geometrically complex objects in manufacturing, healthcare and agriculture. However, existing grippers struggle to grasp feature-rich objects with high topological variability, including gears with sharp tooth profiles on automotive assembly lines, corals with fragile protrusions, or vegetables with irregular branching structures like broccoli. Unlike simple geometric primitives such as cubes or spheres, feature-rich objects lack a clear "optimal" contact surface, making them both difficult to grasp and susceptible to damage when grasped by existing gripper designs. Safe handling of such objects therefore requires specialized soft grippers whose morphology is tailored to the object's features. Topology optimization offers a promising approach for producing specialized grippers, but its utility is limited by the requirement for pre-defined load cases. For soft grippers interacting with feature-rich objects, these loads arise from hundreds of unpredictable gripper-object contact forces during grasping and are unknown a priori. To address this problem, we introduce SimTO, a framework that enables high-resolution topology optimization by automatically extracting load cases from a contact-based physics simulator, eliminating the need for manual load specification. Given an arbitrary feature-rich object, SimTO produces highly customized soft grippers with fine-grained morphological features tailored to the object geometry. Numerical results show our designs are not only highly specialized to feature-rich objects, but also generalize to unseen objects.
comment: 12 pages, 8 figures. Submitted to Structural and Multidisciplinary Optimization
Neuromorphic BrailleNet: Accurate and Generalizable Braille Reading Beyond Single Characters through Event-Based Optical Tactile Sensing
Conventional robotic Braille readers typically rely on discrete, character-by-character scanning, limiting reading speed and disrupting natural flow. Vision-based alternatives often require substantial computation, introduce latency, and degrade in real-world conditions. In this work, we present a high accuracy, real-time pipeline for continuous Braille recognition using Evetac, an open-source neuromorphic event-based tactile sensor. Unlike frame-based vision systems, the neuromorphic tactile modality directly encodes dynamic contact events during continuous sliding, closely emulating human finger-scanning strategies. Our approach combines spatiotemporal segmentation with a lightweight ResNet-based classifier to process sparse event streams, enabling robust character recognition across varying indentation depths and scanning speeds. The proposed system achieves near-perfect accuracy (>=98%) at standard depths, generalizes across multiple Braille board layouts, and maintains strong performance under fast scanning. On a physical Braille board containing daily-living vocabulary, the system attains over 90% word-level accuracy, demonstrating robustness to temporal compression effects that challenge conventional methods. These results position neuromorphic tactile sensing as a scalable, low latency solution for robotic Braille reading, with broader implications for tactile perception in assistive and robotic applications.
Real-Time Robot Execution with Masked Action Chunking ICLR 2026
Real-time execution is essential for cyber-physical systems such as robots. These systems operate in dynamic real-world environments where even small delays can undermine responsiveness and compromise performance. Asynchronous inference has recently emerged as a system-level paradigm for real-time robot manipulation, enabling the next action chunk to be predicted while the current one is being executed. While this approach achieves real-time responsiveness, naive integration often results in execution failure. Previous methods attributed this failure to inter-chunk discontinuity and developed test-time algorithms to smooth chunk boundaries. In contrast, we identify another critical yet overlooked factor: intra-chunk inconsistency, where the robot's executed action chunk partially misaligns with its current perception. To address this, we propose REMAC, which learns corrective adjustments on the pretrained policy through masked action chunking, enabling the policy to remain resilient under mismatches between intended actions and actual execution during asynchronous inference. In addition, we introduce a prefix-preserved sampling procedure to reinforce inter-chunk continuity. Overall, our method delivers more reliable policies without incurring additional latency. Extensive experiments in both simulation and real-world settings demonstrate that our method enables faster task execution, maintains robustness across varying delays, and consistently achieves higher completion rates.
comment: ICLR 2026. Project page at https://remac-async.github.io/
Game-Theoretic Autonomous Driving: A Graphs of Convex Sets Approach
Multi-vehicle autonomous driving couples strategic interaction with hybrid (discrete-continuous) maneuver planning under shared safety constraints. We introduce IBR-GCS, an Iterative Best Response (IBR) planning approach based on the Graphs of Convex Sets (GCS) framework that models highway driving as a generalized noncooperative game. IBR-GCS integrates combinatorial maneuver reasoning, trajectory planning, and game-theoretic interaction within a unified framework. The key novelty is a vehicle-specific, strategy-dependent GCS construction. Specifically, at each best-response update, each vehicle builds its own graph conditioned on the current strategies of the other vehicles, with vertices representing lane-specific, time-varying, convex, collision-free regions and edges encoding dynamically feasible transitions. This yields a shortest-path problem in GCS for each best-response step, which admits an efficient convex relaxation that can be solved using convex optimization tools without exhaustive discrete tree search. We then apply an iterative best-response scheme in which vehicles update their trajectories sequentially and provide conditions under which the resulting inexact updates converge to an approximate generalized Nash equilibrium. Simulation results across multi-lane, multi-vehicle scenarios demonstrate that IBR-GCS produces safe trajectories and strategically consistent interactive behaviors.
comment: 16 pages for content, 2 pages for references, 2 pages for notation
Just in time Informed Trees: Manipulability-Aware Asymptotically Optimized Motion Planning
In high-dimensional robotic path planning, traditional sampling-based methods often struggle to efficiently identify both feasible and optimal paths in complex, multi-obstacle environments. This challenge is intensified in robotic manipulators, where the risk of kinematic singularities and self-collisions further complicates motion efficiency and safety. To address these issues, we introduce the Just-in-Time Informed Trees (JIT*) algorithm, an enhancement over Effort Informed Trees (EIT*), designed to improve path planning through two core modules: the Just-in-Time module and the Motion Performance module. The Just-in-Time module includes "Just-in-Time Edge," which dynamically refines edge connectivity, and "Just-in-Time Sample," which adjusts sampling density in bottleneck areas to enable faster initial path discovery. The Motion Performance module balances manipulability and trajectory cost through dynamic switching, optimizing motion control while reducing the risk of singularities. Comparative analysis shows that JIT* consistently outperforms traditional sampling-based planners across $\mathbb{R}^4$ to $\mathbb{R}^{16}$ dimensions. Its effectiveness is further demonstrated in single-arm and dual-arm manipulation tasks, with experimental results available in a video at https://youtu.be/nL1BMHpMR7c.
E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning
Human-in-the-loop guidance has emerged as an effective approach for enabling faster convergence in online reinforcement learning (RL) of complex real-world manipulation tasks. However, existing human-in-the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world human-in-the-loop RL framework named \method, which requires fewer human intervention by actively selecting informative samples. Specifically, stable reduction of policy entropy enables improved trade-off between exploration and exploitation with higher sample efficiency. We first build influence functions of different samples on the policy entropy, which is efficiently estimated by the covariance of action probabilities and soft advantages of policies. Then we select samples with moderate values of influence functions, where shortcut samples that induce sharp entropy drops and noisy samples with negligible effect are pruned. Extensive experiments on four real-world manipulation tasks demonstrate that \method achieves a 42.1\% higher success rate while requiring 10.1\% fewer human interventions compared to the state-of-the-art HiL-RL method, validating its effectiveness. The project page providing code, videos, and mathematical formulations can be found at https://e2hil.github.io/.
comment: Project page: https://e2hil.github.io/
LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, LangForce significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
Problem Space Transformations for Out-of-Distribution Generalisation in Behavioural Cloning
The combination of behavioural cloning and neural networks has driven significant progress in robotic manipulation. As these algorithms may require a large number of demonstrations for each task of interest, they remain fundamentally inefficient in complex scenarios, in which finite datasets can hardly cover the state space. One of the remaining challenges is thus out-of-distribution (OOD) generalisation, i.e. the ability to predict correct actions for states with a low likelihood with respect to the state occupancy induced by the dataset. This issue is aggravated when the system to control is treated as a black-box, ignoring its physical properties. This work highlights widespread properties of robotic manipulation, specifically pose equivariance and locality. We investigate the effect of the choice of problem space on OOD performance of BC policies and how transformations arising from characteristic properties of manipulation can be employed for its improvement. Through controlled, simulated and real-world experiments, we empirically demonstrate that these transformations allow behaviour cloning policies, using either standard MLP-based one-step action prediction or diffusion-based action-sequence prediction, to generalise better to certain OOD problem instances. Code is available at https://github.com/kirandoshi/pst_ood_gen.
Studying the Effect of Explicit Interaction Representations on Learning Scene-level Distributions of Human Trajectories
Effectively capturing the joint distribution of all agents in a scene is relevant for predicting the true evolution of the scene and in turn providing more accurate information to the decision processes of autonomous vehicles. While new models have been developed for this purpose in recent years, it remains unclear how to best represent the joint distributions particularly from the perspective of the interactions between agents. Thus far there is no clear consensus on how best to represent interactions between agents; whether they should be learned implicitly from data by neural networks, or explicitly modeled using the spatial and temporal relations that are more grounded in human decision-making. This paper aims to study various means of describing interactions within the same network structure and their effect on the final learned joint distributions. Our findings show that more often than not, simply allowing a network to establish interactive connections between agents based on data has a detrimental effect on performance. Instead, having well defined interactions (such as which agent of an agent pair passes first at an intersection) can often bring about a clear boost in performance.
Constraint-Aware Discrete-Time PID Gain Optimization for Robotic Joint Control Under Actuator Saturation
The precise regulation of rotary actuation is fundamental in autonomous robotics, yet practical PID loops deviate from continuous-time theory due to discrete-time execution, actuator saturation, and small delays and measurement imperfections. We present an implementation-aware analysis and tuning workflow for saturated discrete-time joint control. We (i) derive PI stability regions under Euler and exact zero-order-hold (ZOH) discretizations using the Jury criterion, (ii) evaluate a discrete back-calculation anti-windup realization under saturation-dominant regimes, and (iii) propose a hybrid-certified Bayesian optimization workflow that screens analytically unstable candidates and behaviorally unsafe transients while optimizing a robust IAE objective with soft penalties on overshoot and saturation duty. Baseline sweeps ($τ=1.0$~s, $Δt=0.01$~s, $u\in[-10,10]$) quantify rise/settle trends for P/PI/PID. Under a randomized model family emulating uncertainty, delay, noise, quantization, and tighter saturation, robustness-oriented tuning improves median IAE from $0.843$ to $0.430$ while keeping median overshoot below $2\%$. In simulation-only tuning, the certification screen rejects $11.6\%$ of randomly sampled gains within bounds before full robust evaluation, improving sample efficiency without hardware experiments.
comment: Pending IEEE Transactions on Robotics Publication
A Riemannian Take on Distance Fields and Geodesic Flows in Robotics
Distance functions are crucial in robotics for representing spatial relationships between a robot and its environment. They provide an implicit, continuous, and differentiable representation that integrates seamlessly with control, optimization, and learning. While standard distance fields rely on the Euclidean metric, many robotic tasks inherently involve non-Euclidean structures. To this end, we generalize Euclidean distance fields to more general metric spaces by solving the Riemannian eikonal equation, a first-order partial differential equation whose solution defines a distance field and its associated gradient flow on the manifold, enabling the computation of geodesics and globally length-minimizing paths. We demonstrate that geodesic distance fields, the classical Riemannian distance function represented as a global, continuous, and queryable field, are effective for a broad class of robotic problems where Riemannian geometry naturally arises. To realize this, we present a neural Riemannian eikonal solver (NES) that solves the equation as a mesh-free implicit representation without grid discretization, scaling to high-dimensional robot manipulators. Training leverages a physics-informed neural network (PINN) objective that constrains spatial derivatives via the PDE residual and boundary and metric conditions, so the model is supervised by the governing equation and requires no labeled distances or geodesics. We propose two NES variants, conditioned on boundary data and on spatially varying Riemannian metrics, underscoring the flexibility of the neural parameterization. We validate the effectiveness of our approach through extensive examples, yielding minimal-length geodesics across diverse robot tasks involving Riemannian geometry.
comment: 27 pages, 20 figures. International Journal of Robotics Research (IJRR)
CBMC-V3: A CNS-inspired Control Framework Towards Agile Manipulation with SNN
As robotic arm applications expand beyond traditional industrial settings into service-oriented domains such as catering, household and retail, existing control algorithms struggle to achieve the level of agile manipulation required in unstructured environments characterized by dynamic trajectories, unpredictable interactions, and diverse objects. This paper presents a biomimetic control framework based on Spiking Neural Network (SNN), inspired by the human Central Nervous System (CNS), to address these challenges. The proposed framework comprises five control modules-cerebral cortex, cerebellum, thalamus, brainstem, and spinal cord-organized into three hierarchical control levels (first-order, second-order, and third-order) and two information pathways (ascending and descending). All modules are fully implemented using SNN. The framework is validated through both simulation and experiments on a commercial robotic arm platform across a range of control tasks. The results demonstrate that the proposed method outperforms the baseline in terms of agile motion control capability, offering a practical and effective solution for achieving agile manipulation.
Automating Box Folding: Sequence Extraction and Ranking Methodologies
Box folding represents a crucial challenge for automated packaging systems. This work bridges the gap between existing methods for folding sequence extraction and approaches focused on the adaptability of automated systems to specific box types. An innovative method is proposed to identify and rank folding sequences, enabling the transformation of a box from an initial state to a desired final configuration. The system evaluates and ranks these sequences based on their feasibility and compatibility with available hardware, providing recommendations for real-world implementations. Finally, an illustrative use case is presented, where a robot performs the folding of a box.
comment: Accepted at IFAC ROBOTICS 2025
xFLIE: Leveraging Actionable Hierarchical Scene Representations for Autonomous Semantic-Aware Inspection Missions
We present a novel architecture aimed towards incremental construction and exploitation of a hierarchical 3D scene graph representation during semantic-aware inspection missions. Inspection planning, particularly of distributed targets in previously unseen environments, presents an opportunity to exploit the semantic structure of the scene during reasoning, navigation and scene understanding. Motivated by this, we propose the 3D Layered Semantic Graph (3DLSG), a hierarchical inspection scene graph constructed in an incremental manner and organized into abstraction layers that support planning demands in real-time. To address the task of semantic-aware inspection, a mission framework, termed as Enhanced First-Look Inspect Explore (xFLIE), that tightly couples the 3DLSG with an inspection planner is proposed. We assess the performance through simulations and experimental trials, evaluating target-selection, path-planning and semantic navigation tasks over the 3DLSG model. The scenarios presented are diverse, ranging from city-scale distributed to solitary infrastructure targets in simulated worlds and subsequent outdoor and subterranean environment deployments onboard a quadrupedal robot. The proposed method successfully demonstrates incremental construction and planning over the 3DLSG representation to meet the objectives of the missions. Furthermore, the framework demonstrates successful semantic navigation tasks over the structured interface at the end of the inspection missions. Finally, we report multiple orders of magnitude reduction in path-planning time compared to conventional volumetric-map-based methods over various environment scale, demonstrating the planning efficiency and scalability of the proposed approach.
comment: 20 pages
MetaVLA: Unified Meta Co-training For Efficient Embodied Adaption
Vision-Language-Action (VLA) models show promise in embodied reasoning, yet remain far from true generalists-they often require task-specific fine-tuning, incur high compute costs, and generalize poorly to unseen tasks. We propose MetaVLA, a unified, backbone-agnostic post-training framework for efficient and scalable alignment. MetaVLA introduces Context-Aware Meta Co-Training, which consolidates diverse target tasks into a single fine-tuning stage while leveraging structurally diverse auxiliary tasks to improve in-domain generalization. Unlike naive multi-task SFT, MetaVLA integrates a lightweight meta-learning mechanism-derived from Attentive Neural Processes-to enable rapid adaptation from diverse contexts with minimal architectural change or inference overhead. On the LIBERO benchmark, MetaVLA with six auxiliary tasks outperforms OpenVLA by up to 8.0% on long-horizon tasks, reduces training steps from 240K to 75K, and cuts GPU time by ~76%. These results show that scalable, low-resource post-training is achievable-paving the way toward general-purpose embodied agents. Code will be available.
Fast and Safe Trajectory Optimization for Mobile Manipulators With Neural Configuration Space Distance Field
Mobile manipulators promise agile, long-horizon behavior by coordinating base and arm motion, yet whole-body trajectory optimization in cluttered, confined spaces remains difficult due to high-dimensional nonconvexity and the need for fast, accurate collision reasoning. Configuration Space Distance Fields (CDF) enable fixed-base manipulators to model collisions directly in configuration space via smooth, implicit distances. This representation holds strong potential to bypass the nonlinear configuration-to-workspace mapping while preserving accurate whole-body geometry and providing optimization-friendly collision costs. Yet, extending this capability to mobile manipulators is hindered by unbounded workspaces and tighter base-arm coupling. We lift this promise to mobile manipulation with Generalized Configuration Space Distance Fields (GCDF), extending CDF to robots with both translational and rotational joints in unbounded workspaces with tighter base-arm coupling. We prove that GCDF preserves Euclidean-like local distance structure and accurately encodes whole-body geometry in configuration space, and develop a data generation and training pipeline that yields continuous neural GCDFs with accurate values and gradients, supporting efficient GPU-batched queries. Building on this representation, we develop a high-performance sequential convex optimization framework centered on GCDF-based collision reasoning. The solver scales to large numbers of implicit constraints through (i) online specification of neural constraints, (ii) sparsity-aware active-set detection with parallel batched evaluation across thousands of constraints, and (iii) incremental constraint management for rapid replanning under scene changes.
Memory-Maze: Scenario Driven Visual Language Navigation Benchmark for Guiding Blind People
Visual Language Navigation (VLN) powered robots have the potential to guide blind people by understanding route instructions provided by sighted passersby. This capability allows robots to operate in environments often unknown a prior. Existing VLN models are insufficient for the scenario of navigation guidance for blind people, as they need to understand routes described from human memory, which frequently contains stutters, errors, and omissions of details, as opposed to those obtained by thinking out loud, such as in the R2R dataset. However, existing benchmarks do not contain instructions obtained from human memory in natural environments. To this end, we present our benchmark, Memory-Maze, which simulates the scenario of seeking route instructions for guiding blind people. Our benchmark contains a maze-like structured virtual environment and novel route instruction data from human memory. Our analysis demonstrates that instruction data collected from memory was longer and contained more varied wording. We further demonstrate that addressing errors and ambiguities from memory-based instructions is challenging, by evaluating state-of-the-art models alongside our baseline model with modularized perception and controls.
Indoor Positioning Based on Active Radar Sensing and Passive Reflectors: Reflector Placement Optimization
We extend our work on a novel indoor positioning system (IPS) for autonomous mobile robots (AMRs) based on radar sensing of local, passive radar reflectors. Through the combination of simple reflectors and a single-channel frequency modulated continuous wave (FMCW) radar, high positioning accuracy at low system cost can be achieved. Further, a multi-objective (MO) particle swarm optimization (PSO) algorithm is presented that optimizes the 2D placement of radar reflectors in complex room settings.
Equitable Routing--Rethinking the Multiple Traveling Salesman Problem
The Multiple Traveling Salesman Problem (MTSP) extends the traveling salesman problem by assigning multiple salesmen to visit a set of targets from a common depot, with each target visited exactly once while minimizing total tour length. A common variant, the min-max MTSP, focuses on workload balance by minimizing the longest tour, but it is difficult to solve optimally due to weak linear relaxation bounds. This paper introduces two new parametric fairness-driven variants of the MTSP: the $\varepsilon$-Fair-MTSP and the $Δ$-Fair-MTSP, which promote equitable distribution of tour lengths while controlling overall cost. The $\varepsilon$-Fair-MTSP is formulated as a mixed-integer second-order cone program, while the $Δ$-Fair-MTSP is modeled as a mixed-integer linear program. We develop algorithms that guarantee global optimality for both formulations. Computational experiments on benchmark instances and real-world applications, including electric vehicle fleet routing, demonstrate their effectiveness. Furthermore, we show that the algorithms presented for the fairness-constrained MTSP variants can be used to obtain the pareto-front of a bi-objective optimization problem where one objective focuses on minimizing the total tour length and the other focuses on balancing the tour lengths of the individual tours. Overall, these fairness-constrained MTSP variants provide a practical and flexible alternative to the min-max MTSP.
comment: 26 pages
Multiagent Systems
Reimagining Peer Review Process Through Multi-Agent Mechanism Design ICSE
The software engineering research community faces a systemic crisis: peer review is failing under growing submissions, misaligned incentives, and reviewer fatigue. Community surveys reveal that researchers perceive the process as "broken." This position paper argues that these dysfunctions are mechanism design failures amenable to computational solutions. We propose modeling the research community as a stochastic multi-agent system and applying multi-agent reinforcement learning to design incentive-compatible protocols. We outline three interventions: a credit-based submission economy, MARL-optimized reviewer assignment, and hybrid verification of review consistency. We present threat models, equity considerations, and phased pilot metrics. This vision charts a research agenda toward sustainable peer review.
comment: To appear in the Proceedings of the 2026 IEEE/ACM 48th International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE). 4 pages, 1 figure, 1 table
TS-Debate: Multimodal Collaborative Debate for Zero-Shot Time Series Reasoning
Recent progress at the intersection of large language models (LLMs) and time series (TS) analysis has revealed both promise and fragility. While LLMs can reason over temporal structure given carefully engineered context, they often struggle with numeric fidelity, modality interference, and principled cross-modal integration. We present TS-Debate, a modality-specialized, collaborative multi-agent debate framework for zero-shot time series reasoning. TS-Debate assigns dedicated expert agents to textual context, visual patterns, and numerical signals, preceded by explicit domain knowledge elicitation, and coordinates their interaction via a structured debate protocol. Reviewer agents evaluate agent claims using a verification-conflict-calibration mechanism, supported by lightweight code execution and numerical lookup for programmatic verification. This architecture preserves modality fidelity, exposes conflicting evidence, and mitigates numeric hallucinations without task-specific fine-tuning. Across 20 tasks spanning three public benchmarks, TS-Debate achieves consistent and significant performance improvements over strong baselines, including standard multimodal debate in which all agents observe all inputs.
comment: Code will be available at https://github.com/DeepAuto-AI/TS-Debate
LLMs as Orchestrators: Constraint-Compliant Multi-Agent Optimization for Recommendation Systems
Recommendation systems must optimize multiple objectives while satisfying hard business constraints such as fairness and coverage. For example, an e-commerce platform may require every recommendation list to include items from multiple sellers and at least one newly listed product; violating such constraints--even once--is unacceptable in production. Prior work on multi-objective recommendation and recent LLM-based recommender agents largely treat constraints as soft penalties or focus on item scoring and interaction, leading to frequent violations in real-world deployments. How to leverage LLMs for coordinating constrained optimization in recommendation systems remains underexplored. We propose DualAgent-Rec, an LLM-coordinated dual-agent framework for constrained multi-objective e-commerce recommendation. The framework separates optimization into an Exploitation Agent that prioritizes accuracy under hard constraints and an Exploration Agent that promotes diversity through unconstrained Pareto search. An LLM-based coordinator adaptively allocates resources between agents based on optimization progress and constraint satisfaction, while an adaptive epsilon-relaxation mechanism guarantees feasibility of final solutions. Experiments on the Amazon Reviews 2023 dataset demonstrate that DualAgent-Rec achieves 100% constraint satisfaction and improves Pareto hypervolume by 4-6% over strong baselines, while maintaining competitive accuracy-diversity trade-offs. These results indicate that LLMs can act as effective orchestration agents for deployable and constraint-compliant recommendation systems.
comment: 8 pages, 5 figures
More at Stake: How Payoff and Language Shape LLM Agent Strategies in Cooperation Dilemmas
As LLMs increasingly act as autonomous agents in interactive and multi-agent settings, understanding their strategic behavior is critical for safety, coordination, and AI-driven social and economic systems. We investigate how payoff magnitude and linguistic context shape LLM strategies in repeated social dilemmas, using a payoff-scaled Prisoner's Dilemma to isolate sensitivity to incentive strength. Across models and languages, we observe consistent behavioral patterns, including incentive-sensitive conditional strategies and cross-linguistic divergence. To interpret these dynamics, we train supervised classifiers on canonical repeated-game strategies and apply them to LLM decisions, revealing systematic, model- and language-dependent behavioral intentions, with linguistic framing sometimes matching or exceeding architectural effects. Our results provide a unified framework for auditing LLMs as strategic agents and highlight cooperation biases with direct implications for AI governance and multi-agent system design.
comment: 14 pages, 10 figures, 4 tables
Game-Theoretic Autonomous Driving: A Graphs of Convex Sets Approach
Multi-vehicle autonomous driving couples strategic interaction with hybrid (discrete-continuous) maneuver planning under shared safety constraints. We introduce IBR-GCS, an Iterative Best Response (IBR) planning approach based on the Graphs of Convex Sets (GCS) framework that models highway driving as a generalized noncooperative game. IBR-GCS integrates combinatorial maneuver reasoning, trajectory planning, and game-theoretic interaction within a unified framework. The key novelty is a vehicle-specific, strategy-dependent GCS construction. Specifically, at each best-response update, each vehicle builds its own graph conditioned on the current strategies of the other vehicles, with vertices representing lane-specific, time-varying, convex, collision-free regions and edges encoding dynamically feasible transitions. This yields a shortest-path problem in GCS for each best-response step, which admits an efficient convex relaxation that can be solved using convex optimization tools without exhaustive discrete tree search. We then apply an iterative best-response scheme in which vehicles update their trajectories sequentially and provide conditions under which the resulting inexact updates converge to an approximate generalized Nash equilibrium. Simulation results across multi-lane, multi-vehicle scenarios demonstrate that IBR-GCS produces safe trajectories and strategically consistent interactive behaviors.
comment: 16 pages for content, 2 pages for references, 2 pages for notation
The Role of Social Learning and Collective Norm Formation in Fostering Cooperation in LLM Multi-Agent Systems AAMAS 2026
A growing body of multi-agent studies with LLMs explores how norms and cooperation emerge in mixed-motive scenarios, where pursuing individual gain can undermine the collective good. While prior work has explored these dynamics in both richly contextualized simulations and simplified game-theoretic environments, most LLM systems featuring common-pool resource (CPR) games provide agents with explicit reward functions directly tied to their actions. In contrast, human cooperation often emerges without explicit knowledge of the payoff structure or how individual actions translate into long-run outcomes, relying instead on heuristics, communication, and enforcement. We introduce a CPR simulation framework that removes explicit reward signals and embeds cultural-evolutionary mechanisms: social learning (adopting strategies and beliefs from successful peers) and norm-based punishment, grounded in Ostrom's principles of resource governance. Agents also individually learn from the consequences of harvesting, monitoring, and punishing via environmental feedback, enabling norms to emerge endogenously. We establish the validity of our simulation by reproducing key findings from existing studies on human behavior. Building on this, we examine norm evolution across a $2\times2$ grid of environmental and social initialisations (resource-rich vs. resource-scarce; altruistic vs. selfish) and benchmark how agentic societies comprised of different LLMs perform under these conditions. Our results reveal systematic model differences in sustaining cooperation and norm formation, positioning the framework as a rigorous testbed for studying emergent norms in mixed-motive LLM societies. Such analysis can inform the design of AI systems deployed in social and organizational contexts, where alignment with cooperative norms is critical for stability, fairness, and effective governance of AI-mediated environments.
comment: Accepted at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
Systems and Control (EESS)
A Latent Space Framework for Modeling Transient Engine Emissions Using Joint Embedding Predictive Architectures
Accurately modeling and controlling vehicle exhaust emissions during transient events, such as rapid acceleration, is critical for meeting environmental regulations and optimizing powertrains. Conventional data-driven methods, such as Multilayer Perceptrons (MLPs) and Long Short-Term Memory (LSTM) networks, improve upon phenomenological models but often struggle with the complex nonlinear dynamics of emission formation. These monolithic architectures are sensitive to dataset variability and typically require deep, computationally expensive structures to perform well, limiting their practical utility. This paper introduces a novel approach that overcomes these limitations by modeling emission dynamics within a structured latent space. Leveraging a Joint Embedding Predictive Architecture (JEPA), the proposed framework learns from a rich dataset that combines real-world Portable Emission Measurement System (PEMS) data with high-frequency hardware-in-the-loop measurements. The model abstracts away irrelevant noise, encoding only the key factors governing emission behavior into a compact, robust representation. This results in superior data efficiency and predictive accuracy across diverse transient regimes, significantly outperforming high-performing LSTM baselines in generalization. To ensure suitability for real-world deployment, the JEPA framework is structured to support pruning and post-training quantization. This strategy drastically reduces the computational footprint, minimizing inference time and memory demand with negligible accuracy loss. The result is a highly efficient model ideal for on-board implementation of advanced strategies, such as model predictive control or model-based reinforcement learning, in conventional and hybrid powertrains. These findings offer a clear pathway toward more robust emission control systems for next-generation vehicles.
comment: 10 pages, Submitted to the 2026 SAE WCX
A refined nonlinear least-squares method for the rational approximation problem
The adaptive Antoulas-Anderson (AAA) algorithm for rational approximation is a widely used method for the efficient construction of highly accurate rational approximations to given data. While AAA can often produce rational approximations accurate to any prescribed tolerance, these approximations may have degrees larger than what is actually required to meet the given tolerance. In this work, we consider the adaptive construction of interpolating rational approximations while aiming for the smallest feasible degree to satisfy a given error tolerance. To this end, we introduce refinement approaches to the linear least-squares step of the classical AAA algorithm that aim to minimize the true nonlinear least-squares error with respect to the given data. Furthermore, we theoretically analyze the derived approaches in terms of the corresponding gradients from the resulting minimization problems and use these insights to propose a new greedy framework that ensures monotonic error convergence. Numerical examples from function approximation and model order reduction verify the effectiveness of the proposed algorithm to construct accurate rational approximations of small degrees.
comment: 26 pages, 5 figures
Component-Aware Pruning Framework for Neural Network Controllers via Gradient-Based Importance Estimation
The transition from monolithic to multi-component neural architectures in advanced neural network controllers poses substantial challenges due to the high computational complexity of the latter. Conventional model compression techniques for complexity reduction, such as structured pruning based on norm-based metrics to estimate the relative importance of distinct parameter groups, often fail to capture functional significance. This paper introduces a component-aware pruning framework that utilizes gradient information to compute three distinct importance metrics during training: Gradient Accumulation, Fisher Information, and Bayesian Uncertainty. Experimental results with an autoencoder and a TD-MPC agent demonstrate that the proposed framework reveals critical structural dependencies and dynamic shifts in importance that static heuristics often miss, supporting more informed compression decisions.
comment: 8 pages, Submitted to the 2026 IFAC World Congress
SCOPE: Smooth Convex Optimization for Planned Evolution of Deformable Linear Objects
We present SCOPE, a fast and efficient framework for modeling and manipulating deformable linear objects (DLOs). Unlike conventional energy-based approaches, SCOPE leverages convex approximations to significantly reduce computational cost while maintaining smooth and physically plausible deformations. This trade-off between speed and accuracy makes the method particularly suitable for applications requiring real-time or near-real-time response. The effectiveness of the proposed framework is demonstrated through comprehensive simulation experiments, highlighting its ability to generate smooth shape trajectories under geometric and length constraints.
comment: Proceedings of Machine Learning Research tbd:1_13, 2025 International Conference on Computational Optimization
Frequency Shaping Control for Oscillation Damping in Weakly-Connected Power Network: A Root Locus Method
Frequency control following a contingency event is of vital concern in power system operations. Leveraging inverter-based resources, it is not hard to shape the center of inertia (COI) frequency nicely. However, under weak grid conditions, it becomes insufficient to solely shape the COI frequency since this aggregate signal fails to reveal the inter-area oscillations. In this manuscript, we advocate for foolproof fine-tuning rules for \emph{frequency shaping control} (FS) based on a systematic analysis of damping ratio and decay rate of inter-area oscillations to simultaneously meet specified metrics for frequency security and oscillatory stability. To this end, building on a modal decomposition, we simplify the oscillation damping problem into a pole-placement task for a set of scalar subsystems, which can be efficiently solved by only investigating the root locus of a scalar subsystem associated with the main mode, while FS inherently guarantees a Nadir-less COI frequency response. Through our proposed root-locus-based oscillatory stability analysis, we derive closed-form expressions for the minimum damping ratio and decay rate among inter-area oscillations in terms of networked system and control parameters under FS. Moreover, we propose useful tuning guidelines for FS which need only simple calculations or visualized tuning to not only shape the COI frequency into a first-order response that converges to a steady-state value within the allowed range but also ensure a satisfactory damping ratio and decay rate of inter-area oscillations following disturbances. As for the common virtual inertia control (VI), although similar oscillatory stability analysis becomes intractable, one can still glean some insights via the root locus method. Numerical simulations validate the proposed tuning for FS as well as the superiority of FS over VI in exponential convergence rate.
Data-Driven Predictive Control for Wide-Area Power Oscillation Damping
We study damping of inter-area oscillations in transmission grids using voltage-source-converter-based high-voltage direct-current (VSC-HVDC) links. Conventional power oscillation damping controllers rely on system models that are difficult to obtain in practice. Data-driven Predictive Control (DPC) addresses this limitation by replacing explicit models with data. We apply AutoRegressive with eXogenous inputs (ARX)-based predictive control and its Transient Predictive Control (TPC) variant, and compare them with Data-enabled Predictive Control (DeePC) and two standard model-based controllers. The methods are evaluated in simulation on a system exhibiting both inter-area and local oscillation modes. ARX-based predictive control and DeePC both achieve effective damping, while the ARX-based methods require less online computation. Using warm-started, pre-factorized operator-splitting solvers, ARX/TPC control actions are computed in less than 1ms. These results demonstrate that DPC is a viable approach for power-system oscillation damping for the given test case.
comment: 14 pages, 12 figures, submitted to TCST
Improved Initialization for Port-Hamiltonian Neural Network Models
Port-Hamiltonian neural networks have shown promising results in the identification of nonlinear dynamics of complex systems, as their combination of physical principles with data-driven learning allows for accurate modelling. However, due to the non-convex optimization problem inherent in learning the correct network parameters, the training procedure is prone to converging to local minima, potentially leading to poor performance. In order to avoid this issue, this paper proposes an improved initialization for port-Hamiltonian neural networks. The core idea is to first estimate a linear port-Hamiltonian system to be used as an initialization for the network, after which the neural network adapts to the system nonlinearities, reducing the training times and improving convergence. The effectiveness of this method is tested on a chained mass-spring-damper setup for varying noise levels and compared to the original approach.
comment: Preprint submitted to IFAC World Congress 2026
On the rarity of rocket-driven Penrose extraction in Kerr spacetime
We present a Monte Carlo study of energy extraction from rotating (Kerr) black holes via the Penrose process using rocket propulsion. Through over 250,000 trajectory simulations, we establish sharp constraints on when Penrose extraction with escape to infinity succeeds. The mechanism requires that exhaust ejected inside the ergosphere carries negative Killing energy, which is kinematically accessible only via ultra-relativistic ejection deep within the ergosphere. We find that successful extraction with escape is statistically rare ($\sim$1% in broad parameter scans) and is governed by strict thresholds: it requires high black hole spin (empirically $a/M \gtrsim 0.89$) and ultra-relativistic exhaust velocity (onset at $v_e \approx 0.91c$). When conditions are highly tuned to a specific "sweet spot," success rates can reach 88.5%, representing a narrow extraction window rather than generic behavior. Furthermore, single-impulse thrust at periapsis achieves significantly higher cumulative efficiency ($η_{\rm cum} \approx 19\%$) compared to continuous thrust ($\sim$2--4%) due to path-averaging penalties. These constraints quantify the extreme fine-tuning required for material-based Penrose extraction, consistent with the astrophysical dominance of electromagnetic mechanisms. Simulation code is available at https://github.com/anindex/penrose_process.
comment: 17 pages, 6 figures, 8 tables, submitted to Physical Review D
Physical Human-Robot Interaction: A Critical Review of Safety Constraints
This paper aims to provide a clear and rigorous understanding of commonly recognized safety constraints in physical human-robot interaction, i.e. ISO/TS 15066, by examining how they are obtained and which assumptions support them. We clarify the interpretation and practical impact of key simplifying assumptions, show how these modeling choices affect both safety and performance across the system, and indicate specific design parameters that can be adjusted in safety-critical control implementations. Numerical examples are provided to quantify performance degradation induced by common approximations and simplifying design choices. Furthermore, the fundamental role of energy in safety assessment is emphasized, and focused insights are offered on the existing body of work concerning energy-based safety methodologies.
Self-Supervised Path Planning in Unstructured Environments via Global-Guided Differentiable Hard Constraint Projection
Deploying deep learning agents for autonomous navigation in unstructured environments faces critical challenges regarding safety, data scarcity, and limited computational resources. Traditional solvers often suffer from high latency, while emerging learning-based approaches struggle to ensure deterministic feasibility. To bridge the gap from embodied to embedded intelligence, we propose a self-supervised framework incorporating a differentiable hard constraint projection layer for runtime assurance. To mitigate data scarcity, we construct a Global-Guided Artificial Potential Field (G-APF), which provides dense supervision signals without manual labeling. To enforce actuator limitations and geometric constraints efficiently, we employ an adaptive neural projection layer, which iteratively rectifies the coarse network output onto the feasible manifold. Extensive benchmarks on a test set of 20,000 scenarios demonstrate an 88.75\% success rate, substantiating the enhanced operational safety. Closed-loop experiments in CARLA further validate the physical realizability of the planned paths under dynamic constraints. Furthermore, deployment verification on an NVIDIA Jetson Orin NX confirms an inference latency of 94 ms, showing real-time feasibility on resource-constrained embedded hardware. This framework offers a generalized paradigm for embedding physical laws into neural architectures, providing a viable direction for solving constrained optimization in mechatronics. Source code is available at: https://github.com/wzq-13/SSHC.git.
Eco-Driving Control for Electric Vehicles with Multi-Speed Transmission: Optimizing Vehicle Speed and Powertrain Operation in Dynamic Environments
This article presents an eco-driving algorithm for electric vehicles featuring multi-speed transmissions. The proposed controller is formulated as a co-optimization problem, simultaneously optimizing both vehicle longitudinal speed and powertrain operation to maximize energy efficiency. Constraints derived from a connected vehicle based traffic prediction algorithm are used to ensure traffic safety and smooth traffic flow in dynamic environments with multiple signalized intersections and mixed traffic. By simplifying the complex, nonlinear mixed integer problem, the proposed controller achieves computational efficiency, enabling real-time implementation. To evaluate its performance, traffic scenarios from both Simulation of Urban MObility (SUMO) and real-world road tests are employed. The results demonstrate a notable reduction in energy consumption by up to 11.36\% over an \SI{18}{\km} drive.
comment: Accepted by SAE International Journal of Connected and Automated Vehicles
LightSBB-M: Bridging Schrödinger and Bass for Generative Diffusion Modeling
The Schrodinger Bridge and Bass (SBB) formulation, which jointly controls drift and volatility, is an established extension of the classical Schrodinger Bridge (SB). Building on this framework, we introduce LightSBB-M, an algorithm that computes the optimal SBB transport plan in only a few iterations. The method exploits a dual representation of the SBB objective to obtain analytic expressions for the optimal drift and volatility, and it incorporates a tunable parameter beta greater than zero that interpolates between pure drift (the Schrodinger Bridge) and pure volatility (Bass martingale transport). We show that LightSBB-M achieves the lowest 2-Wasserstein distance on synthetic datasets against state-of-the-art SB and diffusion baselines with up to 32 percent improvement. We also illustrate the generative capability of the framework on an unpaired image-to-image translation task (adult to child faces in FFHQ). These findings demonstrate that LightSBB-M provides a scalable, high-fidelity SBB solver that outperforms existing SB and diffusion baselines across both synthetic and real-world generative tasks. The code is available at https://github.com/alexouadi/LightSBB-M.
Output Feedback Stabilization of Linear Systems via Policy Gradient Methods
Stabilizing a dynamical system is a fundamental problem that serves as a cornerstone for many complex tasks in the field of control systems. The problem becomes challenging when the system model is unknown. Among the Reinforcement Learning (RL) algorithms that have been successfully applied to solve problems pertaining to unknown linear dynamical systems, the policy gradient (PG) method stands out due to its ease of implementation and can solve the problem in a model-free manner. However, most of the existing works on PG methods for unknown linear dynamical systems assume full-state feedback. In this paper, we take a step towards model-free learning for partially observable linear dynamical systems with output feedback and focus on the fundamental stabilization problem of the system. We propose an algorithmic framework that stretches the boundary of PG methods to the problem without global convergence guarantees. We show that by leveraging zeroth-order PG update based on system trajectories and its convergence to stationary points, the proposed algorithms return a stabilizing output feedback policy for discrete-time linear dynamical systems. We also explicitly characterize the sample complexity of our algorithm and verify the effectiveness of the algorithm using numerical examples.
comment: 31 pages, 2 figures
Reevaluating Bluetooth Low Energy for Ingestible Electronics
Bluetooth Low Energy (BLE) has been widely adopted in wearable devices; however, it has not been widely used in ingestible electronics, primarily due to concerns regarding severe tissue attenuation at the 2.4 GHz band. In this work, we systematically reevaluate the feasibility of BLE for ingestible applications by benchmarking its performance against representative sub-GHz communication schemes across power consumption, throughput, tissue-induced attenuation, latency, and system-level integration constraints. We demonstrate that incorporating an RF amplifier enables BLE to maintain robust communication links through tissue-mimicking media while preserving favorable energy efficiency. We further quantify the relationship between throughput and energy consumption over a wide operating range and demonstrate that, for the majority of ingestible sensing applications with throughput requirements below 100 kbps, BLE achieves substantially lower power consumption than sub-GHz alternatives. End-to-end latency measurements show that BLE offers significantly lower latency than sub-GHz solutions due to its native compatibility with modern computing infrastructure. Finally, we analyze antenna form factor and ecosystem integration, highlighting the mechanical and translational advantages of BLE in ingestible system design. Collectively, these results demonstrate that BLE, when appropriately configured, represents a compelling and scalable wireless communication solution for next-generation ingestible electronics.
A Hybrid BLE-Wi-Fi Communication Architecture for Adaptive Imaging in Wireless Capsule Endoscopy
Wireless capsule endoscopy (WCE) is fundamentally constrained by limited wireless bandwidth, resulting in low imaging resolution and frame rate, which can cause motion blur and missed lesions. Although adaptive frame-rate schemes have been explored to accommodate transient gastrointestinal (GI) motility, these approaches typically require sacrificing image resolution. The use of higher-frequency communication bands is further limited by increased tissue attenuation. To address these challenges, we propose a hybrid Bluetooth Low Energy (BLE) and WiFi communication architecture that combines the low-power operation of BLE with the high data throughput of WiFi. We systematically evaluate the performance of BLE and WiFi under tissue-mimicking conditions by measuring throughput, received signal strength indicator (RSSI), and power consumption. The results demonstrate that amplified BLE with an adaptive transmission power control strategy provides a stable frame rate at low power consumption, while 2.4 GHz WiFi operating in station mode is the most suitable high-throughput communication configuration for WCE. Compared with WiFi, BLE reduces power consumption by approximately ten times, whereas WiFi achieves up to ten times higher throughput. To reconcile these complementary trade-offs, we further introduce a hybrid system with a frame-boundary-synchronized switching mechanism to ensure lossless data transmission during BLE and WiFi transitions. Experimental results show that the switching latency from BLE to WiFi is approximately 92.66 ms, which is longer than the WiFi-to-BLE switching latency of 15.49 ms when transmitting 10 kB image payloads. Overall, the proposed hybrid BLE and WiFi system enables robust, lossless, and energy-efficient mode switching, supports adaptive imaging, and advances the development of next-generation autonomous WCE platforms.
In-Network Collective Operations: Game Changer or Challenge for AI Workloads?
This paper summarizes the opportunities of in-network collective operations (INC) for accelerated collective operations in AI workloads. We provide sufficient detail to make this important field accessible to non-experts in AI or networking, fostering a connection between these communities. Consider two types of INC: Edge-INC, where the system is implemented at the node level, and Core-INC, where the system is embedded within network switches. We outline the potential performance benefits as well as six key obstacles in the context of both Edge-INC and Core-INC that may hinder their adoption. Finally, we present a set of predictions for the future development and application of INC.
Structural Monotonicity in Transmission Scheduling for Remote State Estimation with Hidden Channel Mode
This study treats transmission scheduling for remote state estimation over unreliable channels with a hidden mode. A local Kalman estimator selects scheduling actions, such as power allocation and resource usage, and communicates with a remote estimator based on acknowledgement feedback, balancing estimation performance and communication cost. The resulting problem is naturally formulated as a partially observable Markov decision process (POMDP). In settings with observable channel modes, it is well known that monotonicity of the value function can be established via investigating order-preserving property of transition kernels. In contrast, under partial observability, the transition kernels generally lack this property, which prevents the direct application of standard monotonicity arguments. To overcome this difficulty, we introduce a novel technique, referred to as state-space folding, which induces transformed transition kernels recovering order preservation on the folded space. This transformation enables a rigorous monotonicity analysis in the partially observable setting. As a representative implication, we focus on an associated optimal stopping formulation and show that the resulting optimal scheduling policy admits a threshold structure.
Passive Daytime Radiative Cooling Enabled by Bio-Derived Ceramic-Polymer Coatings on Rapid-Curing Fiberglass Casts
Passive daytime radiative cooling (PDRC) provides an energy-free approach to suppress surface temperatures by reflecting solar irradiation while emitting thermal radiation through the mid-infrared atmospheric window. Despite rapid progress in optical performance, most PDRC systems remain limited by rigid, fragile, or planar substrates, restricting their use on flexible, curved, or wearable surfaces. Here, we report a biocompatible and structurally robust PDRC system integrated onto a commercial rapid-curing fiberglass cast, a conformal substrate widely used in orthopedic and industrial applications. The cooling architecture adopts a bilayer polymer design consisting of a polyvinyl alcohol (PVA) adhesion layer and a polymethyl methacrylate (PMMA) protective layer, both embedded with calcium pyrophosphate (CPP) ceramic particles derived from processed animal bone waste. The bio-derived CPP simultaneously enables broadband solar scattering and high mid-infrared emittance, while offering sustainability and biocompatibility advantages. The resulting composite exhibits over 90% solar reflectance and achieves up to 15 C sub-ambient cooling under direct outdoor sunlight.
comment: 20 pages, 5 figures
Dual Analysis of Continuous-time Economic Dispatch and Its Price Implications
As load varies continuously over time, it is essential to provide continuous-time price signals that accurately reflect supply-demand balance. However, conventional discrete-time economic dispatch fails to capture the intra-temporal variations in load and generation. Its dual solution--the marginal price--may distort economic signals, leading to inefficient market incentives. To analyze these issues, this paper develops a continuous-time dispatch model and derives its dual formulation for price analysis. The continuous-time dispatch produces dual variables that can be interpreted as price signals. Piecewise time-indexed generation and price trajectories are then constructed through a parametric programming approach. The resulting price, represented by the Lagrange multiplier of the system-wide power balance constraint, evolves piecewise along the continuous-time load profile. Each segment corresponds to a critical region characterized by a set of active constraints. Furthermore, we discuss the impact of unit-specific ramping constraints on price implications. Results indicate that continuous-time generation and price trajectories provide deeper insights into the price distortions and inefficient incentives inherent in discrete-time formulations. The proposed methodology is validated on an illustrative 5-bus system and a modified IEEE RTS-2019.
comment: 10 pages, 8 figures
3D-Printed Passive Reflectors for mmWave Beam Steering via Binary Aperture Control
This paper presents a theory-to-hardware framework for fully passive millimeter-wave beam control aimed at extending indoor coverage. The reflecting aperture is modeled using an equivalent array-factor formulation in which each passive element contributes a reradiated field determined by the incident and desired departure angles. Building on this model, we develop two implementation-friendly passive design approaches: (i) binary (1-bit) spatial masking, which enables beam steering and multi-beam synthesis by selecting element participation on a dense lattice via an ON/OFF metallization pattern, and (ii) diffraction-order (periodic) steering, which exploits controlled aperture periodicity to place selected diffraction orders at prescribed angles. Theoretical analysis characterizes the asymptotic activation behavior of the binary mask and establishes a distribution-free lower-bound on the achievable gain. Prototypes are realized using a copper-backed 3D-printed substrate with stencil-guided conductive ink deposition. 60 GHz over-the-air measurements in single- and multi-beam configurations validate the predicted steering behavior. Theoretical and experimental results demonstrate that fully passive, binary-coded apertures can provide deterministic beam control and offer a scalable alternative to power-consuming reconfigurable surfaces for static indoor links.
A Leader-Follower Approach for The Attitude Synchronization of Multiple Rigid Body Systems on $SO(3)$
This paper deals with the leader-follower attitude synchronization problem for a group of heterogeneous rigid body systems on $SO(3)$ under an undirected, connected, and acyclic graph communication topology. The proposed distributed control strategy, endowed with almost global asymptotic stability guarantees, allows the synchronization of the rigid body systems to a constant desired orientation known only to a single rigid body. Some simulation results are also provided to validate the theoretical developments and illustrate the performance of the proposed control strategy.
FTA-NTN: Fairness and Throughput Assurance in Non-Terrestrial Networks
Designing optimal non-terrestrial network (NTN) constellations is essential for maximizing throughput and ensuring fair resource distribution. This paper presents FTA-NTN (Fairness and Throughput Assurance in Non-Terrestrial Networks), a multi-objective optimization framework that jointly maximizes throughput and fairness under realistic system constraints. The framework integrates multi-layer Walker Delta constellations, a parametric mobility model for user distributions across Canadian land regions, adaptive K-Means clustering for beamforming and user association, and Bayesian optimization for parameter tuning. Simulation results with 500 users show that FTA-NTN achieves over 9.88 Gbps of aggregate throughput with an average fairness of 0.42, corresponding to an optimal configuration of 9 planes with 15 satellites per plane in LEO and 7 planes with 3 satellites per plane in MEO. These values align with 3GPP NTN evaluation scenarios and representative system assumptions, confirming their relevance for realistic deployments. Overall, FTA-NTN demonstrates that throughput and fairness can be jointly optimized under practical constraints, advancing beyond throughput-centric designs in the literature and offering a scalable methodology for next-generation NTN deployments that supports efficient and equitable global connectivity.
comment: 6 pages, 3 figures, Accepted to IEEE International Conference on Communications (ICC) 2026
Machine Learning-Based Evaluation of Attitude Sensor Characteristics Using Microsatellite Flight Data
Using actual flight data from a 50-cm-class microsatellite whose mission and operations have already been completed, this study re-evaluates satellite attitude determination performance and the error characteristics of onboard attitude sensors. While conventional approaches rely on batch estimation or Kalman filtering based on predefined physical models and white-noise assumptions, this research introduces a machine-learning-based approach to extract and correct structural and nonlinear error patterns embedded in real observational data. In this study, high-quality attitude determination results obtained from star sensors and a fiber optical gyro (FOG) are treated as ground truth, and machine learning is applied to coarse attitude sensor data consisting of Sun sensors and magnetic field sensors. A one-dimensional convolutional neural network (Conv1D) is employed to regressively predict attitude from short sequences of time-series sensor measurements. The model is trained and evaluated using five sets of on-orbit observation logs, with four passes used for training and one independent pass used for testing. The results show that, while conventional coarse attitude determination using the TRIAD method yields attitude errors on the order of 7 deg RMS, the proposed machine-learning approach achieves RMS errors of approximately 0.7 deg for the training data and 2-3 deg for the test data, depending on the sensor combination.
comment: 7 pages, 5 figures. 4 tables, UKAREN 69th, November 25-28, 2025
Control systems for synthetic biology and a case-study in cell fate reprogramming
This paper gives an overview of the use of control systems engineering in synthetic biology, motivated by applications such as cell therapy and cell fate reprogramming for regenerative medicine. A ubiquitous problem in these and other applications is the ability to control the concentration of specific regulatory factors in the cell accurately despite environmental uncertainty and perturbations. The paper describes the origin of these perturbations and how they affect the dynamics of the biomolecular ``plant'' to be controlled. A variety of biomolecular control implementations are then introduced to achieve robustness of the plant's output to perturbations and are grouped into feedback and feedforward control architectures. Although sophisticated control laws can be implemented in a computer today, they cannot be necessarily implemented inside the cell via biomolecular processes. This fact constraints the set of feasible control laws to those realizable through biomolecular processes that can be engineered with synthetic biology. After reviewing biomolecular feedback and feedforward control implementations, mostly focusing on the author's own work, the paper illustrates the application of such control strategies to cell fate reprogramming. Within this context, a master regulatory factor needs to be controlled at a specific level inside the cell in order to reprogram skin cells to pluripotent stem cells. The article closes by highlighting on-going challenges and directions of future research for biomolecular control design.
OptAgent: an Agentic AI framework for Intelligent Building Operations
The urgent need for building decarbonization calls for a paradigm shift in future autonomous building energy operation, from human-intensive engineering workflows toward intelligent agents that interact with physics-grounded digital environments. This study proposes an end-to-end agentic AI-enabled Physics-Informed Machine Learning (PIML) environment for scalable building energy modeling, simulation, control, and automation. The framework consists of (1) a modular and physics-consistent PIML digital environment spanning building thermal dynamics, Heating, Ventilation, and Air Conditioning (HVAC), and distributed energy resources (DER) for grid-interactive energy management; and (2) an agentic AI layer with 11 specialist agents and 72 Model Context Protocol (MCP) tools that enable end-to-end execution of multi-step energy analytics. A representative case study demonstrates multi-domain, multi-agent coordination for assessing how system and control upgrades affect energy use, operating cost, thermal comfort, and flexibility. In addition, a large-scale benchmark (about 4000 runs) systematically evaluates workflow performance in terms of accuracy, token consumption, execution time, and inference cost. The results quantify the impacts of intelligence mode design, model size, task complexity, and orchestrator-specialist coordination, and provide key lessons for building future agentic AI systems in real-world building energy applications. This work establishes a scalable, physics-grounded foundation for deploying agentic AI in decarbonized and grid-interactive building operations.
Probabilistic Sensing: Intelligence in Data Sampling ISCA
Extending the intelligence of sensors to the data-acquisition process - deciding whether to sample or not - can result in transformative energy-efficiency gains. However, making such a decision in a deterministic manner involves risk of losing information. Here we present a sensing paradigm that enables making such a decision in a probabilistic manner. The paradigm takes inspiration from the autonomous nervous system and employs a probabilistic neuron (p-neuron) driven by an analog feature extraction circuit. The response time of the system is on the order of microseconds, over-coming the sub-sampling-rate response time limit and enabling real-time intelligent autonomous activation of data-sampling. Validation experiments on active seismic survey data demonstrate lossless probabilistic data acquisition, with a normalized mean squared error of 0.41%, and 93% saving in the active operation time of the system and the number of generated samples.
comment: Accepted for presentation at IEEE ISCAS 2026 as a lecture
Secrecy Performance of a Keyhole-based Multi-user System with Multiple Eavesdroppers
This paper investigates the secrecy performance of a keyhole-aided multi-user communication network in the presence of multiple eavesdroppers. The communication happens through the same keyhole for legitimate users and eavesdroppers. In this context, the secrecy performance is evaluated for a user scheduling technique by obtaining the exact closed-form expression of secrecy outage probability (SOP). Further, a simplified asymptotic SOP expression is derived assuming high signal-to-noise ratio (SNR) scenario for a better understanding of the impact of system parameters. The effect of the keyhole parameters, number of users, number of eavesdroppers, and threshold secrecy rate on the SOP performance are also investigated for the considered system model. In the high-SNR regime, the asymptotic SOP saturates to a constant value and does not depend on the keyhole parameter and the channel parameter of the source-to-keyhole channel.
Fine Tuning a Simulation-Driven Estimator
Many industries now deploy high-fidelity simulators (digital twins) to represent physical systems, yet their parameters must be calibrated to match the true system. This motivated the construction of simulation-driven parameter estimators, built by generating synthetic observations for sampled parameter values and learning a supervised mapping from observations to parameters. However, when the true parameters lie outside the sampled range, predictions suffer from an out-of-distribution (OOD) error. This paper introduces a fine-tuning approach for the Two-Stage estimator that mitigates OOD effects and improves accuracy. The effectiveness of the proposed method is verified through numerical simulations.
comment: Published in IEEE Control Systems Letters, vol. 9, pp 2975-2980, 2025
Ruggedized Ultrasound Sensing in Harsh Conditions: eRTIS in the wild
We present eRTIS, a rugged, embedded ultrasound sensing system for use in harsh industrial environments. The system features a broadband capacitive transducer and a 32-element MEMS microphone array capable of 2D and 3D beamforming. A modular hardware architecture separates sensing and processing tasks: a high-performance microcontroller handles excitation signal generation and data acquisition, while an NVIDIA Jetson module performs GPU-accelerated signal processing. eRTIS supports external synchronization via a custom controller that powers and coordinates up to six devices, either simultaneously or in a defined sequence. Additional synchronization options include bidirectional triggering and in-band signal injection. A sealed, anodized aluminum enclosure with passive cooling and IP-rated connectors ensures reliability in challenging conditions. Performance is demonstrated in three field scenarios: harbor mooring, off-road robotics, and autonomous navigation in cluttered environments, demonstrates that eRTIS provides robust sensing in situations where optical systems degrade.
Specifying and Verifying RDMA Synchronisation (Extended Version)
Remote direct memory access (RDMA) allows a machine to directly read from and write to the memory of remote machine, enabling high-throughput, low-latency data transfer. Ensuring correctness of RDMA programs has only recently become possible with the formalisation of $\text{RDMA}^\text{TSO}$ semantics (describing the behaviour of RDMA networking over a TSO CPU). However, this semantics currently lacks a formalisation of remote synchronisation, meaning that the implementations of common abstractions such as locks cannot be verified. In this paper, we close this gap by presenting $\text{RDMA}^{\text{TSO}}_{\text{RMW}}$, the first semantics for remote `read-modify-write' (RMW) instructions over TSO. It turns out that remote RMW operations are weak and only ensure atomicity against other remote RMWs. We therefore build a set of composable synchronisation abstractions starting with the $\text{RDMA}^{\text{WAIT}}_{\text{RMW}}$ library. Underpinned by $\text{RDMA}^{\text{WAIT}}_{\text{RMW}}$, we then specify, implement and verify three classes of remote locks that are suitable for different scenarios. Additionally, we develop the notion of a strong RDMA model, $\text{RDMA}^{\text{SC}}_{\text{RMW}}$, which is akin to sequential consistency in shared memory architectures. Our libraries are built to be compatible with an existing set of high-performance libraries called LOCO, which ensures compositionality and verifiability.
comment: 95 pages, extended version of ESOP 2026 paper, replaced to fix a tikz opacity problem
Convergence of energy-based learning in linear resistive networks
Energy-based learning algorithms are alternatives to backpropagation and are well-suited to distributed implementations in analog electronic devices. However, a rigorous theory of convergence is lacking. We make a first step in this direction by analysing a particular energybased learning algorithm, Contrastive Learning, applied to a network of linear adjustable resistors. It is shown that, in this setup, Contrastive Learning is equivalent to projected gradient descent on a convex function with Lipschitz continuous gradient, giving a guarantee of convergence of the algorithm for a range of stepsizes. This convergence result is then extended to a stochastic variant of Contrastive Learning.
AlphaPEM: an open-source dynamic 1D physics-based PEM fuel cell model for embedded applications
The urgency of the energy transition requires improving the performance and longevity of hydrogen technologies. AlphaPEM is a dynamic one-dimensional (1D) physics-based PEM fuel cell system simulator, programmed in Python and experimentally validated. It offers a good balance between accuracy and execution speed. The modular architecture allows for addition of new features, and it has a user-friendly graphical interface. An automatic calibration method is proposed to match the model to the studied fuel cell. The software provides information on the internal states of the system in response to any current density and can produce polarization and EIS curves. AlphaPEM facilitates the use of a model in embedded conditions, allowing real-time modification of the fuel cell's operating conditions.
An advanced 1D physics-based model for PEM hydrogen fuel cells with enhanced overvoltage prediction
A one-dimensional, dynamic, two-phase, isothermal model of proton exchange membrane fuel cell systems using a finite-difference approach has been developed. This model balances the simplicity of lumped-parameter models with the detailed accuracy of computational fluid dynamics models, offering precise internal state descriptions with low computational demand. The model's static behavior is validated experimentally using polarization curves. In addition, a novel physical parameter, the limit liquid water saturation coefficient ($s_{\rm lim}$), is introduced in the overvoltage calculation, replacing the traditional limit current density coefficient ($i_{\rm lim}$). This new parameter links the voltage drop at high current densities to the amount of liquid water present in the catalyst layers and the operating conditions of the fuel cell. Additionally, it has been observed that $s_{\rm lim}$ is influenced at least by the gas pressure applied by the operator. This newly established link is promising for optimizing the control and thereby improving the performance of fuel cells.
Revisiting Z Transform Laplace Inversion: To Correct flaws in Signal and System Theory
This paper revisits the classical formulation of the Z-transform and its relationship to the inverse Laplace transform (L-1), originally developed by Ragazzini in sampled-data theory. It identifies a longstanding mathematical oversight in standard derivations, which typically neglect the contribution from the infinite arc in the complex plane during inverse Laplace evaluation. This omission leads to inconsistencies, especially at discontinuities such as t = 0. By incorporating the full Bromwich contour, including all boundary contributions, we restore internal consistency between L-1 and the Z-transform, aligning the corrected L-1 with results from Discrete-Time Fourier Transform (DTFT) aliasing theory. Consequently, this necessitates a structural revision of the Z-transform, inverse Laplace transform, and the behavior of the Heaviside step function at discontinuities, providing a more accurate foundation for modeling and analysis of sampled-data systems.
comment: This work is to be submitted to IEEE transactions on automatic control This is revision2 of the manuscript
Hill-Type Stability Analysis of Periodic Solutions of Fractional-Order Differential Equations
This paper explores stability properties of periodic solutions of (nonlinear) fractional-order differential equations (FODEs). As classical Caputo-type FODEs do not admit exactly periodic solutions, we propose a framework of Liouville-Weyl-type FODEs, which do admit exactly periodic solutions and are an extension of Caputo-type FODEs. Local linearization around a periodic solution results in perturbation dynamics governed by a linear time-periodic differential equation. In the classical integer-order case, the perturbation dynamics is therefore described by Floquet theory, i.e. the exponential growth or decay of perturbations is expressed by Floquet exponents which can be assessed using the Hill matrix approach. For fractional-order systems, however, a rigorous Floquet theory is lacking. Here, we explore the limitations when trying to extend Floquet theory and the Hill matrix method to linear time-periodic fractional-order differential equations (LTP-FODEs) as local linearization of nonlinear fractional-order systems. A key result of the paper is that such an extended Floquet theory can only assess exponentially growing solutions of LTP-FODEs. Moreover, we provide an analysis of linear time-invariant fractional-order systems (LTI-FODEs) with algebraically decaying solutions and show that the inaccessibility of decaying solutions through Floquet theory is already present in the time-invariant case.
comment: 22 pages, 6 figures
Cloud and AI Infrastructure Cost Optimization: A Comprehensive Review of Strategies and Case Studies
Cloud computing has revolutionized the way organizations manage their IT infrastructure, but it has also introduced new challenges, such as managing cloud costs. The rapid adoption of artificial intelligence (AI) and machine learning (ML) workloads has further amplified these challenges, with GPU compute now representing 40-60\% of technical budgets for AI-focused organizations. This paper provides a comprehensive review of cloud and AI infrastructure cost optimization techniques, covering traditional cloud pricing models, resource allocation strategies, and emerging approaches for managing AI/ML workloads. We examine the dramatic cost reductions in large language model (LLM) inference which has decreased by approximately 10x annually since 2021 and explore techniques such as model quantization, GPU instance selection, and inference optimization. Real-world case studies from Amazon Prime Video, Pinterest, Cloudflare, and Netflix showcase practical application of these techniques. Our analysis reveals that organizations can achieve 50-90% cost savings through strategic optimization approaches. Future research directions in automated optimization, sustainability, and AI-specific cost management are proposed to advance the state of the art in this rapidly evolving field.
comment: Version 2. Significantly expanded to include AI/ML infrastructure and GPU cost optimization. Updated with 2025 industry data and new case studies on LLM inference costs. Title updated from "Cloud Cost Optimization: A Comprehensive Review of Strategies and Case Studies" to reflect broader scope
Contact Plan Design for Cross-Linked GNSSs: An ILP Approach for Extended Applications
Global Navigation Satellite Systems (GNSS) employ inter-satellite links (ISLs) to reduce dependency on ground stations, enabling precise ranging and communication across satellites. Beyond their traditional role, ISLs can support extended applications, including providing navigation and communication services to external entities. However, designing effective contact plan design (CPD) schemes for these multifaceted ISLs, operating under a polling time-division duplex (PTDD) framework, remains a critical challenge. Existing CPD approaches focus solely on meeting GNSS satellites' internal ranging and communication demands, neglecting their extended applications. This paper introduces the first CPD scheme capable of supporting extended GNSS ISLs. By modeling GNSS requirements and designing a tailored service process, our approach ensures the allocation of essential resources for internal operations while accommodating external user demands. Based on the BeiDou constellation, simulation results demonstrate the proposed scheme's efficacy in maintaining core GNSS functionality while providing extended ISLs on a best-effort basis. Additionally, the results highlight the significant impact of GNSS ISLs in enhancing orbit determination and clock synchronization for the Earth-Moon libration point constellation, underscoring the importance of extended GNSS ISL applications.
comment: 19 pages, 13 figures
Learning Flatness-Preserving Residuals for Pure-Feedback Systems
We study residual dynamics learning for differentially flat systems, where a nominal model is augmented with a learned correction term from data. A key challenge is that generic residual parameterizations may destroy flatness, limiting the applicability of flatness-based planning and control methods. To address this, we propose a framework for learning flatness-preserving residual dynamics in systems whose nominal model admits a pure-feedback form. We show that residuals with a lower-triangular structure preserve both the flatness of the system and the original flat outputs. Moreover, we provide a constructive procedure to recover the flatness diffeomorphism of the augmented system from that of the nominal model. Building on these insights, we introduce a parameterization of flatness-preserving residuals using smooth function approximators, making them learnable from trajectory data with conventional algorithms. Our approach is validated in simulation on a 2D quadrotor subject to unmodeled aerodynamic effects. We demonstrate that the resulting learned flat model achieves a tracking error $5\times$ lower than the nominal flat model, while being $20\times$ faster over a structure-agnostic alternative.
Supporting Lemmas for RISE-based Control Methods
A class of continuous controllers termed Robust Integral of the Signum of the Error (RISE) have been published over the last decade as a means to yield asymptotic convergence of the tracking error for classes of nonlinear systems that are subject to exogenous disturbances and/or modeling uncertainties. The development of this class of controllers relies on a property related to the integral of the signum of an error signal. A proof for this property is not available in previous literature. The stability of some RISE controllers is analyzed using differential inclusions. Such results rely on the hypothesis that a set of points is Lebesgue negligible. This paper states and proves two lemmas related to the properties.
Robotics
Goal-oriented Communication for Fast and Robust Robotic Fault Detection and Recovery
Autonomous robotic systems are widely deployed in smart factories and operate in dynamic, uncertain, and human-involved environments that require low-latency and robust fault detection and recovery (FDR). However, existing FDR frameworks exhibit various limitations, such as significant delays in communication and computation, and unreliability in robot motion/trajectory generation, mainly because the communication-computation-control (3C) loop is designed without considering the downstream FDR goal. To address this, we propose a novel Goal-oriented Communication (GoC) framework that jointly designs the 3C loop tailored for fast and robust robotic FDR, with the goal of minimising the FDR time while maximising the robotic task (e.g., workpiece sorting) success rate. For fault detection, our GoC framework innovatively defines and extracts the 3D scene graph (3D-SG) as the semantic representation via our designed representation extractor, and detects faults by monitoring spatial relationship changes in the 3D-SG. For fault recovery, we fine-tune a small language model (SLM) via Low-Rank Adaptation (LoRA) and enhance its reasoning and generalization capabilities via knowledge distillation to generate recovery motions for robots. We also design a lightweight goal-oriented digital twin reconstruction module to refine the recovery motions generated by the SLM when fine-grained robotic control is required, using only task-relevant object contours for digital twin reconstruction. Extensive simulations demonstrate that our GoC framework reduces the FDR time by up to 82.6% and improves the task success rate by up to 76%, compared to the state-of-the-art frameworks that rely on vision language models for fault detection and large language models for fault recovery.
comment: Submit to IEEE for potential publication
Advances and Innovations in the Multi-Agent Robotic System (MARS) Challenge NeurIPS 2025
Recent advancements in multimodal large language models and vision-languageaction models have significantly driven progress in Embodied AI. As the field transitions toward more complex task scenarios, multi-agent system frameworks are becoming essential for achieving scalable, efficient, and collaborative solutions. This shift is fueled by three primary factors: increasing agent capabilities, enhancing system efficiency through task delegation, and enabling advanced human-agent interactions. To address the challenges posed by multi-agent collaboration, we propose the Multi-Agent Robotic System (MARS) Challenge, held at the NeurIPS 2025 Workshop on SpaVLE. The competition focuses on two critical areas: planning and control, where participants explore multi-agent embodied planning using vision-language models (VLMs) to coordinate tasks and policy execution to perform robotic manipulation in dynamic environments. By evaluating solutions submitted by participants, the challenge provides valuable insights into the design and coordination of embodied multi-agent systems, contributing to the future development of advanced collaborative AI systems.
comment: MARS Challenge @ NeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI. Challenge page: https://mars-eai.github.io/MARS-Challenge-Webpage/
Trustworthy Evaluation of Robotic Manipulation: A New Benchmark and AutoEval Methods
Driven by the rapid evolution of Vision-Action and Vision-Language-Action models, imitation learning has significantly advanced robotic manipulation capabilities. However, evaluation methodologies have lagged behind, hindering the establishment of Trustworthy Evaluation for these behaviors. Current paradigms rely on binary success rates, failing to address the critical dimensions of trust: Source Authenticity (i.e., distinguishing genuine policy behaviors from human teleoperation) and Execution Quality (e.g., smoothness and safety). To bridge these gaps, we propose a solution that combines the Eval-Actions benchmark and the AutoEval architecture. First, we construct the Eval-Actions benchmark to support trustworthiness analysis. Distinct from existing datasets restricted to successful human demonstrations, Eval-Actions integrates VA and VLA policy execution trajectories alongside human teleoperation data, explicitly including failure scenarios. This dataset is structured around three core supervision signals: Expert Grading (EG), Rank-Guided preferences (RG), and Chain-of-Thought (CoT). Building on this, we propose the AutoEval architecture: AutoEval leverages Spatio-Temporal Aggregation for semantic assessment, augmented by an auxiliary Kinematic Calibration Signal to refine motion smoothness; AutoEval Plus (AutoEval-P) incorporates the Group Relative Policy Optimization (GRPO) paradigm to enhance logical reasoning capabilities. Experiments show AutoEval achieves Spearman's Rank Correlation Coefficients (SRCC) of 0.81 and 0.84 under the EG and RG protocols, respectively. Crucially, the framework possesses robust source discrimination capabilities, distinguishing between policy-generated and teleoperated videos with 99.6% accuracy, thereby establishing a rigorous standard for trustworthy robotic evaluation. Our project and code are available at https://term-bench.github.io/.
Low Cost, High Efficiency: LiDAR Place Recognition in Vineyards with Matryoshka Representation Learning
Localization in agricultural environments is challenging due to their unstructured nature and lack of distinctive landmarks. Although agricultural settings have been studied in the context of object classification and segmentation, the place recognition task for mobile robots is not trivial in the current state of the art. In this study, we propose MinkUNeXt-VINE, a lightweight, deep-learning-based method that surpasses state-of-the-art methods in vineyard environments thanks to its pre-processing and Matryoshka Representation Learning multi-loss approach. Our method prioritizes enhanced performance with low-cost, sparse LiDAR inputs and lower-dimensionality outputs to ensure high efficiency in real-time scenarios. Additionally, we present a comprehensive ablation study of the results on various evaluation cases and two extensive long-term vineyard datasets employing different LiDAR sensors. The results demonstrate the efficiency of the trade-off output produced by this approach, as well as its robust performance on low-cost and low-resolution input data. The code is publicly available for reproduction.
A Pragmatic VLA Foundation Model
Offering great potential in robotic manipulation, a capable Vision-Language-Action (VLA) foundation model is expected to faithfully generalize across tasks and platforms while ensuring cost efficiency (e.g., data and GPU hours required for adaptation). To this end, we develop LingBot-VLA with around 20,000 hours of real-world data from 9 popular dual-arm robot configurations. Through a systematic assessment on 3 robotic platforms, each completing 100 tasks with 130 post-training episodes per task, our model achieves clear superiority over competitors, showcasing its strong performance and broad generalizability. We have also built an efficient codebase, which delivers a throughput of 261 samples per second per GPU with an 8-GPU training setup, representing a 1.5~2.8$\times$ (depending on the relied VLM base model) speedup over existing VLA-oriented codebases. The above features ensure that our model is well-suited for real-world deployment. To advance the field of robot learning, we provide open access to the code, base model, and benchmark data, with a focus on enabling more challenging tasks and promoting sound evaluation standards.
comment: Project Webpage: https://technology.robbyant.com/lingbot-vla/, Code: https://github.com/Robbyant/lingbot-vla/
Constraint-Aware Discrete-Time PID Gain Optimization for Robotic Joint Control Under Actuator Saturation
The precise regulation of rotary actuation is fundamental in autonomous robotics, yet practical PID loops deviate from continuous-time theory due to discrete-time execution, actuator saturation, and small delays and measurement imperfections. We present an implementation-aware analysis and tuning workflow for saturated discrete-time joint control. We (i) derive PI stability regions under Euler and exact zero-order-hold (ZOH) discretizations using the Jury criterion, (ii) evaluate a discrete back-calculation anti-windup realization under saturation-dominant regimes, and (iii) propose a hybrid-certified Bayesian optimization workflow that screens analytically unstable candidates and behaviorally unsafe transients while optimizing a robust IAE objective with soft penalties on overshoot and saturation duty. Baseline sweeps ($τ=1.0$~s, $Δt=0.01$~s, $u\in[-10,10]$) quantify rise/settle trends for P/PI/PID. Under a randomized model family emulating uncertainty, delay, noise, quantization, and tighter saturation, robustness-oriented tuning improves median IAE from $0.843$ to $0.430$ while keeping median overshoot below $2\%$. In simulation-only tuning, the certification screen rejects $11.6\%$ of randomly sampled gains within bounds before full robust evaluation, improving sample efficiency without hardware experiments.
ExoGS: A 4D Real-to-Sim-to-Real Framework for Scalable Manipulation Data Collection
Real-to-Sim-to-Real technique is gaining increasing interest for robotic manipulation, as it can generate scalable data in simulation while having narrower sim-to-real gap. However, previous methods mainly focused on environment-level visual real-to-sim transfer, ignoring the transfer of interactions, which could be challenging and inefficient to obtain purely in simulation especially for contact-rich tasks. We propose ExoGS, a robot-free 4D Real-to-Sim-to-Real framework that captures both static environments and dynamic interactions in the real world and transfers them seamlessly to a simulated environment. It provides a new solution for scalable manipulation data collection and policy learning. ExoGS employs a self-designed robot-isomorphic passive exoskeleton AirExo-3 to capture kinematically consistent trajectories with millimeter-level accuracy and synchronized RGB observations during direct human demonstrations. The robot, objects, and environment are reconstructed as editable 3D Gaussian Splatting assets, enabling geometry-consistent replay and large-scale data augmentation. Additionally, a lightweight Mask Adapter injects instance-level semantics into the policy to enhance robustness under visual domain shifts. Real-world experiments demonstrate that ExoGS significantly improves data efficiency and policy generalization compared to teleoperation-based baselines. Code and hardware files have been released on https://github.com/zaixiabalala/ExoGS.
Attention-Based Neural-Augmented Kalman Filter for Legged Robot State Estimation
In this letter, we propose an Attention-Based Neural-Augmented Kalman Filter (AttenNKF) for state estimation in legged robots. Foot slip is a major source of estimation error: when slip occurs, kinematic measurements violate the no-slip assumption and inject bias during the update step. Our objective is to estimate this slip-induced error and compensate for it. To this end, we augment an Invariant Extended Kalman Filter (InEKF) with a neural compensator that uses an attention mechanism to infer error conditioned on foot-slip severity and then applies this estimate as a post-update compensation to the InEKF state (i.e., after the filter update). The compensator is trained in a latent space, which aims to reduce sensitivity to raw input scales and encourages structured slip-conditioned compensations, while preserving the InEKF recursion. Experiments demonstrate improved performance compared to existing legged-robot state estimators, particularly under slip-prone conditions.
comment: 8 pages, 6 figures, Accepted to IEEE Robotics and Automation Letters (RA-L)
Fast and Safe Trajectory Optimization for Mobile Manipulators With Neural Configuration Space Distance Field
Mobile manipulators promise agile, long-horizon behavior by coordinating base and arm motion, yet whole-body trajectory optimization in cluttered, confined spaces remains difficult due to high-dimensional nonconvexity and the need for fast, accurate collision reasoning. Configuration Space Distance Fields (CDF) enable fixed-base manipulators to model collisions directly in configuration space via smooth, implicit distances. This representation holds strong potential to bypass the nonlinear configuration-to-workspace mapping while preserving accurate whole-body geometry and providing optimization-friendly collision costs. Yet, extending this capability to mobile manipulators is hindered by unbounded workspaces and tighter base-arm coupling. We lift this promise to mobile manipulation with Generalized Configuration Space Distance Fields (GCDF), extending CDF to robots with both translational and rotational joints in unbounded workspaces with tighter base-arm coupling. We prove that GCDF preserves Euclidean-like local distance structure and accurately encodes whole-body geometry in configuration space, and develop a data generation and training pipeline that yields continuous neural GCDFs with accurate values and gradients, supporting efficient GPU-batched queries. Building on this representation, we develop a high-performance sequential convex optimization framework centered on GCDF-based collision reasoning. The solver scales to large numbers of implicit constraints through (i) online specification of neural constraints, (ii) sparsity-aware active-set detection with parallel batched evaluation across thousands of constraints, and (iii) incremental constraint management for rapid replanning under scene changes.
SKETCH: Semantic Key-Point Conditioning for Long-Horizon Vessel Trajectory Prediction
Accurate long-horizon vessel trajectory prediction remains challenging due to compounded uncertainty from complex navigation behaviors and environmental factors. Existing methods often struggle to maintain global directional consistency, leading to drifting or implausible trajectories when extrapolated over long time horizons. To address this issue, we propose a semantic-key-point-conditioned trajectory modeling framework, in which future trajectories are predicted by conditioning on a high-level Next Key Point (NKP) that captures navigational intent. This formulation decomposes long-horizon prediction into global semantic decision-making and local motion modeling, effectively restricting the support of future trajectories to semantically feasible subsets. To efficiently estimate the NKP prior from historical observations, we adopt a pretrain-finetune strategy. Extensive experiments on real-world AIS data demonstrate that the proposed method consistently outperforms state-of-the-art approaches, particularly for long travel durations, directional accuracy, and fine-grained trajectory prediction.
DV-VLN: Dual Verification for Reliable LLM-Based Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires an embodied agent to navigate in a complex 3D environment according to natural language instructions. Recent progress in large language models (LLMs) has enabled language-driven navigation with improved interpretability. However, most LLM-based agents still rely on single-shot action decisions, where the model must choose one option from noisy, textualized multi-perspective observations. Due to local mismatches and imperfect intermediate reasoning, such decisions can easily deviate from the correct path, leading to error accumulation and reduced reliability in unseen environments. In this paper, we propose DV-VLN, a new VLN framework that follows a generate-then-verify paradigm. DV-VLN first performs parameter-efficient in-domain adaptation of an open-source LLaMA-2 backbone to produce a structured navigational chain-of-thought, and then verifies candidate actions with two complementary channels: True-False Verification (TFV) and Masked-Entity Verification (MEV). DV-VLN selects actions by aggregating verification successes across multiple samples, yielding interpretable scores for reranking. Experiments on R2R, RxR (English subset), and REVERIE show that DV-VLN consistently improves over direct prediction and sampling-only baselines, achieving competitive performance among language-only VLN agents and promising results compared with several cross-modal systems.Code is available at https://github.com/PlumJun/DV-VLN.
SG-CADVLM: A Context-Aware Decoding Powered Vision Language Model for Safety-Critical Scenario Generation
Autonomous vehicle safety validation requires testing on safety-critical scenarios, but these events are rare in real-world driving and costly to test due to collision risks. Crash reports provide authentic specifications of safety-critical events, offering a vital alternative to scarce real-world collision trajectory data. This makes them valuable sources for generating realistic high-risk scenarios through simulation. Existing approaches face significant limitations because data-driven methods lack diversity due to their reliance on existing latent distributions, whereas adversarial methods often produce unrealistic scenarios lacking physical fidelity. Large Language Model (LLM) and Vision Language Model (VLM)-based methods show significant promise. However, they suffer from context suppression issues where internal parametric knowledge overrides crash specifications, producing scenarios that deviate from actual accident characteristics. This paper presents SG-CADVLM (A Context-Aware Decoding Powered Vision Language Model for Safety-Critical Scenario Generation), a framework that integrates Context-Aware Decoding with multi-modal input processing to generate safety-critical scenarios from crash reports and road network diagrams. The framework mitigates VLM hallucination issues while enabling the simultaneous generation of road geometry and vehicle trajectories. The experimental results demonstrate that SG-CADVLM generates critical risk scenarios at a rate of 84.4% compared to 12.5% for the baseline methods, representing an improvement of 469%, while producing executable simulations for autonomous vehicle testing.
TC-IDM: Grounding Video Generation for Executable Zero-shot Robot Motion
The vision-language-action (VLA) paradigm has enabled powerful robotic control by leveraging vision-language models, but its reliance on large-scale, high-quality robot data limits its generalization. Generative world models offer a promising alternative for general-purpose embodied AI, yet a critical gap remains between their pixel-level plans and physically executable actions. To this end, we propose the Tool-Centric Inverse Dynamics Model (TC-IDM). By focusing on the tool's imagined trajectory as synthesized by the world model, TC-IDM establishes a robust intermediate representation that bridges the gap between visual planning and physical control. TC-IDM extracts the tool's point cloud trajectories via segmentation and 3D motion estimation from generated videos. Considering diverse tool attributes, our architecture employs decoupled action heads to project these planned trajectories into 6-DoF end-effector motions and corresponding control signals. This plan-and-translate paradigm not only supports a wide range of end-effectors but also significantly improves viewpoint invariance. Furthermore, it exhibits strong generalization capabilities across long-horizon and out-of-distribution tasks, including interacting with deformable objects. In real-world evaluations, the world model with TC-IDM achieves an average success rate of 61.11 percent, with 77.7 percent on simple tasks and 38.46 percent on zero-shot deformable object tasks. It substantially outperforms end-to-end VLA-style baselines and other inverse dynamics models.
Quest2ROS2: A ROS 2 Framework for Bi-manual VR Teleoperation
Quest2ROS2 is an open-source ROS2 framework for bi-manual teleoperation designed to scale robot data collection. Extending Quest2ROS, it overcomes workspace limitations via relative motion-based control, calculating robot movement from VR controller pose changes to enable intuitive, pose-independent operation. The framework integrates essential usability and safety features, including real-time RViz visualization, streamlined gripper control, and a pause-and-reset function for smooth transitions. We detail a modular architecture that supports "Side-by-Side" and "Mirror" control modes to optimize operator experience across diverse platforms. Code is available at: https://github.com/Taokt/Quest2ROS2.
comment: HRI 2026
Grasp-and-Lift: Executable 3D Hand-Object Interaction Reconstruction via Physics-in-the-Loop Optimization
Dexterous hand manipulation increasingly relies on large-scale motion datasets with precise hand-object trajectory data. However, existing resources such as DexYCB and HO3D are primarily optimized for visual alignment but often yield physically implausible interactions when replayed in physics simulators, including penetration, missed contact, and unstable grasps. We propose a simulation-in-the-loop refinement framework that converts these visually aligned trajectories into physically executable ones. Our core contribution is to formulate this as a tractable black-box optimization problem. We parameterize the hand's motion using a low-dimensional, spline-based representation built on sparse temporal keyframes. This allows us to use a powerful gradient-free optimizer, CMA-ES, to treat the high-fidelity physics engine as a black-box objective function. Our method finds motions that simultaneously maximize physical success (e.g., stable grasp and lift) while minimizing deviation from the original human demonstration. Compared to MANIPTRANS-recent transfer pipelines, our approach achieves lower hand and object pose errors during replay and more accurately recovers hand-object physical interactions. Our approach provides a general and scalable method for converting visual demonstrations into physically valid trajectories, enabling the generation of high-fidelity data crucial for robust policy learning.
comment: 13 pages, 7 figures
Beyond Static Datasets: Robust Offline Policy Optimization via Vetted Synthetic Transitions
Offline Reinforcement Learning (ORL) holds immense promise for safety-critical domains like industrial robotics, where real-time environmental interaction is often prohibitive. A primary obstacle in ORL remains the distributional shift between the static dataset and the learned policy, which typically mandates high degrees of conservatism that can restrain potential policy improvements. We present MoReBRAC, a model-based framework that addresses this limitation through Uncertainty-Aware latent synthesis. Instead of relying solely on the fixed data, MoReBRAC utilizes a dual-recurrent world model to synthesize high-fidelity transitions that augment the training manifold. To ensure the reliability of this synthetic data, we implement a hierarchical uncertainty pipeline integrating Variational Autoencoder (VAE) manifold detection, model sensitivity analysis, and Monte Carlo (MC) dropout. This multi-layered filtering process guarantees that only transitions residing within high-confidence regions of the learned dynamics are utilized. Our results on D4RL Gym-MuJoCo benchmarks reveal significant performance gains, particularly in ``random'' and ``suboptimal'' data regimes. We further provide insights into the role of the VAE as a geometric anchor and discuss the distributional trade-offs encountered when learning from near-optimal datasets.
comment: 11 pages, 2 figures, 2 tables
HumanoidTurk: Expanding VR Haptics with Humanoids for Driving Simulations
We explore how humanoid robots can be repurposed as haptic media, extending beyond their conventional role as social, assistive, collaborative agents. To illustrate this approach, we implemented HumanoidTurk, taking a first step toward a humanoid-based haptic system that translates in-game g-force signals into synchronized motion feedback in VR driving. A pilot study involving six participants compared two synthesis methods, leading us to adopt a filter-based approach for smoother and more realistic feedback. A subsequent study with sixteen participants evaluated four conditions: no-feedback, controller, humanoid+controller, and human+controller. Results showed that humanoid feedback enhanced immersion, realism, and enjoyment, while introducing moderate costs in terms of comfort and simulation sickness. Interviews further highlighted the robot's consistency and predictability in contrast to the adaptability of human feedback. From these findings, we identify fidelity, adaptability, and versatility as emerging themes, positioning humanoids as a distinct haptic modality for immersive VR.
comment: 14 pages, 7 figures. To appear in CHI 2026
A Switching Nonlinear Model Predictive Control Strategy for Safe Collision Handling by an Underwater Vehicle-Manipulator System
For active intervention tasks in underwater environments, the use of autonomous vehicles is just now emerging as an active area of research. During operation, for various reasons, the robot might find itself on a collision course with an obstacle in its environment. In this paper, a switching Nonlinear Model Predictive Control (NMPC) strategy is proposed to safely handle collisions for an Underwater Vehicle-Manipulator System (UVMS). When avoiding the collision is impossible, the control algorithm takes advantage of the manipulator, using it to push against the obstacle, and deflect away from the collision. Virtual experiments are performed to demonstrate the algorithm's capability to successfully detect collisions and either avoid them, or use the manipulator to handle them appropriately without damaging sensitive areas of the vehicle.
comment: This work has been submitted to the 2026 Mediterranean Conference on Control and Automation (MED) to be considered for publication. Figures and animations are available at https://zenodo.org/records/18357280
Fauna Sprout: A lightweight, approachable, developer-ready humanoid robot
Recent advances in learned control, large-scale simulation, and generative models have accelerated progress toward general-purpose robotic controllers, yet the field still lacks platforms suitable for safe, expressive, long-term deployment in human environments. Most existing humanoids are either closed industrial systems or academic prototypes that are difficult to deploy and operate around people, limiting progress in robotics. We introduce Sprout, a developer platform designed to address these limitations through an emphasis on safety, expressivity, and developer accessibility. Sprout adopts a lightweight form factor with compliant control, limited joint torques, and soft exteriors to support safe operation in shared human spaces. The platform integrates whole-body control, manipulation with integrated grippers, and virtual-reality-based teleoperation within a unified hardware-software stack. An expressive head further enables social interaction -- a domain that remains underexplored on most utilitarian humanoids. By lowering physical and technical barriers to deployment, Sprout expands access to capable humanoid platforms and provides a practical basis for developing embodied intelligence in real human environments.
Toward Learning POMDPs Beyond Full-Rank Actions and State Observability
We are interested in enabling autonomous agents to learn and reason about systems with hidden states, such as furniture with hidden locking mechanisms. We cast this problem as learning the parameters of a discrete Partially Observable Markov Decision Process (POMDP). The agent begins with knowledge of the POMDP's actions and observation spaces, but not its state space, transitions, or observation models. These properties must be constructed from action-observation sequences. Spectral approaches to learning models of partially observable domains, such as learning Predictive State Representations (PSRs), are known to directly estimate the number of hidden states. These methods cannot, however, yield direct estimates of transition and observation likelihoods, which are important for many downstream reasoning tasks. Other approaches leverage tensor decompositions to estimate transition and observation likelihoods but often assume full state observability and full-rank transition matrices for all actions. To relax these assumptions, we study how PSRs learn transition and observation matrices up to a similarity transform, which may be estimated via tensor methods. Our method learns observation matrices and transition matrices up to a partition of states, where the states in a single partition have the same observation distributions corresponding to actions whose transition matrices are full-rank. Our experiments suggest that these partition-level transition models learned by our method, with a sufficient amount of data, meets the performance of PSRs as models to be used by standard sampling-based POMDP solvers. Furthermore, the explicit observation and transition likelihoods can be leveraged to specify planner behavior after the model has been learned.
DeFM: Learning Foundation Representations from Depth for Robotics
Depth sensors are widely deployed across robotic platforms, and advances in fast, high-fidelity depth simulation have enabled robotic policies trained on depth observations to achieve robust sim-to-real transfer for a wide range of tasks. Despite this, representation learning for depth modality remains underexplored compared to RGB, where large-scale foundation models now define the state of the art. To address this gap, we present DeFM, a self-supervised foundation model trained entirely on depth images for robotic applications. Using a DINO-style self-distillation objective on a curated dataset of 60M depth images, DeFM learns geometric and semantic representations that generalize to diverse environments, tasks, and sensors. To retain metric awareness across multiple scales, we introduce a novel input normalization strategy. We further distill DeFM into compact models suitable for resource-constrained robotic systems. When evaluated on depth-based classification, segmentation, navigation, locomotion, and manipulation benchmarks, DeFM achieves state-of-the-art performance and demonstrates strong generalization from simulation to real-world environments. We release all our pretrained models, which can be adopted off-the-shelf for depth-based robotic learning without task-specific fine-tuning. Webpage: https://de-fm.github.io/
comment: Under review, 19 pages, 15 Figures, 9 Tables
Learning the Pareto Space of Multi-Objective Autonomous Driving: A Modular, Data-Driven Approach
Balancing safety, efficiency, and interaction is fundamental to designing autonomous driving agents and to understanding autonomous vehicle (AV) behavior in real-world operation. This study introduces an empirical learning framework that derives these trade-offs directly from naturalistic trajectory data. A unified objective space represents each AV timestep through composite scores of safety, efficiency, and interaction. Pareto dominance is applied to identify non-dominated states, forming an empirical frontier that defines the attainable region of balanced performance. The proposed framework was demonstrated using the Third Generation Simulation (TGSIM) datasets from Foggy Bottom and I-395. Results showed that only 0.23\% of AV driving instances were Pareto-optimal, underscoring the rarity of simultaneous optimization across objectives. Pareto-optimal states showed notably higher mean scores for safety, efficiency, and interaction compared to non-optimal cases, with interaction showing the greatest potential for improvement. This minimally invasive and modular framework, which requires only kinematic and positional data, can be directly applied beyond the scope of this study to derive and visualize multi-objective learning surfaces
comment: Accepted for presentation/publication at the The IEEE Intelligent Vehicles Symposium of 2026 (IEEE IV 2026)
DGFusion: Depth-Guided Sensor Fusion for Robust Semantic Perception
Robust semantic perception for autonomous vehicles relies on effectively combining multiple sensors with complementary strengths and weaknesses. State-of-the-art sensor fusion approaches to semantic perception often treat sensor data uniformly across the spatial extent of the input, which hinders performance when faced with challenging conditions. By contrast, we propose a novel depth-guided multimodal fusion method that upgrades condition-aware fusion by integrating depth information. Our network, DGFusion, poses multimodal segmentation as a multi-task problem, utilizing the lidar measurements, which are typically available in outdoor sensor suites, both as one of the model's inputs and as ground truth for learning depth. Our corresponding auxiliary depth head helps to learn depth-aware features, which are encoded into spatially varying local depth tokens that condition our attentive cross-modal fusion. Together with a global condition token, these local depth tokens dynamically adapt sensor fusion to the spatially varying reliability of each sensor across the scene, which largely depends on depth. In addition, we propose a robust loss for our depth, which is essential for learning from lidar inputs that are typically sparse and noisy in adverse conditions. Our method achieves state-of-the-art panoptic and semantic segmentation performance on the challenging MUSES and DeLiVER datasets. Code and models are available at https://github.com/timbroed/DGFusion
comment: Code and models are available at https://github.com/timbroed/DGFusion
Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections
As one of the most promising technologies in industry, the Digital Twin (DT) facilitates real-time monitoring and predictive analysis for real-world systems by precisely reconstructing virtual replicas of physical entities. However, this reconstruction faces unprecedented challenges due to the everincreasing communication overhead, especially for digital robot arm reconstruction. To this end, we propose a novel goal-oriented semantic communication (GSC) framework to extract the GSC information for the robot arm reconstruction task in the DT, with the aim of minimising the communication load under the strict and relaxed reconstruction error constraints. Unlike the traditional reconstruction framework that periodically transmits a reconstruction message for real-time DT reconstruction, our framework implements a feature selection (FS) algorithm to extract the semantic information from the reconstruction message, and a deep reinforcement learning-based temporal selection algorithm to selectively transmit the semantic information over time. We validate our proposed GSC framework through both Pybullet simulations and lab experiments based on the Franka Research 3 robot arm. For a range of distinct robotic tasks, simulation results show that our framework can reduce the communication load by at least 59.5% under strict reconstruction error constraints and 80% under relaxed reconstruction error constraints, compared with traditional communication framework. Also, experimental results confirm the effectiveness of our framework, where the communication load is reduced by 53% in strict constraint case and 74% in relaxed constraint case. The demo is available at: https://youtu.be/2OdeHKxcgnk.
comment: Accepted by IEEE Journal on Selected Areas in Communications
Actor-Critic Cooperative Compensation to Model Predictive Control for Off-Road Autonomous Vehicles Under Unknown Dynamics ICRA
This study presents an Actor-Critic Cooperative Compensated Model Predictive Controller (AC3MPC) designed to address unknown system dynamics. To avoid the difficulty of modeling highly complex dynamics and ensuring realtime control feasibility and performance, this work uses deep reinforcement learning with a model predictive controller in a cooperative framework to handle unknown dynamics. The model-based controller takes on the primary role as both controllers are provided with predictive information about the other. This improves tracking performance and retention of inherent robustness of the model predictive controller. We evaluate this framework for off-road autonomous driving on unknown deformable terrains that represent sandy deformable soil, sandy and rocky soil, and cohesive clay-like deformable soil. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers by upto 29.2% and 10.2%. This framework generalized well over varied and previously unseen terrain characteristics to track longitudinal reference speeds with lower errors. Furthermore, this required significantly less training data compared to purely learning-based controller, while delivering better performance even when under-trained.
comment: 7 pages, Accepted at 2025 IEEE ICRA
A Computationally Efficient Maximum A Posteriori Sequence Estimation via Stein Variational Inference
State estimation in robotic systems presents significant challenges, particularly due to the prevalence of multimodal posterior distributions in real-world scenarios. One effective strategy for handling such complexity is to compute maximum a posteriori (MAP) sequences over a discretized or sampled state space, which enables a concise representation of the most likely state trajectory. However, this approach often incurs substantial computational costs, especially in high-dimensional settings. In this article, we propose a novel MAP sequence estimation method, Stein-MAP-Seq, which effectively addresses multimodality while substantially reducing computational and memory overhead. Our key contribution is a sequential variational inference framework that captures temporal dependencies in dynamical system models and integrates Stein variational gradient descent (SVGD) into a Viterbi-style dynamic programming algorithm, enabling computationally efficient MAP sequence estimation. This integration allows the method to focus computational effort on MAP-consistent modes rather than exhaustively exploring the entire state space. Stein-MAP-Seq inherits the parallelism and mode-seeking behavior of SVGD, allowing particle updates to be efficiently executed on parallel hardware and significantly reducing the number of trajectory candidates required for MAP-sequence recursion compared to conventional methods that rely on hundreds to thousands of particles. We validate the proposed approach on a range of highly multimodal scenarios, including nonlinear dynamics with ambiguous observations, unknown data association with outliers, range-only localization under temporary unobservability, and high-dimensional robotic manipulators. Experimental results demonstrate substantial improvements in estimation accuracy and robustness to multimodality over existing estimation methods.
comment: 20 pages
Estimating the Joint Probability of Scenario Parameters with Gaussian Mixture Copula Models
This paper presents the first application of Gaussian Mixture Copula Models to the statistical modeling of driving scenarios for the safety validation of automated driving systems. Knowledge of the joint probability distribution of scenario parameters is essential for scenario-based safety assessment, where risk quantification depends on the likelihood of concrete parameter combinations. Gaussian Mixture Copula Models bring together the multimodal expressivity of Gaussian Mixture Models and the flexibility of copulas, enabling separate modeling of marginal distributions and dependence. We benchmark Gaussian Mixture Copula Models against previously proposed approaches - Gaussian Mixture Models and Gaussian Copula Models - using real-world driving data drawn from two scenarios defined in United Nations Regulation No. 157. Our evaluation on approximately 18 million instances of these two scenarios demonstrates that Gaussian Mixture Copula Models consistently surpass Gaussian Copula Models and perform competitively with Gaussian Mixture Models, as measured by both log-likelihood and Sinkhorn distance, with relative performance depending on the scenario. The results are promising for the adoption of Gaussian Mixture Copula Models as a statistical foundation for future scenario-based validation frameworks.
comment: 9 pages, 4 figures; This work has been submitted to the IEEE for possible publication; Code available at: https://codeocean.com/capsule/1003615/tree
Masked Generative Policy for Robotic Control
We present Masked Generative Policy (MGP), a novel framework for visuomotor imitation learning. We represent actions as discrete tokens, and train a conditional masked transformer that generates tokens in parallel and then rapidly refines only low-confidence tokens. We further propose two new sampling paradigms: MGP-Short, which performs parallel masked generation with score-based refinement for Markovian tasks, and MGP-Long, which predicts full trajectories in a single pass and dynamically refines low-confidence action tokens based on new observations. With globally coherent prediction and robust adaptive execution capabilities, MGP-Long enables reliable control on complex and non-Markovian tasks that prior methods struggle with. Extensive evaluations on 150 robotic manipulation tasks spanning the Meta-World and LIBERO benchmarks show that MGP achieves both rapid inference and superior success rates compared to state-of-the-art diffusion and autoregressive policies. Specifically, MGP increases the average success rate by 9% across 150 tasks while cutting per-sequence inference time by up to 35x. It further improves the average success rate by 60% in dynamic and missing-observation environments, and solves two non-Markovian scenarios where other state-of-the-art methods fail.
Gaussian Variational Inference with Non-Gaussian Factors for State Estimation: A UWB Localization Case Study
This letter extends the exactly sparse Gaussian variational inference (ESGVI) algorithm for state estimation in two complementary directions. First, ESGVI is generalized to operate on matrix Lie groups, enabling the estimation of states with orientation components while respecting the underlying group structure. Second, factors are introduced to accommodate heavy-tailed and skewed noise distributions, as commonly encountered in ultra-wideband (UWB) localization due to non-line-of-sight (NLOS) and multipath effects. Both extensions are shown to integrate naturally within the ESGVI framework while preserving its sparse and derivative-free structure. The proposed approach is validated in a UWB localization experiment with NLOS-rich measurements, demonstrating improved accuracy and comparable consistency. Finally, a Python implementation within a factor-graph-based estimation framework is made open-source (https://github.com/decargroup/gvi_ws) to support broader research use.
Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs
Vision-Language-Action (VLA) models have emerged as powerful generalist policies for robotic control, yet their performance scaling across model architectures and hardware platforms, as well as their associated power budgets, remain poorly understood. This work presents an evaluation of five representative VLA models -- spanning state-of-the-art baselines and two newly proposed architectures -- targeting edge and datacenter GPU platforms. Using the LIBERO benchmark, we measure accuracy alongside system-level metrics, including latency, throughput, and peak memory usage, under varying edge power constraints and high-performance datacenter GPU configurations. Our results identify distinct scaling trends: (1) architectural choices, such as action tokenization and model backbone size, strongly influence throughput and memory footprint; (2) power-constrained edge devices exhibit non-linear performance degradation, with some configurations matching or exceeding older datacenter GPUs; and (3) high-throughput variants can be achieved without significant accuracy loss. These findings provide actionable insights when selecting and optimizing VLAs across a range of deployment constraints. Our work challenges current assumptions about the superiority of datacenter hardware for robotic inference.
comment: To appear in the Asilomar Conference on Signals, Systems, and Computers 2025
DecompGAIL: Learning Realistic Traffic Behaviors with Decomposed Multi-Agent Generative Adversarial Imitation Learning ICLR
Realistic traffic simulation is critical for the development of autonomous driving systems and urban mobility planning, yet existing imitation learning approaches often fail to model realistic traffic behaviors. Behavior cloning suffers from covariate shift, while Generative Adversarial Imitation Learning (GAIL) is notoriously unstable in multi-agent settings. We identify a key source of this instability: irrelevant interaction misguidance, where a discriminator penalizes an ego vehicle's realistic behavior due to unrealistic interactions among its neighbors. To address this, we propose Decomposed Multi-agent GAIL (DecompGAIL), which explicitly decomposes realism into ego-map and ego-neighbor components, filtering out misleading neighbor: neighbor and neighbor: map interactions. We further introduce a social PPO objective that augments ego rewards with distance-weighted neighborhood rewards, encouraging overall realism across agents. Integrated into a lightweight SMART-based backbone, DecompGAIL achieves state-of-the-art performance on the WOMD Sim Agents 2025 benchmark.
comment: accepted by ICLR
GPA-VGGT:Adapting VGGT to Large Scale Localization by Self-Supervised Learning with Geometry and Physics Aware Loss
Transformer-based general visual geometry frameworks have shown promising performance in camera pose estimation and 3D scene understanding. Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise in camera pose estimation and 3D reconstruction. However, these models typically rely on ground truth labels for training, posing challenges when adapting to unlabeled and unseen scenes. In this paper, we propose a self-supervised framework to train VGGT with unlabeled data, thereby enhancing its localization capability in large-scale environments. To achieve this, we extend conventional pair-wise relations to sequence-wise geometric constraints for self-supervised learning. Specifically, in each sequence, we sample multiple source frames and geometrically project them onto different target frames, which improves temporal feature consistency. We formulate physical photometric consistency and geometric constraints as a joint optimization loss to circumvent the requirement for hard labels. By training the model with this proposed method, not only the local and global cross-view attention layers but also the camera and depth heads can effectively capture the underlying multi-view geometry. Experiments demonstrate that the model converges within hundreds of iterations and achieves significant improvements in large-scale localization. Our code will be released at https://github.com/X-yangfan/GPA-VGGT.
Close-Fitting Dressing Assistance Based on State Estimation of Feet and Garments with Semantic-based Visual Attention
As the population continues to age, a shortage of caregivers is expected in the future. Dressing assistance, in particular, is crucial for opportunities for social participation. Especially dressing close-fitting garments, such as socks, remains challenging due to the need for fine force adjustments to handle the friction or snagging against the skin, while considering the shape and position of the garment. This study introduces a method uses multi-modal information including not only robot's camera images, joint angles, joint torques, but also tactile forces for proper force interaction that can adapt to individual differences in humans. Furthermore, by introducing semantic information based on object concepts, rather than relying solely on RGB data, it can be generalized to unseen feet and background. In addition, incorporating depth data helps infer relative spatial relationship between the sock and the foot. To validate its capability for semantic object conceptualization and to ensure safety, training data were collected using a mannequin, and subsequent experiments were conducted with human subjects. In experiments, the robot successfully adapted to previously unseen human feet and was able to put socks on 10 participants, achieving a higher success rate than Action Chunking with Transformer and Diffusion Policy. These results demonstrate that the proposed model can estimate the state of both the garment and the foot, enabling precise dressing assistance for close-fitting garments.
comment: Accepted at RA-L, 2026
Foundation Models in Autonomous Driving: A Survey on Scenario Generation and Scenario Analysis
For autonomous vehicles, safe navigation in complex environments depends on handling a broad range of diverse and rare driving scenarios. Simulation- and scenario-based testing have emerged as key approaches to development and validation of autonomous driving systems. Traditional scenario generation relies on rule-based systems, knowledge-driven models, and data-driven synthesis, often producing limited diversity and unrealistic safety-critical cases. With the emergence of foundation models, which represent a new generation of pre-trained, general-purpose AI models, developers can process heterogeneous inputs (e.g., natural language, sensor data, HD maps, and control actions), enabling the synthesis and interpretation of complex driving scenarios. In this paper, we conduct a survey about the application of foundation models for scenario generation and scenario analysis in autonomous driving (as of May 2025). Our survey presents a unified taxonomy that includes large language models, vision-language models, multimodal large language models, diffusion models, and world models for the generation and analysis of autonomous driving scenarios. In addition, we review the methodologies, open-source datasets, simulation platforms, and benchmark challenges, and we examine the evaluation metrics tailored explicitly to scenario generation and analysis. Finally, the survey concludes by highlighting the open challenges and research questions, and outlining promising future research directions. All reviewed papers are listed in a continuously maintained repository, which contains supplementary materials and is available at https://github.com/TUM-AVS/FM-for-Scenario-Generation-Analysis.
comment: Final version (Accepted by the IEEE Open Journal of Intelligent Transportation Systems)
ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation
Existing methods for multi-agent navigation typically assume fully known environments, offering limited support for partially known scenarios such as warehouses or factory floors. There, agents may need to plan trajectories that balance their own path optimality with their ability to collect and share information about the environment that can help their teammates reach their own goals. To these ends, we propose ORION, a novel deep reinforcement learning framework for cooperative multi-agent online navigation in partially known environments. Starting from an imperfect prior map, ORION trains agents to make decentralized decisions, coordinate to reach their individual targets, and actively reduce map uncertainty by sharing online observations in a closed perception-action loop. We first design a shared graph encoder that fuses prior map with online perception into a unified representation, providing robust state embeddings under dynamic map discrepancies. At the core of ORION is an option-critic framework that learns to reason about a set of high-level cooperative modes that translate into sequences of low-level actions, allowing agents to switch between individual navigation and team-level exploration adaptively. We further introduce a dual-stage cooperation strategy that enables agents to assist teammates under map uncertainty, thereby reducing the overall makespan. Across extensive maze-like maps and large-scale warehouse environments, our simulation results show that ORION achieves high-quality, real-time decentralized cooperation over varying team sizes, outperforming state-of-the-art classical and learning-based baselines. Finally, we validate ORION on physical robot teams, demonstrating its robustness and practicality for real-world cooperative navigation.
Safe Learning for Contact-Rich Robot Tasks: A Survey from Classical Learning-Based Methods to Safe Foundation Models
Contact-rich tasks pose significant challenges for robotic systems due to inherent uncertainty, complex dynamics, and the high risk of damage during interaction. Recent advances in learning-based control have shown great potential in enabling robots to acquire and generalize complex manipulation skills in such environments, but ensuring safety, both during exploration and execution, remains a critical bottleneck for reliable real-world deployment. This survey provides a comprehensive overview of safe learning-based methods for robot contact-rich tasks. We categorize existing approaches into two main domains: safe exploration and safe execution. We review key techniques, including constrained reinforcement learning, risk-sensitive optimization, uncertainty-aware modeling, control barrier functions, and model predictive safety shields, and highlight how these methods incorporate prior knowledge, task structure, and online adaptation to balance safety and efficiency. A particular emphasis of this survey is on how these safe learning principles extend to and interact with emerging robotic foundation models, especially vision-language models (VLMs) and vision-language-action models (VLAs), which unify perception, language, and control for contact-rich manipulation. We discuss both the new safety opportunities enabled by VLM/VLA-based methods, such as language-level specification of constraints and multimodal grounding of safety signals, and the amplified risks and evaluation challenges they introduce. Finally, we outline current limitations and promising future directions toward deploying reliable, safety-aligned, and foundation-model-enabled robots in complex contact-rich environments. More details and materials are available at our \href{ https://github.com/jack-sherman01/Awesome-Learning4Safe-Contact-rich-tasks}{Project GitHub Repository}.
comment: version 2
Contact SLAM: An Active Tactile Exploration Policy Based on Physical Reasoning Utilized in Robotic Fine Blind Manipulation Tasks
Contact-rich manipulation is difficult for robots to execute and requires accurate perception of the environment. In some scenarios, vision is occluded. The robot can then no longer obtain real-time scene state information through visual feedback. This is called ``blind manipulation". In this manuscript, a novel physically-driven contact cognition method, called ``Contact SLAM", is proposed. It estimates the state of the environment and achieves manipulation using only tactile sensing and prior knowledge of the scene. To maximize exploration efficiency, this manuscript also designs an active exploration policy. The policy gradually reduces uncertainties in the manipulation scene. The experimental results demonstrated the effectiveness and accuracy of the proposed method in several contact-rich tasks, including the difficult and delicate socket assembly task and block-pushing task.
comment: 9 pages, 9 figures
EquiContact: A Hierarchical SE(3) Vision-to-Force Equivariant Policy for Spatially Generalizable Contact-rich Tasks
This paper presents a framework for learning vision-based robotic policies for contact-rich manipulation tasks that generalize spatially across task configurations. We focus on achieving robust spatial generalization of the policy for the peg-in-hole (PiH) task trained from a small number of demonstrations. We propose EquiContact, a hierarchical policy composed of a high-level vision planner (Diffusion Equivariant Descriptor Field, Diff-EDF) and a novel low-level compliant visuomotor policy (Geometric Compliant ACT, G-CompACT). G-CompACT operates using only localized observations (geometrically consistent error vectors (GCEV), force-torque readings, and wrist-mounted RGB images) and produces actions defined in the end-effector frame. Through these design choices, we show that the entire EquiContact pipeline is SE(3)-equivariant, from perception to force control. We also outline three key components for spatially generalizable contact-rich policies: compliance, localized policies, and induced equivariance. Real-world experiments on PiH, screwing, and surface wiping tasks demonstrate a near-perfect success rate and robust generalization to unseen spatial configurations, validating the proposed framework and principles.
comment: Submitted to RSS
Towards Real-time Adaptation of Embodied Agent in Human-Robot Collaboration
Large Language Models (LLMs) have opened transformative possibilities for human-robot collaboration. However, enabling real-time collaboration requires both low latency and robust reasoning, and most LLMs suffer from high latency. To address this gap, we first propose a fine-grained benchmark that explicitly assesses agents' proactive adaptability and temporal responsiveness in the Overcooked-AI environment. Based on evaluation results, we propose MonTA (Monitor-then-Adapt), a hierarchical framework inspired by cognitive science research. MonTA contains three key modules: a lightweight Monitor that operates at high frequency (7 Hz) to detect adaptation needs, and two proficient Adapters for subtask and path adaptation reasoning that provide instructions to humans at a lower frequency. Our results demonstrate that MonTA significantly outperforms baseline agents on our proposed benchmark, achieving superior performance across layouts with varying teaming fluency. User studies confirm the high reasonableness of adaptation plans and consistent language instructions provided by our framework to humans.
comment: 13 pages, 7 figures
Equi-RO: A 4D mmWave Radar Odometry via Equivariant Networks
Autonomous vehicles and robots rely on accurate odometry estimation in GPS-denied environments. While LiDARs and cameras struggle under extreme weather, 4D mmWave radar emerges as a robust alternative with all-weather operability and velocity measurement. In this paper, we introduce Equi-RO, an equivariant network-based framework for 4D radar odometry. Our algorithm pre-processes Doppler velocity into invariant node and edge features in the graph, and employs separate networks for equivariant and invariant feature processing. A graph-based architecture enhances feature aggregation in sparse radar data, improving inter-frame correspondence. Experiments on an open-source dataset and a self-collected dataset show Equi-RO outperforms state-of-the-art algorithms in accuracy and robustness. Overall, our method achieves 10.7% and 13.4% relative improvements in translation and rotation accuracy, respectively, compared to the best baseline on the open-source dataset.
PneuGelSight: Soft Robotic Vision-Based Proprioception and Tactile Sensing
Soft pneumatic robot manipulators are popular in industrial and human-interactive applications due to their compliance and flexibility. However, deploying them in real-world scenarios requires advanced sensing for tactile feedback and proprioception. Our work presents a novel vision-based approach for sensorizing soft robots. We demonstrate our approach on PneuGelSight, a pioneering pneumatic manipulator featuring high-resolution proprioception and tactile sensing via an embedded camera. To optimize the sensor's performance, we introduce a comprehensive pipeline that accurately simulates its optical and dynamic properties, facilitating a zero-shot knowledge transition from simulation to real-world applications. PneuGelSight and our sim-to-real pipeline provide a novel, easily implementable, and robust sensing methodology for soft robots, paving the way for the development of more advanced soft robots with enhanced sensory capabilities.
comment: 16 pages, 12 figures, International Journal of Robotics Research (accepted), 2025
Multiagent Systems
AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce \textbf{AdaReasoner}, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.
comment: 28 pages, 10 figures and 13 tables
Emergent Cooperation in Quantum Multi-Agent Reinforcement Learning Using Communication
Emergent cooperation in classical Multi-Agent Reinforcement Learning has gained significant attention, particularly in the context of Sequential Social Dilemmas (SSDs). While classical reinforcement learning approaches have demonstrated capability for emergent cooperation, research on extending these methods to Quantum Multi-Agent Reinforcement Learning remains limited, particularly through communication. In this paper, we apply communication approaches to quantum Q-Learning agents: the Mutual Acknowledgment Token Exchange (MATE) protocol, its extension Mutually Endorsed Distributed Incentive Acknowledgment Token Exchange (MEDIATE), the peer rewarding mechanism Gifting, and Reinforced Inter-Agent Learning (RIAL). We evaluate these approaches in three SSDs: the Iterated Prisoner's Dilemma, Iterated Stag Hunt, and Iterated Game of Chicken. Our experimental results show that approaches using MATE with temporal-difference measure (MATE\textsubscript{TD}), AutoMATE, MEDIATE-I, and MEDIATE-S achieved high cooperation levels across all dilemmas, demonstrating that communication is a viable mechanism for fostering emergent cooperation in Quantum Multi-Agent Reinforcement Learning.
comment: Accepted at IEEE ICC 2026
Dicey Games: Shared Sources of Randomness in Distributed Systems
Consider a 4-player version of Matching Pennies where a team of three players competes against the Devil. Each player simultaneously says "Heads" or "Tails". The team wins if all four choices match; otherwise the Devil wins. If all team players randomise independently, they win with probability 1/8; if all players share a common source of randomness, they win with probability 1/2. What happens when each pair of team players shares a source of randomness? Can the team do better than win with probability 1/4? The surprising (and nontrivial) answer is yes! We introduce Dicey Games, a formal framework motivated by the study of distributed systems with shared sources of randomness (of which the above example is a specific instance). We characterise the existence, representation and computational complexity of optimal strategies in Dicey Games, and we study the problem of allocating limited sources of randomness optimally within a team.
comment: 16 pages, 9 figures
VissimRL: A Multi-Agent Reinforcement Learning Framework for Traffic Signal Control Based on Vissim
Traffic congestion remains a major challenge for urban transportation, leading to significant economic and environmental impacts. Traffic Signal Control (TSC) is one of the key measures to mitigate congestion, and recent studies have increasingly applied Reinforcement Learning (RL) for its adaptive capabilities. With respect to SUMO and CityFlow, the simulator Vissim offers high-fidelity driver behavior modeling and wide industrial adoption but remains underutilized in RL research due to its complex interface and lack of standardized frameworks. To address this gap, this paper proposes VissimRL, a modular RL framework for TSC that encapsulates Vissim's COM interface through a high-level Python API, offering standardized environments for both single- and multi-agent training. Experiments show that VissimRL significantly reduces development effort while maintaining runtime efficiency, and supports consistent improvements in traffic performance during training, as well as emergent coordination in multi-agent control. Overall, VissimRL demonstrates the feasibility of applying RL in high-fidelity simulations and serves as a bridge between academic research and practical applications in intelligent traffic signal control.
Learning the Pareto Space of Multi-Objective Autonomous Driving: A Modular, Data-Driven Approach
Balancing safety, efficiency, and interaction is fundamental to designing autonomous driving agents and to understanding autonomous vehicle (AV) behavior in real-world operation. This study introduces an empirical learning framework that derives these trade-offs directly from naturalistic trajectory data. A unified objective space represents each AV timestep through composite scores of safety, efficiency, and interaction. Pareto dominance is applied to identify non-dominated states, forming an empirical frontier that defines the attainable region of balanced performance. The proposed framework was demonstrated using the Third Generation Simulation (TGSIM) datasets from Foggy Bottom and I-395. Results showed that only 0.23\% of AV driving instances were Pareto-optimal, underscoring the rarity of simultaneous optimization across objectives. Pareto-optimal states showed notably higher mean scores for safety, efficiency, and interaction compared to non-optimal cases, with interaction showing the greatest potential for improvement. This minimally invasive and modular framework, which requires only kinematic and positional data, can be directly applied beyond the scope of this study to derive and visualize multi-objective learning surfaces
comment: Accepted for presentation/publication at the The IEEE Intelligent Vehicles Symposium of 2026 (IEEE IV 2026)
LLM-Enhanced Multi-Agent Reinforcement Learning with Expert Workflow for Real-Time P2P Energy Trading
Real-time peer-to-peer (P2P) electricity markets dynamically adapt to fluctuations in renewable energy and variations in demand, maximizing economic benefits through instantaneous price responses while enhancing grid flexibility. However, scaling expert guidance for massive personalized prosumers poses critical challenges, including diverse decision-making demands and a lack of customized modeling frameworks. This paper proposes an integrated large language model-multi-agent reinforcement learning (LLM-MARL) framework for real-time P2P energy trading to address challenges such as the limited technical capability of prosumers, the lack of expert experience, and security issues of distribution networks. LLMs are introduced as experts to generate personalized strategies, guiding MARL under the centralized training with decentralized execution (CTDE) paradigm through imitation. To handle the scalability issues inherent in large-scale P2P networks, a differential attention-based critic network is introduced to efficiently extract key interaction features and enhance convergence. Experimental results demonstrate that LLM-generated strategies effectively substitute human experts. The proposed imitative expert MARL algorithms achieve significantly lower economic costs and voltage violation rates on test sets compared to baseline algorithms, while maintaining robust stability. This paper provides an effective solution for the real-time decision-making of the P2P electricity market by bridging expert knowledge with agent learning.
SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios
Existing benchmarks for AI coding agents focus on isolated, single-issue tasks such as fixing a bug or implementing a small feature. However, real-world software engineering is fundamentally a long-horizon endeavor: developers must interpret high-level requirements, plan coordinated changes across many files, and evolve codebases over multiple iterations while preserving existing functionality. We introduce SWE-EVO, a benchmark that evaluates agents on this long-horizon software evolution challenge. Constructed from release notes and version histories of seven mature open-source Python projects, SWE-EVO comprises 48 evolution tasks that require agents to implement multi-step modifications spanning an average of 21 files, validated against comprehensive test suites averaging 874 tests per instance. Experiments with state-of-the-art models reveal a striking capability gap: even GPT-5 with OpenHands achieves only a 21 percent resolution rate on SWE-EVO, compared to 65 percent on the single-issue SWE-Bench Verified. This demonstrates that current agents struggle with sustained, multi-file reasoning. We also propose Fix Rate, a fine-grained metric that captures partial progress toward solving these complex, long-horizon tasks.
On the Fundamental Limits of LLMs at Scale
Large Language Models (LLMs) have benefited enormously from scaling, yet these gains are bounded by five fundamental limitations: (1) hallucination, (2) context compression, (3) reasoning degradation, (4) retrieval fragility, and (5) multimodal misalignment. While existing surveys describe these phenomena empirically, they lack a rigorous theoretical synthesis connecting them to the foundational limits of computation, information, and learning. This work closes that gap by presenting a unified, proof-informed framework that formalizes the innate theoretical ceilings of LLM scaling. First, computability and uncomputability imply an irreducible residue of error: for any computably enumerable model family, diagonalization guarantees inputs on which some model must fail, and undecidable queries (e.g., halting-style tasks) induce infinite failure sets for all computable predictors. Second, information-theoretic and statistical constraints bound attainable accuracy even on decidable tasks, finite description length enforces compression error, and long-tail factual knowledge requires prohibitive sample complexity. Third, geometric and computational effects compress long contexts far below their nominal size due to positional under-training, encoding attenuation, and softmax crowding. We further show how likelihood-based training favors pattern completion over inference, how retrieval under token limits suffers from semantic drift and coupling noise, and how multimodal scaling inherits shallow cross-modal alignment. Across sections, we pair theorems and empirical evidence to outline where scaling helps, where it saturates, and where it cannot progress, providing both theoretical foundations and practical mitigation paths like bounded-oracle retrieval, positional curricula, and sparse or hierarchical attention.
comment: Submitted to TMLR 2025
CooperBench: Why Coding Agents Cannot be Your Teammates Yet
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.
comment: https://cooperbench.com First two authors contribute equally. The 3th - 6th authors contribute equally
Neural Cellular Automata: From Cells to Pixels
Neural Cellular Automata (NCAs) are bio-inspired dynamical systems in which identical cells iteratively apply a learned local update rule to self-organize into complex patterns, exhibiting regeneration, robustness, and spontaneous dynamics. Despite their success in texture synthesis and morphogenesis, NCAs remain largely confined to low-resolution outputs. This limitation stems from (1) training time and memory requirements that grow quadratically with grid size, (2) the strictly local propagation of information that impedes long-range cell communication, and (3) the heavy compute demands of real-time inference at high resolution. In this work, we overcome this limitation by pairing an NCA that evolves on a coarse grid with a lightweight implicit decoder that maps cell states and local coordinates to appearance attributes, enabling the same model to render outputs at arbitrary resolution. Moreover, because both the decoder and NCA updates are local, inference remains highly parallelizable. To supervise high-resolution outputs efficiently, we introduce task-specific losses for morphogenesis (growth from a seed) and texture synthesis with minimal additional memory and computation overhead. Our experiments across 2D/3D grids and mesh domains demonstrate that our hybrid models produce high-resolution outputs in real-time, and preserve the characteristic self-organizing behavior of NCAs.
comment: 9 pages, 14 figures, +7 pages of Appendix
Systems and Control (EESS)
Multi-Objective Reinforcement Learning for Efficient Tactical Decision Making for Trucks in Highway Traffic
Balancing safety, efficiency, and operational costs in highway driving poses a challenging decision-making problem for heavy-duty vehicles. A central difficulty is that conventional scalar reward formulations, obtained by aggregating these competing objectives, often obscure the structure of their trade-offs. We present a Proximal Policy Optimization based multi-objective reinforcement learning framework that learns a continuous set of policies explicitly representing these trade-offs and evaluates it on a scalable simulation platform for tactical decision making in trucks. The proposed approach learns a continuous set of Pareto-optimal policies that capture the trade-offs among three conflicting objectives: safety, quantified in terms of collisions and successful completion; energy efficiency and time efficiency, quantified using energy cost and driver cost, respectively. The resulting Pareto frontier is smooth and interpretable, enabling flexibility in choosing driving behavior along different conflicting objectives. This framework allows seamless transitions between different driving policies without retraining, yielding a robust and adaptive decision-making strategy for autonomous trucking applications.
AI-Driven Fuzzing for Vulnerability Assessment of 5G Traffic Steering Algorithms
Traffic Steering (TS) dynamically allocates user traffic across cells to enhance Quality of Experience (QoE), load balance, and spectrum efficiency in 5G networks. However, TS algorithms remain vulnerable to adversarial conditions such as interference spikes, handover storms, and localized outages. To address this, an AI-driven fuzz testing framework based on the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is proposed to systematically expose hidden vulnerabilities. Using NVIDIA Sionna, five TS algorithms are evaluated across six scenarios. Results show that AI-driven fuzzing detects 34.3% more total vulnerabilities and 5.8% more critical failures than traditional testing, achieving superior diversity and edge-case discovery. The observed variance in critical failure detection underscores the stochastic nature of rare vulnerabilities. These findings demonstrate that AI-driven fuzzing offers an effective and scalable validation approach for improving TS algorithm robustness and ensuring resilient 6G-ready networks.
ART for Diffusion Sampling: A Reinforcement Learning Approach to Timestep Schedule
We consider time discretization for score-based diffusion models to generate samples from a learned reverse-time dynamic on a finite grid. Uniform and hand-crafted grids can be suboptimal given a budget on the number of time steps. We introduce Adaptive Reparameterized Time (ART) that controls the clock speed of a reparameterized time variable, leading to a time change and uneven timesteps along the sampling trajectory while preserving the terminal time. The objective is to minimize the aggregate error arising from the discretized Euler scheme. We derive a randomized control companion, ART-RL, and formulate time change as a continuous-time reinforcement learning (RL) problem with Gaussian policies. We then prove that solving ART-RL recovers the optimal ART schedule, which in turn enables practical actor--critic updates to learn the latter in a data-driven way. Empirically, based on the official EDM pipeline, ART-RL improves Fréchet Inception Distance on CIFAR-10 over a wide range of budgets and transfers to AFHQv2, FFHQ, and ImageNet without the need of retraining.
comment: 17 pages, 7 figures
Synchronization and Localization in Ad-Hoc ICAS Networks Using a Two-Stage Kuramoto Method
To enable Integrated Communications and Sensing (ICAS) in a peer-to-peer vehicular network, precise synchronization in frequency and phase among the communicating entities is required. In addition, self-driving cars need accurate position estimates of the surrounding vehicles. In this work, we propose a joint, distributed synchronization and localization scheme for a network of communicating entities. Our proposed scheme is mostly signal-agnostic and therefore can be applied to a wide range of possible ICAS signals. We also mitigate the effect of finite sampling frequencies, which otherwise would degrade the synchronization and localization performance severely.
comment: 6 pages, conference
Experimental Characterization of ISAC Channel Mapping and Environment Awareness
In the context of integrated sensing and communications (ISAC), this paper presents an experimental investigation of the relationship between monostatic sensing and naturally bistatic communication channels in an indoor millimeter-wave environment. We characterize the propagation channel in the joint delay--angle domain, extract dominant multipath components (MPCs) and associate them with physical scatterers in the environment, and demonstrate how communication MPCs can be explicitly recovered from sensing channels. Finally, the radar cross-sections (RCSs) of two key scatterers, namely the wall and metal plate, are obtained based on calibrated channel power and reconstructed propagation distances.
comment: 4 pages, 6 figures, submitted to URSI-GASS 2026
Real-Time Prediction of Lower Limb Joint Kinematics, Kinetics, and Ground Reaction Force using Wearable Sensors and Machine Learning
Walking is a key movement of interest in biomechanics, yet gold-standard data collection methods are time- and cost-expensive. This paper presents a real-time, multimodal, high sample rate lower-limb motion capture framework, based on wireless wearable sensors and machine learning algorithms. Random Forests are used to estimate joint angles from IMU data, and ground reaction force (GRF) is predicted from instrumented insoles, while joint moments are predicted from angles and GRF using deep learning based on the ResNet-16 architecture. All three models achieve good accuracy compared to literature, and the predictions are logged at 1 kHz with a minimal delay of 23 ms for 20s worth of input data. The present work fully relies on wearable sensors, covers all five major lower limb joints, and provides multimodal comprehensive estimations of GRF, joint angles, and moments with minimal delay suitable for biofeedback applications.
Convex Chance-Constrained Stochastic Control under Uncertain Specifications with Application to Learning-Based Hybrid Powertrain Control
This paper presents a strictly convex chance-constrained stochastic control framework that accounts for uncertainty in control specifications such as reference trajectories and operational constraints. By jointly optimizing control inputs and risk allocation under general (possibly non-Gaussian) uncertainties, the proposed method guarantees probabilistic constraint satisfaction while ensuring strict convexity, leading to uniqueness and continuity of the optimal solution. The formulation is further extended to nonlinear model-based control using exactly linearizable models identified through machine learning. The effectiveness of the proposed approach is demonstrated through model predictive control applied to a hybrid powertrain system.
comment: Submitted to IEEE Transactions on Control Systems Technology (TCST)
A Generative AI-Driven Reliability Layer for Action-Oriented Disaster Resilience
As climate-related hazards intensify, conventional early warning systems (EWS) disseminate alerts rapidly but often fail to trigger timely protective actions, leading to preventable losses and inequities. We introduce Climate RADAR (Risk-Aware, Dynamic, and Action Recommendation system), a generative AI-based reliability layer that reframes disaster communication from alerts delivered to actions executed. It integrates meteorological, hydrological, vulnerability, and social data into a composite risk index and employs guardrail-embedded large language models (LLMs) to deliver personalized recommendations across citizen, volunteer, and municipal interfaces. Evaluation through simulations, user studies, and a municipal pilot shows improved outcomes, including higher protective action execution, reduced response latency, and increased usability and trust. By combining predictive analytics, behavioral science, and responsible AI, Climate RADAR advances people-centered, transparent, and equitable early warning systems, offering practical pathways toward compliance-ready disaster resilience infrastructures.
comment: 19 pages
Reinforcement Learning with Distributed MPC for Fuel-Efficient Platoon Control with Discrete Gear Transitions
Cooperative control of groups of autonomous vehicles (AVs), i.e., platoons, is a promising direction to improving the efficiency of autonomous transportation systems. In this context, distributed co-optimization of both vehicle speed and gear position can offer benefits for fuel-efficient driving. To this end, model predictive control (MPC) is a popular approach, optimizing the speed and gear-shift schedule while explicitly considering the vehicles' dynamics over a prediction window. However, optimization over both the vehicles' continuous dynamics and discrete gear positions is computationally intensive, and may require overly long sample times or high-end hardware for real-time implementation. This work proposes a reinforcement learning (RL)-based distributed MPC approach to address this issue. For each vehicle in the platoon, a policy is trained to select and fix the gear positions across the prediction window of a local MPC controller, leaving a significantly simpler continuous optimization problem to be solved as part of a distributed MPC scheme. In order to reduce the computational cost of training and facilitate the scalability of the proposed approach to large platoons, the policies are parameterized such that the emergent multi-agent RL problem can be decoupled into single-agent learning tasks. In addition, a recurrent neural-network (RNN) architecture is proposed for the gear selection policy, such that the learning is scalable even as the number of possible gear-shift schedules grows exponentially with the MPC prediction horizon. In highway-driving simulations, the proposed approach is shown to have a significantly lower computation burden and a comparable performance in terms of fuel-efficient platoon control, with respect to pure MPC-based co-optimization.
comment: 16 pages, 8 figures, submitted to Transaction on IEEE Transactions on Intelligent Transportation Systems
Validation of a Software-Defined 100-Gb/s RDMA Streaming Architecture for Ultrafast Optoacoustic and Ultrasound Imaging
Optoacoustic (OA) imaging has emerged as a powerful investigation tool, with demonstrated applicability in oncology, neuroscience, and cardiovascular biology. However, its clinical translation is limited with the existing OA systems, which often rely on bulky and expensive acquisition hardware mainly optimized for pulse-echo ultrasound (US) imaging. Despite the fact that OA imaging has different requirements for receive bandwidths and timing synchronization with external laser sources, there is a strong need for unified OA-US imaging platforms, as pulse-echo US remains the standard tool for visualizing soft tissues. To address these challenges, we propose a new data acquisition architecture for ultrafast OA and US imaging that fully covers the requirements for large channel counts, wide bandwidth, and software-defined operation. LtL combines state-of-the-art wideband analog front-ends, a Zynq UltraScale+ MPSoC integrating FPGA fabric with an Application Processing Unit, and a 100 GbE Remote Direct Memory Access (RDMA) backend enabling raw-data streaming at up to 95.6 Gb/s. The architecture avoids local buffers followed by burst transfers, which commonly constrain sustainable frame rate and recording intervals, thus achieving true continuous and sustained streaming of raw data. We validate the core elements of the LtL architecture using a 16-channel demonstration system built from commercial evaluation boards. We further verify the signal chain for up to 256-channel scalability, confirming the wide bandwidth capabilities to support state-of-the-art data transmission speeds.
Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success
A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.
i-Socket: Design, Development, and Pilot Evaluation of an Individualized Multi-Material, Multi-Thickness Transtibial Prosthetic Socket
The prosthetic socket is an essential part in ensuring comfort and stability for the overall prosthesis system. This study proposes a multi-material/thickness individualized transtibial prosthetic socket that focuses on providing comfort. This study aims to identify the proper material and thickness to protect Calf areas and reduce the pressure around bone areas. First, the socket is divided into four parts depending on the pressure-sensitive/tolerant regions. After identifying the thickness range for each concerned area, the thickness and material are selected based on the Pressure-Pain Threshold (PPT) test, and the finalized design is then prototyped. The prototyped individualized socket (i-Socket) is 22% lighter than the participant's own socket. Results of the pilot experiments with an amputee participant showed that the pressure inside the socket decreased by 45% and 31% for the Tibia and Fibula regions, respectively. Additionally, the self-selected CoM velocity for the walking experiment is increased by 15% compared to the similar studies in the literature. Regarding the kinematic results, symmetry in the knee and ankle joints increased by 65% and 2% when using the i-Socket compared to the results with the participant's own socket.
Improving Stability Margins with Grid-Forming Damper Winding Emulation
This work presents (i) a framework for certifying small-signal frequency stability of a power system with line dynamics and heterogeneous bus dynamics, (ii) a novel reduced-order model of damper windings in synchronous machines, and (iii) a proportional-derivative (PD) damper winding emulation control for voltage-source converters (VSCs). Damper windings have long been understood to improve the frequency synchronization between machines. However, the dynamics of the damper windings are complex, making them difficult to analyze and directly emulate in the control of VSCs. This paper derives a reduced-order model of the damper windings as a PD term that allows grid-forming controls for VSCs to emulate their effect on frequency dynamics. Next, a framework for certifying small-signal frequency stability of a network with heterogeneous bus dynamics is developed that extends prior results by incorporating line dynamics. Finally, we analytically demonstrate that PD damper winding emulation can improve the stability of grid-forming converter controls. These results are validated with electromagnetic-transient (EMT) simulation.
comment: 11 pages, 12 figures
When Does Adaptation Win? Scaling Laws for Meta-Learning in Quantum Control
Quantum hardware suffers from intrinsic device heterogeneity and environmental drift, forcing practitioners to choose between suboptimal non-adaptive controllers or costly per-device recalibration. We derive a scaling law lower bound for meta-learning showing that the adaptation gain (expected fidelity improvement from task-specific gradient steps) saturates exponentially with gradient steps and scales linearly with task variance, providing a quantitative criterion for when adaptation justifies its overhead. Validation on quantum gate calibration shows negligible benefits for low-variance tasks but $>40\%$ fidelity gains on two-qubit gates under extreme out-of-distribution conditions (10$\times$ the training noise), with implications for reducing per-device calibration time on cloud quantum processors. Further validation on classical linear-quadratic control confirms these laws emerge from general optimization geometry rather than quantum-specific physics. Together, these results offer a transferable framework for decision-making in adaptive control.
comment: 28 pages, 11 figures
A Switching Nonlinear Model Predictive Control Strategy for Safe Collision Handling by an Underwater Vehicle-Manipulator System
For active intervention tasks in underwater environments, the use of autonomous vehicles is just now emerging as an active area of research. During operation, for various reasons, the robot might find itself on a collision course with an obstacle in its environment. In this paper, a switching Nonlinear Model Predictive Control (NMPC) strategy is proposed to safely handle collisions for an Underwater Vehicle-Manipulator System (UVMS). When avoiding the collision is impossible, the control algorithm takes advantage of the manipulator, using it to push against the obstacle, and deflect away from the collision. Virtual experiments are performed to demonstrate the algorithm's capability to successfully detect collisions and either avoid them, or use the manipulator to handle them appropriately without damaging sensitive areas of the vehicle.
comment: This work has been submitted to the 2026 Mediterranean Conference on Control and Automation (MED) to be considered for publication. Figures and animations are available at https://zenodo.org/records/18357280
3D-Printed Hybrid Liquid-CPCM Cooling Modules for High-Performance Thermal Management of Lithium-Ion Pouch Cells
Efficient thermal management is critical for ensuring the safety, performance, and durability of lithium ion pouch cells (LIPCs), particularly under high power operating conditions where conventional battery thermal management systems (BTMS) struggle to balance cooling effectiveness, structural simplicity, and weight. Here, we report a lightweight hybrid BTMS that synergistically integrates active liquid cooling with composite phase change material (CPCM) based thermal buffering through a 3D printed hexagonal architecture. The system is fabricated via a two step additive manufacturing process that enables sealed CPCM encapsulation and isolated liquid cooling pathways within a single carbon fiber reinforced nylon module, effectively eliminating leakage risks while allowing precise geometric control. Hexagonally partitioned CPCM cavities maximize the CPCM wall interfacial area and shorten internal conduction paths, accelerating latent heat absorption, while embedded serpentine liquid channels provide continuous convective heat removal and prevent CPCM saturation. A nanocarbon enhanced CPCM is employed to overcome the intrinsic low thermal conductivity of conventional paraffin based materials.
comment: 32 pages, 7 figures
Analysis of Control Bellman Residual Minimization for Markov Decision Problem
Markov decision problems are most commonly solved via dynamic programming. Another approach is Bellman residual minimization, which directly minimizes the squared Bellman residual objective function. However, compared to dynamic programming, this approach has received relatively less attention, mainly because it is often less efficient in practice and can be more difficult to extend to model-free settings such as reinforcement learning. Nonetheless, Bellman residual minimization has several advantages that make it worth investigating, such as more stable convergence with function approximation for value functions. While Bellman residual methods for policy evaluation have been widely studied, methods for policy optimization (control tasks) have been scarcely explored. In this paper, we establish foundational results for the control Bellman residual minimization for policy optimization.
Goal-oriented Semantic Communication for Robot Arm Reconstruction in Digital Twin: Feature and Temporal Selections
As one of the most promising technologies in industry, the Digital Twin (DT) facilitates real-time monitoring and predictive analysis for real-world systems by precisely reconstructing virtual replicas of physical entities. However, this reconstruction faces unprecedented challenges due to the everincreasing communication overhead, especially for digital robot arm reconstruction. To this end, we propose a novel goal-oriented semantic communication (GSC) framework to extract the GSC information for the robot arm reconstruction task in the DT, with the aim of minimising the communication load under the strict and relaxed reconstruction error constraints. Unlike the traditional reconstruction framework that periodically transmits a reconstruction message for real-time DT reconstruction, our framework implements a feature selection (FS) algorithm to extract the semantic information from the reconstruction message, and a deep reinforcement learning-based temporal selection algorithm to selectively transmit the semantic information over time. We validate our proposed GSC framework through both Pybullet simulations and lab experiments based on the Franka Research 3 robot arm. For a range of distinct robotic tasks, simulation results show that our framework can reduce the communication load by at least 59.5% under strict reconstruction error constraints and 80% under relaxed reconstruction error constraints, compared with traditional communication framework. Also, experimental results confirm the effectiveness of our framework, where the communication load is reduced by 53% in strict constraint case and 74% in relaxed constraint case. The demo is available at: https://youtu.be/2OdeHKxcgnk.
comment: Accepted by IEEE Journal on Selected Areas in Communications
Electrical Design of a Clean Offshore Heat and Power (CleanOFF) Hub
This paper presents an innovative offshore solution where oil & gas platform clusters are powered by a wind farm and a hydrogen hub. The results show a feasible off-grid design as an alternative to conventional electrification solutions. To address the challenges of design and operation of such a system, a power system model of the equipment and control was developed in a power system simulator called Process Power Simulator (PPSim). Power fluctuations in the wind farm are modelled using a state-of-the-art method encompassing turbulence and wakes. Various operation scenarios were used to evaluate the system design and find the right equipment size. An expensive component to over dimension is the battery energy storage system (BESS). The BESS power rating and energy capacity were found by running a combination of scenarios with extreme and natural wind variations, and contingencies. The control strategy and ramp rates of electrolyzers have significant impact on both system performance and design. A ramp rate in the order of seconds as opposed to minutes will decrease the required BESS size by 60-70%. Choosing synchronized control of the electrolyzers can further reduce the BESS size by 15-20%. The simulations also revealed challenges to achieve self-sufficiency of hydrogen and potential design improvements are suggested.
Constrained Parameter Update Law for Adaptive Control
In this paper, a constrained parameter update law is derived in the context of adaptive control. The parameter update law is based on constrained optimization technique where a Lagrangian is formulated to incorporate the constraints on the parameters using inverse Barrier function. The constrained parameter update law is used to develop a adaptive tracking controller and the overall stability of the adaptive controller along with the constrained parameter update law is shown using Lyapunov analysis and development in stability of constrained primal-dual dynamics. The performance of the constrained parameter update law is tested in simulation for keeping the parameters within constraints and convergence to true parameters.
Backpressure-based Mean-field Type Game for Scheduling in Multi-Hop Wireless Sensor Networks
We propose a Mean-Field Type Game (MFTG) framework for effective scheduling in multi-hop wireless sensor networks (WSNs) using backpressure as a performance criterion. Traditional backpressure algorithms leverage queue differentials to regulate data flow and maintain network stability. In this work, we extend the backpressure framework by incorporating a mean-field term into the cost functional, capturing the global behavior of the system alongside local dynamics. The resulting model utilizes the strengths of non-cooperative mean-field type games, enabling nodes to make decentralized decisions based on both individual queue states and system mean-field effects while accounting for stochastic network interactions. By leveraging the interplay between backpressure dynamics and mean-field coupling, the approach balances local optimization with global efficiency. Numerical simulations demonstrate the efficacy of the proposed method in handling congestion and scheduling in large-scale WSNs.
comment: Accepted to the 33rd European Signal Processing Conference (EUSIPCO 2025)
Feasibility Study of Curvature Effect in Flexible Antenna Arrays for 2-Dimensional Beam Alignment of 6G Wireless Systems
This article investigates the influential role of flexible antenna array curvature on the performance of 6G communication systems with carrier frequencies above 100 GHz. It is demonstrated that the curvature of flexible antenna arrays can be leveraged for 2-dimensional beam alignment in phased arrays with relatively small insertion loss. The effect of antenna array bending on the radiation properties such as gain and antenna impedance are analytically studied and simulated for a 4x4 microstrip patch antenna array operating between 97.5-102.5 GHz. Moreover, the deployment of this flexible antenna array in conjunction with state-of-the-art flexible board packaging techniques is examined for 6G wireless transceivers based on 65nm CMOS technology and simulated for three variants of quadrature amplitude modulation (4QAM, 16 QAM, and 64 QAM). The communication performance in terms of signal-to-noise ratio (SNR) and bit error rate (BER) is evaluated using analytical derivations and simulation results which exhibit a relatively close match.
Asymptotic Behavior of an Unforced Duhem-Type Hysteretic Oscillator
The article describes fundamental analytical properties of an unforced mechanical oscillator with a Duhem-type viscoelastoplastic hysteretic element. These properties include global existence of solutions, uniqueness of solutions, and convergence of each solution to an equilibrium point.
comment: 8 pages; (v2) minor amendments, further references
Sequential Operating Simulation of Solid State Transformer-Driven Next-Generation 800 VDC Data Center
Artificial-intelligence (AI) workloads are driving rapid growth in data-center electricity use and rack power density, increasing demand for power-delivery systems that are efficient and robust to fast load transients. Conventional uninterruptible power supply (UPS) based AC distribution chains involve multiple conversion stages and line-frequency transformers, which compound losses and are less compatible with dynamic AI power profiles. Although solid-state transformers (SSTs) and 800 VDC distribution architecture are widely discussed, implementable topology/control details, and long-horizon validation with realistic operating profiles remain limited. This paper develops an SST-driven 800 VDC architecture that converts 10 kV MVAC to an 800V LVDC bus using a three-phase H-bridge AC/DC stage cascaded with a dual-active-bridge (DAB) DC/DC stage. A coordinated closed-loop control scheme, combining rectifier voltage/current regulation and DAB phase-shift control, is designed to maintain DC-bus voltage stability. The proposed system is implemented on the real-time digital simulation (RTDS) platform and evaluated via sequential simulations using real-world day- and month-scale operating profiles of data centers, benchmarked against a UPS supply chain. Numerical studies demonstrate tight 800 VDC regulation, reduced input-side energy consumption compared with the UPS baseline, and satisfactory power-quality performance. A capacitance sensitivity test quantifies tradeoffs between DC-bus ripple and low-frequency input-power oscillations, yielding a practical capacitance range for design. Overall, the work provides a reproducible evaluation workflow and actionable guidance for next-generation AI data centers.
Cooperative Optimal Output Tracking for Discrete-Time Multiagent Systems: Stabilizing Policy Iteration Frameworks
This paper proposes two cooperative optimal output tracking (COOT) algorithms based on policy iteration (PI) for discrete-time multi-agent systems with unknown model parameters. First, we establish a stabilizing PI framework that can start from any initial control policy, relaxing the dependence of traditional PI on the initial stabilizing control policy. Then, another efficient and equivalent Q-learning framework is developed, which is shown to require only less system data to get the same results as the stabilizing PI. In the two frameworks, the stabilizing control policy is obtained by gradually iterating the stabilizing virtual system to the actual feedback closed-loop system. Two explicit schemes for adjusting the iteration step-size/coefficient are designed and their stability is analyzed. Finally, the COOT is realized by a distributed feedforward-feedback controller with learned optimal gains. The proposed algorithms are validated by simulation.
comment: Published online in IEEE Transactions on Automatic Control, November 2025
Incremental Collision Laws Based on the Bouc-Wen Model: Improved Collision Models and Further Results
In the article titled "The Bouc-Wen Model for Binary Direct Collinear Collisions of Convex Viscoplastic Bodies" and published in the Journal of Computational and Nonlinear Dynamics (Volume 20, Issue 6, June 2025), the authors studied mathematical models of binary direct collinear collisions of convex viscoplastic bodies that employed two incremental collision laws based on the Bouc-Wen differential model of hysteresis. It was shown that the models possess favorable analytical properties, and several model parameter identification studies were conducted, demonstrating that the models can accurately capture the nature of a variety of collision phenomena. In this article, the aforementioned models are augmented by modeling the effects of external forces as time-dependent inputs. Furthermore, the range of the parameters under which the models possess favorable analytical properties is extended to several corner cases that were not considered in the prior publication. Finally, the previously conducted model parameter identification studies are extended, and an additional model parameter identification study is provided in an attempt to validate the ability of the augmented models to represent the effects of external forces.
comment: 15 pages, 9 figures, see https://gitlab.com/user9716869/EBWCM ; (v2-v9) various minor amendments; (v5) replaced the parameter identification study of Quinn (2004) with Villegas et al (2021) due to incompatibility of the proposed collision models with the experimental setup in Quinn (2004); (v7) added ODEs for COM and an application example; arXiv admin note: text overlap with arXiv:2410.08147
Real-Time Load Estimation for Load-lifting Exoskeletons Using Insole Pressure Sensors and Machine Learning
To enhance lifting-load estimation accuracy in industrial upper-limb assistive exoskeletons, this study proposes a machine learning-based approach using insole pressure sensors. Unlike traditional methods that rely on electromyography (EMG), force sensors, or posture data, insole pressure sensors provide a non-invasive, posture-independent, and stable solution suitable for long-term use. Lifting load data ranging from 2 to 10 kg (0.5 kg intervals) were collected from five subjects. Two data representations were investigated: channel-based vectors and map-based images. For the channel-based approach, conventional regression models (SVR, MLP, and Elastic Net) were trained on pooled data from all subjects to assess inter-subject generalization, specifically testing the ability to infer load levels unseen during training. In parallel, a preliminary feasibility study was conducted for the map-based deep learning model (MobileNetV2) using inner-subject data. Results indicate that the channel-based SVR achieved the most balanced accuracy and generalization performance, with a mean absolute error of 0.547 kg. These findings demonstrate the feasibility and advantages of using insole pressure data for variable load estimation, supporting control strategies in industrial exoskeleton applications.
System-Level Analysis of Module Uncertainty Quantification in the Autonomy Pipeline
Modern autonomous systems with machine learning components often use uncertainty quantification to help produce assurances about system operation. However, there is a lack of consensus in the community on what uncertainty is and how to perform uncertainty quantification. In this work, we propose that uncertainty measures should be understood within the context of overall system design and operation. To this end, we present two novel analysis techniques. First, we produce a probabilistic specification on a module's uncertainty measure given a system specification. Second, we propose a method to measure a system's input-output robustness in order to compare system designs and quantify the impact of making a system uncertainty-aware. In addition to this theoretical work, we present the application of these analyses on two real-world autonomous systems: an autonomous driving system and an aircraft runway incursion detection system. We show that our analyses can determine desired relationships between module uncertainty and error, provide visualizations of how well an uncertainty measure is being used by a system, produce principled comparisons between different uncertainty measures and decision-making algorithm designs, and provide insights into system vulnerabilities and tradeoffs.
comment: Previous version was a conference paper (Conference on Decision and Control, Dec 2024). This version is a journal submission (IEEE Transactions on Control Systems Technology). The title is the same but the text is entirely different and the content is significantly revised
Random Access for LEO Satellite Communication Systems via Deep Learning
Integrating contention-based random access procedures into low Earth orbit (LEO) satellite communication (SatCom) systems poses new challenges, including long propagation delays, large Doppler shifts, and a large number of simultaneous access attempts. These factors degrade the efficiency and responsiveness of conventional random access schemes, particularly in scenarios such as satellite-based internet of things and direct-to-device services. In this paper, we propose a deep learning-based random access framework designed for LEO SatCom systems. The framework incorporates an early preamble collision classifier that uses multi-antenna correlation features and a lightweight 1D convolutional neural network to estimate the number of collided users at the earliest stage. Based on this estimate, we introduce an opportunistic transmission scheme that balances access probability and resource efficiency to improve success rates and reduce delay. Simulation results under 3GPP-compliant LEO settings confirm that the proposed framework achieves higher access success probability, lower delay, better physical uplink shared channel utilization, and reduced computational complexity compared to existing schemes.
comment: 12 pages, 8 figures, 4 tables
DPLib: A Standard Benchmark Library for Distributed Power System Analysis and Optimization
\textit{DPLib} is an open-source MATLAB-based benchmark library created to support research and development in distributed and decentralized power system analysis and optimization. Distributed and decentralized methods offer scalability, privacy preservation, and resilience to single points of failure, making them increasingly important for modern power systems. However, unlike centralized tools such as MATPOWER, no general-purpose, reproducible data library package currently exists for distributed power system studies. DPLib, available at \href{https://github.com/LSU-RAISE-LAB/DPLib.git}{GitHub}, fills this gap by providing a standard power system library featuring over 20 multi-region benchmark test cases of varying sizes, along with a graph-based partitioning toolkit that decomposes any MATPOWER test system into multiple electrically coherent regions. The partitioning toolkit, an easy-to-use MATLAB code, generates standardized \texttt{.mat} and \texttt{.m} files, along with region visualizations for intuitive understanding. We also provide modular, easy-to-use distributed optimal power flow (OPF) solvers: an alternating direction method of multipliers(ADMM)-based DC-OPF solver implemented in YALMIP, and an ADMM-based AC-OPF solver leveraging IPOPT. These solvers validate the generated test systems for distributed optimization applications. Numerical results validate the generated test cases, establishing DPLib as a foundation for reproducible distributed power system research.
Autonomous Navigation and Station-Keeping on Near-Rectilinear Halo Orbits
This article develops an optical navigation (OPNAV) and station-keeping pipeline for the near-rectilinear halo orbit (NRHO) in high-fidelity ephemeris model dynamics, using synthetic images in a non-iterative horizon-based OPNAV algorithm, applying the result in a navigation filter, and using the obtained estimates in a station-keeping control scheme that keeps the spacecraft in the vicinity of a reference orbit. We study differential correction-based and minimization-based implementations of the so-called x-axis crossing control scheme, and propose an improved targeting prediction scheme by incorporating the filter's state covariance with an unscented transform. We also introduce a hysteresis mechanism, which improves station-keeping cost and provides insight into the difference in performance between the differential correction-based and minimization-based approaches. We perform Monte-Carlo experiments to assess the pipeline's tracking and ΔV performances. We report several key findings, including the variability of the filter performance with the sensor field of view and measurement locations, station-keeping cost reduction achieved by the unscented transform-based prediction and hysteresis, as well as the variability of the cumulative ΔV as a function of maneuver location due to the periodic structure in the OPNAV-based filter's estimation accuracy.
Monotone Neural Barrier Certificates
This report presents a neurosymbolic framework for safety verification and control synthesis in high-dimensional monotone dynamical systems without relying on explicit models or conservative Lipschitz bounds. The approach combines the expressiveness of neural networks with the rigor of symbolic reasoning via barrier certificates, functional analogs of inductive invariants that formally guarantee safety. Prior data-driven methods often treat dynamics as black-box models, relying on dense state-space discretization or Lipschitz overapproximations, leading to exponential sample complexity. In contrast, monotonicity--a pervasive structural property in many real-world systems--provides a symbolic scaffold that simplifies both learning and verification. Exploiting order preservation reduces verification to localized boundary checks, transforming a high-dimensional problem into a tractable, low-dimensional one. Barrier certificates are synthesized using monotone neural network architectures with embedded monotonicity constraints--trained via gradient-based optimization guided by barrier conditions. This enables scalable, formally sound verification directly from simulation data, bridging black-box learning and formal guarantees within a unified neurosymbolic framework.
Robotics
NeuroManip: Prosthetic Hand Manipulation System Based on EMG and Eye Tracking Powered by the Neuromorphic Processor AltAi
This paper presents a novel neuromorphic control architecture for upper-limb prostheses that combines surface electromyography (sEMG) with gaze-guided computer vision. The system uses a spiking neural network deployed on the neuromorphic processor AltAi to classify EMG patterns in real time while an eye-tracking headset and scene camera identify the object within the user's focus. In our prototype, the same EMG recognition model that was originally developed for a conventional GPU is deployed as a spiking network on AltAi, achieving comparable accuracy while operating in a sub-watt power regime, which enables a lightweight, wearable implementation. For six distinct functional gestures recorded from upper-limb amputees, the system achieves robust recognition performance comparable to state-of-the-art myoelectric interfaces. When the vision pipeline restricts the decision space to three context-appropriate gestures for the currently viewed object, recognition accuracy increases to roughly 95% while excluding unsafe, object-inappropriate grasps. These results indicate that the proposed neuromorphic, context-aware controller can provide energy-efficient and reliable prosthesis control and has the potential to improve safety and usability in everyday activities for people with upper-limb amputation.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
Masked Depth Modeling for Spatial Perception
Spatial visual perception is a fundamental requirement in physical-world applications like autonomous driving and robotic manipulation, driven by the need to interact with 3D environments. Capturing pixel-aligned metric depth using RGB-D cameras would be the most viable way, yet it usually faces obstacles posed by hardware limitations and challenging imaging conditions, especially in the presence of specular or texture-less surfaces. In this work, we argue that the inaccuracies from depth sensors can be viewed as "masked" signals that inherently reflect underlying geometric ambiguities. Building on this motivation, we present LingBot-Depth, a depth completion model which leverages visual context to refine depth maps through masked depth modeling and incorporates an automated data curation pipeline for scalable training. It is encouraging to see that our model outperforms top-tier RGB-D cameras in terms of both depth precision and pixel coverage. Experimental results on a range of downstream tasks further suggest that LingBot-Depth offers an aligned latent representation across RGB and depth modalities. We release the code, checkpoint, and 3M RGB-depth pairs (including 2M real data and 1M simulated data) to the community of spatial perception.
comment: Tech report, 19 pages, 15 figures and 4 tables
PEAfowl: Perception-Enhanced Multi-View Vision-Language-Action for Bimanual Manipulation
Bimanual manipulation in cluttered scenes requires policies that remain stable under occlusions, viewpoint and scene variations. Existing vision-language-action models often fail to generalize because (i) multi-view features are fused via view-agnostic token concatenation, yielding weak 3D-consistent spatial understanding, and (ii) language is injected as global conditioning, resulting in coarse instruction grounding. In this paper, we introduce PEAfowl, a perception-enhanced multi-view VLA policy for bimanual manipulation. For spatial reasoning, PEAfowl predicts per-token depth distributions, performs differentiable 3D lifting, and aggregates local cross-view neighbors to form geometrically grounded, cross-view consistent representations. For instruction grounding, we propose to replace global conditioning with a Perceiver-style text-aware readout over frozen CLIP visual features, enabling iterative evidence accumulation. To overcome noisy and incomplete commodity depth without adding inference overhead, we apply training-only depth distillation from a pretrained depth teacher to supervise the depth-distribution head, providing perception front-end with geometry-aware priors. On RoboTwin 2.0 under domain-randomized setting, PEAfowl improves the strongest baseline by 23.0 pp in success rate, and real-robot experiments further demonstrate reliable sim-to-real transfer and consistent improvements from depth distillation. Project website: https://peafowlvla.github.io/.
Less Is More: Scalable Visual Navigation from Limited Data
Imitation learning provides a powerful framework for goal-conditioned visual navigation in mobile robots, enabling obstacle avoidance while respecting human preferences and social norms. However, its effectiveness depends critically on the quality and diversity of training data. In this work, we show how classical geometric planners can be leveraged to generate synthetic trajectories that complement costly human demonstrations. We train Less is More (LiMo), a transformer-based visual navigation policy that predicts goal-conditioned SE(2) trajectories from a single RGB observation, and find that augmenting limited expert demonstrations with planner-generated supervision yields substantial performance gains. Through ablations and complementary qualitative and quantitative analyses, we characterize how dataset scale and diversity affect planning performance. We demonstrate real-robot deployment and argue that robust visual navigation is enabled not by simply collecting more demonstrations, but by strategically curating diverse, high-quality datasets. Our results suggest that scalable, embodiment-specific geometric supervision is a practical path toward data-efficient visual navigation.
Delay-Compensated Stiffness Estimation for Robot-Mediated Dyadic Interaction
Robot-mediated human-human (dyadic) interactions enable therapists to provide physical therapy remotely, yet an accurate perception of patient stiffness remains challenging due to network-induced haptic delays. Conventional stiffness estimation methods, which neglect delay, suffer from temporal misalignment between force and position signals, leading to significant estimation errors as delays increase. To address this, we propose a robust, delay-compensated stiffness estimation framework by deriving an algebraic estimator based on quasi-static equilibrium that explicitly accounts for temporally aligning the expert's input with the novice's response. A Normalised Weighted Least Squares (NWLS) implementation is then introduced to robustly filter dynamic bias resulting from the algebraic derivation. Experiments using commercial rehabilitation robots (H-MAN) as the platform demonstrate that the proposed method significantly outperforms the standard estimator, maintaining consistent tracking accuracy under multiple introduced delays. These findings offer a promising solution for achieving high-fidelity haptic perception in remote dyadic interaction, potentially facilitating reliable stiffness assessment in therapeutic settings across networks.
Cross-Level Sensor Fusion with Object Lists via Transformer for 3D Object Detection
In automotive sensor fusion systems, smart sensors and Vehicle-to-Everything (V2X) modules are commonly utilized. Sensor data from these systems are typically available only as processed object lists rather than raw sensor data from traditional sensors. Instead of processing other raw data separately and then fusing them at the object level, we propose an end-to-end cross-level fusion concept with Transformer, which integrates highly abstract object list information with raw camera images for 3D object detection. Object lists are fed into a Transformer as denoising queries and propagated together with learnable queries through the latter feature aggregation process. Additionally, a deformable Gaussian mask, derived from the positional and size dimensional priors from the object lists, is explicitly integrated into the Transformer decoder. This directs attention toward the target area of interest and accelerates model training convergence. Furthermore, as there is no public dataset containing object lists as a standalone modality, we propose an approach to generate pseudo object lists from ground-truth bounding boxes by simulating state noise and false positives and negatives. As the first work to conduct cross-level fusion, our approach shows substantial performance improvements over the vision-based baseline on the nuScenes dataset. It demonstrates its generalization capability over diverse noise levels of simulated object lists and real detectors.
comment: 6 pages, 3 figures, accepted at IV2025
Who Is Responsible? Self-Adaptation Under Multiple Concurrent Uncertainties With Unknown Sources in Complex ROS-Based Systems
Robotic systems increasingly operate in dynamic, unpredictable environments, where tightly coupled sensors and software modules increase the probability of a single fault cascading across components and admitting multiple plausible strategies to resolve the underlying uncertainty. Most existing self-adaptive approaches that have been applied to robotics assume predefined one-to-one uncertainty-to-adaptation mappings. We present a ROS2-based self-adaptive approach building upon the MAPE-K feedback loop that addresses (1) multiple simultaneous uncertainties with differing criticality, (2) cascading uncertainties across components, and (3) multiple plausible resolving strategies per detected symptom. Central to our approach is an adaptation rule set which lets designers specify uncertainty patterns, assign criticality levels, and enumerate multiple plausible adaptation strategies. This rule set, combined with an automatically extracted live ROS2 dependency graph, enables lightweight root-cause analysis and strategy ranking to prioritize minimal and effective adaptations. Evaluations on an underwater robot scenario and a perception use case show that our approach can identify root causes among concurrent uncertainties, favours inexpensive adaptations, reduces unnecessary adaptations, and achieves performance comparable to existing baselines designed for sequential uncertainties. The code is publicly available.
Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone
This article presents a complete semantic scene understanding workflow using only a single 2D lidar. This fills the gap in 2D lidar semantic segmentation, thereby enabling the rethinking and enhancement of existing 2D lidar-based algorithms for application in various mobile robot tasks. It introduces the first publicly available 2D lidar semantic segmentation dataset and the first fine-grained semantic segmentation algorithm specifically designed for 2D lidar sensors on autonomous mobile robots. To annotate this dataset, we propose a novel semi-automatic semantic labeling framework that requires minimal human effort and provides point-level semantic annotations. The data was collected by three different types of 2D lidar sensors across twelve indoor environments, featuring a range of common indoor objects. Furthermore, the proposed semantic segmentation algorithm fully exploits raw lidar information -- position, range, intensity, and incident angle -- to deliver stochastic, point-wise semantic segmentation. We present a series of semantic occupancy grid mapping experiments and demonstrate two semantically-aware navigation control policies based on 2D lidar. These results demonstrate that the proposed semantic 2D lidar dataset, semi-automatic labeling framework, and segmentation algorithm are effective and can enhance different components of the robotic navigation pipeline. Multimedia resources are available at: https://youtu.be/P1Hsvj6WUSY.
Symphony: A Heuristic Normalized Calibrated Advantage Actor and Critic Algorithm in application for Humanoid Robots
In our work we implicitly suggest that it is a misconception to think that humans learn fast. The learning process takes time. Babies start learning to move in the restricted fluid environment of the womb. Children are often limited by underdeveloped body. Even adults are not allowed to participate in complex competitions right away. However, with robots, when learning from scratch, we often don't have the privilege of waiting for tens of millions of steps. "Swaddling" regularization is responsible for restraining an agent in rapid but unstable development penalizing action strength in a specific way not affecting actions directly. The Symphony, Transitional-policy Deterministic Actor and Critic algorithm, is a concise combination of different ideas for possibility of training humanoid robots from scratch with Sample Efficiency, Sample Proximity and Safety of Actions in mind. It is well known that continuous increase in Gaussian noise without appropriate smoothing is harmful for motors and gearboxes. Compared to Stochastic algorithms, we set limited parametric noise and promote a reduced strength of actions, safely increasing entropy, since the actions are submerged in weaker noise. When actions require more extreme values, actions rise above the weak noise. Training becomes empirically much safer for both the environment around and the robot's mechanisms. We use Fading Replay Buffer: using a fixed formula containing the hyperbolic tangent, we adjust the batch sampling probability: the memory contains a recent memory and a long-term memory trail. Fading Replay Buffer allows us to use Temporal Advantage when we improve the current Critic Network prediction compared to the exponential moving average. Temporal Advantage allows us to update the Actor and Critic in one pass, as well as combine the Actor and Critic in one Object and implement their Losses in one line.
comment: https://github.com/SuspensionRailway/symphony
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Application
One of the core advantages of SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. In the previous paper, the theoretical analysis of autonomy property of navigation model in inertial, earth and world frames was given. A construction method for SE2(3) group navigation model is proposed to improve the non-inertial navigation model toward full autonomy. This paper serves as a counterpart to previous paper and conducts the real-world strapdown inertial navigation system (SINS)/odometer(ODO) experiments as well as Monte-Carlo simulations to demonstrate the performance of improved SE2(3) group based high-precision navigation models.
comment: arXiv admin note: substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062
InteLiPlan: An Interactive Lightweight LLM-Based Planner for Domestic Robot Autonomy
We introduce an interactive LLM-based framework designed to enhance the autonomy and robustness of domestic robots, targeting embodied intelligence. Our approach reduces reliance on large-scale data and incorporates a robot-agnostic pipeline that embodies an LLM. Our framework, InteLiPlan, ensures that the LLM's decision-making capabilities are effectively aligned with robotic functions, enhancing operational robustness and adaptability, while our human-in-the-loop mechanism allows for real-time human intervention when user instruction is required. We evaluate our method in both simulation and on the real robot platforms, including a Toyota Human Support Robot and an ANYmal D robot with a Unitree Z1 arm. Our method achieves a 95% success rate in the `fetch me' task completion with failure recovery, highlighting its capability in both failure reasoning and task planning. InteLiPlan achieves comparable performance to state-of-the-art LLM-based robotics planners, while using only real-time onboard computing. Project website: https://kimtienly.github.io/InteLiPlan.
Vision-Proprioception Fusion with Mamba2 in End-to-End Reinforcement Learning for Motion Control
End-to-end reinforcement learning (RL) for motion control trains policies directly from sensor inputs to motor commands, enabling unified controllers for different robots and tasks. However, most existing methods are either blind (proprioception-only) or rely on fusion backbones with unfavorable compute-memory trade-offs. Recurrent controllers struggle with long-horizon credit assignment, and Transformer-based fusion incurs quadratic cost in token length, limiting temporal and spatial context. We present a vision-driven cross-modal RL framework built on SSD-Mamba2, a selective state-space backbone that applies state-space duality (SSD) to enable both recurrent and convolutional scanning with hardware-aware streaming and near-linear scaling. Proprioceptive states and exteroceptive observations (e.g., depth tokens) are encoded into compact tokens and fused by stacked SSD-Mamba2 layers. The selective state-space updates retain long-range dependencies with markedly lower latency and memory use than quadratic self-attention, enabling longer look-ahead, higher token resolution, and stable training under limited compute. Policies are trained end-to-end under curricula that randomize terrain and appearance and progressively increase scene complexity. A compact, state-centric reward balances task progress, energy efficiency, and safety. Across diverse motion-control scenarios, our approach consistently surpasses strong state-of-the-art baselines in return, safety (collisions and falls), and sample efficiency, while converging faster at the same compute budget. These results suggest that SSD-Mamba2 provides a practical fusion backbone for resource-constrained robotic and autonomous systems in engineering informatics applications.
comment: 6 figures and 8 tables. This paper has been accepted by Advanced Engineering Informatics
Path Planning using a One-shot-sampling Skeleton Map
Path planning algorithms fundamentally aim to compute collision-free paths, with many works focusing on finding the optimal distance path. However, for several applications, a more suitable approach is to balance response time, path safety, and path length. In this context, a skeleton map is a useful tool in graph-based schemes, as it provides an intrinsic representation of the free workspace. However, standard skeletonization algorithms are computationally expensive, as they are primarly oriented towards image processing tasks. We propose an efficient path-planning methodology that finds safe paths within an acceptable processing time. This methodology leverages a Deep Denoising Autoencoder (DDAE) based on the U-Net architecture to compute a skeletonized version of the navigation map, which we refer to as SkelUnet. The SkelUnet network facilitates exploration of the entire workspace through one-shot sampling (OSS), as opposed to the iterative or probabilistic sampling used by previous algorithms. SkelUnet is trained and tested on a dataset consisting of 12,500 two-dimensional dungeon maps. The motion planning methodology is evaluated in a simulation environment with an Unmanned Aerial Vehicle (UAV) in 250 previously unseen maps and assessed using several navigation metrics to quantify the navigability of the computed paths. The results demonstrate that using SkelUnet to construct the roadmap offers significant advantages, such as connecting all regions of free workspace, providing safer paths, and reducing processing time.
comment: Submitted to IEEE Latin America Transactions
Multiagent Systems
Eyes on the Mission: Mixed Methods Assessment of Eye-Tracker-Enabled Interactive Decision Support in a Simulated Unmanned Aerial Vehicle System
Supervisors in military command and control (C2) environments face dynamic conditions. Dynamically changing information continuously flows to the supervisors through multiple displays. In this environment, important pieces of information can be overlooked due to the complexity of tasks and environments. This study examined the efficacy of an eye-tracker-based adaptive attention-guided decision support tool (DST) for supervisors in a simulated C2 environment. The DST monitors supervisors' visual attention allocation in real time and displays visually salient cues if critical changes or events are missed. Twenty-five military students participated in a simulated intelligence task. Results indicated significant performance enhancement when the adaptive DST was present. Eye-tracking analysis also showed that longer, more frequent fixations on critical areas of interest were negatively correlated with performance. Additionally, post-experiment interviews revealed that the adaptive DST was unobtrusive and positively received. These findings underscore the potential of real-time gaze-based interventions to optimize supervisory decision-making. Future research could incorporate AI-driven approaches to better support supervisors in complex task environments.
comment: 27 pages, 6 figures, 4 tables, under review
Types for Grassroots Logic Programs
Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities. ``Logic Programs as Types for Logic Programs'' (LICS'91) defined types as regular sets of paths over derivable ground atoms. Here, we define types to be regular sets of moded paths, where a mode captures directionality of communication -- whether a subterm is consumed from or produced to the environment -- enabling the typing of interactive partial computations including those that eventually deadlock or fail, or never terminate. We provide a syntactic definition of well-typing and prove that a program is well-typed iff the path abstraction of its moded-atom semantics satisfies covariance and contravariance conditions with respect to its type. The GLP type system was implemented in Dart by AI, starting from a mathematical specification of Typed GLP (this paper), deriving from it an English spec (written by AI), and from the spec deriving Dart code (by AI). While GLP is naturally untyped, the motivation for Typed GLP comes from programming with AI: Asking AI to program complex communication modalities in GLP (and in general) and hoping for the best is a tenuous strategy. The emerging discipline we advocate and employ is for the human designer and AI to jointly develop and agree upon (1)~GLP types; (2)~GLP procedure type declarations; (3)~informal (English) descriptions of the procedures; and only then let AI attempt to write (4)~GLP code based on those.
TruthTensor: Evaluating LLMs through Human Imitation on Prediction Market under Drift and Holistic Reasoning
Evaluating language models and AI agents remains fundamentally challenging because static benchmarks fail to capture real-world uncertainty, distribution shift, and the gap between isolated task accuracy and human-aligned decision-making under evolving conditions. This paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures reasoning models not only as prediction engines but as human-imitation systems operating in socially-grounded, high-entropy environments. Building on forward-looking, contamination-free tasks, our framework anchors evaluation to live prediction markets and combines probabilistic scoring to provide a holistic view of model behavior. TruthTensor complements traditional correctness metrics with drift-centric diagnostics and explicit robustness checks for reproducibility. It specify human vs. automated evaluation roles, annotation protocols, and statistical testing procedures to ensure interpretability and replicability of results. In experiments across 500+ real markets (political, economic, cultural, technological), TruthTensor demonstrates that models with similar forecast accuracy can diverge markedly in calibration, drift, and risk-sensitivity, underscoring the need to evaluate models along multiple axes (accuracy, calibration, narrative stability, cost, and resource efficiency). TruthTensor therefore operationalizes modern evaluation best practices, clear hypothesis framing, careful metric selection, transparent compute/cost reporting, human-in-the-loop validation, and open, versioned evaluation contracts, to produce defensible assessments of LLMs in real-world decision contexts. We publicly released TruthTensor at https://truthtensor.com.
comment: 16 pages, 6 figures, 2 tables
TrustResearcher: Automating Knowledge-Grounded and Transparent Research Ideation with Multi-Agent Collaboration WWW
Agentic systems have recently emerged as a promising tool to automate literature-based ideation. However, current systems often remain black-box, with limited transparency or control for researchers. Our work introduces TrustResearcher, a multi-agent demo system for knowledge-grounded and transparent ideation. Specifically, TrustResearcher integrates meticulously designed four stages into a unified framework: (A) Structured Knowledge Curation, (B) Diversified Idea Generation, (C) Multi-stage Idea Selection, and (D) Expert Panel Review and Synthesis. Different from prior pipelines, our system not only exposes intermediate reasoning states, execution logs, and configurable agents for inspections, but also enables diverse and evidence-aligned idea generation. Our design is also domain-agnostic, where the same pipeline can be instantiated in any scientific field. As an illustrative case, we demonstrate TrustResearcher on a graph-mining scenario (k-truss breaking problem), where it generates distinct, plausible candidates with evidence and critiques. A live demo and source code are available at https://github.com/valleysprings/TrustResearcher
comment: Accepted to The Web Conference (WWW) 2026 as a demo paper. 6 pages, 2 figures
Systems and Control (EESS)
A Cherry-Picking Approach to Large Load Shaping for More Effective Carbon Reduction
Shaping multi-megawatt loads, such as data centers, impacts generator dispatch on the electric grid, which in turn affects system CO2 emissions and energy cost. Substantiating the effectiveness of prevalent load shaping strategies, such as those based on grid-level average carbon intensity, locational marginal price, or marginal emissions, is challenging due to the lack of detailed counterfactual data required for accurate attribution. This study uses a series of calibrated granular ERCOT day-ahead direct current optimal power flow (DC-OPF) simulations for counterfactual analysis of a broad set of load shaping strategies on grid CO2 emissions and cost of electricity. In terms of annual grid level CO2 emissions reductions, LMP-based shaping outperforms other common strategies, but can be significantly improved upon. Examining the performance of practicable strategies under different grid conditions motivates a more effective load shaping approach: one that "cherry-picks" a daily strategy based on observable grid signals and historical data. The cherry-picking approach to power load shaping is applicable to any large flexible consumer on the electricity grid, such as data centers, distributed energy resources and Virtual Power Plants (VPPs).
Data-driven nonparametric Li-ion battery ageing model aiming at learning from real operation data -- Part B: Cycling operation
Conventional Li-ion battery ageing models, such as electrochemical, semi-empirical and empirical models, require a significant amount of time and experimental resources to provide accurate predictions under realistic operating conditions. At the same time, there is significant interest from industry in the introduction of new data collection telemetry technology. This implies the forthcoming availability of a significant amount of real-world battery operation data. In this context, the development of ageing models able to learn from in-field battery operation data is an interesting solution to mitigate the need for exhaustive laboratory testing. In a series of two papers, a data-driven ageing model is developed for Li-ion batteries under the Gaussian Process framework. A special emphasis is placed on illustrating the ability of the Gaussian Process model to learn from new data observations, providing more accurate and confident predictions, and extending the operating window of the model. The first paper of the series focussed on the systematic modelling and experimental verification of cell degradation through calendar ageing. Conversantly, this second paper addresses the same research challenge when the cell is electrically cycled. A specific covariance function is composed, tailored for use in a battery ageing application. Over an extensive dataset involving 124 cells tested during more than three years, different training possibilities are contemplated in order to quantify the minimal number of laboratory tests required for the design of an accurate ageing model. A model trained with only 26 tested cells achieves an overall mean-absolute-error of 1.04% in the capacity curve prediction, after being validated under a broad window of both dynamic and static cycling temperatures, Depth-of-Discharge, middle-SOC, charging and discharging C-rates.
On maximum hands-off restricted hybrid control for discrete-time switched linear systems
This paper deals with design of maximum hands-off hybrid control sequences for discrete-time switched linear systems. It is a sparsest combination of a discrete control sequence (i.e. the switching sequence) and a continuous control sequence, both satisfying pre-specified restrictions on the admissible actions, that steers a given initial state of the switched system to the origin of the state-space in a pre-specified duration of time. Given the subsystems dynamics, the sets of admissible continuous and discrete control, the initial state and the time horizon, we present a new algorithm that, under certain conditions on the subsystems dynamics and the admissible control, designs maximum hands-off hybrid control sequences for the resulting switched system. The key apparatuses for our analysis are graph theory and linear algebra. Numerical examples are presented to demonstrate our results.
comment: 11 pages, 0 figures. Work under review
Data-driven nonparametric Li-ion battery ageing model aiming at learning from real operation data -- Part A: Storage operation
Conventional Li-ion battery ageing models, such as electrochemical, semi-empirical and empirical models, require a significant amount of time and experimental resources to provide accurate predictions under realistic operating conditions. At the same time, there is significant interest from industry in the introduction of new data collection telemetry technology. This implies the forthcoming availability of a significant amount of real-world battery operation data. In this context, the development of ageing models able to learn from in-field battery operation data is an interesting solution to mitigate the need for exhaustive laboratory testing. In a series of two papers, a data-driven ageing model is developed for Li-ion batteries under the Gaussian Process framework. A special emphasis is placed on illustrating the ability of the Gaussian Process model to learn from new data observations, providing more accurate and confident predictions, and extending the operating window of the model. This first paper focusses on the systematic modelling and experimental verification of cell degradation through calendar ageing. A specific covariance function is composed, tailored for use in a battery ageing application. Over an extensive dataset involving 32 cells tested during more than three years, different training possibilities are contemplated in order to quantify the minimal number of laboratory tests required for the design of an accurate ageing model. A model trained with only 18 tested cells achieves an overall mean-absolute-error of 0.53% in the capacity curves prediction, after being validated under a broad window of both dynamic and static temperature and SOC storage conditions.
Photovoltaic energy sharing: Implementation and tests on a real collective self-consumption system
This research study analyses different types of photovoltaic (PV) energy sharing in a collective self-consumption (CSC) real-case in the Izarbel technological park in France. The analysis is carried out above all from the point of view of the self-consumption rate (SCR) and the savings. After explaining the emergence of the self-consumption concept for the integration of renewable energies, the study case is described. The PV energy is produced in ESTIA1 building and consumed in ESTIA1, 2 and 4 buildings. The main IoT components used to implement the CSC are smart meters and the Tecsol TICs; devices based on the LoRa protocol to retrieve production and consumption data. Then, the characteristics of PV energy sharing in France are explained, in particular the three possible types of energy sharing/allocation (static, dynamic by default and customised dynamic) and the structure of the electricity bill. Finally, the three types of sharing are compared in four scenarios (without and with a data centre, for low and high solar radiation). The results show that the dynamic allocations lead to increases of the SCR and that the customised dynamic sharing increases savings.
Interpreting Moment Matrix Blocks Spectra using Mutual Shadow Area
The mutual shadow area of pairs of surface regions is used for guiding the study of the spectral components and rank of their wave interaction, as captured by the corresponding moment matrix blocks. It is demonstrated that the mutual shadow area provides an asymptotically accurate predictor of the location of the singular value curve knee. This predicted knee index is shown to partition the interacting parts of the range and domain of blocks into two subspaces that can be associated with different wave phenomena: an "aperture" subspace of dimension that scales with the subdomains area (or length in 2-D) and a remainder "diffraction" subspace of dimension that scales much slower with the electrical length, depending on the geometric configuration. For interactions between open surface domains typical for the common hierarchical partitioning in most fast solvers, the latter can be attributed to the domain edges visible by its interacting counterpart. For interactions in 3-D with a small aspect angles between the source and observers, the diffraction subspace dimension is dominant in determining the rank until fairly large electrical lengths are reached. This explains the delayed asymptotic scaling of ranks and impressive fast solver performance observed in recent literature for seemingly arbitrary scatterers with no special geometric characteristics. In the extreme cases of "endfire" reduced dimensionality interactions, where the shadow area vanishes, the diffraction governs also the asymptotic rank, which translates to superior asymptotic solver performance.
Space-Air-Ground-Integrated Networks: The BER vs. Residual Delay and Doppler Analysis
Perfect Doppler compensation and synchronization is nontrivial due to multi-path Doppler effects and Einstein's theory of relativity in the space-air-ground-integrated networks (SAGINs). Hence, by considering the residual Doppler and the synchronization delay, this paper investigates the bit-error-rate (BER) performance attained under time-varying correlated Shadowed-Rician SAGIN channels. First, a practical SAGIN model is harnessed, encompassing correlated Shadowed-Rician channels, the Snell's law-based path loss, atmospheric absorption, the line-of-sight Doppler compensation, elliptical satellite orbits, and Einstein's theory of relativity. Then, a specific correlation coefficient between the pilot and data symbols is derived in the context of correlated Shadowed-Rician Channels. By exploiting this correlation coefficient, the channel distribution is mimicked by a bi-variate Gamma distribution. Then, a closed-form BER formula is derived under employing least-square channel estimation and equalization for 16-QAM. Our analytical results indicate for a 300-km-altitude LEO that 1) the period of realistic elliptical orbits is around 0.8 seconds longer than that of the idealized circular orbits; and 2) the relativistic delay is lower than 1 $μs$ over a full LEO pass (from rise to set). Our numerical results for the L bands quantify the effects of: 1) the residual Doppler; 2) atmospheric shadowing; 3) synchronization errors; and 4) pilot overhead.
Delay-Compensated Stiffness Estimation for Robot-Mediated Dyadic Interaction
Robot-mediated human-human (dyadic) interactions enable therapists to provide physical therapy remotely, yet an accurate perception of patient stiffness remains challenging due to network-induced haptic delays. Conventional stiffness estimation methods, which neglect delay, suffer from temporal misalignment between force and position signals, leading to significant estimation errors as delays increase. To address this, we propose a robust, delay-compensated stiffness estimation framework by deriving an algebraic estimator based on quasi-static equilibrium that explicitly accounts for temporally aligning the expert's input with the novice's response. A Normalised Weighted Least Squares (NWLS) implementation is then introduced to robustly filter dynamic bias resulting from the algebraic derivation. Experiments using commercial rehabilitation robots (H-MAN) as the platform demonstrate that the proposed method significantly outperforms the standard estimator, maintaining consistent tracking accuracy under multiple introduced delays. These findings offer a promising solution for achieving high-fidelity haptic perception in remote dyadic interaction, potentially facilitating reliable stiffness assessment in therapeutic settings across networks.
A Capsule-Sized Multi-Wavelength Wireless Optical System for Edge-AI-Based Classification of Gastrointestinal Bleeding Flow Rate
Post-endoscopic gastrointestinal (GI) rebleeding frequently occurs within the first 72 hours after therapeutic hemostasis and remains a major cause of early morbidity and mortality. Existing non-invasive monitoring approaches primarily provide binary blood detection and lack quantitative assessment of bleeding severity or flow dynamic, limiting their ability to support timely clinical decision-making during this high-risk period. In this work, we developed a capsule-sized, multi-wavelength optical sensing wireless platform for order-of-magnitude-level classification of GI bleeding flow rate, leveraging transmission spectroscopy and low-power edge artificial intelligence. The system performs time-resolved, multi-spectral measurements and employs a lightweight two-dimensional convolutional neural network for on-device flow-rate classification, with physics-based validation confirming consistency with wavelength-dependent hemoglobin absorption behavior. In controlled in vitro experiments under simulated gastric conditions, the proposed approach achieved an overall classification accuracy of 98.75% across multiple bleeding flow-rate levels while robustly distinguishing diverse non-blood gastrointestinal interference. By performing embedded inference directly on the capsule electronics, the system reduced overall energy consumption by approximately 88% compared with continuous wireless transmission of raw data, making prolonged, battery-powered operation feasible. Extending capsule-based diagnostics beyond binary blood detection toward continuous, site-specific assessment of bleeding severity, this platform has the potential to support earlier identification of clinically significant rebleeding and inform timely re-intervention during post-endoscopic surveillance.
Composite Adaptive Control Barrier Functions for Safety-Critical Systems with Parametric Uncertainty
Control barrier functions guarantee safety but typically require accurate system models. Parametric uncertainty invalidates these guarantees. Existing robust methods maintain safety via worst-case bounds, limiting performance, while modular learning schemes decouple estimation from safety, permitting state violations during training. This paper presents the composite adaptive control barrier function (CaCBF) algorithm for nonlinear control-affine systems subject to linear parametric uncertainty. We derive adaptation laws from a composite energy function comprising a logarithmic safety barrier, a control Lyapunov function, and a parameter error term. We prove that CaCBF guarantees the forward invariance of the safe set and the uniform boundedness of the closed-loop system. This safety guarantee holds without requiring parameter convergence. Simulations of adaptive cruise control, an omnidirectional robot, and a planar drone demonstrate the efficacy of the CaCBF algorithm.
Battery-less Long-Range LTE-M Water Leak Detector
This work presents a self powered water leak sensor that eliminates both batteries and local gateways. The design integrates a dual compartment electrochemical harvester, a low input boost converter with supercapacitor storage, and a comparator gated LTE-M radio built on the Nordic Thingy:91 platform. Laboratory tests confirm that the system can be awakened from a dormant state in the presence of water, harvest sufficient energy, and issue repeated cloud beacons using the water exposure as the power source. Beyond conventional LTE-M deployments, the system's compatibility with 3GPP standard cellular protocols paves the way for future connectivity via non terrestrial 5G networks, enabling coverage in infrastructure scarce regions.
Battery-Free and Gateway-Free Cellular IoT Water Leak Detection System
This paper presents a battery-free and gateway-free water leak detection system capable of direct communication over LTE-M (Cat-M1). The system operates solely on energy harvested through a hydroelectric mechanism driven by an electrochemical sensor, thereby removing the need for conventional batteries. To address the stringent startup and operational power demands of LTE-M transceivers, the architecture incorporates a compartmentalized sensing module and a dedicated power management subsystem, comprising a boost converter, supercapacitor based energy storage, and a hysteresis controlled load isolation circuit. This design enables autonomous, direct to cloud data transmission without reliance on local networking infrastructure. Experimental results demonstrate consistent LTE-M beacon transmissions triggered by water induced energy generation, underscoring the system's potential for sustainable, maintenance free, and globally scalable IoT leak detection applications in smart infrastructure.
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Application
One of the core advantages of SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. In the previous paper, the theoretical analysis of autonomy property of navigation model in inertial, earth and world frames was given. A construction method for SE2(3) group navigation model is proposed to improve the non-inertial navigation model toward full autonomy. This paper serves as a counterpart to previous paper and conducts the real-world strapdown inertial navigation system (SINS)/odometer(ODO) experiments as well as Monte-Carlo simulations to demonstrate the performance of improved SE2(3) group based high-precision navigation models.
comment: arXiv admin note: substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062. substantial text overlap with arXiv:2601.16062
Vision-Proprioception Fusion with Mamba2 in End-to-End Reinforcement Learning for Motion Control
End-to-end reinforcement learning (RL) for motion control trains policies directly from sensor inputs to motor commands, enabling unified controllers for different robots and tasks. However, most existing methods are either blind (proprioception-only) or rely on fusion backbones with unfavorable compute-memory trade-offs. Recurrent controllers struggle with long-horizon credit assignment, and Transformer-based fusion incurs quadratic cost in token length, limiting temporal and spatial context. We present a vision-driven cross-modal RL framework built on SSD-Mamba2, a selective state-space backbone that applies state-space duality (SSD) to enable both recurrent and convolutional scanning with hardware-aware streaming and near-linear scaling. Proprioceptive states and exteroceptive observations (e.g., depth tokens) are encoded into compact tokens and fused by stacked SSD-Mamba2 layers. The selective state-space updates retain long-range dependencies with markedly lower latency and memory use than quadratic self-attention, enabling longer look-ahead, higher token resolution, and stable training under limited compute. Policies are trained end-to-end under curricula that randomize terrain and appearance and progressively increase scene complexity. A compact, state-centric reward balances task progress, energy efficiency, and safety. Across diverse motion-control scenarios, our approach consistently surpasses strong state-of-the-art baselines in return, safety (collisions and falls), and sample efficiency, while converging faster at the same compute budget. These results suggest that SSD-Mamba2 provides a practical fusion backbone for resource-constrained robotic and autonomous systems in engineering informatics applications.
comment: 6 figures and 8 tables. This paper has been accepted by Advanced Engineering Informatics
Robotics
Quantum-Inspired Episode Selection for Monte Carlo Reinforcement Learning via QUBO Optimization
Monte Carlo (MC) reinforcement learning suffers from high sample complexity, especially in environments with sparse rewards, large state spaces, and correlated trajectories. We address these limitations by reformulating episode selection as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solving it with quantum-inspired samplers. Our method, MC+QUBO, integrates a combinatorial filtering step into standard MC policy evaluation: from each batch of trajectories, we select a subset that maximizes cumulative reward while promoting state-space coverage. This selection is encoded as a QUBO, where linear terms favor high-reward episodes and quadratic terms penalize redundancy. We explore both Simulated Quantum Annealing (SQA) and Simulated Bifurcation (SB) as black-box solvers within this framework. Experiments in a finite-horizon GridWorld demonstrate that MC+QUBO outperforms vanilla MC in convergence speed and final policy quality, highlighting the potential of quantum-inspired optimization as a decision-making subroutine in reinforcement learning.
comment: Proceedings of Machine Learning Research tbd: 1_13, 2025 International Conference on Computational Optimization
Correct-by-Construction Vision-based Pose Estimation using Geometric Generative Models
We consider the problem of vision-based pose estimation for autonomous systems. While deep neural networks have been successfully used for vision-based tasks, they inherently lack provable guarantees on the correctness of their output, which is crucial for safety-critical applications. We present a framework for designing certifiable neural networks (NNs) for perception-based pose estimation that integrates physics-driven modeling with learning-based estimation. The proposed framework begins by leveraging the known geometry of planar objects commonly found in the environment, such as traffic signs and runway markings, referred to as target objects. At its core, it introduces a geometric generative model (GGM), a neural-network-like model whose parameters are derived from the image formation process of a target object observed by a camera. Once designed, the GGM can be used to train NN-based pose estimators with certified guarantees in terms of their estimation errors. We first demonstrate this framework in uncluttered environments, where the target object is the only object present in the camera's field of view. We extend this using ideas from NN reachability analysis to design certified object NN that can detect the presence of the target object in cluttered environments. Subsequently, the framework consolidates the certified object detector with the certified pose estimator to design a multi-stage perception pipeline that generalizes the proposed approach to cluttered environments, while maintaining its certified guarantees. We evaluate the proposed framework using both synthetic and real images of various planar objects commonly encountered by autonomous vehicles. Using images captured by an event-based camera, we show that the trained encoder can effectively estimate the pose of a traffic sign in accordance with the certified bound provided by the framework.
AsterNav: Autonomous Aerial Robot Navigation In Darkness Using Passive Computation
Autonomous aerial navigation in absolute darkness is crucial for post-disaster search and rescue operations, which often occur from disaster-zone power outages. Yet, due to resource constraints, tiny aerial robots, perfectly suited for these operations, are unable to navigate in the darkness to find survivors safely. In this paper, we present an autonomous aerial robot for navigation in the dark by combining an Infra-Red (IR) monocular camera with a large-aperture coded lens and structured light without external infrastructure like GPS or motion-capture. Our approach obtains depth-dependent defocus cues (each structured light point appears as a pattern that is depth dependent), which acts as a strong prior for our AsterNet deep depth estimation model. The model is trained in simulation by generating data using a simple optical model and transfers directly to the real world without any fine-tuning or retraining. AsterNet runs onboard the robot at 20 Hz on an NVIDIA Jetson Orin$^\text{TM}$ Nano. Furthermore, our network is robust to changes in the structured light pattern and relative placement of the pattern emitter and IR camera, leading to simplified and cost-effective construction. We successfully evaluate and demonstrate our proposed depth navigation approach AsterNav using depth from AsterNet in many real-world experiments using only onboard sensing and computation, including dark matte obstacles and thin ropes (diameter 6.25mm), achieving an overall success rate of 95.5% with unknown object shapes, locations and materials. To the best of our knowledge, this is the first work on monocular, structured-light-based quadrotor navigation in absolute darkness.
comment: 8 pages, 10 figures, Published in IEEE Robotics And Automation Letters
MetaWorld: Skill Transfer and Composition in a Hierarchical World Model for Grounding High-Level Instructions ICLR 2026
Humanoid robot loco-manipulation remains constrained by the semantic-physical gap. Current methods face three limitations: Low sample efficiency in reinforcement learning, poor generalization in imitation learning, and physical inconsistency in VLMs. We propose MetaWorld, a hierarchical world model that integrates semantic planning and physical control via expert policy transfer. The framework decouples tasks into a VLM-driven semantic layer and a latent dynamics model operating in a compact state space. Our dynamic expert selection and motion prior fusion mechanism leverages a pre-trained multi-expert policy library as transferable knowledge, enabling efficient online adaptation via a two-stage framework. VLMs serve as semantic interfaces, mapping instructions to executable skills and bypassing symbol grounding. Experiments on Humanoid-Bench show MetaWorld outperforms world model-based RL in task completion and motion coherence. Our code will be found at https://anonymous.4open.science/r/metaworld-2BF4/
comment: 8 pages, 4 figures, Submitted to ICLR 2026 World Model Workshop
EquiForm: Noise-Robust SE(3)-Equivariant Policy Learning from 3D Point Clouds
Visual imitation learning with 3D point clouds has advanced robotic manipulation by providing geometry-aware, appearance-invariant observations. However, point cloud-based policies remain highly sensitive to sensor noise, pose perturbations, and occlusion-induced artifacts, which distort geometric structure and break the equivariance assumptions required for robust generalization. Existing equivariant approaches primarily encode symmetry constraints into neural architectures, but do not explicitly correct noise-induced geometric deviations or enforce equivariant consistency in learned representations. We introduce EquiForm, a noise-robust SE(3)-equivariant policy learning framework for point cloud-based manipulation. EquiForm formalizes how noise-induced geometric distortions lead to equivariance deviations in observation-to-action mappings, and introduces a geometric denoising module to restore consistent 3D structure under noisy or incomplete observations. In addition, we propose a contrastive equivariant alignment objective that enforces representation consistency under both rigid transformations and noise perturbations. Built upon these components, EquiForm forms a flexible policy learning pipeline that integrates noise-robust geometric reasoning with modern generative models. We evaluate EquiForm on 16 simulated tasks and 4 real-world manipulation tasks across diverse objects and scene layouts. Compared to state-of-the-art point cloud imitation learning methods, EquiForm achieves an average improvement of 17.2% in simulation and 28.1% in real-world experiments, demonstrating strong noise robustness and spatial generalization.
comment: Project website: https://ZhangZhiyuanZhang.github.io/equiform-website/ Code will be released
PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes
Humanoid robots hold great potential for diverse interactions and daily service tasks within human-centered environments, necessitating controllers that seamlessly integrate precise locomotion with dexterous manipulation. However, most existing whole-body controllers lack exteroceptive awareness of the surrounding environment, rendering them insufficient for stable task execution in complex, unstructured scenarios.To address this challenge, we propose PILOT, a unified single-stage reinforcement learning (RL) framework tailored for perceptive loco-manipulation, which synergizes perceptive locomotion and expansive whole-body control within a single policy. To enhance terrain awareness and ensure precise foot placement, we design a cross-modal context encoder that fuses prediction-based proprioceptive features with attention-based perceptive representations. Furthermore, we introduce a Mixture-of-Experts (MoE) policy architecture to coordinate diverse motor skills, facilitating better specialization across distinct motion patterns. Extensive experiments in both simulation and on the physical Unitree G1 humanoid robot validate the efficacy of our framework. PILOT demonstrates superior stability, command tracking precision, and terrain traversability compared to existing baselines. These results highlight its potential to serve as a robust, foundational low-level controller for loco-manipulation in unstructured scenes.
comment: 8 pages, 4 figures
Scaling Rough Terrain Locomotion with Automatic Curriculum Reinforcement Learning
Curriculum learning has demonstrated substantial effectiveness in robot learning. However, it still faces limitations when scaling to complex, wide-ranging task spaces. Such task spaces often lack a well-defined difficulty structure, making the difficulty ordering required by previous methods challenging to define. We propose a Learning Progress-based Automatic Curriculum Reinforcement Learning (LP-ACRL) framework, which estimates the agent's learning progress online and adaptively adjusts the task-sampling distribution, thereby enabling automatic curriculum generation without prior knowledge of the difficulty distribution over the task space. Policies trained with LP-ACRL enable the ANYmal D quadruped to achieve and maintain stable, high-speed locomotion at 2.5 m/s linear velocity and 3.0 rad/s angular velocity across diverse terrains, including stairs, slopes, gravel, and low-friction flat surfaces--whereas previous methods have generally been limited to high speeds on flat terrain or low speeds on complex terrain. Experimental results demonstrate that LP-ACRL exhibits strong scalability and real-world applicability, providing a robust baseline for future research on curriculum generation in complex, wide-ranging robotic learning task spaces.
DiffusionCinema: Text-to-Aerial Cinematography
We propose a novel Unmanned Aerial Vehicles (UAV) assisted creative capture system that leverages diffusion models to interpret high-level natural language prompts and automatically generate optimal flight trajectories for cinematic video recording. Instead of manually piloting the drone, the user simply describes the desired shot (e.g., "orbit around me slowly from the right and reveal the background waterfall"). Our system encodes the prompt along with an initial visual snapshot from the onboard camera, and a diffusion model samples plausible spatio-temporal motion plans that satisfy both the scene geometry and shot semantics. The generated flight trajectory is then executed autonomously by the UAV to record smooth, repeatable video clips that match the prompt. User evaluation using NASA-TLX showed a significantly lower overall workload with our interface (M = 21.6) compared to a traditional remote controller (M = 58.1), demonstrating a substantial reduction in perceived effort. Mental demand (M = 11.5 vs. 60.5) and frustration (M = 14.0 vs. 54.5) were also markedly lower for our system, confirming clear usability advantages in autonomous text-driven flight control. This project demonstrates a new interaction paradigm: text-to-cinema flight, where diffusion models act as the "creative operator" converting story intentions directly into aerial motion.
Eye-Tracking-Driven Control in Daily Task Assistance for Assistive Robotic Arms
Shared control improves Human-Robot Interaction by reducing the user's workload and increasing the robot's autonomy. It allows robots to perform tasks under the user's supervision. Current eye-tracking-driven approaches face several challenges. These include accuracy issues in 3D gaze estimation and difficulty interpreting gaze when differentiating between multiple tasks. We present an eye-tracking-driven control framework, aimed at enabling individuals with severe physical disabilities to perform daily tasks independently. Our system uses task pictograms as fiducial markers combined with a feature matching approach that transmits data of the selected object to accomplish necessary task related measurements with an eye-in-hand configuration. This eye-tracking control does not require knowledge of the user's position in relation to the object. The framework correctly interpreted object and task selection in up to 97.9% of measurements. Issues were found in the evaluation, that were improved and shared as lessons learned. The open-source framework can be adapted to new tasks and objects due to the integration of state-of-the-art object detection models.
comment: 23 pages, 6 figures, publication in review process
Real-Time Synchronized Interaction Framework for Emotion-Aware Humanoid Robots
As humanoid robots increasingly introduced into social scene, achieving emotionally synchronized multimodal interaction remains a significant challenges. To facilitate the further adoption and integration of humanoid robots into service roles, we present a real-time framework for NAO robots that synchronizes speech prosody with full-body gestures through three key innovations: (1) A dual-channel emotion engine where large language model (LLM) simultaneously generates context-aware text responses and biomechanically feasible motion descriptors, constrained by a structured joint movement library; (2) Duration-aware dynamic time warping for precise temporal alignment of speech output and kinematic motion keyframes; (3) Closed-loop feasibility verification ensuring gestures adhere to NAO's physical joint limits through real-time adaptation. Evaluations show 21% higher emotional alignment compared to rule-based systems, achieved by coordinating vocal pitch (arousal-driven) with upper-limb kinematics while maintaining lower-body stability. By enabling seamless sensorimotor coordination, this framework advances the deployment of context-aware social robots in dynamic applications such as personalized healthcare, interactive education, and responsive customer service platforms.
comment: 6 pages, 6 figures
EMPM: Embodied MPM for Modeling and Simulation of Deformable Objects
Modeling deformable objects - especially continuum materials - in a way that is physically plausible, generalizable, and data-efficient remains challenging across 3D vision, graphics, and robotic manipulation. Many existing methods oversimplify the rich dynamics of deformable objects or require large training sets, which often limits generalization. We introduce embodied MPM (EMPM), a deformable object modeling and simulation framework built on a differentiable Material Point Method (MPM) simulator that captures the dynamics of challenging materials. From multi-view RGB-D videos, our approach reconstructs geometry and appearance, then uses an MPM physics engine to simulate object behavior by minimizing the mismatch between predicted and observed visual data. We further optimize MPM parameters online using sensory feedback, enabling adaptive, robust, and physics-aware object representations that open new possibilities for robotic manipulation of complex deformables. Experiments show that EMPM outperforms spring-mass baseline models. Project website: https://embodied-mpm.github.io.
Quantifying Ergonomics in the Elevate Soft Robotic Suit
Soft robotic suits have the potential to rehabilitate, assist, and augment the human body. The low weight, cost, and minimal form-factor of these devices make them ideal for daily use by both healthy and impaired individuals. However, challenges associated with data-driven, user-specific, and comfort-first design of human-robot interfaces using soft materials limit their widespread translation and adoption. In this work, we present the quantitative evaluation of ergonomics and comfort of the Elevate suit - a cable driven soft robotic suit that assists shoulder elevation. Using a motion-capture system and force sensors, we measured the suit's ergonomics during assisted shoulder elevation up to 70 degrees. Two 4-hour sessions were conducted with one subject, involving transmitting cable tensions of up to 200N with no discomfort reported. We estimated that the pressure applied to the shoulder during assisted movements was within the range seen in a human grasp (approximately 69.1-85.1kPa), and estimated volumetric compression of <3% and <8% across the torso and upper arm, respectively. These results provide early validation of Elevate's ergonomic design in preparation for future studies with patient groups.
comment: 5 pages, 3 figures. Submitted to IEEE-EMBC 2026
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects
A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captured through articulated objects, which are essential for tasks such as physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or strong assumptions from hand-curated datasets, which hinders scalability. In this paper, we introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions. Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry. To achieve this, we combine MCTS search for structural inference with geometry-driven optimization for joint reasoning, producing physically consistent and functionally valid descriptions. We evaluate Kinematify on diverse inputs from both synthetic and real-world environments, demonstrating improvements in registration and kinematic topology accuracy over prior work.
comment: Project Page: https://sites.google.com/deemos.com/kinematify
Monocular pose estimation of articulated open surgery tools -- in the wild
This work presents a framework for monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, specularity, occlusions, and synthetic-to-real domain adaptation. The proposed approach consists of three main components: $(1)$ synthetic data generation pipeline that incorporates 3D scanning of surgical tools with articulation rigging and physically-based rendering; $(2)$ a tailored pose estimation framework combining tool detection with pose and articulation estimation; and $(3)$ a training strategy on synthetic and real unannotated video data, employing domain adaptation with automatically generated pseudo-labels. Evaluations conducted on real data of open surgery demonstrate the good performance and real-world applicability of the proposed framework, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.
comment: Author Accepted Manuscript (AAM)
PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning
Robotic manipulation systems benefit from complementary sensing modalities, where each provides unique environmental information. Point clouds capture detailed geometric structure, while RGB images provide rich semantic context. Current point cloud methods struggle to capture fine-grained detail, especially for complex tasks, which RGB methods lack geometric awareness, which hinders their precision and generalization. We introduce PointMapPolicy, a novel approach that conditions diffusion policies on structured grids of points without downsampling. The resulting data type makes it easier to extract shape and spatial relationships from observations, and can be transformed between reference frames. Yet due to their structure in a regular grid, we enable the use of established computer vision techniques directly to 3D data. Using xLSTM as a backbone, our model efficiently fuses the point maps with RGB data for enhanced multi-modal perception. Through extensive experiments on the RoboCasa and CALVIN benchmarks and real robot evaluations, we demonstrate that our method achieves state-of-the-art performance across diverse manipulation tasks. The overview and demos are available on our project page: https://point-map.github.io/Point-Map/
Accurate Calibration and Robust LiDAR-Inertial Odometry for Spinning Actuated LiDAR Systems
Accurate calibration and robust localization are fundamental for downstream tasks in spinning actuated LiDAR applications. Existing methods, however, require parameterizing extrinsic parameters based on different mounting configurations, limiting their generalizability. Additionally, spinning actuated LiDAR inevitably scans featureless regions, which complicates the balance between scanning coverage and localization robustness. To address these challenges, this letter presents a targetless LiDAR-motor calibration (LM-Calibr) on the basis of the Denavit-Hartenberg convention and an environmental adaptive LiDAR-inertial odometry (EVA-LIO). LM-Calibr supports calibration of LiDAR-motor systems with various mounting configurations. Extensive experiments demonstrate its accuracy and convergence across different scenarios, mounting angles, and initial values. Additionally, EVA-LIO adaptively selects downsample rates and map resolutions according to spatial scale. This adaptivity enables the actuator to operate at maximum speed, thereby enhancing scanning completeness while ensuring robust localization, even when LiDAR briefly scans featureless areas. The source code and hardware design are available on GitHub: \textcolor{blue}{\href{https://github.com/zijiechenrobotics/lm_calibr}{github.com/zijiechenrobotics/lm\_calibr}}. The video is available at \textcolor{blue}{\href{https://youtu.be/cZyyrkmeoSk}{youtu.be/cZyyrkmeoSk}}
comment: This article has been accepted for publication in IEEE Robotics and Automation Letters (RA-L). Personal use is permitted. All other uses require IEEE permission
LangForce: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose LangForce, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, LangForce significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
Haptic Light-Emitting Diodes: Miniature, Luminous Tactile Actuators
We present Haptic Light-Emitting Diodes (HLEDs), luminous thermopneumatic actuators that directly convert pulsed light into mechanical forces and displacements. Each device packages a miniature surface-mount LED in a gas-filled cavity that contains a low-inertia graphite photoabsorber. The cavity is sealed by an elastic membrane, which functions as a working diaphragm. Brief optical pulses heat the photoabsorber, which heats the gas. The resulting rapid pressure increases generate forces and displacements at the working diaphragm. Millimeter-scale HLEDs produce forces exceeding 0.4 N and displacements of 0.9 mm at low voltages, with 5 to 100 ms response times, making them attractive as actuators providing tactile feedback in human-machine interfaces. Unusually, these actuators are also light-emitting, as a fraction of optical energy is transmitted through the membrane. These photomechanical actuators have many potential applications in tactile displays, human interface engineering, wearable computing, and other areas.
Point Bridge: 3D Representations for Cross Domain Policy Learning
Robot foundation models are beginning to deliver on the promise of generalist robotic agents, yet progress remains constrained by the scarcity of large-scale real-world manipulation datasets. Simulation and synthetic data generation offer a scalable alternative, but their usefulness is limited by the visual domain gap between simulation and reality. In this work, we present Point Bridge, a framework that leverages unified, domain-agnostic point-based representations to unlock synthetic datasets for zero-shot sim-to-real policy transfer, without explicit visual or object-level alignment. Point Bridge combines automated point-based representation extraction via Vision-Language Models (VLMs), transformer-based policy learning, and efficient inference-time pipelines to train capable real-world manipulation agents using only synthetic data. With additional co-training on small sets of real demonstrations, Point Bridge further improves performance, substantially outperforming prior vision-based sim-and-real co-training methods. It achieves up to 44% gains in zero-shot sim-to-real transfer and up to 66% with limited real data across both single-task and multitask settings. Videos of the robot are best viewed at: https://pointbridge3d.github.io/
Model-free source seeking of exponentially convergent unicycle: theoretical and robotic experimental results
This paper introduces a novel model-free, real-time unicycle-based source seeking design. This design autonomously steers the unicycle dynamic system towards the extremum point of an objective function or physical/scalar signal that is unknown expression-wise, but accessible via measurements. A key contribution of this paper is that the introduced design converges exponentially to the extremum point of objective functions (or scalar signals) that behave locally like a higher-degree power function (e.g., fourth-degree polynomial function) as opposed to locally quadratic objective functions, the usual case in literature. We provide theoretical results and design characterization, supported by a variety of simulation results that demonstrate the robustness of the proposed design, including cases with different initial conditions and measurement delays/noise. Also, for the first time in the literature, we provide experimental robotic results that demonstrate the effectiveness of the proposed design and its exponential convergence ability. These experimental results confirm that the proposed exponentially convergent extremum seeking design can be practically realized on a physical robotic platform under real-world sensing and actuation constraints.
Multiagent Systems
Efficient Self-Learning and Model Versioning for AI-native O-RAN Edge
The AI-native vision of 6G requires Radio Access Networks to train, deploy, and continuously refine thousands of machine learning (ML) models that drive real-time radio network optimization. Although the Open RAN (O-RAN) architecture provides open interfaces and an intelligent control plane, it leaves the life-cycle management of these models unspecified. Consequently, operators still rely on ad-hoc, manual update practices that can neither scale across the heterogeneous, multi-layer stack of Cell-Site, Edge-, Regional-, and Central-Cloud domains, nor across the three O-RAN control loops (real-, near-real-, and non-real-time). We present a self-learning framework that provides an efficient closed-loop version management for an AI-native O-RAN edge. In this framework, training pipelines in the Central/Regional Cloud continuously generate new models, which are cataloged along with their resource footprints, security scores, and accuracy metrics in a shared version repository. An Update Manager consults this repository and applies a self-learning policy to decide when and where each new model version should be promoted into operation. A container orchestrator then realizes these decisions across heterogeneous worker nodes, enabling multiple services (rApps, xApps, and dApps) to obtain improved inference with minimal disruption. Simulation results show that an efficient RL-driven decision-making can guarantee quality of service, bounded latencies while balancing model accuracy, system stability, and resilience.
comment: This paper is uploaded here for research community, thus it is for non-commercial purposes
Embodiment-Induced Coordination Regimes in Tabular Multi-Agent Q-Learning
Centralized value learning is often assumed to improve coordination and stability in multi-agent reinforcement learning, yet this assumption is rarely tested under controlled conditions. We directly evaluate it in a fully tabular predator-prey gridworld by comparing independent and centralized Q-learning under explicit embodiment constraints on agent speed and stamina. Across multiple kinematic regimes and asymmetric agent roles, centralized learning fails to provide a consistent advantage and is frequently outperformed by fully independent learning, even under full observability and exact value estimation. Moreover, asymmetric centralized-independent configurations induce persistent coordination breakdowns rather than transient learning instability. By eliminating confounding effects from function approximation and representation learning, our tabular analysis isolates coordination structure as the primary driver of these effects. The results show that increased coordination can become a liability under embodiment constraints, and that the effectiveness of centralized learning is fundamentally regime and role dependent rather than universal.
Collective intelligence in science: direct elicitation of diverse information from experts with unknown information structure
Suppose we need a deep collective analysis of an open scientific problem: there is a complex scientific hypothesis and a large online group of mutually unrelated experts with relevant private information of a diverse and unpredictable nature. This information may be results of experts' individual experiments, original reasoning of some of them, results of AI systems they use, etc. We propose a simple mechanism based on a self-resolving play-money prediction market entangled with a chat. We show that such a system can easily be brought to an equilibrium where participants directly share their private information on the hypothesis through the chat and trade as if the market were resolved in accordance with the truth of the hypothesis. This approach will lead to efficient aggregation of relevant information in a completely interpretable form even if the ground truth cannot be established and experts initially know nothing about each other and cannot perform complex Bayesian calculations. Finally, by rewarding the experts with some real assets proportionally to the play money they end up with, we can get an innovative way to fund large-scale collaborative studies of any type.
comment: 21 pages
Agent Contracts: A Formal Framework for Resource-Bounded Autonomous AI Systems
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
comment: v2: Corrected author names in references
Collaborative Belief Reasoning with LLMs for Efficient Multi-Agent Collaboration
Effective real-world multi-agent collaboration requires not only accurate planning but also the ability to reason about collaborators' intents--a crucial capability for avoiding miscoordination and redundant communication under partial observable environments. Due to their strong planning and reasoning capabilities, large language models (LLMs) have emerged as promising autonomous agents for collaborative task solving. However, existing collaboration frameworks for LLMs overlook their reasoning potential for dynamic intent inference, and thus produce inconsistent plans and redundant communication, reducing collaboration efficiency. To bridge this gap, we propose CoBel-World, a novel framework that equips LLM agents with a Collaborative Belief World--an internal representation jointly modeling the physical environment and collaborators' mental states. CoBel-World enables agents to parse external open-world knowledge into structured beliefs via a symbolic belief representation module, and perform zero-shot Bayesian-style belief updates through LLM reasoning. This allows agents to proactively detect potential miscoordination (e.g., conflicting plans) and communicate adaptively. Evaluated on challenging embodied benchmarks (i.e., TDW-MAT and C-WAH), CoBel-World significantly reduces communication costs by 64-79% and improves task completion efficiency by 4-28% compared to the strongest baseline. Our results show that explicit, intent-aware belief modeling is essential for efficient and human-like collaboration in LLM-based multi-agent systems.
The incompatibility of the Condorcet winner and loser criteria with positive involvement and resolvability
We prove that there is no preferential voting method satisfying the Condorcet winner and loser criteria, positive involvement (if a candidate $x$ wins in an initial preference profile, then adding a voter who ranks $x$ uniquely first cannot cause $x$ to lose), and $n$-voter resolvability (if $x$ initially ties for winning, then $x$ can be made the unique winner by adding some set of up to $n$ voters). This impossibility theorem holds for any positive integer $n$. In a previous note, we proved an analogous result assuming an additional axiom of ordinal margin invariance, which we now show is unnecessary for an impossibility theorem, at least if the desired voting method is defined for five-candidate elections.
comment: 6 pages, 2 figures. Updated Figures 1 and 2 and added Remark 5
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
Despite recent advances, long-sequence video generation frameworks still suffer from significant limitations: poor assistive capability, suboptimal visual quality, and limited expressiveness. To mitigate these limitations, we propose MAViS, a multi-agent collaborative framework designed to assist in long-sequence video storytelling by efficiently translating ideas into visual narratives. MAViS orchestrates specialized agents across multiple stages, including script writing, shot designing, character modeling, keyframe generation, video animation, and audio generation. In each stage, agents operate under the 3E Principle -- Explore, Examine, and Enhance -- to ensure the completeness of intermediate outputs. Considering the capability limitations of current generative models, we propose the Script Writing Guidelines to optimize compatibility between scripts and generative tools. Experimental results demonstrate that MAViS achieves state-of-the-art performance in assistive capability, visual quality, and video expressiveness. Its modular framework further enables scalability with diverse generative models and tools. With just a brief idea description, MAViS enables users to rapidly explore diverse visual storytelling and creative directions for sequential video generation by efficiently producing high-quality, complete long-sequence videos. To the best of our knowledge, MAViS is the only framework that provides multimodal design output -- videos with narratives and background music.
comment: Video Generation Agent
Systems and Control (EESS)
Higher-Order Gravitational Models: A Tutorial on Spherical Harmonics and the Newtonian Model
Accurate modeling of gravitational interactions is fundamental to the analysis, prediction, and control of space systems. While the Newtonian point-mass approximation suffices for many preliminary studies, real celestial bodies exhibit deviations from spherical symmetry, including oblateness, localized mass concentrations, and higher-order shape irregularities. These features can significantly perturb spacecraft trajectories, especially in low-altitude or long-duration missions, leading to cumulative orbit prediction errors and increased control demands. This article presents a tutorial introduction to spherical harmonic gravity models, outlining their theoretical foundations and underlying assumptions. Higher-order gravitational fields are derived as solutions to the Laplace equation, providing a systematic framework to capture the effects of non-uniform mass distributions. The impact of these higher-order terms on orbital dynamics is illustrated through examples involving Low Earth Orbit satellites and spacecraft near irregularly shaped asteroids, highlighting the practical significance of moving beyond the point-mass approximation.
Home Health System Deployment Experience for Geriatric Care Remote Monitoring
To support aging-in-place, adult children often provide care to their aging parents from a distance. These informal caregivers desire plug-and-play remote care solutions for privacy-preserving continuous monitoring that enabling real-time activity monitoring and intuitive, actionable information. This short paper presents insights from three iterations of deployment experience for remote monitoring system and the iterative improvement in hardware, modeling, and user interface guided by the Geriatric 4Ms framework (matters most, mentation, mobility, and medication). An LLM-assisted solution is developed to balance user experience (privacy-preserving, plug-and-play) and system performance.
GenAI-Net: A Generative AI Framework for Automated Biomolecular Network Design
Biomolecular networks underpin emerging technologies in synthetic biology-from robust biomanufacturing and metabolic engineering to smart therapeutics and cell-based diagnostics-and also provide a mechanistic language for understanding complex dynamics in natural and ecological systems. Yet designing chemical reaction networks (CRNs) that implement a desired dynamical function remains largely manual: while a proposed network can be checked by simulation, the reverse problem of discovering a network from a behavioral specification is difficult, requiring substantial human insight to navigate a vast space of topologies and kinetic parameters with nonlinear and possibly stochastic dynamics. Here we introduce GenAI-Net, a generative AI framework that automates CRN design by coupling an agent that proposes reactions to simulation-based evaluation defined by a user-specified objective. GenAI-Net efficiently produces novel, topologically diverse solutions across multiple design tasks, including dose responses, complex logic gates, classifiers, oscillators, and robust perfect adaptation in deterministic and stochastic settings (including noise reduction). By turning specifications into families of circuit candidates and reusable motifs, GenAI-Net provides a general route to programmable biomolecular circuit design and accelerates the translation from desired function to implementable mechanisms.
Invited: Toward Sustainable and Transparent Benchmarking for Academic Physical Design Research SP
This paper presents RosettaStone 2.0, an open benchmark translation and evaluation framework built on OpenROAD-Research. RosettaStone 2.0 provides complete RTL-to-GDS reference flows for both conventional 2D designs and Pin-3D-style face-to-face (F2F) hybrid-bonded 3D designs, enabling rigorous apples-to-apples comparison across planar and three-dimensional implementation settings. The framework is integrated within OpenROAD-flow-scripts (ORFS)-Research; it incorporates continuous integration (CI)-based regression testing and provides a standardized evaluation pipeline based on the METRICS2.1 convention, with structured logs and reports generated by ORFS-Research. To support transparent and reproducible research, RosettaStone 2.0 further provides a community-facing leaderboard, which is governed by verified pull requests and enforced through Developer Certificate of Origin (DCO) compliance.
comment: ISPD 26 (Invited paper), 9 pages, 11 figures, 5 tables
Robust Output Regulation of Uncertain Linear Time-Varying Systems
Robust output regulation for linear time-varying systems has remained an open problem for decades. To address this, we propose intrinsic system immersion by reformulating the regulator equation in a more insightful form, indicating that finding an internal model is equivalent to reproducing the output trajectory of a forced system by constructing an unforced system. This perspective reveals the influence of parametric uncertainties, demonstrating that an infinite-dimensional controller is generally unavoidable for robustness against plant uncertainty. Consequently, a general robust design is proposed without explicitly solving the regulator equation. It ensures robustness against uncertainties in the exosystem interaction, and achieves approximate output regulation when an infinite-dimensional controller is necessary for regulation. Additionally, we study the regulator equation in a coordinate-free framework, extend the time-varying non-resonance condition, and provide a method to minimize the dimension of an internal model. Overall, these results provide a general systematic framework for constructing robust internal model-based controllers, and simplify the control implementation process.
A new approach for combined model class selection and parameters learning for auto-regressive neural models
This work introduces a novel approach for the joint selection of model structure and parameter learning for nonlinear dynamical systems identification. Focusing on a specific Recurrent Neural Networks (RNNs) family, i.e., Nonlinear Auto-Regressive with eXogenous inputs Echo State Networks (NARXESNs), the method allows to simultaneously select the optimal model class and learn model parameters from data through a new set-membership (SM) based procedure. The results show the effectiveness of the approach in identifying parsimonious yet accurate models suitable for control applications. Moreover, the proposed framework enables a robust training strategy that explicitly accounts for bounded measurement noise and enhances model robustness by allowing data-consistent evaluation of simulation performance during parameter learning, a process generally NP-hard for models with autoregressive components.
Laboratory and field testing of a residential heat pump retrofit for a DC solar nanogrid
Residential buildings are increasingly integrating large devices that run natively on direct current (DC), such as solar photovoltaics, electric vehicles, stationary batteries, and DC motors that drive heat pumps and other major appliances. Today, these natively-DC devices typically connect within buildings through alternating current (AC) distribution systems, entailing significant energy losses due to conversions between AC and DC. This paper investigates the alternative of connecting DC devices through DC distribution. Specifically, this paper shows through laboratory and field experiments that an off-the-shelf residential heat pump designed for conventional AC systems can be powered directly on DC with few hardware modifications and little change in performance. Supporting simulations of a DC nanogrid including {historical heat pump and rest-of-house load measurements,} a solar photovoltaic array, and a stationary battery suggest that connecting these devices through DC distribution could decrease annual electricity bills by 12.5% with an after-market AC-to-DC heat pump retrofit and by 16.7% with a heat pump designed to run on DC.
Stabilizing Policy Gradient Methods via Reward Profiling
Policy gradient methods, which have been extensively studied in the last decade, offer an effective and efficient framework for reinforcement learning problems. However, their performances can often be unsatisfactory, suffering from unreliable reward improvements and slow convergence, due to high variance in gradient estimations. In this paper, we propose a universal reward profiling framework that can be seamlessly integrated with any policy gradient algorithm, where we selectively update the policy based on high-confidence performance estimations. We theoretically justify that our technique will not slow down the convergence of the baseline policy gradient methods, but with high probability, will result in stable and monotonic improvements of their performance. Empirically, on eight continuous-control benchmarks (Box2D and MuJoCo/PyBullet), our profiling yields up to 1.5x faster convergence to near-optimal returns, up to 1.75x reduction in return variance on some setups. Our profiling approach offers a general, theoretically grounded path to more reliable and efficient policy learning in complex environments.
Cyberattack Detection in Virtualized Microgrids Using LightGBM and Knowledge-Distilled Classifiers
Modern microgrids depend on distributed sensing and communication interfaces, making them increasingly vulnerable to cyber physical disturbances that threaten operational continuity and equipment safety. In this work, a complete virtual microgrid was designed and implemented in MATLAB/Simulink, integrating heterogeneous renewable sources and secondary controller layers. A structured cyberattack framework was developed using MGLib to inject adversarial signals directly into the secondary control pathways. Multiple attack classes were emulated, including ramp, sinusoidal, additive, coordinated stealth, and denial of service behaviors. The virtual environment was used to generate labeled datasets under both normal and attack conditions. The datasets trained Light Gradient Boosting Machine (LightGBM) models to perform two functions: detecting the presence of an intrusion (binary) and distinguishing among attack types (multiclass). The multiclass model attained 99.72% accuracy and a 99.62% F1 score, while the binary model attained 94.8% accuracy and a 94.3% F1 score. A knowledge-distillation step reduced the size of the multiclass model, allowing faster predictions with only a small drop in performance. Real-time tests showed a processing delay of about 54 to 67 ms per 1000 samples, demonstrating suitability for CPU-based edge deployment in microgrid controllers. The results confirm that lightweight machine learning based intrusion detection methods can provide fast, accurate, and efficient cyberattack detection without relying on complex deep learning models. Key contributions include: (1) development of a complete MATLAB-based virtual microgrid, (2) structured attack injection at the control layer, (3) creation of multiclass labeled datasets, and (4) design of low-cost AI models suitable for practical microgrid cybersecurity.
comment: 13 pages
Extragradient methods with complexity guarantees for hierarchical variational inequalities
In the framework of a real Hilbert space we consider the problem of approaching solutions to a class of hierarchical variational inequality problems, subsuming several other problem classes including certain mathematical programs under equilibrium constraints, constrained min-max problems, hierarchical game problems, optimal control under VI constraints, and simple bilevel optimization problems. For this general problem formulation, we establish rates of convergence in terms of suitably constructed gap functions, measuring feasibility gaps and optimality gaps. We present worst-case iteration complexity results on both levels of the variational problem, as well as weak convergence under a geometric weak sharpness condition on the lower level solution set. Our results match and improve the state of the art in terms of their iteration complexity and the generality of the problem formulation.
comment: Revised and extended version
Consumer-based Carbon Costs: Integrating Consumer Carbon Preferences in Electricity Markets
An increasing share of consumers care about the carbon footprint of their electricity. This paper analyzes a method to integrate consumer carbon preferences in the electricity market-clearing by introducing consumer-based carbon costs and a carbon allocation mechanism. Specifically, consumers submit not only bids for power but also assign a cost to the carbon emissions incurred by their electricity use. The carbon allocation mechanism then assigns emissions from generation to consumers to minimize overall carbon costs. Our analysis starts from a previously proposed centralized market clearing formulation that maximizes social welfare under consideration of generation costs, consumer utility, and consumer carbon costs. We then derive an equivalent equilibrium formulation that incorporates a carbon allocation problem and gives rise to a set of carbon-adjusted electricity prices for both consumers and generators. We prove that the carbon-adjusted prices are higher for low-emitting generators and consumers with high carbon costs. Further, we prove that this new paradigm satisfies the same desirable market properties as standard electricity markets based on locational marginal prices, namely revenue adequacy and individual rationality, and demonstrate that a carbon tax on generators is equivalent to imposing a uniform carbon cost on consumers. Using a simplified three-bus system and the RTS-GMLC system, we illustrate that consumer-based carbon costs contribute to greener electricity market clearing both through generation redispatch and demand reductions.
comment: 13 pages, 5 figures
Robotics
AnyView: Synthesizing Any Novel View in Dynamic Scenes
Modern generative video models excel at producing convincing, high-quality outputs, but struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world environments. In this work, we introduce \textbf{AnyView}, a diffusion-based video generation framework for \emph{dynamic view synthesis} with minimal inductive biases or geometric assumptions. We leverage multiple data sources with various levels of supervision, including monocular (2D), multi-view static (3D) and multi-view dynamic (4D) datasets, to train a generalist spatiotemporal implicit representation capable of producing zero-shot novel videos from arbitrary camera locations and trajectories. We evaluate AnyView on standard benchmarks, showing competitive results with the current state of the art, and propose \textbf{AnyViewBench}, a challenging new benchmark tailored towards \emph{extreme} dynamic view synthesis in diverse real-world scenarios. In this more dramatic setting, we find that most baselines drastically degrade in performance, as they require significant overlap between viewpoints, while AnyView maintains the ability to produce realistic, plausible, and spatiotemporally consistent videos when prompted from \emph{any} viewpoint. Results, data, code, and models can be viewed at: https://tri-ml.github.io/AnyView/
comment: Project webpage: https://tri-ml.github.io/AnyView/
GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss
Transformer-based general visual geometry frameworks have shown promising performance in camera pose estimation and 3D scene understanding. Recent advancements in Visual Geometry Grounded Transformer (VGGT) models have shown great promise in camera pose estimation and 3D reconstruction. However, these models typically rely on ground truth labels for training, posing challenges when adapting to unlabeled and unseen scenes. In this paper, we propose a self-supervised framework to train VGGT with unlabeled data, thereby enhancing its localization capability in large-scale environments. To achieve this, we extend conventional pair-wise relations to sequence-wise geometric constraints for self-supervised learning. Specifically, in each sequence, we sample multiple source frames and geometrically project them onto different target frames, which improves temporal feature consistency. We formulate physical photometric consistency and geometric constraints as a joint optimization loss to circumvent the requirement for hard labels. By training the model with this proposed method, not only the local and global cross-view attention layers but also the camera and depth heads can effectively capture the underlying multi-view geometry. Experiments demonstrate that the model converges within hundreds of iterations and achieves significant improvements in large-scale localization. Our code will be released at https://github.com/X-yangfan/GPA-VGGT.
A Multimodal Data Collection Framework for Dialogue-Driven Assistive Robotics to Clarify Ambiguities: A Wizard-of-Oz Pilot Study
Integrated control of wheelchairs and wheelchair-mounted robotic arms (WMRAs) has strong potential to increase independence for users with severe motor limitations, yet existing interfaces often lack the flexibility needed for intuitive assistive interaction. Although data-driven AI methods show promise, progress is limited by the lack of multimodal datasets that capture natural Human-Robot Interaction (HRI), particularly conversational ambiguity in dialogue-driven control. To address this gap, we propose a multimodal data collection framework that employs a dialogue-based interaction protocol and a two-room Wizard-of-Oz (WoZ) setup to simulate robot autonomy while eliciting natural user behavior. The framework records five synchronized modalities: RGB-D video, conversational audio, inertial measurement unit (IMU) signals, end-effector Cartesian pose, and whole-body joint states across five assistive tasks. Using this framework, we collected a pilot dataset of 53 trials from five participants and validated its quality through motion smoothness analysis and user feedback. The results show that the framework effectively captures diverse ambiguity types and supports natural dialogue-driven interaction, demonstrating its suitability for scaling to a larger dataset for learning, benchmarking, and evaluation of ambiguity-aware assistive control.
Boosting Deep Reinforcement Learning with Semantic Knowledge for Robotic Manipulators
Deep Reinforcement Learning (DRL) is a powerful framework for solving complex sequential decision-making problems, particularly in robotic control. However, its practical deployment is often hindered by the substantial amount of experience required for learning, which results in high computational and time costs. In this work, we propose a novel integration of DRL with semantic knowledge in the form of Knowledge Graph Embeddings (KGEs), aiming to enhance learning efficiency by providing contextual information to the agent. Our architecture combines KGEs with visual observations, enabling the agent to exploit environmental knowledge during training. Experimental validation with robotic manipulators in environments featuring both fixed and randomized target attributes demonstrates that our method achieves up to {60}{\%} reduction in learning time and improves task accuracy by approximately 15 percentage points, without increasing training time or computational complexity. These results highlight the potential of semantic knowledge to reduce sample complexity and improve the effectiveness of DRL in robotic applications.
An Efficient Insect-inspired Approach for Visual Point-goal Navigation
In this work we develop a novel insect-inspired agent for visual point-goal navigation. This combines abstracted models of two insect brain structures that have been implicated, respectively, in associative learning and path integration. We draw an analogy between the formal benchmark of the Habitat point-goal navigation task and the ability of insects to learn and refine visually guided paths around obstacles between a discovered food location and their nest. We demonstrate that the simple insect-inspired agent exhibits performance comparable to recent SOTA models at many orders of magnitude less computational cost. Testing in a more realistic simulated environment shows the approach is robust to perturbations.
A Feature Extraction Pipeline for Enhancing Lightweight Neural Networks in sEMG-based Joint Torque Estimation
Robot-assisted rehabilitation offers an effective approach, wherein exoskeletons adapt to users' needs and provide personalized assistance. However, to deliver such assistance, accurate prediction of the user's joint torques is essential. In this work, we propose a feature extraction pipeline using 8-channel surface electromyography (sEMG) signals to predict elbow and shoulder joint torques. For preliminary evaluation, this pipeline was integrated into two neural network models: the Multilayer Perceptron (MLP) and the Temporal Convolutional Network (TCN). Data were collected from a single subject performing elbow and shoulder movements under three load conditions (0 kg, 1.10 kg, and 1.85 kg) using three motion-capture cameras. Reference torques were estimated from center-of-mass kinematics under the assumption of static equilibrium. Our offline analyses showed that, with our feature extraction pipeline, MLP model achieved mean RMSE of 0.963 N m, 1.403 N m, and 1.434 N m (over five seeds) for elbow, front-shoulder, and side-shoulder joints, respectively, which were comparable to the TCN performance. These results demonstrate that the proposed feature extraction pipeline enables a simple MLP to achieve performance comparable to that of a network designed explicitly for temporal dependencies. This finding is particularly relevant for applications with limited training data, a common scenario patient care.
Creating a biologically more accurate spider robot to study active vibration sensing
Orb-weaving spiders detect prey on a web using vibration sensors at leg joints. They often dynamically crouch their legs during prey sensing, likely an active sensing strategy. However, how leg crouching enhances sensing is poorly understood, because measuring system vibrations in behaving animals is difficult. We use robophysical modeling to study this problem. Our previous spider robot had only four legs, simplified leg morphology, and a shallow crouching range of motion. Here, we developed a new spider robot, with eight legs, each with four joints that better approximated spider leg morphology. Leg exoskeletons were 3-D printed and joint stiffness was tuned using integrated silicone molding with variable materials and geometry. Tendon-driven actuation allowed a motor in the body to crouch all eight legs deeply as spiders do, while accelerometers at leg joints record leg vibrations. Experiments showed that our new spider robot reproduced key vibration features observed in the previous robot while improving biological accuracy. Our new robot provides a biologically more accurate robophysical model for studying how leg behaviors modulate vibration sensing on a web.
comment: 8 pages, 12 figures
Adaptive Reinforcement and Model Predictive Control Switching for Safe Human-Robot Cooperative Navigation
This paper addresses the challenge of human-guided navigation for mobile collaborative robots under simultaneous proximity regulation and safety constraints. We introduce Adaptive Reinforcement and Model Predictive Control Switching (ARMS), a hybrid learning-control framework that integrates a reinforcement learning follower trained with Proximal Policy Optimization (PPO) and an analytical one-step Model Predictive Control (MPC) formulated as a quadratic program safety filter. To enable robust perception under partial observability and non-stationary human motion, ARMS employs a decoupled sensing architecture with a Long Short-Term Memory (LSTM) temporal encoder for the human-robot relative state and a spatial encoder for 360-degree LiDAR scans. The core contribution is a learned adaptive neural switcher that performs context-aware soft action fusion between the two controllers, favoring conservative, constraint-aware QP-based control in low-risk regions while progressively shifting control authority to the learned follower in highly cluttered or constrained scenarios where maneuverability is critical, and reverting to the follower action when the QP becomes infeasible. Extensive evaluations against Pure Pursuit, Dynamic Window Approach (DWA), and an RL-only baseline demonstrate that ARMS achieves an 82.5 percent success rate in highly cluttered environments, outperforming DWA and RL-only approaches by 7.1 percent and 3.1 percent, respectively, while reducing average computational latency by 33 percent to 5.2 milliseconds compared to a multi-step MPC baseline. Additional simulation transfer in Gazebo and initial real-world deployment results further indicate the practicality and robustness of ARMS for safe and efficient human-robot collaboration. Source code and a demonstration video are available at https://github.com/21ning/ARMS.git.
Sim-to-Real Transfer via a Style-Identified Cycle Consistent Generative Adversarial Network: Zero-Shot Deployment on Robotic Manipulators through Visual Domain Adaptation
The sample efficiency challenge in Deep Reinforcement Learning (DRL) compromises its industrial adoption due to the high cost and time demands of real-world training. Virtual environments offer a cost-effective alternative for training DRL agents, but the transfer of learned policies to real setups is hindered by the sim-to-real gap. Achieving zero-shot transfer, where agents perform directly in real environments without additional tuning, is particularly desirable for its efficiency and practical value. This work proposes a novel domain adaptation approach relying on a Style-Identified Cycle Consistent Generative Adversarial Network (StyleID-CycleGAN or SICGAN), an original Cycle Consistent Generative Adversarial Network (CycleGAN) based model. SICGAN translates raw virtual observations into real-synthetic images, creating a hybrid domain for training DRL agents that combines virtual dynamics with real-like visual inputs. Following virtual training, the agent can be directly deployed, bypassing the need for real-world training. The pipeline is validated with two distinct industrial robots in the approaching phase of a pick-and-place operation. In virtual environments agents achieve success rates of 90 to 100\%, and real-world deployment confirms robust zero-shot transfer (i.e., without additional training in the physical environment) with accuracies above 95\% for most workspace regions. We use augmented reality targets to improve the evaluation process efficiency, and experimentally demonstrate that the agent successfully generalizes to real objects of varying colors and shapes, including LEGO\textsuperscript{\textregistered}~cubes and a mug. These results establish the proposed pipeline as an efficient, scalable solution to the sim-to-real problem.
ReViP: Reducing False Completion in Vision-Language-Action Models with Vision-Proprioception Rebalance
Vision-Language-Action (VLA) models have advanced robotic manipulation by combining vision, language, and proprioception to predict actions. However, previous methods fuse proprioceptive signals directly with VLM-encoded vision-language features, resulting in state-dominant bias and false completions despite visible execution failures. We attribute this to modality imbalance, where policies over-rely on internal state while underusing visual evidence. To address this, we present ReViP, a novel VLA framework with Vision-Proprioception Rebalance to enhance visual grounding and robustness under perturbations. The key insight is to introduce auxiliary task-aware environment priors to adaptively modulate the coupling between semantic perception and proprioceptive dynamics. Specifically, we use an external VLM as a task-stage observer to extract real-time task-centric visual cues from visual observations, which drive a Vision-Proprioception Feature-wise Linear Modulation to enhance environmental awareness and reduce state-driven errors. Moreover, to evaluate false completion, we propose the first False-Completion Benchmark Suite built on LIBERO with controlled settings such as Object-Drop. Extensive experiments show that ReViP effectively reduces false-completion rates and improves success rates over strong VLA baselines on our suite, with gains extending to LIBERO, RoboTwin 2.0, and real-world evaluations.
A Unified Calibration Framework for High-Accuracy Articulated Robot Kinematics
Researchers have identified various sources of tool positioning errors for articulated industrial robots and have proposed dedicated compensation strategies. However, these typically require individual, specialized experiments with separate models and identification procedures. This article presents a unified approach to the static calibration of industrial robots that identifies a robot model, including geometric and non-geometric effects (compliant bending, thermal deformation, gear transmission errors), using only a single, straightforward experiment for data collection. The model augments the kinematic chain with virtual joints for each modeled effect and realizes the identification using Gauss-Newton optimization with analytic gradients. Fisher information spectra show that the estimation is well-conditioned and the parameterization near-minimal, whereas systematic temporal cross-validation and model ablations demonstrate robustness of the model identification. The resulting model is very accurate and its identification robust, achieving a mean position error of 26.8 $μm$ on a KUKA KR30 industrial robot compared to 102.3 $μm$ for purely geometric calibration.
Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab
We present a reproducible benchmark for evaluating sim-to-real transfer of Multi-Agent Reinforcement Learning (MARL) policies for Connected and Automated Vehicles (CAVs). The platform, based on the Cyber-Physical Mobility Lab (CPM Lab) [1], integrates simulation, a high-fidelity digital twin, and a physical testbed, enabling structured zero-shot evaluation of MARL motion-planning policies. We demonstrate its use by deploying a SigmaRL-trained policy [2] across all three domains, revealing two complementary sources of performance degradation: architectural differences between simulation and hardware control stacks, and the sim-to-real gap induced by increasing environmental realism. The open-source setup enables systematic analysis of sim-to-real challenges in MARL under realistic, reproducible conditions.
RENEW: Risk- and Energy-Aware Navigation in Dynamic Waterways AAAI 2026
We present RENEW, a global path planner for Autonomous Surface Vehicle (ASV) in dynamic environments with external disturbances (e.g., water currents). RENEW introduces a unified risk- and energy-aware strategy that ensures safety by dynamically identifying non-navigable regions and enforcing adaptive safety constraints. Inspired by maritime contingency planning, it employs a best-effort strategy to maintain control under adverse conditions. The hierarchical architecture combines high-level constrained triangulation for topological diversity with low-level trajectory optimization within safe corridors. Validated with real-world ocean data, RENEW is the first framework to jointly address adaptive non-navigability and topological path diversity for robust maritime navigation.
comment: 9 pages, 10 figure, 4 tables, AAAI 2026 (main track; oral acceptance)
Reinforcement Learning-Based Energy-Aware Coverage Path Planning for Precision Agriculture
Coverage Path Planning (CPP) is a fundamental capability for agricultural robots; however, existing solutions often overlook energy constraints, resulting in incomplete operations in large-scale or resource-limited environments. This paper proposes an energy-aware CPP framework grounded in Soft Actor-Critic (SAC) reinforcement learning, designed for grid-based environments with obstacles and charging stations. To enable robust and adaptive decision-making under energy limitations, the framework integrates Convolutional Neural Networks (CNNs) for spatial feature extraction and Long Short-Term Memory (LSTM) networks for temporal dynamics. A dedicated reward function is designed to jointly optimize coverage efficiency, energy consumption, and return-to-base constraints. Experimental results demonstrate that the proposed approach consistently achieves over 90% coverage while ensuring energy safety, outperforming traditional heuristic algorithms such as Rapidly-exploring Random Tree (RRT), Particle Swarm Optimization (PSO), and Ant Colony Optimization (ACO) baselines by 13.4-19.5% in coverage and reducing constraint violations by 59.9-88.3%. These findings validate the proposed SAC-based framework as an effective and scalable solution for energy-constrained CPP in agricultural robotics.
comment: Accepted by RACS '25: International Conference on Research in Adaptive and Convergent Systems, November 16-19, 2025, Ho Chi Minh, Vietnam. 10 pages, 5 figures
GNSS-based Lunar Orbit and Clock Estimation With Stochastic Cloning UD Filter
This paper presents a terrestrial GNSS-based orbit and clock estimation framework for lunar navigation satellites. To enable high-precision estimation under the low-observability conditions encountered at lunar distances, we develop a stochastic-cloning UD-factorized filter and delayed-state smoother that provide enhanced numerical stability when processing precise time-differenced carrier phase (TDCP) measurements. A comprehensive dynamics and measurement model is formulated, explicitly accounting for relativistic coupling between orbital and clock states, lunar time-scale transformations, and signal propagation delays including ionospheric, plasmaspheric, and Shapiro effects. The proposed approach is evaluated using high-fidelity Monte-Carlo simulations incorporating realistic multi-constellation GNSS geometry, broadcast ephemeris errors, lunar satellite dynamics, and ionospheric and plasmaspheric delay computed from empirical electron density models. Simulation results demonstrate that combining ionosphere-free pseudorange and TDCP measurements achieves meter-level orbit accuracy and sub-millimeter-per-second velocity accuracy, satisfying the stringent signal-in-space error requirements of future Lunar Augmented Navigation Services (LANS).
comment: Submitted to the Journal of Guidance, Control, and Dynamics
Real-Time, Energy-Efficient, Sampling-Based Optimal Control via FPGA Acceleration
Autonomous mobile robots (AMRs), used for search-and-rescue and remote exploration, require fast and robust planning and control schemes. Sampling-based approaches for Model Predictive Control, especially approaches based on the Model Predictive Path Integral Control (MPPI) algorithm, have recently proven both to be highly effective for such applications and to map naturally to GPUs for hardware acceleration. However, both GPU and CPU implementations of such algorithms can struggle to meet tight energy and latency budgets on battery-constrained AMR platforms that leverage embedded compute. To address this issue, we present an FPGA-optimized MPPI design that exposes fine-grained parallelism and eliminates synchronization bottlenecks via deep pipelining and parallelism across algorithmic stages. This results in an average 3.1x to 7.5x speedup over optimized implementations on an embedded GPU and CPU, respectively, while simultaneously achieving a 2.5x to 5.4x reduction in energy usage. These results demonstrate that FPGA architectures are a promising direction for energy-efficient and high-performance edge robotics.
comment: 8 pages, 5 figures
Hierarchical Informative Path Planning via Graph Guidance and Trajectory Optimization
We study informative path planning (IPP) with travel budgets in cluttered environments, where an agent collects measurements of a latent field modeled as a Gaussian process (GP) to reduce uncertainty at target locations. Graph-based solvers provide global guarantees but assume pre-selected measurement locations, while continuous trajectory optimization supports path-based sensing but is computationally intensive and sensitive to initialization in obstacle-dense settings. We propose a hierarchical framework with three stages: (i) graph-based global planning, (ii) segment-wise budget allocation using geometric and kernel bounds, and (iii) spline-based refinement of each segment with hard constraints and obstacle pruning. By combining global guidance with local refinement, our method achieves lower posterior uncertainty than graph-only and continuous baselines, while running faster than continuous-space solvers (up to 9x faster than gradient-based methods and 20x faster than black-box optimizers) across synthetic cluttered environments and Arctic datasets.
Advancing Improvisation in Human-Robot Construction Collaboration: Taxonomy and Research Roadmap
The construction industry faces productivity stagnation, skilled labor shortages, and safety concerns. While robotic automation offers solutions, construction robots struggle to adapt to unstructured, dynamic sites. Central to this is improvisation, adapting to unexpected situations through creative problem-solving, which remains predominantly human. In construction's unpredictable environments, collaborative human-robot improvisation is essential for workflow continuity. This research develops a six-level taxonomy classifying human-robot collaboration (HRC) based on improvisation capabilities. Through systematic review of 214 articles (2010-2025), we categorize construction robotics across: Manual Work (Level 0), Human-Controlled Execution (Level 1), Adaptive Manipulation (Level 2), Imitation Learning (Level 3), Human-in-Loop BIM Workflow (Level 4), Cloud-Based Knowledge Integration (Level 5), and True Collaborative Improvisation (Level 6). Analysis reveals current research concentrates at lower levels, with critical gaps in experiential learning and limited progression toward collaborative improvisation. A five-dimensional radar framework illustrates progressive evolution of Planning, Cognitive Role, Physical Execution, Learning Capability, and Improvisation, demonstrating how complementary human-robot capabilities create team performance exceeding individual contributions. The research identifies three fundamental barriers: technical limitations in grounding and dialogic reasoning, conceptual gaps between human improvisation and robotics research, and methodological challenges. We recommend future research emphasizing improved human-robot communication via Augmented/Virtual Reality interfaces, large language model integration, and cloud-based knowledge systems to advance toward true collaborative improvisation.
comment: 73 pages, 8 figures
Autonomous Mars Rover Module for Soil Sampling and Life Component Analysis
The search for extraterrestrial life has long been a primary focus of scientific exploration, driven by rapid advancements in technology and our understanding of the universe. The discovery of water on Mars has sparked significant interest, raising the question of whether life could exist on the planet. This study proposes a novel approach to simulate and illustrate the detection of life using a proof-of-life module integrated into a Mars rover. The module is an autonomous system capable of traveling to designated regions, excavating soil, collecting samples, and performing biochemical testing onboard the rover itself. The project is inherently multidisciplinary, integrating mechanical systems such as a drill mechanism and a vacuum system, alongside biochemical analysis for soil testing. The module is capable of successfully detecting the presence or absence of living components of life from the collected soil particles. This proof-of-life module serves as a proof-of-concept for autonomous life detection in extraterrestrial environments and lays the foundation for future exploration missions.
comment: 9 pages, 12 figures
ConceptACT: Episode-Level Concepts for Sample-Efficient Robotic Imitation Learning
Imitation learning enables robots to acquire complex manipulation skills from human demonstrations, but current methods rely solely on low-level sensorimotor data while ignoring the rich semantic knowledge humans naturally possess about tasks. We present ConceptACT, an extension of Action Chunking with Transformers that leverages episode-level semantic concept annotations during training to improve learning efficiency. Unlike language-conditioned approaches that require semantic input at deployment, ConceptACT uses human-provided concepts (object properties, spatial relationships, task constraints) exclusively during demonstration collection, adding minimal annotation burden. We integrate concepts using a modified transformer architecture in which the final encoder layer implements concept-aware cross-attention, supervised to align with human annotations. Through experiments on two robotic manipulation tasks with logical constraints, we demonstrate that ConceptACT converges faster and achieves superior sample efficiency compared to standard ACT. Crucially, we show that architectural integration through attention mechanisms significantly outperforms naive auxiliary prediction losses or language-conditioned models. These results demonstrate that properly integrated semantic supervision provides powerful inductive biases for more efficient robot learning.
Acoustic Field Video for Multimodal Scene Understanding
We introduce and explore a new multimodal input representation for vision-language models: acoustic field video. Unlike conventional video (RGB with stereo/mono audio), our video stream provides a spatially grounded visualization of sound intensity across a scene, offering a new and powerful dimension of perceptual understanding. Our real-time pipeline uses low-cost beamforming microphone arrays that are already common in smart speakers and increasingly present in robotics and XR headsets, yet this sensing capability remains unutilized for scene understanding. To assess the value of spatial acoustic information, we constructed an evaluation set of 402 question-answer scenes, comparing a state-of-the-art VLM given conventional video with and without paired acoustic field video. Results show a clear and consistent improvement when incorporating spatial acoustic data; the VLM we test improves from 38.3% correct to 67.4%. Our findings highlight that many everyday scene understanding tasks remain underconstrained when relying solely on visual and audio input, and that acoustic field data provides a promising and practical direction for multimodal reasoning. A video demo is available at https://daehwakim.com/seeingsound
MapAnything: Universal Feed-Forward Metric 3D Reconstruction 3DV 2026
We introduce MapAnything, a unified transformer-based feed-forward model that ingests one or more images along with optional geometric inputs such as camera intrinsics, poses, depth, or partial reconstructions, and then directly regresses the metric 3D scene geometry and cameras. MapAnything leverages a factored representation of multi-view scene geometry, i.e., a collection of depth maps, local ray maps, camera poses, and a metric scale factor that effectively upgrades local reconstructions into a globally consistent metric frame. Standardizing the supervision and training across diverse datasets, along with flexible input augmentation, enables MapAnything to address a broad range of 3D vision tasks in a single feed-forward pass, including uncalibrated structure-from-motion, calibrated multi-view stereo, monocular depth estimation, camera localization, depth completion, and more. We provide extensive experimental analyses and model ablations demonstrating that MapAnything outperforms or matches specialist feed-forward models while offering more efficient joint training behavior, thus paving the way toward a universal 3D reconstruction backbone.
comment: 3DV 2026. Project Page: https://map-anything.github.io/
Q-learning with Adjoint Matching
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
comment: 32 pages, 8 figures, 7 tables
Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation
A key challenge in contact-rich dexterous manipulation is the need to jointly reason over geometry, kinematic constraints, and intricate, nonsmooth contact dynamics. End-to-end visuomotor policies bypass this structure, but often require large amounts of data, transfer poorly from simulation to reality, and generalize weakly across tasks/embodiments. We address those limitations by leveraging a simple insight: dexterous manipulation is inherently hierarchical - at a high level, a robot decides where to touch (geometry) and move the object (kinematics); at a low level it determines how to realize that plan through contact dynamics. Building on this insight, we propose a hierarchical RL--MPC framework in which a high-level reinforcement learning (RL) policy predicts a contact intention, a novel object-centric interface that specifies (i) an object-surface contact location and (ii) a post-contact object-level subgoal pose. Conditioned on this contact intention, a low-level contact-implicit model predictive control (MPC) optimizes local contact modes and replans with contact dynamics to generate robot actions that robustly drive the object toward each subgoal. We evaluate the framework on non-prehensile tasks, including geometry-generalized pushing and object 3D reorientation. It achieves near-100% success with substantially reduced data (10x less than end-to-end baselines), highly robust performance, and zero-shot sim-to-real transfer.
HEIGHT: Heterogeneous Interaction Graph Transformer for Robot Navigation in Crowded and Constrained Environments
We study the problem of robot navigation in dense and interactive crowds with static constraints such as corridors and furniture. Previous methods fail to consider all types of spatial and temporal interactions among agents and obstacles, leading to unsafe and inefficient robot paths. In this article, we leverage a graph-based representation of crowded and constrained scenarios and propose a structured framework to learn robot navigation policies with deep reinforcement learning. We first split the representations of different inputs and propose a heterogeneous spatio-temporal graph to model distinct interactions among humans, robots, and obstacles. Based on the heterogeneous spatio-temporal graph, we propose HEIGHT, a novel navigation policy network architecture with different components to capture heterogeneous interactions through space and time. HEIGHT utilizes attention mechanisms to prioritize important interactions and a recurrent network to track changes in the dynamic scene over time, encouraging the robot to avoid collisions adaptively. Through extensive simulation and real-world experiments, we demonstrate that HEIGHT outperforms state-of-the-art baselines in terms of success, navigation time, and generalization to domain shifts in challenging navigation scenarios. More information is available at https://sites.google.com/view/crowdnav-height/home.
comment: Accepted to IEEE Transactions of Automation Science and Engineering (T-ASE)
DAVOS: An Autonomous Vehicle Operating System in the Vehicle Computing Era
Vehicle computing represents a fundamental shift in how autonomous vehicles are designed and deployed, transforming them from isolated transportation systems into mobile computing platforms that support both safety-critical, real-time driving and data-centric services. In this setting, vehicles simultaneously support real-time driving pipelines and a growing set of data-driven applications, placing increased responsibility on the vehicle operating system to coordinate computation, data movement, storage, and access. These demands highlight recurring system considerations related to predictable execution, data and execution protection, efficient handling of high-rate sensor data, and long-term system evolvability, commonly summarized as Safety, Security, Efficiency, and Extensibility (SSEE). Existing vehicle operating systems and runtimes address these concerns in isolation, resulting in fragmented software stacks that limit coordination between autonomy workloads and vehicle data services. This paper presents DAVOS, the Dependable Autonomous Vehicle Operating System, a unified vehicle operating system architecture designed for the vehicle computing context. DAVOS provides a cohesive operating system foundation that supports both real-time autonomy and extensible vehicle computing within a single system framework.
Collision-Free Humanoid Traversal in Cluttered Indoor Scenes
We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles with diverse spatial layouts and geometries to the corresponding traversal skills. However, the lack of an effective representation that captures humanoid-obstacle relationships during collision avoidance makes directly learning such mappings difficult. We therefore propose Humanoid Potential Field (HumanoidPF), which encodes these relationships as collision-free motion directions, significantly facilitating RL-based traversal skill learning. We also find that HumanoidPF exhibits a surprisingly negligible sim-to-real gap as a perceptual representation. To further enable generalizable traversal skills through diverse and challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system where users could command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method. Demos and code can be found in our website: https://axian12138.github.io/CAT/.
XR$^3$: An Extended Reality Platform for Social-Physical Human-Robot Interaction
Social-physical human-robot interaction (spHRI) is difficult to study: building and programming robots that integrate multiple interaction modalities is costly and slow, while VR-based prototypes often lack physical contact, breaking users' visuo-tactile expectations. We present XR$^3$, a co-located dual-VR-headset platform for HRI research in which an attendee and a hidden operator share the same physical space while experiencing different virtual embodiments. The attendee sees an expressive virtual robot that interacts face-to-face in a shared virtual environment. In real time, the robot's upper-body motion, head and gaze behavior, and facial expressions are mapped from the operator's tracked limbs and face signals. Because the operator is co-present and calibrated in the same coordinate frame, the operator can also touch the attendee, enabling perceived robot touch synchronized with the robot's visible hands. Finger and hand motion is mapped to the robot avatar using inverse kinematics to support precise contact. Beyond motion retargeting, XR$^3$ supports social retargeting of multiple nonverbal cues that can be experimentally varied while keeping physical interaction constant. We detail the system design and calibration, and demonstrate the platform in a touch-based Wizard-of-Oz study, lowering the barrier to prototyping and evaluating embodied, contact-based robot behaviors.
comment: 7 pages, 4 figures
HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV
Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image-conditioned diffusion planner that generates human-aware navigation trajectories directly from RGB imagery. The system combines YOLO-11 based human detection with diffusion-driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real-world indoor mock-disaster scenarios. On a 300-sample test set, the model achieves a mean squared error of 0.02 in pixel-space trajectory reconstruction. Real-world experiments demonstrate an overall mission success rate of 80% across accident-response and search-and-locate tasks with partial occlusions. These results indicate that human-conditioned diffusion planning offers a practical and robust solution for human-aware UAV navigation in time-critical assistance settings.
comment: This paper has been accepted at HRI, Late Breaking Report, 2026
VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs
In most existing embodied navigation tasks, instructions are well-defined and unambiguous, such as instruction following and object searching. Under this idealized setting, agents are required solely to produce effective navigation outputs conditioned on vision and language inputs. However, real-world navigation instructions are often vague and ambiguous, requiring the agent to resolve uncertainty and infer user intent through active dialog. To address this gap, we propose Interactive Instance Goal Navigation (IIGN), a task that requires agents not only to generate navigation actions but also to produce language outputs via active dialog, thereby aligning more closely with practical settings. IIGN extends Instance Goal Navigation (IGN) by allowing agents to freely consult an oracle in natural language while navigating. Building on this task, we present the Vision Language-Language Navigation (VL-LN) benchmark, which provides a large-scale, automatically generated dataset and a comprehensive evaluation protocol for training and assessing dialog-enabled navigation models. VL-LN comprises over 41k long-horizon dialog-augmented trajectories for training and an automatic evaluation protocol with an oracle capable of responding to agent queries. Using this benchmark, we train a navigation model equipped with dialog capabilities and show that it achieves significant improvements over the baselines. Extensive experiments and analyses further demonstrate the effectiveness and reliability of VL-LN for advancing research on dialog-enabled embodied navigation. Code and dataset: https://0309hws.github.io/VL-LN.github.io/
VALISENS: A Validated Innovative Multi-Sensor System for Cooperative Automated Driving
Reliable perception remains a key challenge for Connected Automated Vehicles (CAVs) in complex real-world environments, where varying lighting conditions and adverse weather degrade sensing performance. While existing multi-sensor solutions improve local robustness, they remain constrained by limited sensing range, line-of-sight occlusions, and sensor failures on individual vehicles. This paper introduces VALISENS, a validated cooperative perception system that extends multi-sensor fusion beyond a single vehicle through Vehicle-to-Everything (V2X)-enabled collaboration between Connected Automated Vehicles (CAVs) and intelligent infrastructure. VALISENS integrates onboard and roadside LiDARs, radars, RGB cameras, and thermal cameras within a unified multi-agent perception framework. Thermal cameras enhances the detection of Vulnerable Road Users (VRUs) under challenging lighting conditions, while roadside sensors reduce occlusions and expand the effective perception range. In addition, an integrated sensor monitoring module continuously assesses sensor health and detects anomalies before system degradation occurs. The proposed system is implemented and evaluated in a dedicated real-world testbed. Experimental results show that VALISENS improves pedestrian situational awareness by up to 18% compared with vehicle-only sensing, while the sensor monitoring module achieves over 97% accuracy, demonstrating its effectiveness and its potential to support future Cooperative Intelligent Transport Systems (C-ITS) applications.
comment: 8 pages, 5 figures, submitted to IEEE VNC
CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation
Vision-and-Language Navigation (VLN) requires robots to follow natural language instructions and navigate complex environments without prior maps. While recent vision-language large models demonstrate strong reasoning abilities, they often underperform task-specific panoramic small models in VLN tasks. To address this, we propose CLASH (Collaborative Large-Small Hierarchy), a VLN-CE framework that integrates a reactive small-model planner (RSMP) with a reflective large-model reasoner (RLMR). RSMP adopts a causal-learning-based dual-branch architecture to enhance generalization, while RLMR leverages panoramic visual prompting with chain-of-thought reasoning to support interpretable spatial understanding and navigation. We further introduce an uncertainty-aware collaboration mechanism (UCM) that adaptively fuses decisions from both models. For obstacle avoidance, in simulation, we replace the rule-based controller with a fully learnable point-goal policy, and in real-world deployment, we design a LiDAR-based clustering module for generating navigable waypoints and pair it with an online SLAM-based local controller. CLASH achieves state-of-the-art (SoTA) results (ranking 1-st) on the VLN-CE leaderboard, significantly improving SR and SPL on the test-unseen set over the previous SoTA methods. Real-world experiments demonstrate CLASH's strong robustness, validating its effectiveness in both simulation and deployment scenarios.
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
Achieving human-level performance in Vision-and-Language Navigation (VLN) requires an embodied agent to jointly understand multimodal instructions and visual-spatial context while reasoning over long action sequences. Recent works, such as NavCoT and NavGPT-2, demonstrate the potential of Chain-of-Thought (CoT) reasoning for improving interpretability and long-horizon planning. Moreover, multimodal extensions like OctoNav-R1 and CoT-VLA further validate CoT as a promising pathway toward human-like navigation reasoning. However, existing approaches face critical drawbacks: purely textual CoTs lack spatial grounding and easily overfit to sparse annotated reasoning steps, while multimodal CoTs incur severe token inflation by generating imagined visual observations, making real-time navigation impractical. In this work, we propose FantasyVLN, a unified implicit reasoning framework that preserves the benefits of CoT reasoning without explicit token overhead. Specifically, imagined visual tokens are encoded into a compact latent space using a pretrained Visual AutoRegressor (VAR) during CoT reasoning training, and the model jointly learns from textual, visual, and multimodal CoT modes under a unified multi-CoT strategy. At inference, our model performs direct instruction-to-action mapping while still enjoying reasoning-aware representations. Extensive experiments on LH-VLN show that our approach achieves reasoning-aware yet real-time navigation, improving success rates and efficiency while reducing inference latency by an order of magnitude compared to explicit CoT methods.
Tunable Passivity Control for Centralized Multiport Networked Systems
Centralized Multiport Networked Dynamic (CMND) systems have emerged as a key architecture with applications in several complex network systems, such as multilateral telerobotics and multi-agent control. These systems consist of a hub node/subsystem connecting with multiple remote nodes/subsystems via a networked architecture. One challenge for this system is stability, which can be affected by non-ideal network artifacts. Conventional passivity-based approaches can stabilize the system under specialized applications like small-scale networked systems. However, those conventional passive stabilizers have several restrictions, such as distributing compensation across subsystems in a decentralized manner, limiting flexibility, and, at the same time, relying on the restrictive assumptions of node passivity. This paper synthesizes a centralized optimal passivity-based stabilization framework for CMND systems. It consists of a centralized passivity observer monitoring overall energy flow and an optimal passivity controller that distributes the just-needed dissipation among various nodes, guaranteeing strict passivity and, thus, L2 stability. The proposed data-driven model-free approach, i.e., Tunable Centralized Optimal Passivity Control (TCoPC), optimizes total performance based on the prescribed dissipation distribution strategy while ensuring stability. The controller can put high dissipation loads on some sub-networks while relaxing the dissipation on other nodes. Simulation results demonstrate the proposed frameworks performance in a complex task under different time-varying delay scenarios while relaxing the remote nodes minimum phase and passivity assumption, enhancing the scalability and generalizability.
FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis
Due to the deformability of garments, generating a large amount of high-quality data for robotic garment manipulation tasks is highly challenging. In this paper, we present a synthetic garment dataset that can be used for robotic garment folding. We begin by constructing geometric garment templates based on keypoints and applying generative models to generate realistic texture patterns. Leveraging these keypoint annotations, we generate folding demonstrations in simulation and train folding policies via closed-loop imitation learning. To improve robustness, we propose KG-DAgger, which uses a keypoint-based strategy to generate demonstration data for recovering from failures. KG-DAgger significantly improves the model performance, boosting the real-world success rate by 25\%. After training with 15K trajectories (about 2M image-action pairs), the model achieves a 75\% success rate in the real world. Experiments in both simulation and real-world settings validate the effectiveness of our proposed framework.
Offline Reinforcement Learning using Human-Aligned Reward Labeling for Autonomous Emergency Braking in Occluded Pedestrian Crossing
Effective leveraging of real-world driving datasets is crucial for enhancing the training of autonomous driving systems. While Offline Reinforcement Learning enables training autonomous vehicles with such data, most available datasets lack meaningful reward labels. Reward labeling is essential as it provides feedback for the learning algorithm to distinguish between desirable and undesirable behaviors, thereby improving policy performance. This paper presents a novel approach for generating human-aligned reward labels. The proposed approach addresses the challenge of absent reward signals in the real-world datasets by generating labels that reflect human judgment and safety considerations. The reward function incorporates an adaptive safety component that is activated by analyzing semantic segmentation maps, enabling the autonomous vehicle to prioritize safety over efficiency in potential collision scenarios. The proposed method is applied to an occluded pedestrian crossing scenario with varying pedestrian traffic levels, using simulation data. When the generated rewards were used to train various Offline Reinforcement Learning algorithms, each model produced a meaningful policy, demonstrating the method's viability. In addition, the method was applied to a subset of the Audi Autonomous Driving Dataset, and the reward labels were compared to human-annotated reward labels. The findings show a moderate disparity between the two reward sets, and, most interestingly, the method flagged unsafe states that the human annotator missed.
comment: 39 pages, 14 figures, 1 table
Multiagent Systems
Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation
This paper introduces the N-Way Self-Evaluating Deliberation (NSED) protocol, a Runtime Mixture-of-Models (MoM) architecture that constructs emergent composite models from a plurality of distinct expert agents. Unlike traditional Mixture-of-Experts (MoE) which rely on static gating networks, NSED employs a Dynamic Expertise Broker - a runtime optimization engine that treats model selection as a variation of the Knapsack Problem, binding heterogeneous checkpoints to functional roles based on live telemetry and cost constraints. At the execution layer, we formalize deliberation as a Macro-Scale Recurrent Neural Network (RNN), where the consensus state loops back through a semantic forget gate to enable iterative refinement without proportional VRAM scaling. Key components include an orchestration fabric for trustless N-to-N peer review, a Quadratic Voting activation function for non-linear consensus, and a feedback-driven state update. Empirical validation on challenging benchmarks (AIME 2025, LiveCodeBench) demonstrates that this topology allows ensembles of small (less than 20B) consumer-grade models to match or exceed the performance of state-of-the-art 100B+ parameter models, establishing a new hardware arbitrage efficiency frontier. Furthermore, testing on the DarkBench safety suite reveals intrinsic alignment properties, with peer-mediated correction reducing sycophancy scores below that of any individual agent.
Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability
Agentic systems have transformed how Large Language Models (LLMs) can be leveraged to create autonomous systems with goal-directed behaviors, consisting of multi-step planning and the ability to interact with different environments. These systems differ fundamentally from traditional machine learning models, both in architecture and deployment, introducing unique AI safety challenges, including goal misalignment, compounding decision errors, and coordination risks among interacting agents, that necessitate embedding interpretability and explainability by design to ensure traceability and accountability across their autonomous behaviors. Current interpretability techniques, developed primarily for static models, show limitations when applied to agentic systems. The temporal dynamics, compounding decisions, and context-dependent behaviors of agentic systems demand new analytical approaches. This paper assesses the suitability and limitations of existing interpretability methods in the context of agentic systems, identifying gaps in their capacity to provide meaningful insight into agent decision-making. We propose future directions for developing interpretability techniques specifically designed for agentic systems, pinpointing where interpretability is required to embed oversight mechanisms across the agent lifecycle from goal formation, through environmental interaction, to outcome evaluation. These advances are essential to ensure the safe and accountable deployment of agentic AI systems.
Learning to Collaborate: An Orchestrated-Decentralized Framework for Peer-to-Peer LLM Federation AAAI 2026
Fine-tuning Large Language Models (LLMs) for specialized domains is constrained by a fundamental challenge: the need for diverse, cross-organizational data conflicts with the principles of data privacy and sovereignty. While Federated Learning (FL) provides a framework for collaboration without raw data exchange, its classic centralized form introduces a single point of failure and remains vulnerable to model inversion attacks. Decentralized FL (DFL) mitigates this risk by removing the central aggregator but typically relies on inefficient, random peer-to-peer (P2P) pairings, forming a collaboration graph that is blind to agent heterogeneity and risks negative transfer. This paper introduces KNEXA-FL, a novel framework for orchestrated decentralization that resolves this trade-off. KNEXA-FL employs a non-aggregating Central Profiler/Matchmaker (CPM) that formulates P2P collaboration as a contextual bandit problem, using a LinUCB algorithm on abstract agent profiles to learn an optimal matchmaking policy. It orchestrates direct knowledge exchange between heterogeneous, PEFT-based LLM agents via secure distillation, without ever accessing the models themselves. Our comprehensive experiments on a challenging code generation task show that KNEXA-FL yields substantial gains, improving Pass@1 by approx. 50% relative to random P2P collaboration. Critically, our orchestrated approach demonstrates stable convergence, in stark contrast to a powerful centralized distillation baseline which suffers from catastrophic performance collapse. Our work establishes adaptive, learning-based orchestration as a foundational principle for building robust and effective decentralized AI ecosystems.
comment: Accepted to AAAI 2026. 13 pages, 3 figures, 10 tables. Code available at: https://github.com/FujitsuResearch/knexa-fl
MACTAS: Self-Attention-Based Inter-Agent Communication in Multi-Agent Reinforcement Learning with Action-Value Function Decomposition IJCAI 2026
Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication method that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The method can be seamlessly integrated with any action-value function decomposition algorithm and can be viewed as an orthogonal extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents, which makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps. makes it scalable to large systems. Experimental results on the SMACv2 benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on a number of maps.
comment: Submitted for IJCAI 2026
Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay
Current multi-agent LLM frameworks rely on explicit orchestration patterns borrowed from human organizational structures: planners delegate to executors, managers coordinate workers, and hierarchical control flow governs agent interactions. These approaches suffer from coordination overhead that scales poorly with agent count and task complexity. We propose a fundamentally different paradigm inspired by natural coordination mechanisms: agents operate locally on a shared artifact, guided only by pressure gradients derived from measurable quality signals, with temporal decay preventing premature convergence. We formalize this as optimization over a pressure landscape and prove convergence guarantees under mild conditions. Empirically, on meeting room scheduling across 1,350 trials, pressure-field coordination outperforms all baselines: 48.5% aggregate solve rate versus 12.6% for conversation-based coordination, 1.5% for hierarchical control, and 0.4% for sequential and random baselines (all pairwise comparisons p < 0.001). Temporal decay is essential: disabling it reduces solve rate by 10 percentage points. On easy problems, pressure-field achieves 86.7% solve rate. The approach maintains consistent performance from 1 to 4 agents. Implicit coordination through shared pressure gradients outperforms explicit hierarchical control, suggesting that constraint-driven emergence offers a simpler and more effective foundation for multi-agent AI.
comment: 27 pages, 5 figures, 12 tables. Code available at https://github.com/Govcraft/pressure-field-experiment
Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
The evaluation of Large Language Models (LLMs) remains challenging due to inconsistency, bias, and the absence of transparent decision criteria in automated judging. We present Debate, Deliberate, Decide (D3), a cost-aware, adversarial multi-agent framework that orchestrates structured debate among role-specialized agents (advocates, a judge, and an optional jury) to produce reliable and interpretable evaluations. D3 instantiates two complementary protocols: (1) Multi-Advocate One-Round Evaluation (MORE), which elicits k parallel defenses per answer to amplify signal via diverse advocacy, and (2) Single-Advocate Multi-Round Evaluation (SAMRE) with budgeted stopping, which iteratively refines arguments under an explicit token budget and convergence checks. We develop a probabilistic model of score gaps that (i) characterizes reliability and convergence under iterative debate and (ii) explains the separation gains from parallel advocacy. Under mild assumptions, the posterior distribution of the round-r gap concentrates around the true difference and the probability of mis-ranking vanishes; moreover, aggregating across k advocates provably increases expected score separation. We complement theory with a rigorous experimental suite across MT-Bench, AlignBench, and AUTO-J, showing state-of-the-art agreement with human judgments (accuracy and Cohen's kappa), reduced positional and verbosity biases via anonymization and role diversification, and a favorable cost-accuracy frontier enabled by budgeted stopping. Ablations and qualitative analyses isolate the contributions of debate, aggregation, and anonymity. Together, these results establish D3 as a principled, practical recipe for reliable, interpretable, and cost-aware LLM evaluation.
Capital Games and Growth Equilibria
We introduce capital games, which generalize the definition of standard games to incorporate dynamics. In capital games, payoffs are in units of capital which are not assumed to be units of utility. The dynamics allow us to infer player utilities from their individual payoffs and linearizable capital dynamics under the assumption that players aim to maximize the time-average growth rate of their capital.
Systems and Control (EESS)
Performance of Differential Protection Applied to Collector Cables of Offshore Wind Farms with MMC-HVDC Transmission SP
The ongoing global transition towards low-carbon energy has propelled the integration of offshore wind farms, which, when combined with Modular Multilevel Converter-based High-Voltage Direct Current (MMC-HVDC) transmission, present unique challenges for power system protection. In collector cables connecting wind turbines to offshore MMC, both ends are supplied by Inverter-Based Resources (IBRs), which modify the magnitude and characteristics of fault currents. In this context, this paper investigates the limitations of conventional differential protection schemes under such conditions and compares them with enhanced strategies that account for sequence components. Using electromagnetic transient simulations of a representative offshore wind farm modeled in PSCAD/EMTDC software, internal and external fault scenarios are assessed, varying fault types and resistances. The comparative evaluation provides insights into the sensitivity and selectivity of differential protection and guides a deeper conceptual understanding of the evolving protection challenges inherent to future converter-dominated grids.
comment: Submitted to DPSP Global 2026
Mixture-of-Models: Unifying Heterogeneous Agents via N-Way Self-Evaluating Deliberation
This paper introduces the N-Way Self-Evaluating Deliberation (NSED) protocol, a Runtime Mixture-of-Models (MoM) architecture that constructs emergent composite models from a plurality of distinct expert agents. Unlike traditional Mixture-of-Experts (MoE) which rely on static gating networks, NSED employs a Dynamic Expertise Broker - a runtime optimization engine that treats model selection as a variation of the Knapsack Problem, binding heterogeneous checkpoints to functional roles based on live telemetry and cost constraints. At the execution layer, we formalize deliberation as a Macro-Scale Recurrent Neural Network (RNN), where the consensus state loops back through a semantic forget gate to enable iterative refinement without proportional VRAM scaling. Key components include an orchestration fabric for trustless N-to-N peer review, a Quadratic Voting activation function for non-linear consensus, and a feedback-driven state update. Empirical validation on challenging benchmarks (AIME 2025, LiveCodeBench) demonstrates that this topology allows ensembles of small (less than 20B) consumer-grade models to match or exceed the performance of state-of-the-art 100B+ parameter models, establishing a new hardware arbitrage efficiency frontier. Furthermore, testing on the DarkBench safety suite reveals intrinsic alignment properties, with peer-mediated correction reducing sycophancy scores below that of any individual agent.
Identification of Port-Hamiltonian Differential-Algebraic Equations from Input-Output Data
Many models of physical systems, such as mechanical and electrical networks, exhibit algebraic constraints that arise from subsystem interconnections and underlying physical laws. Such systems are commonly formulated as differential-algebraic equations (DAEs), which describe both the dynamic evolution of system states and the algebraic relations that must hold among them. Within this class, port-Hamiltonian differential-algebraic equations (pH-DAEs) offer a structured, energy-based representation that preserves interconnection and passivity properties. This work introduces a data-driven identification method that combines port-Hamiltonian neural networks (pHNNs) with a differential-algebraic solver to model such constrained systems directly from noisy input-output data. The approach preserves the passivity and interconnection structure of port-Hamiltonian systems while employing a backward Euler discretization with Newton's method to solve the coupled differential and algebraic equations consistently. The performance of the proposed approach is demonstrated on a DC power network, where the identified model accurately captures system behaviour and maintains errors proportional to the noise amplitude, while providing reliable parameter estimates.
comment: Preprint submitted to IFAC World Congress 2026
ReLU Networks for Model Predictive Control: Network Complexity and Performance Guarantees
Recent years have witnessed a resurgence in using ReLU neural networks (NNs) to represent model predictive control (MPC) policies. However, determining the required network complexity to ensure closed-loop performance remains a fundamental open problem. This involves a critical precision-complexity trade-off: undersized networks may fail to capture the MPC policy, while oversized ones may outweigh the benefits of ReLU network approximation. In this work, we propose a projection-based method to enforce hard constraints and establish a state-dependent Lipschitz continuity property for the optimal MPC cost function, which enables sharp convergence analysis of the closed-loop system. For the first time, we derive explicit bounds on ReLU network width and depth for approximating MPC policies with guaranteed closed-loop performance. To further reduce network complexity and enhance closed-loop performance, we propose a non-uniform error framework with a state-aware scaling function to adaptively adjust both the input and output of the ReLU network. Our contributions provide a foundational step toward certifiable ReLU NN-based MPC.
Noise Resilience and Robust Convergence Guarantees for the Variational Quantum Eigensolver
Variational Quantum Algorithms (VQAs) are a class of hybrid quantum-classical algorithms that leverage on classical optimization tools to find the optimal parameters for a parameterized quantum circuit. One relevant application of VQAs is the Variational Quantum Eigensolver (VQE), which aims at steering the output of the quantum circuit to the ground state of a certain Hamiltonian. Recent works have provided global convergence guarantees for VQEs under suitable local surjectivity and smoothness hypotheses, but little has been done in characterizing convergence of these algorithms when the underlying quantum circuit is affected by noise. In this work, we characterize the effect of different coherent and incoherent noise processes on the optimal parameters and the optimal cost of the VQE, and we study their influence on the convergence guarantees of the algorithm. Our work provides novel theoretical insight into the behavior of parameterized quantum circuits. Furthermore, we accompany our results with numerical simulations implemented via Pennylane.
Model Predictive Control for Coupled Adoption-Opinion Dynamics
This paper investigates an optimal control problem for an adoption-opinion model that couples opinion dynamics with a compartmental adoption framework on a multilayer network to study the diffusion of sustainable behaviors. Adoption evolves through social contagion and perceived benefits, while opinions are shaped by social interactions and feedback from adoption levels. Individuals may also stop adopting virtuous behavior due to external constraints or shifting perceptions, affecting overall diffusion. After the stability analysis of equilibria, both in the presence and absence of adopters, we introduce a Model Predictive Control (MPC) framework that optimizes interventions by shaping opinions rather than directly enforcing adoption. This nudge-based control strategy allows policymakers to influence diffusion indirectly, making interventions more effective and scalable. Numerical simulations demonstrate that, in the absence of control, adoption stagnates, whereas MPC-driven interventions sustain and enhance adoption across communities.
comment: 6 pages, 4 figures
On a Coupled Adoption-Opinion Framework for Competing Innovations
In this paper, we propose a two-layer adoption-opinion model to study the diffusion of two competing technologies within a population whose opinions evolve under social influence and adoption-driven feedback. After adopting one technology, individuals may become dissatisfied and switch to the alternative. We prove the existence and uniqueness of the adoption-diffused equilibrium, showing that both technologies coexist and that neither partial-adoption nor monopoly can arise. Numerical simulations show that while opinions shape the equilibrium adoption levels, the relative market share between the two technologies depends solely on their user-experience. As a consequence, interventions that symmetrically boost opinions or adoption can disproportionately favor the higher-quality technology, illustrating how symmetric control actions may generate asymmetric outcomes.
comment: 7 pages, 3 figures
Computation-Accuracy Trade-Off in Service-Oriented Model-Based Control
Representing a control system as a Service-Oriented Architecture (SOA)-referred to as Service-Oriented Model-Based Control (SOMC)-enables runtime-flexible composition of control loop elements. This paper presents a framework that optimizes the computation-accuracy trade-off by formulating service orchestration as an A$^\star$search problem, complemented by Contextual Bayesian Optimization (BO) to tune the multi-objective cost weights. A vehicle longitudinal-velocity control case study demonstrates online, performancedriven reconfiguration of the control architecture. We show that our framework not only combines control and software structure but also considers the real-time requirements of the control system during performance optimization.
comment: 9 pages, 9 figures
A Categorical Approach to Semantic Interoperability across Building Lifecycle
Buildings generate heterogeneous data across their lifecycle, yet integrating these data remains a critical unsolved challenge. Despite three decades of standardization efforts, over 40 metadata schemas now span the building lifecycle, with fragmentation accelerating rather than resolving. Current approaches rely on point-to-point mappings that scale quadratically with the number of schemas, or universal ontologies that become unwieldy monoliths. The fundamental gap is the absence of mathematical foundations for structure-preserving transformations across heterogeneous building data. Here we show that category theory provides these foundations, enabling systematic data integration with $O(n)$ specification complexity for $n$ ontologies. We formalize building ontologies as first-order theories and demonstrate two proof-of-concept implementations in Categorical Query Language (CQL): 1) generating BRICK models from IFC design data at commissioning, and 2) three-way integration of IFC, BRICK, and RealEstateCore where only two explicit mappings yield the third automatically through categorical composition. Our correct-by-construction approach treats property sets as first-class schema entities and provides automated bidirectional migrations, and enables cross-ontology queries. These results establish feasibility of categorical methods for building data integration and suggest a path toward an app ecosystem for buildings, where mathematical foundations enable reliable component integration analogous to smartphone platforms.
A Cognitive Framework for Autonomous Agents: Toward Human-Inspired Design
This work introduces a human-inspired reinforcement learning (RL) architecture that integrates Pavlovian and instrumental processes to enhance decision-making in autonomous systems. While existing engineering solutions rely almost exclusively on instrumental learning, neuroscience shows that humans use Pavlovian associations to leverage predictive cues to bias behavior before outcomes occur. We translate this dual-system mechanism into a cue-guided RL framework in which radio-frequency (RF) stimuli act as conditioned (Pavlovian) cues that modulate action selection. The proposed architecture combines Pavlovian values with instrumental policy optimization, improving navigation efficiency and cooperative behavior in unknown, partially observable environments. Simulation results demonstrate that cue-driven agents adapt faster, achieving superior performance compared to traditional instrumental-solo agents. This work highlights the potential of human learning principles to reshape digital agents intelligence.
Challenges in the Proper Metrological Verification of Smart Energy Meters
The most common instruments currently measuring active/reactive energy and power quality indicators are smart energy meters. Unfortunately, the verification of such meters is currently performed under ideal conditions or with simple signal models, which do not recreate actual states occurring in the power grid and do not ensure the verification of the properties of their signal chains. This paper presents challenges in the proper metrological verification of smart energy meters. It presents existing legal and normative requirements and scientific research directions regarding these meters. Selected test results are presented, which show that although the tested meters meet the normative and legal requirements because they have been approved for sale, numerous imperfections in the signal and measurement chains of the analyzed instruments are revealed for the selected test signal. On the basis of the presented research results, further directions of research in the field of smart energy meters have been determined.
comment: 5 pages, 4 figures, submitted to IEEE conferences
Zero-Shot MARL Benchmark in the Cyber-Physical Mobility Lab
We present a reproducible benchmark for evaluating sim-to-real transfer of Multi-Agent Reinforcement Learning (MARL) policies for Connected and Automated Vehicles (CAVs). The platform, based on the Cyber-Physical Mobility Lab (CPM Lab) [1], integrates simulation, a high-fidelity digital twin, and a physical testbed, enabling structured zero-shot evaluation of MARL motion-planning policies. We demonstrate its use by deploying a SigmaRL-trained policy [2] across all three domains, revealing two complementary sources of performance degradation: architectural differences between simulation and hardware control stacks, and the sim-to-real gap induced by increasing environmental realism. The open-source setup enables systematic analysis of sim-to-real challenges in MARL under realistic, reproducible conditions.
Agentic AI-RAN Empowering Synergetic Sensing, Communication, Computing, and Control
Future sixth-generation (6G) networks are expected to support low-altitude wireless networks (LAWNs), where unmanned aerial vehicles (UAVs) and aerial robots operate in highly dynamic three-dimensional environments under stringent latency, reliability, and autonomy requirements. In such scenarios, autonomous task execution at the network edge demands holistic coordination among sensing, communication, computing, and control (SC3) processes. Agentic Artificially Intelligent Radio Access Networks (Agentic AI-RAN) offer a promising paradigm by enabling the edge network to function as an autonomous decision-making entity for low-altitude agents with limited onboard resources. In this article, we propose and design a task-oriented Agentic AI-RAN architecture that enables SC3 task execution within a single edge node. This integrated design tackles the fundamental problem of coordinating heterogeneous workloads in resource-constrained edge environments. Furthermore, a representative low-altitude embodied intelligence system is prototyped based on a general-purpose Graphics Processing Unit (GPU) platform to demonstrate autonomous drone navigation in realistic settings. By leveraging the Multi-Instance GPU (MIG) partitioning technique and the containerized deployment, the demonstration system achieves physical resource isolation while supporting tightly coupled coordination between real-time communication and multimodal inference under a unified task framework. Experimental results demonstrate low closed-loop latency, robust bidirectional communication, and stable performance under dynamic runtime conditions, highlighting its viability for mission-critical low-altitude wireless networks in 6G.
Sequential Operating Simulation of Solid State Transformer-Driven Next-Generation 800 VDC Data Center
Artificial-intelligence (AI) workloads are driving rapid growth in data-center electricity use and rack power density, increasing demand for power-delivery systems that are efficient and robust to fast load transients. Conventional uninterruptible power supply (UPS) based AC distribution chains involve multiple conversion stages and line-frequency transformers, which compound losses and are less compatible with dynamic AI power profiles. Although solid-state transformers (SSTs) and 800 VDC distribution architecture are widely discussed, implementable topology/control details, and long-horizon validation with realistic operating profiles remain limited. This paper develops an SST-driven 800 VDC architecture that converts 10 kV MVAC to an 800V LVDC bus using a three-phase H-bridge AC/DC stage cascaded with a dual-active-bridge (DAB) DC/DC stage. A coordinated closed-loop control scheme, combining rectifier voltage/current regulation and DAB phase-shift control, is designed to maintain DC-bus voltage stability. The proposed system is implemented on the real-time digital simulation (RTDS) platform and evaluated via sequential simulations using real-world day- and month-scale operating profiles of data centers, benchmarked against a UPS supply chain. Numerical studies demonstrate tight 800 VDC regulation, reduced input-side energy consumption compared with the UPS baseline, and satisfactory power-quality performance. A capacitance sensitivity test quantifies tradeoffs between DC-bus ripple and low-frequency input-power oscillations, yielding a practical capacitance range for design. Overall, the work provides a reproducible evaluation workflow and actionable guidance for next-generation AI data centers.
Robust Grid-Forming Control Based on Virtual Flux Observer
This paper investigates the design and analysis of a novel grid-forming (GFM) control method for grid-connected converters (GCCs). The core novelty lies in a virtual flux observer-based synchronization and load angle control method. The terminal voltage of the converter is directly regulated to provide voltage-source behavior. The control parameters are designed for decoupling and pole placement. The proposed method exhibits strong robustness in stability and dynamical performance across varying and uncertain grid strengths. The robust control performance of the proposed method is first demonstrated by small-signal analysis, then validated by experiments on a 20 kVA power conversion system.
comment: This is a six-page conference paper to be included in the proceedings of the 2026 IEEE Applied Power Electronics Conference and Exposition (APEC 2026)
Adaptive Input Shaper Design for Unknown Second-Order Systems with Real-Time Parameter Estimation
We propose a feedforward input-shaping framework with online parameter estimation for unknown second-order systems. The proposed approach eliminates the need for prior knowledge of system parameters when designing input shaping for precise switching times by incorporating online estimation for a black-box system. The adaptive input shaping scheme accounts for the system's periodic switching behavior and enables reference shaping even when initial switching instants are missed. The proposed framework is evaluated in simulation and is intended for vibration suppression in motion control applications such as gantry cranes and 3D printer headers.
Polynomial Chaos-based Input Shaper Design under Time-Varying Uncertainty
The work presented here investigates the application of polynomial chaos expansion toward input shaper design in order to maintain robustness in dynamical systems subject to uncertainty. Furthermore, this work intends to specifically address time-varying uncertainty by employing intrusive polynomial chaos expansion. The methodology presented is validated through numerical simulation of intrusive polynomial chaos expansion formulation applied to spring mass system experiencing time-varying uncertainty in the spring stiffness. The system also evaluates non-robust and robust input shapers through the framework in order to identify designs that minimize residual energy. Results indicate that vibration mitigation is achieved at a similar accuracy, yet at higher efficiency compared to a Monte Carlo framework.
AstroTimer: Rethinking Non-Access Stratum Timers in LEO Constellations
Low-Earth Orbit (LEO) constellations expand 5G coverage to remote regions but differ fundamentally from terrestrial networks due to rapidly changing topologies, fluctuating delays, and constrained onboard resources. Existing 3GPP Non-Access Stratum (NAS) timers, inherited from terrestrial and geostationary (GEO) or medium Earth orbit (MEO) systems, fail to accommodate these dynamics, leading to signaling storms and inefficiency. This paper introduces AstroTimer, a lightweight, adaptive framework for sizing NAS timers based on LEO-specific parameters such as link variability, processing delays, and network-function placement. AstroTimer derives a closed-form timer model with low computational cost and optimizes both watchdog and backoff timers for the 5G registration procedure. Simulation results demonstrate that AstroTimer significantly reduces registration time, retry frequency, and user equipment (UE) energy consumption compared to 3GPP defaults, while preventing signaling overloads. The proposed approach provides an operator-ready foundation for reliable, efficient, and scalable non-terrestrial 5G/6G deployments.
comment: To be published in Proceedings of the 2026 IEEE International Conference on Communications (ICC), 24-28 May 2026, Glasgow, Scotland, UK
Robust and learning-augmented algorithms for degradation-aware battery optimization
This paper studies the problem of maximizing revenue from a grid-scale battery energy storage system, accounting for uncertain future electricity prices and the effect of degradation on battery lifetime. We formulate this task as an online resource allocation problem. We propose an algorithm, based on online mirror descent, that is no-regret in the stochastic i.i.d. setting and attains finite asymptotic competitive ratio in the adversarial setting (robustness). When untrusted advice about the opportunity cost of degradation is available, we propose a learning-augmented algorithm that performs well when the advice is accurate (consistency) while still retaining robustness properties when the advice is poor.
comment: 18 pages, 1 figure
Autonomous Mars Rover Module for Soil Sampling and Life Component Analysis
The search for extraterrestrial life has long been a primary focus of scientific exploration, driven by rapid advancements in technology and our understanding of the universe. The discovery of water on Mars has sparked significant interest, raising the question of whether life could exist on the planet. This study proposes a novel approach to simulate and illustrate the detection of life using a proof-of-life module integrated into a Mars rover. The module is an autonomous system capable of traveling to designated regions, excavating soil, collecting samples, and performing biochemical testing onboard the rover itself. The project is inherently multidisciplinary, integrating mechanical systems such as a drill mechanism and a vacuum system, alongside biochemical analysis for soil testing. The module is capable of successfully detecting the presence or absence of living components of life from the collected soil particles. This proof-of-life module serves as a proof-of-concept for autonomous life detection in extraterrestrial environments and lays the foundation for future exploration missions.
comment: 9 pages, 12 figures
Set-Based Reachability for Low-Thrust Spacecraft in Two-Body and Cislunar Dynamical Systems
This paper investigates the application of zonotope-based reachability analysis to low-thrust spacecraft in both two-body and cislunar environments. Reachable sets are generated under two-body and circular restricted three-body (CR3BP) dynamics using set-based methods that approximate nonlinear systems via Taylor expansions. A state-dependent coefficient (SDC) parameterization is also explored to represent nonlinear dynamics in a pseudo-linear form, enabling efficient matrix based propagation of reachable sets. Applications include Earth-Mars transfer and cislunar scenarios such as L1 and L2 Halo orbits and Near Rectilinear Halo Orbits (NRHOs). The resulting reachable sets are used for safe trajectory generation and tracking, with comparisons drawn between model predictive control (MPC) and LQR-based station-keeping. The proposed approach provides a scalable framework for analyzing spacecraft behavior under complex dynamics and control constraints.
comment: AAS 25-863
Data-Efficient Physics-Informed Learning to Model Synchro-Waveform Dynamics of Grid-Integrated Inverter-Based Resources
Inverter-based resources (IBRs) exhibit fast transient dynamics during network disturbances, which often cannot be properly captured by phasor and SCADA measurements. This shortcoming has recently been addressed with the advent of waveform measurement units (WMUs), which provide high-resolution, time-synchronized raw voltage and current waveform samples from multiple locations in the power system. However, transient model learning based on synchro-waveform measurements remains constrained by the scarcity of network disturbances and the complexity of the underlying nonlinear dynamics of IBRs. We propose to address these problems by developing a data-efficient physics-informed machine learning (PIML) framework for synchro-waveform analytics that estimates the IBR terminal current response from only a few network disturbance signatures. Here, the physics of the electrical circuits are used to compensate for limited data availability by constraining the learning process through known circuit relationships. Two cases are considered, with known and unknown circuit parameters. In the latter case, the framework jointly learns the transient dynamics of the IBRs and the parameters of the electrical circuit. Case studies using WMU disturbance data across multiple sampling rates shows consistently lower current estimation error with substantially fewer training events than a purely data-driven baseline.
Robust H-infinity control and worst-case search in constrained parametric space
Standard H-infinity/H2 robust control and analysis tools operate on uncertain parameters assumed to vary independently within prescribed bounds. This paper extends their capabilities in the presence of constraints coupling these parameters and restricting the parametric space. Focusing on the worst-case search, we demonstrate - based on the theory of upper-C1 functions - the validity of using standard, readily available smooth optimization algorithms to address this nonsmooth constrained optimization problem. In particular, we prove that the sequential quadratic programming algorithm converges to Karush-Kuhn-Tucker points, and that such conditions are satisfied by any subgradient at a local minimum. This worst-case search then enables robust controller synthesis: identified worst-case configurations are iteratively added to an active set on which a non-smooth multi-models optimization of the controller is performed. The methodology is illustrated on a satellite benchmark with flexible appendages, of order 50 with 43 uncertain parameters. From a practical point of view, we combine the local exploitation proposed above with a global exploration using either Monte-Carlo sampling or particle swarm optimization. We show that the proposed constrained optimization effectively complements Monte-Carlo sampling by enabling fast detection of rare worst-case configurations, and that the robust controller optimization converges with less than 10 active configurations.
comment: Preprint
Time-Optimal Switching Surfaces for Triple Integrator under Full Box Constraints
Time-optimal control for triple integrator under full box constraints is a fundamental problem in the field of optimal control, which has been widely applied in the industry. However, scenarios involving asymmetric constraints, non-stationary boundary conditions, and active position constraints pose significant challenges. This paper provides a complete characterization of time-optimal switching surfaces for the problem, leading to novel insights into the geometric structure of the optimal control. The active condition of position constraints is derived, which is absent from the literature. An efficient algorithm is proposed, capable of planning time-optimal trajectories under asymmetric full constraints and arbitrary boundary states, with a 100% success rate. Computational time for each trajectory is within approximately 10$μ$s, achieving a 5-order-of-magnitude reduction compared to optimization-based baselines.
comment: This paper has been accepted by American Control Conference, 2026
Small-Signal Stability Oriented Real-Time Operation of Power Systems with a High Penetration of Inverter-Based Resources
This study proposes a control strategy to ensure the safe operation of modern power systems with high penetration of inverter-based resources (IBRs) within an optimal operation framework. The objective is to obtain operating points that satisfy the optimality conditions of a predefined problem while guaranteeing small-signal stability. The methodology consists of two stages. First, an offline analysis of a set of operating points is performed to derive a data-driven regression-based expression that captures a damping-based stability index as a function of the operating conditions. Second, an Online Feedback Optimization (OFO) controller is employed to drive the system toward an optimal operating point while maintaining a secure distance from the instability region. The proposed strategy is evaluated on an academic test case based on a modified version of the IEEE 9-bus system, in which synchronous generators are replaced by IBRs operating under both grid-following and grid-forming control modes. The results demonstrate the effectiveness of the method and are discussed in detail.
Stability results for MIMO LTI systems via Scaled Relative Graphs
This paper proposes a frequency-wise approach for stability analysis of multi-input, multi-output (MIMO) Linear Time-Invariant (LTI) feedback systems through Scaled Relative Graphs (SRGs). Unlike traditional methods, such as the Generalized Nyquist Criterion (GNC), which relies on a coupled analysis that requires the multiplication of models, our approach enables the evaluation of system stability in a decoupled fashion, system by system, each of which is represented by its SRG (or an over-approximation thereof), and it provides an intuitive, visual representation of system behavior. Our results provide conditions for certifying the stability of stable and square MIMO LTI systems connected in closed loop.
Data Generation for Stability Studies of Power Systems with High Penetration of Inverter-Based Resources
The increasing penetration of inverter-based resources (IBRs) is fundamentally reshaping power system dynamics and creating new challenges for stability assessment. Data-driven approaches, and in particular machine learning models, require large and representative datasets that capture how system stability varies across a wide range of operating conditions and control settings. This paper presents an open-source, high-performance computing framework for the systematic generation of such datasets. The proposed tool defines a scalable operating space for large-scale power systems, explores it through an adaptive sampling strategy guided by sensitivity analysis, and performs small-signal stability assessments to populate a high-information-content dataset. The framework efficiently targets regions near the stability margin while maintaining broad coverage of feasible operating conditions. The workflow is fully implemented in Python and designed for parallel execution. The resulting tool enables the creation of high-quality datasets that support data-driven stability studies in modern power systems with high IBR penetration.
Physics-Informed Detection of Friction Anomalies in Satellite Reaction Wheels
As the number of satellites in orbit has increased exponentially in recent years, ensuring their correct functionality has started to require automated methods to decrease human workload. In this work, we present an algorithm that analyzes the on-board data related to friction from the Reaction Wheel Assemblies (RWA) of a satellite and determines their operating status, distinguishing between nominal status and several possible anomalies that require preventive measures to be taken. The algorithm first uses a model based on hybrid systems theory to extract the information relevant to the problem. The extraction process combines techniques in changepoint detection, dynamic programming, and maximum likelihood in a structured way. A classifier then uses the extracted information to determine the status of the RWA. This last classifier has been previously trained with a labelled dataset produced by a high-fidelity simulator, comprised for the most part of nominal data. The final algorithm combines model-based and data-based approaches to obtain satisfactory results with an accuracy around 95%.
Stochastic Control Barrier Functions under State Estimation: From Euclidean Space to Lie Groups
Ensuring safety for autonomous systems under uncertainty remains challenging, particularly when safety of the true state is required despite the true state not being fully known. Control barrier functions (CBFs) have become widely adopted as safety filters. However, standard CBF formulations do not explicitly account for state estimation uncertainty and its propagation, especially for stochastic systems evolving on manifolds. In this paper, we propose a safety-critical control framework with a provable bound on the finite-time safety probability for stochastic systems under noisy state information. The proposed framework explicitly incorporates the uncertainty arising from both process and measurement noise, and synthesizes controllers that adapt to the level of uncertainty. The framework admits closed-form solutions in linear settings, and experimental results demonstrate its effectiveness on systems whose state spaces range from Euclidean space to Lie groups.
Derandomizing Simultaneous Confidence Regions for Band-Limited Functions by Improved Norm Bounds and Majority-Voting Schemes
Band-limited functions are fundamental objects that are widely used in systems theory and signal processing. In this paper we refine a recent nonparametric, nonasymptotic method for constructing simultaneous confidence regions for band-limited functions from noisy input-output measurements, by working in a Paley-Wiener reproducing kernel Hilbert space. Kernel norm bounds are tightened using a uniformly-randomized Hoeffding's inequality for small samples and an empirical Bernstein bound for larger ones. We derive an approximate threshold, based on the sample size and how informative the inputs are, that governs which bound to deploy. Finally, we apply majority voting to aggregate confidence sets from random subsamples, boosting both stability and region size. We prove that even per-input aggregated intervals retain their simultaneous coverage guarantee. These refinements are also validated through numerical experiments.
Data-Driven Distributed Optimization via Aggregative Tracking and Deep-Learning
In this paper, we propose a novel distributed data-driven optimization scheme. In detail, we focus on the so-called aggregative framework, a scenario in which a set of agents aim to cooperatively minimize the sum of local costs, each depending on both local decision variables and an aggregation of all of them. We consider a data-driven setup where each objective function is unknown and can be sampled at a single point per iteration (thanks to, e.g., feedback from users or sensors). We address this scenario through a distributed algorithm combining three components: (i) a learning part leveraging neural networks to learn the local costs descent direction, (ii) an optimization routine steering the estimates according to the learned direction to minimize the global cost, and (iii) a tracking mechanism locally reconstructing the unavailable global quantities. Using tools from system theory, i.e., timescale separation and averaging theory, we formally prove that in strongly convex setups, the distributed scheme linearly converges to a neighborhood of the optimum, whose radius depends on the accuracy of the neural networks. Finally, numerical simulations validate the theoretical results.
Robotics
Point Bridge: 3D Representations for Cross Domain Policy Learning
Robot foundation models are beginning to deliver on the promise of generalist robotic agents, yet progress remains constrained by the scarcity of large-scale real-world manipulation datasets. Simulation and synthetic data generation offer a scalable alternative, but their usefulness is limited by the visual domain gap between simulation and reality. In this work, we present Point Bridge, a framework that leverages unified, domain-agnostic point-based representations to unlock synthetic datasets for zero-shot sim-to-real policy transfer, without explicit visual or object-level alignment. Point Bridge combines automated point-based representation extraction via Vision-Language Models (VLMs), transformer-based policy learning, and efficient inference-time pipelines to train capable real-world manipulation agents using only synthetic data. With additional co-training on small sets of real demonstrations, Point Bridge further improves performance, substantially outperforming prior vision-based sim-and-real co-training methods. It achieves up to 44% gains in zero-shot sim-to-real transfer and up to 66% with limited real data across both single-task and multitask settings. Videos of the robot are best viewed at: https://pointbridge3d.github.io/
IVRA: Improving Visual-Token Relations for Robot Action Policy with Training-Free Hint-Based Guidance
Many Vision-Language-Action (VLA) models flatten image patches into a 1D token sequence, weakening the 2D spatial cues needed for precise manipulation. We introduce IVRA, a lightweight, training-free method that improves spatial understanding by exploiting affinity hints already available in the model's built-in vision encoder, without requiring any external encoder or retraining. IVRA selectively injects these affinity signals into a language-model layer in which instance-level features reside. This inference-time intervention realigns visual-token interactions and better preserves geometric structure while keeping all model parameters fixed. We demonstrate the generality of IVRA by applying it to diverse VLA architectures (LLaRA, OpenVLA, and FLOWER) across simulated benchmarks spanning both 2D and 3D manipulation (VIMA and LIBERO) and on various real-robot tasks. On 2D VIMA, IVRA improves average success by +4.2% over the baseline LLaRA in a low-data regime. On 3D LIBERO, it yields consistent gains over the OpenVLA and FLOWER baselines, including improvements when baseline accuracy is near saturation (96.3% to 97.1%). All code and models will be released publicly. Visualizations are available at: jongwoopark7978.github.io/IVRA
Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning
Recent video generation models demonstrate remarkable ability to capture complex physical interactions and scene evolution over time. To leverage their spatiotemporal priors, robotics works have adapted video models for policy learning but introduce complexity by requiring multiple stages of post-training and new architectural components for action generation. In this work, we introduce Cosmos Policy, a simple approach for adapting a large pretrained video model (Cosmos-Predict2) into an effective robot policy through a single stage of post-training on the robot demonstration data collected on the target platform, with no architectural modifications. Cosmos Policy learns to directly generate robot actions encoded as latent frames within the video model's latent diffusion process, harnessing the model's pretrained priors and core learning algorithm to capture complex action distributions. Additionally, Cosmos Policy generates future state images and values (expected cumulative rewards), which are similarly encoded as latent frames, enabling test-time planning of action trajectories with higher likelihood of success. In our evaluations, Cosmos Policy achieves state-of-the-art performance on the LIBERO and RoboCasa simulation benchmarks (98.5% and 67.1% average success rates, respectively) and the highest average score in challenging real-world bimanual manipulation tasks, outperforming strong diffusion policies trained from scratch, video model-based policies, and state-of-the-art vision-language-action models fine-tuned on the same robot demonstrations. Furthermore, given policy rollout data, Cosmos Policy can learn from experience to refine its world model and value function and leverage model-based planning to achieve even higher success rates in challenging tasks. We release code, models, and training data at https://research.nvidia.com/labs/dir/cosmos-policy/
Efficiently Learning Robust Torque-based Locomotion Through Reinforcement with Model-Based Supervision
We propose a control framework that integrates model-based bipedal locomotion with residual reinforcement learning (RL) to achieve robust and adaptive walking in the presence of real-world uncertainties. Our approach leverages a model-based controller, comprising a Divergent Component of Motion (DCM) trajectory planner and a whole-body controller, as a reliable base policy. To address the uncertainties of inaccurate dynamics modeling and sensor noise, we introduce a residual policy trained through RL with domain randomization. Crucially, we employ a model-based oracle policy, which has privileged access to ground-truth dynamics during training, to supervise the residual policy via a novel supervised loss. This supervision enables the policy to efficiently learn corrective behaviors that compensate for unmodeled effects without extensive reward shaping. Our method demonstrates improved robustness and generalization across a range of randomized conditions, offering a scalable solution for sim-to-real transfer in bipedal locomotion.
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Application
One of the core advantages of SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. In the previous paper, the theoretical analysis of autonomy property of navigation model in inertial, earth and world frames was given. A construction method for SE2(3) group navigation model is proposed to improve the non-inertial navigation model toward full autonomy. This paper serves as a counterpart to previous paper and conducts the real-world strapdown inertial navigation system (SINS)/odometer(ODO) experiments as well as Monte-Carlo simulations to demonstrate the performance of improved SE2(3) group based high-precision navigation models.
DTP: A Simple yet Effective Distracting Token Pruning Framework for Vision-Language Action Models
Vision-Language Action (VLA) models have shown remarkable progress in robotic manipulation by leveraging the powerful perception abilities of Vision-Language Models (VLMs) to understand environments and directly output actions. However, by default, VLA models may overly attend to image tokens in the task-irrelevant region, which we describe as 'distracting tokens'. This behavior can disturb the model from the generation of the desired action tokens in each step, affecting the success rate of tasks. In this paper, we introduce a simple yet effective plug-and-play Distracting Token Pruning (DTP) framework, which dynamically detects and prunes these distracting image tokens. By correcting the model's visual attention patterns, we aim to improve the task success rate, as well as exploring the performance upper boundaries of the model without altering its original architecture or adding additional inputs. Experiments on the SIMPLER Benchmark (Li et al., 2024) show that our method consistently achieving relative improvements in task success rates across different types of novel VLA models, demonstrating generalizability to transformer-based VLAs. Further analysis reveals a negative correlation between the task success rate and the amount of attentions in the task-irrelevant region for all models tested, highlighting a common phenomenon of VLA models that could guide future research. We also publish our code at: https://anonymous.4open.science/r/CBD3.
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Theoretical Analysis
One of core advantages of the SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. Current research on Lie group based extended Kalman filters has demonstrated that error propagation autonomy holds in low-precision applications, such as in micro electromechanical system (MEMS) based integrated navigation without considering earth rotation and inertial device biases. However, in high-precision navigation state estimation, maintaining autonomy is extremely difficult when considering with earth rotation and inertial device biases. This paper presents the theoretical analysis on the autonomy of SE2(3) group based high-precision navigation models under inertial, earth and world frame respectively. Through theoretical analysis, we find that the limitation of the traditional, trivial SE2(3) group navigation modeling method is that the presence of Coriolis force terms introduced by velocity in non-inertial frame. Therefore, a construction method for SE2(3) group navigation models is proposed, which brings the navigation models closer to full autonomy.
DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning
Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14% success rate, outperforming state-of-the-art by 3.83%p with 96.4% improvement in intention alignment. We also demonstrate steerable generation through partial contact specification, providing fine-grained control over grasp synthesis.
Collision-Free Humanoid Traversal in Cluttered Indoor Scenes
We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles with diverse spatial layouts and geometries to the corresponding traversal skills. However, the lack of an effective representation that captures humanoid-obstacle relationships during collision avoidance makes directly learning such mappings difficult. We therefore propose Humanoid Potential Field (HumanoidPF), which encodes these relationships as collision-free motion directions, significantly facilitating RL-based traversal skill learning. We also find that HumanoidPF exhibits a surprisingly negligible sim-to-real gap as a perceptual representation. To further enable generalizable traversal skills through diverse and challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system where users could command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method. Demos and code can be found in our website: https://axian12138.github.io/CAT/.
Keyframe-Based Feed-Forward Visual Odometry
The emergence of visual foundation models has revolutionized visual odometry~(VO) and SLAM, enabling pose estimation and dense reconstruction within a single feed-forward network. However, unlike traditional pipelines that leverage keyframe methods to enhance efficiency and accuracy, current foundation model based methods, such as VGGT-Long, typically process raw image sequences indiscriminately. This leads to computational redundancy and degraded performance caused by low inter-frame parallax, which provides limited contextual stereo information. Integrating traditional geometric heuristics into these methods is non-trivial, as their performance depends on high-dimensional latent representations rather than explicit geometric metrics. To bridge this gap, we propose a novel keyframe-based feed-forward VO. Instead of relying on hand-crafted rules, our approach employs reinforcement learning to derive an adaptive keyframe policy in a data-driven manner, aligning selection with the intrinsic characteristics of the underlying foundation model. We train our agent on TartanAir dataset and conduct extensive evaluations across several real-world datasets. Experimental results demonstrate that the proposed method achieves consistent and substantial improvements over state-of-the-art feed-forward VO methods.
PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour
Parkour tasks for quadrupeds have emerged as a promising benchmark for agile locomotion. While human athletes can effectively perceive environmental characteristics to select appropriate footholds for obstacle traversal, endowing legged robots with similar perceptual reasoning remains a significant challenge. Existing methods often rely on hierarchical controllers that follow pre-computed footholds, thereby constraining the robot's real-time adaptability and the exploratory potential of reinforcement learning. To overcome these challenges, we present PUMA, an end-to-end learning framework that integrates visual perception and foothold priors into a single-stage training process. This approach leverages terrain features to estimate egocentric polar foothold priors, composed of relative distance and heading, guiding the robot in active posture adaptation for parkour tasks. Extensive experiments conducted in simulation and real-world environments across various discrete complex terrains, demonstrate PUMA's exceptional agility and robustness in challenging scenarios.
Accurate Calibration and Robust LiDAR-Inertial Odometry for Spinning Actuated LiDAR Systems
Accurate calibration and robust localization are fundamental for downstream tasks in spinning actuated LiDAR applications. Existing methods, however, require parameterizing extrinsic parameters based on different mounting configurations, limiting their generalizability. Additionally, spinning actuated LiDAR inevitably scans featureless regions, which complicates the balance between scanning coverage and localization robustness. To address these challenges, this letter presents a targetless LiDAR-motor calibration (LM-Calibr) on the basis of the Denavit-Hartenberg convention and an environmental adaptive LiDAR-inertial odometry (EVA-LIO). LM-Calibr supports calibration of LiDAR-motor systems with various mounting configurations. Extensive experiments demonstrate its accuracy and convergence across different scenarios, mounting angles, and initial values. Additionally, EVA-LIO adaptively selects downsample rates and map resolutions according to spatial scale. This adaptivity enables the actuator to operate at maximum speed, thereby enhancing scanning completeness while ensuring robust localization, even when LiDAR briefly scans featureless areas. The source code and hardware design are available on GitHub: \textcolor{blue}{\href{https://github.com/zijiechenrobotics/lm_calibr}{github.com/zijiechenrobotics/lm\_calibr}}. The video is available at \textcolor{blue}{\href{https://youtu.be/cZyyrkmeoSk}{youtu.be/cZyyrkmeoSk}}
comment: This article has been accepted for publication in IEEE Robotics and Automation Letters (RA-L). Personal use is permitted. All other uses require IEEE permission
TeNet: Text-to-Network for Compact Policy Synthesis
Robots that follow natural-language instructions often either plan at a high level using hand-designed interfaces or rely on large end-to-end models that are difficult to deploy for real-time control. We propose TeNet (Text-to-Network), a framework for instantiating compact, task-specific robot policies directly from natural language descriptions. TeNet conditions a hypernetwork on text embeddings produced by a pretrained large language model (LLM) to generate a fully executable policy, which then operates solely on low-dimensional state inputs at high control frequencies. By using the language only once at the policy instantiation time, TeNet inherits the general knowledge and paraphrasing robustness of pretrained LLMs while remaining lightweight and efficient at execution time. To improve generalization, we optionally ground language in behavior during training by aligning text embeddings with demonstrated actions, while requiring no demonstrations at inference time. Experiments on MuJoCo and Meta-World benchmarks show that TeNet produces policies that are orders of magnitude smaller than sequence-based baselines, while achieving strong performance in both multi-task and meta-learning settings and supporting high-frequency control. These results show that text-conditioned hypernetworks offer a practical way to build compact, language-driven controllers for ressource-constrained robot control tasks with real-time requirements.
A Beacon Based Solution for Autonomous UUVs GNSS-Denied Stealthy Navigation
Autonomous Unmanned Underwater Vehicles (UUVs) enable military and civilian covert operations in coastal areas without relying on support vessels or Global Navigation Satellite Systems (GNSS). Such operations are critical when surface access is not possible and stealthy navigation is required in restricted environments such as protected zones or dangerous areas under access ban. GNSS denied navigation is then essential to maintaining concealment as surfacing could expose UUVs to detection. To ensure a precise fleet positioning a constellation of beacons deployed by aerial or surface drones establish a synthetic landmark network that will guide the fleet of UUVs along an optimized path from the continental shelf to the goal on the shore. These beacons either submerged or floating emit acoustic signals for UUV localisation and navigation. A hierarchical planner generates an adaptive route for the drones executing primitive actions while continuously monitoring and replanning as needed to maintain trajectory accuracy.
comment: 8 pages. IEEE TechDefense 2025
Glove2UAV: A Wearable IMU-Based Glove for Intuitive Control of UAV
This paper presents Glove2UAV, a wearable IMU-glove interface for intuitive UAV control through hand and finger gestures, augmented with vibrotactile warnings for exceeding predefined speed thresholds. To promote safer and more predictable interaction in dynamic flight, Glove2UAV is designed as a lightweight and easily deployable wearable interface intended for real-time operation. Glove2UAV streams inertial measurements in real time and estimates palm and finger orientations using a compact processing pipeline that combines median-based outlier suppression with Madgwick-based orientation estimation. The resulting motion estimations are mapped to a small set of control primitives for directional flight (forward/backward and lateral motion) and, when supported by the platform, to object-interaction commands. Vibrotactile feedback is triggered when flight speed exceeds predefined threshold values, providing an additional alert channel during operation. We validate real-time feasibility by synchronizing glove signals with UAV telemetry in both simulation and real-world flights. The results show fast gesture-based command execution, stable coupling between gesture dynamics and platform motion, correct operation of the core command set in our trials, and timely delivery of vibratile warning cues.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
DualShield: Safe Model Predictive Diffusion via Reachability Analysis for Interactive Autonomous Driving
Diffusion models have emerged as a powerful approach for multimodal motion planning in autonomous driving. However, their practical deployment is typically hindered by the inherent difficulty in enforcing vehicle dynamics and a critical reliance on accurate predictions of other agents, making them prone to safety issues under uncertain interactions. To address these limitations, we introduce DualShield, a planning and control framework that leverages Hamilton-Jacobi (HJ) reachability value functions in a dual capacity. First, the value functions act as proactive guidance, steering the diffusion denoising process towards safe and dynamically feasible regions. Second, they form a reactive safety shield using control barrier-value functions (CBVFs) to modify the executed actions and ensure safety. This dual mechanism preserves the rich exploration capabilities of diffusion models while providing principled safety assurance under uncertain and even adversarial interactions. Simulations in challenging unprotected U-turn scenarios demonstrate that DualShield significantly improves both safety and task efficiency compared to leading methods from different planning paradigms under uncertainty.
comment: 8 pages, 5 figures
D-Optimality-Guided Reinforcement Learning for Efficient Open-Loop Calibration of a 3-DOF Ankle Rehabilitation Robot
Accurate alignment of multi-degree-of-freedom rehabilitation robots is essential for safe and effective patient training. This paper proposes a two-stage calibration framework for a self-designed three-degree-of-freedom (3-DOF) ankle rehabilitation robot. First, a Kronecker-product-based open-loop calibration method is developed to cast the input-output alignment into a linear parameter identification problem, which in turn defines the associated experimental design objective through the resulting information matrix. Building on this formulation, calibration posture selection is posed as a combinatorial design-of-experiments problem guided by a D-optimality criterion, i.e., selecting a small subset of postures that maximises the determinant of the information matrix. To enable practical selection under constraints, a Proximal Policy Optimization (PPO) agent is trained in simulation to choose 4 informative postures from a candidate set of 50. Across simulation and real-robot evaluations, the learned policy consistently yields substantially more informative posture combinations than random selection: the mean determinant of the information matrix achieved by PPO is reported to be more than two orders of magnitude higher with reduced variance. In addition, real-world results indicate that a parameter vector identified from only four D-optimality-guided postures provides stronger cross-episode prediction consistency than estimates obtained from a larger but unstructured set of 50 postures. The proposed framework therefore improves calibration efficiency while maintaining robust parameter estimation, offering practical guidance for high-precision alignment of multi-DOF rehabilitation robots.
AION: Aerial Indoor Object-Goal Navigation Using Dual-Policy Reinforcement Learning
Object-Goal Navigation (ObjectNav) requires an agent to autonomously explore an unknown environment and navigate toward target objects specified by a semantic label. While prior work has primarily studied zero-shot ObjectNav under 2D locomotion, extending it to aerial platforms with 3D locomotion capability remains underexplored. Aerial robots offer superior maneuverability and search efficiency, but they also introduce new challenges in spatial perception, dynamic control, and safety assurance. In this paper, we propose AION for vision-based aerial ObjectNav without relying on external localization or global maps. AION is an end-to-end dual-policy reinforcement learning (RL) framework that decouples exploration and goal-reaching behaviors into two specialized policies. We evaluate AION on the AI2-THOR benchmark and further assess its real-time performance in IsaacSim using high-fidelity drone models. Experimental results show that AION achieves superior performance across comprehensive evaluation metrics in exploration, navigation efficiency, and safety. The video can be found at https://youtu.be/TgsUm6bb7zg.
Airflow Source Seeking on Small Quadrotors Using a Single Flow Sensor
As environmental disasters happen more frequently and severely, seeking the source of pollutants or harmful particulates using plume tracking becomes even more important. Plume tracking on small quadrotors would allow these systems to operate around humans and fly in more confined spaces, but can be challenging due to poor sensitivity and long response times from gas sensors that fit on small quadrotors. In this work, we present an approach to complement chemical plume tracking with airflow source-seeking behavior using a custom flow sensor that can sense both airflow magnitude and direction on small quadrotors < 100 g. We use this sensor to implement a modified version of the `Cast and Surge' algorithm that takes advantage of flow direction sensing to find and navigate towards flow sources. A series of characterization experiments verified that the system can detect airflow while in flight and reorient the quadrotor toward the airflow. Several trials with random starting locations and orientations were used to show that our source-seeking algorithm can reliably find a flow source. This work aims to provide a foundation for future platforms that can use flow sensors in concert with other sensors to enable richer plume tracking data collection and source-seeking.
MapViT: A Two-Stage ViT-Based Framework for Real-Time Radio Quality Map Prediction in Dynamic Environments
Recent advancements in mobile and wireless networks are unlocking the full potential of robotic autonomy, enabling robots to take advantage of ultra-low latency, high data throughput, and ubiquitous connectivity. However, for robots to navigate and operate seamlessly, efficiently and reliably, they must have an accurate understanding of both their surrounding environment and the quality of radio signals. Achieving this in highly dynamic and ever-changing environments remains a challenging and largely unsolved problem. In this paper, we introduce MapViT, a two-stage Vision Transformer (ViT)-based framework inspired by the success of pre-train and fine-tune paradigm for Large Language Models (LLMs). MapViT is designed to predict both environmental changes and expected radio signal quality. We evaluate the framework using a set of representative Machine Learning (ML) models, analyzing their respective strengths and limitations across different scenarios. Experimental results demonstrate that the proposed two-stage pipeline enables real-time prediction, with the ViT-based implementation achieving a strong balance between accuracy and computational efficiency. This makes MapViT a promising solution for energy- and resource-constrained platforms such as mobile robots. Moreover, the geometry foundation model derived from the self-supervised pre-training stage improves data efficiency and transferability, enabling effective downstream predictions even with limited labeled data. Overall, this work lays the foundation for next-generation digital twin ecosystems, and it paves the way for a new class of ML foundation models driving multi-modal intelligence in future 6G-enabled systems.
comment: This paper has been accepted for publication at IEEE International Conference on Communications (ICC) 2026
A Mobile Magnetic Manipulation Platform for Gastrointestinal Navigation with Deep Reinforcement Learning Control
Targeted drug delivery in the gastrointestinal (GI) tract using magnetic robots offers a promising alternative to systemic treatments. However, controlling these robots is a major challenge. Stationary magnetic systems have a limited workspace, while mobile systems (e.g., coils on a robotic arm) suffer from a "model-calibration bottleneck", requiring complex, pre-calibrated physical models that are time-consuming to create and computationally expensive. This paper presents a compact, low-cost mobile magnetic manipulation platform that overcomes this limitation using Deep Reinforcement Learning (DRL). Our system features a compact four-electromagnet array mounted on a UR5 collaborative robot. A Soft Actor-Critic (SAC)-based control strategy is trained through a sim-to-real pipeline, enabling effective policy deployment within 15 minutes and significantly reducing setup time. We validated the platform by controlling a 7-mm magnetic capsule along 2D trajectories. Our DRL-based controller achieved a root-mean-square error (RMSE) of 1.18~mm for a square path and 1.50~mm for a circular path. We also demonstrated successful tracking over a clinically relevant, 30 cm * 20 cm workspace. This work demonstrates a rapidly deployable, model-free control framework capable of precise magnetic manipulation in a large workspace,validated using a 2D GI phantom.
DMAVA: Distributed Multi-Autonomous Vehicle Architecture Using Autoware
Simulating and validating coordination among multiple autonomous vehicles (AVs) is a challenging task as most existing simulation architectures are limited to single-vehicle operation or rely on centralized control. This paper presents a Distributed Multi-AV Architecture (DMAVA) that enables synchronized, real-time autonomous driving simulation across multiple physical hosts. Each vehicle runs its own complete AV stack and operates independently from other AVs. The vehicles in the simulation maintain synchronized coordination through a low-latency data-centric communication layer. The proposed system integrates ROS 2 Humble, Autoware Universe, AWSIM Labs, and Zenoh to support concurrent execution of multiple Autoware stacks within a shared Unity-based environment. Experiments conducted on multiple-host configurations demonstrate stable localization, reliable inter-host communication, and fully synchronized closed-loop control. The DMAVA also serves as a foundation for Multi-Vehicle Autonomous Valet Parking, demonstrating its extensibility toward higher-level cooperative autonomy. Demo videos and source code are available at: https://github.com/zubxxr/distributed-multi-autonomous-vehicle-architecture.
comment: 9 pages, 4 figures, 5 tables, Submitted to IEEE IV 2026, Demo videos and source code available
DMV-AVP: Distributed Multi-Vehicle Autonomous Valet Parking using Autoware
This paper presents the DMV-AVP System, a distributed simulation of Multi-Vehicle Autonomous Valet Parking (AVP). The system was implemented as an application of the Distributed Multi-Vehicle Architecture (DMAVA) for synchronized multi-host execution. Most existing simulation approaches rely on centralized or non-distributed designs that constrain scalability and limit fully autonomous control. This work introduces two modules built on top of the DMAVA: 1) a Multi-Vehicle AVP Node that performs state-based coordination, queuing, and reservation management across multiple vehicles, and 2) a Unity-Integrated YOLOv5 Parking Spot Detection Module that provides real-time, vision-based perception within AWSIM Labs. Both modules integrate seamlessly with the DMAVA and extend it specifically for multi-vehicle AVP operation, supported by a Zenoh-based communication layer that ensures low-latency topic synchronization and coordinated behavior across hosts. Experiments conducted on two- and three-host configurations demonstrate deterministic coordination, conflict-free parking behavior, and scalable performance across distributed Autoware instances. The results confirm that the proposed Distributed Multi-Vehicle AVP System supports cooperative AVP simulation and establishes a foundation for future real-world and hardware-in-the-loop validation. Demo videos and source code are available at https://github.com/zubxxr/multi-vehicle-avp
comment: 7 pages, 5 figures, 1 table. Demo videos and source code available
Scalable Screw-Theoretic Synthesis for PDE-Based Dynamic Modeling of Multibody Flexible Manipulators
This paper presents a novel and scalable screw-theoretic multibody synthesis framework for PDE-based dynamic modeling of serial robotic manipulators with an arbitrary number of flexible links in three-dimensional space. The proposed approach systematically constructs screw-theoretic PDE models for individual flexible links and rigorously enforces holonomic joint constraints through interaction forces. The dynamics of each link are formulated using a set of dual screws expressed in body-fixed coordinates: one describing the motion of the body-fixed frame relative to the inertial frame, a second relating the body-fixed frame to the undeformed configuration, and a third capturing elastic deformations. By expressing the system energy and applying variational principles, the governing dynamics of each link had been previously derived in a unified manner. Synthesizing the individual link models yields an infinitely scalable multibody representation capable of capturing both local (subsystem-level) and global (system-level) dynamics. The framework explicitly recovers all dynamic states, including the motion of each body-fixed frame and the distributed deformation fields of the flexible links. For computational tractability and mathematical rigor, the resulting governing equations are formulated as a semi-explicit index-1 differential-algebraic system. Furthermore, by applying separation of variables, the PDE model is recast as an abstract Cauchy problem, and well-posedness of the resulting system is established.
Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain
This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.
comment: Submitted to IEEE Transactions on Intelligent Vehicles as a Regular Paper; was withdrawn in March 2025. A revised version of this manuscript was submitted to ACC 2025 review as a regular paper in Sep 2025
ProbeMDE: Uncertainty-Guided Active Proprioception for Monocular Depth Estimation in Surgical Robotics
Monocular depth estimation (MDE) provides a useful tool for robotic perception, but its predictions are often uncertain and inaccurate in challenging environments such as surgical scenes where textureless surfaces, specular reflections, and occlusions are common. To address this, we propose ProbeMDE, a cost-aware active sensing framework that combines RGB images with sparse proprioceptive measurements for MDE. Our approach utilizes an ensemble of MDE models to predict dense depth maps conditioned on both RGB images and on a sparse set of known depth measurements obtained via proprioception, where the robot has touched the environment in a known configuration. We quantify predictive uncertainty via the ensemble's variance and measure the gradient of the uncertainty with respect to candidate measurement locations. To prevent mode collapse while selecting maximally informative locations to propriocept (touch), we leverage Stein Variational Gradient Descent (SVGD) over this gradient map. We validate our method in both simulated and physical experiments on central airway obstruction surgical phantoms. Our results demonstrate that our approach outperforms baseline methods across standard depth estimation metrics, achieving higher accuracy while minimizing the number of required proprioceptive measurements. Project page: https://brittonjordan.github.io/probe_mde/
comment: 9 pages, 5 figures. Project page: https://brittonjordan.github.io/probe_mde/
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose BayesianVLA, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, BayesianVLA significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
Data-driven tool wear prediction in milling, based on a process-integrated single-sensor approach
Accurate tool wear prediction is essential for maintaining productivity and minimizing costs in machining. However, the complex nature of the tool wear process poses significant challenges to achieving reliable predictions. This study explores data-driven methods, in particular deep learning, for tool wear prediction. Traditional data-driven approaches often focus on a single process, relying on multi-sensor setups and extensive data generation, which limits generalization to new settings. Moreover, multi-sensor integration is often impractical in industrial environments. To address these limitations, this research investigates the transferability of predictive models using minimal training data, validated across two processes. Furthermore, it uses a simple setup with a single acceleration sensor to establish a low-cost data generation approach that facilitates the generalization of models to other processes via transfer learning. The study evaluates several machine learning models, including transformer-inspired convolutional neural networks (CNN), long short-term memory networks (LSTM), support vector machines (SVM), and decision trees, trained on different input formats such as feature vectors and short-time Fourier transform (STFT). The performance of the models is evaluated on two machines and on different amounts of training data, including scenarios with significantly reduced datasets, providing insight into their effectiveness under constrained data conditions. The results demonstrate the potential of specific models and configurations for effective tool wear prediction, contributing to the development of more adaptable and efficient predictive maintenance strategies in machining. Notably, the ConvNeXt model has an exceptional performance, achieving 99.1\% accuracy in identifying tool wear using data from only four milling tools operated until they are worn.
comment: This work is a preprint and has been submitted for possible publication,14 pages, 12 figures
Sigma: The Key for Vision-Language-Action Models toward Telepathic Alignment
To address a fundamental limitation in cognitive systems, namely the absence of a time-updatable mediating thought space between semantics and continuous control, this work constructs and trains a vision-language-action model termed Sigma, deployed on a single RTX 4090. The model is built upon the open-source pi0.5_base backbone, with the svla_so101_pickplace dataset preprocessed into a structured training corpus. An independently designed VLA architecture is introduced to integrate deep semantic understanding with associative reasoning, enabling telepathic-style alignment between perception and action. Training proceeds through iterative optimization of data preprocessing, LoRA-based fine-tuning, and inference-stage adapter design. Evaluation is conducted using offline closed-loop replay, comparing Sigma against the untuned pi0.5_base under identical data conditions. Experimental results indicate a consistent reduction in control MSE across vector-, fragment-, and trajectory-level scales, while preserving the stability of the telepathy norm and semantic-text alignment quality. These findings demonstrate that mind-responsive alignment control can be quantitatively achieved through semantic and associative architectural integration without retraining the base model, providing a reproducible pathway for semantic alignment and intention-driven behavior.
comment: The Sigma model has been open-sourced on Hugging Face. Weights, dataset, some scripts, and logs are all available. The link is: https://huggingface.co/Veltraxor/Sigma
VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning NeurIPS 2025
Coordinating multiple embodied agents in dynamic environments remains a core challenge in artificial intelligence, requiring both perception-driven reasoning and scalable cooperation strategies. While recent works have leveraged large language models (LLMs) for multi-agent planning, a few have begun to explore vision-language models (VLMs) for visual reasoning. However, these VLM-based approaches remain limited in their support for diverse embodiment types. In this work, we introduce VIKI-Bench, the first hierarchical benchmark tailored for embodied multi-agent cooperation, featuring three structured levels: agent activation, task planning, and trajectory perception. VIKI-Bench includes diverse robot embodiments, multi-view visual observations, and structured supervision signals to evaluate reasoning grounded in visual inputs. To demonstrate the utility of VIKI-Bench, we propose VIKI-R, a two-stage framework that fine-tunes a pretrained vision-language model (VLM) using Chain-of-Thought annotated demonstrations, followed by reinforcement learning under multi-level reward signals. Our extensive experiments show that VIKI-R significantly outperforms baselines method across all task levels. Furthermore, we show that reinforcement learning enables the emergence of compositional cooperation patterns among heterogeneous agents. Together, VIKI-Bench and VIKI-R offer a unified testbed and method for advancing multi-agent, visual-driven cooperation in embodied AI systems.
comment: Accepted by NeurIPS 2025 Track on Datasets and Benchmarks. Project page: https://faceong.github.io/VIKI-R/
Multi-Layered Reasoning from a Single Viewpoint for Learning See-Through Grasping
Sensory substitution enables biological systems to perceive stimuli that are typically perceived by another organ, which is inspirational for physical agents. Multimodal perception of intrinsic and extrinsic interactions is critical in building an intelligent robot that learns. This study presents a Vision-based See-Through Perception (VBSeeThruP) architecture that simultaneously perceives multiple intrinsic and extrinsic modalities from a single visual input, in a markerless manner, all packed into a soft robotic finger using the Soft Polyhedral Network design. It is generally applicable to miniature vision systems placed beneath deformable networks with a see-through design, capturing real-time images of the network's physical interactions induced by contact-based events, overlaid on the visual scene of the external environment, as demonstrated in the ablation study. We present the VBSeeThruP's capability for learning reactive grasping without using external cameras or dedicated force and torque sensors on the fingertips. Using the inpainted scene and the deformation mask, we further demonstrate the multimodal performance of the VBSeeThruP architecture to simultaneously achieve various perceptions, including but not limited to scene inpainting, object detection, depth sensing, scene segmentation, masked deformation tracking, 6D force/torque sensing, and contact event detection, all within a single sensory input from the in-finger vision markerlessly.
comment: 39 pages, 13 figures, 2 tables, for supplementary videos, see https://bionicdl.ancorasir.com/?p=1658, for opensourced codes, see https://github.com/ancorasir/SeeThruFinger
Towards Natural Language Environment: Understanding Seamless Natural-Language-Based Human-Multi-Robot Interactions
As multiple robots are expected to coexist in future households, natural language is increasingly envisioned as a primary medium for human-robot and robot-robot communication. This paper introduces the concept of a Natural Language Environment (NLE), defined as an interaction space in which humans and multiple heterogeneous robots coordinate primarily through natural language. Rather than proposing a deployable system, this work aims to explore the design space of such environments. We first synthesize prior work on language-based human-robot interaction to derive a preliminary design space for NLEs. We then conduct a role-playing study in virtual reality to investigate how people conceptualize, negotiate, and coordinate human-multi-robot interactions within this imagined environment. Based on qualitative and quantitative analysis, we refine the preliminary design space and derive design implications that highlight key tensions and opportunities around task coordination dominance, robot autonomy, and robot personality in Natural Language Environments.
Digital twins for the design, interactive control, and deployment of modular, fiber-reinforced soft continuum arms
Soft continuum arms (SCAs) promise versatile manipulation through mechanical compliance, for assistive devices, agriculture, search applications, or surgery. However, the strong nonlinear coupling between materials, morphology, and actuation renders design and control challenging, hindering real-world deployment. In this context, a modular fabrication strategy paired with reliable, interactive simulations would be highly beneficial, streamlining prototyping and control design. Here, we present a digital twin framework for modular SCAs realized using pneumatic Fiber-Reinforced Elastomeric Enclosures (FREEs). The approach models assemblies of FREE actuators through networks of Cosserat rods, favoring the accurate simulation of three-dimensional arm reconfigurations, while explicitly preserving internal modular architecture. This enables the quantitative analysis and scalable development of composite soft robot arms, overcoming limitations of current monolithic continuum models. To validate the framework, we introduce a three-dimensional reconstruction pipeline tailored to soft, slender, small-volume, and highly deformable structures, allowing reliable recovery of arm kinematics and strain distributions. Experimental results across multiple configurations and actuation regimes demonstrate close agreement with simulations. Finally, we embed the digital twins in a virtual environment to allow interactive control design and sim-to-real deployment, establishing a foundation for principled co-design and remote operation of modular soft continuum manipulators.
comment: 8 pages, 4 figures This work has been submitted for possible publication
Multiagent Systems
Average Unfairness in Routing Games AAMAS 2026
We propose average unfairness as a new measure of fairness in routing games, defined as the ratio between the average latency and the minimum latency experienced by users. This measure is a natural complement to two existing unfairness notions: loaded unfairness, which compares maximum and minimum latencies of routes with positive flow, and user equilibrium (UE) unfairness, which compares maximum latency with the latency of a Nash equilibrium. We show that the worst-case values of all three unfairness measures coincide and are characterized by a steepness parameter intrinsic to the latency function class. We show that average unfairness is always no greater than loaded unfairness, and the two measures are equal only when the flow is fully fair. Besides that, we offer a complete comparison of the three unfairness measures, which, to the best of our knowledge, is the first theoretical analysis in this direction. Finally, we study the constrained system optimum (CSO) problem, where one seeks to minimize total latency subject to an upper bound on unfairness. We prove that, for the same tolerance level, the optimal flow under an average unfairness constraint achieves lower total latency than any flow satisfying a loaded unfairness constraint. We show that such improvement is always strict in parallel-link networks and establish sufficient conditions for general networks. We further illustrate the latter with numerical examples. Our results provide theoretical guarantees and valuable insights for evaluating fairness-efficiency tradeoffs in network routing.
comment: 14 pages, 5 figures, 1 table. Accepted for publication at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals AAMAS
Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.
comment: To Appear in the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2026
ALIGNAgent: Adaptive Learner Intelligence for Gap Identification and Next-step guidance
Personalized learning systems have emerged as a promising approach to enhance student outcomes by tailoring educational content, pacing, and feedback to individual needs. However, most existing systems remain fragmented, specializing in either knowledge tracing, diagnostic modeling, or resource recommendation, but rarely integrating these components into a cohesive adaptive cycle. In this paper, we propose ALIGNAgent (Adaptive Learner Intelligence for Gap Identification and Next-step guidance), a multi-agent educational framework designed to deliver personalized learning through integrated knowledge estimation, skill-gap identification, and targeted resource recommendation.ALIGNAgent begins by processing student quiz performance, gradebook data, and learner preferences to generate topic-level proficiency estimates using a Skill Gap Agent that employs concept-level diagnostic reasoning to identify specific misconceptions and knowledge deficiencies. After identifying skill gaps, the Recommender Agent retrieves preference-aware learning materials aligned with diagnosed deficiencies, implementing a continuous feedback loop where interventions occur before advancing to subsequent topics. Extensive empirical evaluation on authentic datasets from two undergraduate computer science courses demonstrates ALIGNAgent's effectiveness, with GPT-4o-based agents achieving precision of 0.87-0.90 and F1 scores of 0.84-0.87 in knowledge proficiency estimation validated against actual exam performance.
comment: 35 pages
DMAVA: Distributed Multi-Autonomous Vehicle Architecture Using Autoware
Simulating and validating coordination among multiple autonomous vehicles (AVs) is a challenging task as most existing simulation architectures are limited to single-vehicle operation or rely on centralized control. This paper presents a Distributed Multi-AV Architecture (DMAVA) that enables synchronized, real-time autonomous driving simulation across multiple physical hosts. Each vehicle runs its own complete AV stack and operates independently from other AVs. The vehicles in the simulation maintain synchronized coordination through a low-latency data-centric communication layer. The proposed system integrates ROS 2 Humble, Autoware Universe, AWSIM Labs, and Zenoh to support concurrent execution of multiple Autoware stacks within a shared Unity-based environment. Experiments conducted on multiple-host configurations demonstrate stable localization, reliable inter-host communication, and fully synchronized closed-loop control. The DMAVA also serves as a foundation for Multi-Vehicle Autonomous Valet Parking, demonstrating its extensibility toward higher-level cooperative autonomy. Demo videos and source code are available at: https://github.com/zubxxr/distributed-multi-autonomous-vehicle-architecture.
comment: 9 pages, 4 figures, 5 tables, Submitted to IEEE IV 2026, Demo videos and source code available
DMV-AVP: Distributed Multi-Vehicle Autonomous Valet Parking using Autoware
This paper presents the DMV-AVP System, a distributed simulation of Multi-Vehicle Autonomous Valet Parking (AVP). The system was implemented as an application of the Distributed Multi-Vehicle Architecture (DMAVA) for synchronized multi-host execution. Most existing simulation approaches rely on centralized or non-distributed designs that constrain scalability and limit fully autonomous control. This work introduces two modules built on top of the DMAVA: 1) a Multi-Vehicle AVP Node that performs state-based coordination, queuing, and reservation management across multiple vehicles, and 2) a Unity-Integrated YOLOv5 Parking Spot Detection Module that provides real-time, vision-based perception within AWSIM Labs. Both modules integrate seamlessly with the DMAVA and extend it specifically for multi-vehicle AVP operation, supported by a Zenoh-based communication layer that ensures low-latency topic synchronization and coordinated behavior across hosts. Experiments conducted on two- and three-host configurations demonstrate deterministic coordination, conflict-free parking behavior, and scalable performance across distributed Autoware instances. The results confirm that the proposed Distributed Multi-Vehicle AVP System supports cooperative AVP simulation and establishes a foundation for future real-world and hardware-in-the-loop validation. Demo videos and source code are available at https://github.com/zubxxr/multi-vehicle-avp
comment: 7 pages, 5 figures, 1 table. Demo videos and source code available
AMBER: A Columnar Architecture for High-Performance Agent-Based Modeling in Python
Agent-based modeling (ABM) has emerged as an indispensable methodology for studying complex adaptive systems across the natural and social sciences. However, Python-based ABM frameworks face a fundamental tension between the accessibility that has made Python dominant in scientific computing and the performance requirements of large-scale simulations. This paper introduces AMBER, a framework that resolves this tension through a novel architectural approach: replacing the conventional object-per-agent representation with columnar state management using the Polars DataFrame library. We analyze the computational characteristics of both paradigms, present the architectural design of AMBER including its core abstractions, spatial environments, experiment management, and optimization capabilities. Empirical evaluation on three canonical benchmarks demonstrates that AMBER achieves speedups of 1.2x to 93x depending on workload characteristics, with the greatest advantages for models dominated by population-wide attribute operations. Memory profiling reveals 30-50% reduction in peak usage compared to object-oriented frameworks. Our results establish columnar state management as a viable architectural foundation for high-performance ABM in interpreted languages.
SemanticALLI: Caching Reasoning, Not Just Responses, in Agentic Systems
Agentic AI pipelines suffer from a hidden inefficiency: they frequently reconstruct identical intermediate logic, such as metric normalization or chart scaffolding, even when the user's natural language phrasing is entirely novel. Conventional boundary caching fails to capture this inefficiency because it treats inference as a monolithic black box. We introduce SemanticALLI, a pipeline-aware architecture within Alli (PMG's marketing intelligence platform), designed to operationalize redundant reasoning. By decomposing generation into Analytic Intent Resolution (AIR) and Visualization Synthesis (VS), SemanticALLI elevates structured intermediate representations (IRs) to first-class, cacheable artifacts. The impact of caching within the agentic loop is substantial. In our evaluation, baseline monolithic caching caps at a 38.7% hit rate due to linguistic variance. In contrast, our structured approach allows for an additional stage, the Visualization Synthesis stage, to achieve an 83.10% hit rate, bypassing 4,023 LLM calls with a median latency of just 2.66 ms. This internal reuse reduces total token consumption, offering a practical lesson for AI system design: even when users rarely repeat themselves, the pipeline often does, at stable, structured checkpoints where caching is most reliable.
GameTalk: Training LLMs for Strategic Conversation
Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce \textbf{GameTalk}, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.
comment: 32 pages, 8 figures
ThinkTank-ME: A Multi-Expert Framework for Middle East Event Forecasting
Event forecasting is inherently influenced by multifaceted considerations, including international relations, regional historical dynamics, and cultural contexts. However, existing LLM-based approaches employ single-model architectures that generate predictions along a singular explicit trajectory, constraining their ability to capture diverse geopolitical nuances across complex regional contexts. To address this limitation, we introduce ThinkTank-ME, a novel Think Tank framework for Middle East event forecasting that emulates collaborative expert analysis in real-world strategic decision-making. To facilitate expert specialization and rigorous evaluation, we construct POLECAT-FOR-ME, a Middle East-focused event forecasting benchmark. Experimental results demonstrate the superiority of multi-expert collaboration in handling complex temporal geopolitical forecasting tasks. The code is available at https://github.com/LuminosityX/ThinkTank-ME.
TDFlow: Agentic Workflows for Test Driven Development EACL 2026
We introduce TDFlow, a novel test-driven agentic workflow that frames repository-scale software engineering as a test-resolution task, specifically designed to solve human-written tests. Given a set of tests, TDFlow repeatedly proposes, revises, and debugs repository-scale patches using precisely engineered sub-agents and tightly constrained tools. The workflow decomposes software engineering program repair into four components governed by respective sub-agents. This simple, forced decoupling of patch proposing, debugging, patch revision, and optional test generation (1) reduces long-context burden on any individual sub-agent, (2) focuses each sub-agent on specific, pre-defined sub-tasks, and (3) allows for specialized performance improvement on specific sub-tasks. When provided human-written tests, TDFlow attains 88.8% pass rate on SWE-Bench Lite (an absolute improvement of 27.8% over the next best system) and 94.3% on SWE-Bench Verified. Manual inspection of the 800 TDFlow runs within SWE-Bench Lite and Verified uncover only 7 instances of test hacking, which were subsequently counted as failures. Furthermore, we show that the primary obstacle to human-level software engineering performance lies within writing successful reproduction tests. We envision a human-LLM interactive system powered by TDFlow where human developers write tests solved by LLM systems. Together, these results indicate that modern LLMs, when embedded in a narrowly engineered, test-driven workflow, already achieve human-level test resolution -- with the final frontier for fully autonomous repository repair being the accurate generation of valid reproduction tests.
comment: Published in the 19th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2026 Main Conference)
TruthTensor: Evaluating LLMs through Human Imitation on Prediction Market under Drift and Holistic Reasoning
Evaluating language models and AI agents remains fundamentally challenging because static benchmarks fail to capture real-world uncertainty, distribution shift, and the gap between isolated task accuracy and human-aligned decision-making under evolving conditions. This paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures reasoning models not only as prediction engines but as human-imitation systems operating in socially-grounded, high-entropy environments. Building on forward-looking, contamination-free tasks, our framework anchors evaluation to live prediction markets and combines probabilistic scoring to provide a holistic view of model behavior. TruthTensor complements traditional correctness metrics with drift-centric diagnostics and explicit robustness checks for reproducibility. It specify human vs. automated evaluation roles, annotation protocols, and statistical testing procedures to ensure interpretability and replicability of results. In experiments across 500+ real markets (political, economic, cultural, technological), TruthTensor demonstrates that models with similar forecast accuracy can diverge markedly in calibration, drift, and risk-sensitivity, underscoring the need to evaluate models along multiple axes (accuracy, calibration, narrative stability, cost, and resource efficiency). TruthTensor therefore operationalizes modern evaluation best practices, clear hypothesis framing, careful metric selection, transparent compute/cost reporting, human-in-the-loop validation, and open, versioned evaluation contracts, to produce defensible assessments of LLMs in real-world decision contexts. We publicly release TruthTensor at https://truthtensor.com.
comment: 16 pages, 6 figures, 2 tables
DISPATCH -- Decentralized Informed Spatial Planning and Assignment of Tasks for Cooperative Heterogeneous Agents
Spatial task allocation in systems such as multi-robot delivery or ride-sharing requires balancing efficiency with fair service across tasks. Greedy assignment policies that match each agent to its highest-preference or lowest-cost task can maximize efficiency but often create inequities: some tasks receive disproportionately favorable service (e.g., shorter delays or better matches), while others face long waits or poor allocations. We study fairness in heterogeneous multi-agent systems where tasks vary in preference alignment and urgency. Most existing approaches either assume centralized coordination or largely ignore fairness under partial observability. Distinct from this prior work, we establish a connection between the Eisenberg-Gale (EG) equilibrium convex program and decentralized, partially observable multi-agent learning. Building on this connection, we develop two equilibrium-informed algorithms that integrate fairness and efficiency: (i) a multi-agent reinforcement learning (MARL) framework, EG-MARL, whose training is guided by a centralized EG equilibrium assignment algorithm; and (ii) a stochastic online optimization mechanism that performs guided exploration and subset-based fair assignment as tasks are discovered. We evaluate on Multi-Agent Particle Environment (MPE) simulations across varying team sizes against centralized EG, Hungarian, and Min-Max distance baselines, and also present a Webots-based warehouse proof-of-concept with heterogeneous robots. Both methods preserve the fairness-efficiency balance of the EG solution under partial observability, with EG-MARL achieving near-centralized coordination and reduced travel distances, and the online mechanism enabling real-time allocation with competitive fairness.
VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
Analog mixed-signal circuit sizing involves complex trade-offs within high-dimensional design spaces. Existing automatic analog circuit sizing approaches rely solely on netlists, ignoring the circuit schematic, which hinders the cognitive link between the schematic and its performance. Furthermore, the black-box nature of machine learning methods and hallucination risks in large language models fail to provide the necessary ground-truth explainability required for industrial sign-off. To address these challenges, we propose a Vision Language Model-optimized collaborative agent design workflow (VLM-CAD), which analyzes circuits, optimizes DC operating points, performs inference-based sizing, and executes external sizing optimization. We integrate Image2Net to annotate circuit schematics and generate a structured JSON description for precise interpretation by Vision Language Models. Furthermore, we propose an Explainable Trust Region Bayesian Optimization method (ExTuRBO) that employs collaborative warm-start from agent-generated seeds and offers dual-granularity sensitivity analysis for external sizing optimization, supporting a comprehensive final design report. Experiment results on amplifier sizing tasks using 180nm, 90nm, and 45nm Predictive Technology Models demonstrate that VLM-CAD effectively balances power and performance while maintaining physics-based explainability. VLM-CAD meets all specification requirements while maintaining low power consumption in optimizing an amplifier with a complementary input and a class-AB output stage, with a total runtime under 66 minutes across all experiments on two amplifiers.
comment: 9 pages, 4 figures, submitted to the 10th International Conference on Control, Automation and Diagnosis (ICCAD'26)
Systems and Control (EESS)
Stochastic Control Barrier Functions under State Estimation: From Euclidean Space to Lie Groups
Ensuring safety for autonomous systems under uncertainty remains challenging, particularly when safety of the true state is required despite the true state not being fully known. Control barrier functions (CBFs) have become widely adopted as safety filters. However, standard CBF formulations do not explicitly account for state estimation uncertainty and its propagation, especially for stochastic systems evolving on manifolds. In this paper, we propose a safety-critical control framework with a provable bound on the finite-time safety probability for stochastic systems under noisy state information. The proposed framework explicitly incorporates the uncertainty arising from both process and measurement noise, and synthesizes controllers that adapt to the level of uncertainty. The framework admits closed-form solutions in linear settings, and experimental results demonstrate its effectiveness on systems whose state spaces range from Euclidean space to Lie groups.
Average Unfairness in Routing Games AAMAS 2026
We propose average unfairness as a new measure of fairness in routing games, defined as the ratio between the average latency and the minimum latency experienced by users. This measure is a natural complement to two existing unfairness notions: loaded unfairness, which compares maximum and minimum latencies of routes with positive flow, and user equilibrium (UE) unfairness, which compares maximum latency with the latency of a Nash equilibrium. We show that the worst-case values of all three unfairness measures coincide and are characterized by a steepness parameter intrinsic to the latency function class. We show that average unfairness is always no greater than loaded unfairness, and the two measures are equal only when the flow is fully fair. Besides that, we offer a complete comparison of the three unfairness measures, which, to the best of our knowledge, is the first theoretical analysis in this direction. Finally, we study the constrained system optimum (CSO) problem, where one seeks to minimize total latency subject to an upper bound on unfairness. We prove that, for the same tolerance level, the optimal flow under an average unfairness constraint achieves lower total latency than any flow satisfying a loaded unfairness constraint. We show that such improvement is always strict in parallel-link networks and establish sufficient conditions for general networks. We further illustrate the latter with numerical examples. Our results provide theoretical guarantees and valuable insights for evaluating fairness-efficiency tradeoffs in network routing.
comment: 14 pages, 5 figures, 1 table. Accepted for publication at the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
Interconnection-based Model Reduction for Linear Hybrid Systems
In this paper, we address the model reduction problem for linear hybrid systems via the interconnection-based technique called moment matching. We consider two classical interconnections, namely the direct and swapped interconnections, in the hybrid setting, and we present families of reduced-order models for each interconnection via a hybrid characterisation of the steady-state responses. By combining the results for each interconnection, the design of a reduced-order model that achieves moment matching simultaneously for both interconnections is studied. In addition, we show that the presented results have simplified counterparts when the jumps of the hybrid system are periodic. A numerical simulation is finally given to illustrate the results.
comment: 17 pages
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Application
One of the core advantages of SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. In the previous paper, the theoretical analysis of autonomy property of navigation model in inertial, earth and world frames was given. A construction method for SE2(3) group navigation model is proposed to improve the non-inertial navigation model toward full autonomy. This paper serves as a counterpart to previous paper and conducts the real-world strapdown inertial navigation system (SINS)/odometer(ODO) experiments as well as Monte-Carlo simulations to demonstrate the performance of improved SE2(3) group based high-precision navigation models.
Improve the autonomy of the SE2(3) group based Extended Kalman Filter for Integrated Navigation: Theoretical Analysis
One of core advantages of the SE2(3) Lie group framework for navigation modeling lies in the autonomy of error propagation. Current research on Lie group based extended Kalman filters has demonstrated that error propagation autonomy holds in low-precision applications, such as in micro electromechanical system (MEMS) based integrated navigation without considering earth rotation and inertial device biases. However, in high-precision navigation state estimation, maintaining autonomy is extremely difficult when considering with earth rotation and inertial device biases. This paper presents the theoretical analysis on the autonomy of SE2(3) group based high-precision navigation models under inertial, earth and world frame respectively. Through theoretical analysis, we find that the limitation of the traditional, trivial SE2(3) group navigation modeling method is that the presence of Coriolis force terms introduced by velocity in non-inertial frame. Therefore, a construction method for SE2(3) group navigation models is proposed, which brings the navigation models closer to full autonomy.
Dynamic Tactile Sensing System and Soft Actor Critic Reinforcement Learning for Inclusion Characterization
This paper presents the Dynamic Tactile Sensing System that utilizes robotic tactile sensing in conjunction with reinforcement learning to locate and characterize embedded inclusions. A dual arm robot is integrated with an optical Tactile Imaging Sensor that utilizes the Soft Actor Critic Algorithm to acquire tactile data based on a pixel intensity reward. A Dynamic Interrogation procedure for tactile exploration is developed that enables the robot to first localize inclusion and refine their positions for precise imaging. Experimental validation conducted on Polydimethylsiloxane phantoms demonstrates that the robot using the Tactile Soft Actor Critic Model was able to achieve size estimation errors of 2.61% and 5.29% for soft and hard inclusions compared to 7.84% and 6.87% for expert human operators. Results also show that Dynamic Tactile Sensing System was able to locate embedded inclusions and autonomously determine their mechanical properties, useful in applications such as breast tumor characterization.
Stability Analysis of Power-Electronics-Dominated Grids Using Scaled Relative Graphs
This paper presents a novel approach to stability analysis for grid-connected converters utilizing Scaled Relative Graphs (SRG). Our method effectively decouples grid and converter dynamics, thereby establishing a comprehensive and efficient framework for evaluating closed-loop stability. Our analysis accommodates both linear and non-linear loads, enhancing its practical applicability. Furthermore, we demonstrate that our stability assessment remains unaffected by angular variations resulting from dq-frame transformations, significantly increasing the method's robustness and versatility. The effectiveness of our approach is validated in several simulation case studies, which illustrate its broad applicability in modern power systems.
comment: Submitted to possible publication
Time-Optimal Switching Surfaces for Triple Integrator under Full Box Constraints
Time-optimal control for triple integrator under full box constraints is a fundamental problem in the field of optimal control, which has been widely applied in the industry. However, scenarios involving asymmetric constraints, non-stationary boundary conditions, and active position constraints pose significant challenges. This paper provides a complete characterization of time-optimal switching surfaces for the problem, leading to novel insights into the geometric and algebraic structure of the optimal control. The active condition of position constraints is derived, which is absent from the literature. An efficient algorithm is proposed, capable of planning time-optimal trajectories under asymmetric full constraints and arbitrary boundary states, with a 100% success rate. Computational time for each trajectory is within approximately 10μs, achieving a 5-order-of-magnitude reduction compared to optimization-based baselines.
comment: Accepted by ACC 2026
Virtual Traffic Police: Large Language Model-Augmented Traffic Signal Control for Unforeseen Incidents
Adaptive traffic signal control (TSC) has demonstrated strong effectiveness in managing dynamic traffic flows. However, conventional methods often struggle when unforeseen traffic incidents occur (e.g., accidents and road maintenance), which typically require labor-intensive and inefficient manual interventions by traffic police officers. Large Language Models (LLMs) appear to be a promising solution thanks to their remarkable reasoning and generalization capabilities. Nevertheless, existing works often propose to replace existing TSC systems with LLM-based systems, which can be (i) unreliable due to the inherent hallucinations of LLMs and (ii) costly due to the need for system replacement. To address the issues of existing works, we propose a hierarchical framework that augments existing TSC systems with LLMs, whereby a virtual traffic police agent at the upper level dynamically fine-tunes selected parameters of signal controllers at the lower level in response to real-time traffic incidents. To enhance domain-specific reliability in response to unforeseen traffic incidents, we devise a self-refined traffic language retrieval system (TLRS), whereby retrieval-augmented generation is employed to draw knowledge from a tailored traffic language database that encompasses traffic conditions and controller operation principles. Moreover, we devise an LLM-based verifier to update the TLRS continuously over the reasoning process. Our results show that LLMs can serve as trustworthy virtual traffic police officers that can adapt conventional TSC methods to unforeseen traffic incidents with significantly improved operational efficiency and reliability.
DualShield: Safe Model Predictive Diffusion via Reachability Analysis for Interactive Autonomous Driving
Diffusion models have emerged as a powerful approach for multimodal motion planning in autonomous driving. However, their practical deployment is typically hindered by the inherent difficulty in enforcing vehicle dynamics and a critical reliance on accurate predictions of other agents, making them prone to safety issues under uncertain interactions. To address these limitations, we introduce DualShield, a planning and control framework that leverages Hamilton-Jacobi (HJ) reachability value functions in a dual capacity. First, the value functions act as proactive guidance, steering the diffusion denoising process towards safe and dynamically feasible regions. Second, they form a reactive safety shield using control barrier-value functions (CBVFs) to modify the executed actions and ensure safety. This dual mechanism preserves the rich exploration capabilities of diffusion models while providing principled safety assurance under uncertain and even adversarial interactions. Simulations in challenging unprotected U-turn scenarios demonstrate that DualShield significantly improves both safety and task efficiency compared to leading methods from different planning paradigms under uncertainty.
comment: 8 pages, 5 figures
Bridging Qualitative Rubrics and AI: A Binary Question Framework for Criterion-Referenced Grading in Engineering
PURPOSE OR GOAL: This study investigates how GenAI can be integrated with a criterion-referenced grading framework to improve the efficiency and quality of grading for mathematical assessments in engineering. It specifically explores the challenges demonstrators face with manual, model solution-based grading and how a GenAI-supported system can be designed to reliably identify student errors, provide high-quality feedback, and support human graders. The research also examines human graders' perceptions of the effectiveness of this GenAI-assisted approach. ACTUAL OR ANTICIPATED OUTCOMES: The study found that GenAI achieved an overall grading accuracy of 92.5%, comparable to two experienced human graders. The two researchers, who also served as subject demonstrators, perceived the GenAI as a helpful second reviewer that improved accuracy by catching small errors and provided more complete feedback than they could manually. A central outcome was the significant enhancement of formative feedback. However, they noted the GenAI tool is not yet reliable enough for autonomous use, especially with unconventional solutions. CONCLUSIONS/RECOMMENDATIONS/SUMMARY: This study demonstrates that GenAI, when paired with a structured, criterion-referenced framework using binary questions, can grade engineering mathematical assessments with an accuracy comparable to human experts. Its primary contribution is a novel methodological approach that embeds the generation of high-quality, scalable formative feedback directly into the assessment workflow. Future work should investigate student perceptions of GenAI grading and feedback.
comment: Proceedings of the 36th Annual Conference of the Australasian Association for Engineering Education (AAEE 2025)
Design, Modelling, and Control of Magnetic Ball Suspension System
This paper presents the modeling, control design, and performance analysis of a Magnetic Ball Suspension System (MBSS), a nonlinear and inherently unstable electromechanical system used in various precision applications. The system's primary objective is to levitate a steel ball using electromagnetic force without physical contact, thereby eliminating frictional losses. A comprehensive state-space model was developed, capturing both the mechanical and electrical dynamics. The equilibrium points of the system were determined through feedback linearization using the Jacobian matrix. To ensure system stability, controllability and observability analyses were conducted, confirming that state feedback and observer-based control strategies could be effectively implemented. Three distinct control methods were explored: pole placement-based state feedback control, full-order observer design, and optimal state feedback control using the Linear Quadratic Regulator (LQR). Each control strategy was validated through Simulink simulations for both linearized and nonlinear models. Simulation results demonstrated that the linearized system consistently achieved desired performance with minimal oscillations, whereas the nonlinear system exhibited significant transient oscillations before stabilization. The full-order observer enhanced estimation accuracy, enabling effective control where direct state measurement was impractical. The LQR-based control offered improved robustness and minimized control effort, though its performance was comparable to standard state feedback in some cases.
comment: 8 pages
Game-to-Real Gap: Quantifying the Effect of Model Misspecification in Network Games
Game-theoretic models and solution concepts provide rigorous tools for predicting collective behavior in multi-agent systems. In practice, however, different agents may rely on different game-theoretic models to design their strategies. As a result, when these heterogeneous models interact, the realized outcome can deviate substantially from the outcome each agent expects based on its own local model. In this work, we introduce the game-to-real gap, a new metric that quantifies the impact of such model misspecification in multi-agent environments. The game-to-real gap is defined as the difference between the utility an agent actually obtains in the multi-agent environment (where other agents may have misspecified models) and the utility it expects under its own game model. Focusing on quadratic network games, we show that misspecifications in either (i) the external shock or (ii) the player interaction network can lead to arbitrarily large game-to-real gaps. We further develop novel network centrality measures that allow exact evaluation of this gap in quadratic network games. Our analysis reveals that standard network centrality measures fail to capture the effects of model misspecification, underscoring the need for new structural metrics that account for this limitation. Finally, through illustrative numerical experiments, we show that existing centrality measures in network games may provide a counterintuitive understanding of the impact of model misspecification.
Multi-User Content Diversity in Wireless Networks
Immersive applications such as eXtended Reality (XR), cloud gaming, and real-time video streaming are central to the vision of 6G networks. These applications require not only low latency and high data rates, but also consistent and high-quality User Experience (UX). Traditional rate allocation and congestion control mechanisms in wireless networks treat users uniformly based on channel conditions, rely only on network-centric Key Performance Indicators (KPIs), and ignore the content diversity, which can lead to inefficient resource utilization and degraded UX. In this paper, we introduce the concept of Multi-User Content Diversity, which recognizes that different users concurrently consume media with varying complexity, and therefore have different bitrate requirements to achieve satisfactory UX. We propose multiple different frameworks that exploit multi-user content diversity and lead to overall network-wide gains in terms of UX. For each framework, we demonstrate the required information exchange between Application Servers (ASs), Application Clients (ACs), and the network, and the algorithms that run in each of these components to optimize a network-wide UXbased objective. Simulation results demonstrate that exploiting multi-user content diversity leads to significant gains in UX capacity, UX fairness, and network utilization, when compared to conventional rate control methods. These findings highlight the potential of content-aware networking as a key enabler for emerging wireless systems.
comment: arXiv admin note: text overlap with arXiv:2505.04114
Optimal County-Level Siting of Data Centers in the United States
Data centers are growing rapidly, creating the pressing need for the development of critical infrastructure build out to support these resource-intensive large loads. Their immense consumption of electricity and, often, freshwater, continues to stress an already constrained and aging power grid and water resources. This paper presents a comprehensive modeling approach to determine the optimal locations to construct such facilities by quantifying their resource use and minimizing associated costs. The interdisciplinary modeling approach incorporates a number of factors including the power grid, telecommunications, climate, water use, and collocated generation potential. This work establishes the base model whose functionality is shown through several test cases focusing on carbon-free generation collocation on a county-level in the United States. The results suggest that while capital costs are the biggest driver, having a longer future outlook and allowing more variable generation collocation influences the model to choose sites with higher renewable potential.
AMBER: A Columnar Architecture for High-Performance Agent-Based Modeling in Python
Agent-based modeling (ABM) has emerged as an indispensable methodology for studying complex adaptive systems across the natural and social sciences. However, Python-based ABM frameworks face a fundamental tension between the accessibility that has made Python dominant in scientific computing and the performance requirements of large-scale simulations. This paper introduces AMBER, a framework that resolves this tension through a novel architectural approach: replacing the conventional object-per-agent representation with columnar state management using the Polars DataFrame library. We analyze the computational characteristics of both paradigms, present the architectural design of AMBER including its core abstractions, spatial environments, experiment management, and optimization capabilities. Empirical evaluation on three canonical benchmarks demonstrates that AMBER achieves speedups of 1.2x to 93x depending on workload characteristics, with the greatest advantages for models dominated by population-wide attribute operations. Memory profiling reveals 30-50% reduction in peak usage compared to object-oriented frameworks. Our results establish columnar state management as a viable architectural foundation for high-performance ABM in interpreted languages.
Maximizing Reach-Avoid Probabilities for Linear Stochastic Systems via Control Architectures
The maximization of reach-avoid probabilities for stochastic systems is a central topic in the control literature. Yet, the available methods are either restricted to low-dimensional systems or suffer from conservative approximations. To address these limitations, we propose control architectures that combine the flexibility of Markov Decision Processes with the scalability of Model Predictive Controllers. The Model Predictive Controller tracks reference signals while remaining agnostic to the stochasticity and reach-avoid objective. Instead, the reach-avoid probability is maximized by optimally updating the controller's reference online. To achieve this, the closed-loop system, consisting of the system and Model Predictive Controller, is abstracted as a Markov Decision Process in which a new reference can be chosen at every time-step. A feedback policy generating optimal references is then computed via Dynamic Programming. If the state space of the system is continuous, the Dynamic Programming algorithm must be executed on a finite system approximation. Modifications to the Model Predictive Controller enable a computationally efficient robustification of the Dynamic Programming algorithm to approximation errors, preserving bounds on the achieved reach-avoid probability. The approach is validated on a perturbed 12D quadcopter model in cluttered reach-avoid environments proving its flexibility and scalability.
BESTOpt: A Modular, Physics-Informed Machine Learning based Building Modeling, Control and Optimization Framework
Modern buildings are increasingly interconnected with occupancy, heating, ventilation, and air-conditioning (HVAC) systems, distributed energy resources (DERs), and power grids. Modeling, control, and optimization of such multi-domain systems play a critical role in achieving building-sector decarbonization. However, most existing tools lack scalability and physical consistency for addressing these complex, multi-scale ecosystem problems. To bridge this gap, this study presents BESTOpt, a modular, physics-informed machine learning (PIML) framework that unifies building applications, including benchmarking, evaluation, diagnostics, control, optimization, and performance simulation. The framework adopts a cluster-domain-system/building-component hierarchy and a standardized state-action-disturbance-observation data typology. By embedding physics priors into data-driven modules, BESTOpt improves model accuracy and physical consistency under unseen conditions. Case studies on single-building and cluster scenarios demonstrate its capability for multi-level centralized and decentralized control. Looking ahead, BESTOpt lays the foundation for an open, extensible platform that accelerates interdisciplinary research toward smart, resilient, and decarbonized building ecosystems.
Reinforcement Learning Compensated Model Predictive Control for Off-road Driving on Unknown Deformable Terrain
This study presents an Actor-Critic reinforcement learning Compensated Model Predictive Controller (AC2MPC) designed for high-speed, off-road autonomous driving on deformable terrains. Addressing the difficulty of modeling unknown tire-terrain interaction and ensuring real-time control feasibility and performance, this framework integrates deep reinforcement learning with a model predictive controller to manage unmodeled nonlinear dynamics. We evaluate the controller framework over constant and varying velocity profiles using high-fidelity simulator Project Chrono. Our findings demonstrate that our controller statistically outperforms standalone model-based and learning-based controllers over three unknown terrains that represent sandy deformable track, sandy and rocky track and cohesive clay-like deformable soil track. Despite varied and previously unseen terrain characteristics, this framework generalized well enough to track longitudinal reference speeds with the least error. Furthermore, this framework required significantly less training data compared to purely learning based controller, converging in fewer steps while delivering better performance. Even when under-trained, this controller outperformed the standalone controllers, highlighting its potential for safer and more efficient real-world deployment.
comment: Submitted to IEEE Transactions on Intelligent Vehicles as a Regular Paper; was withdrawn in March 2025. A revised version of this manuscript was submitted to ACC 2025 review as a regular paper in Sep 2025
Integrated Sensing and Communication for Low-Altitude Security
The dense concentration of low-altitude, slow-speed, and small-size targets in the complex low-altitude environment poses significant security challenges, including failures in continuous wide-area sensing and ambiguous target intent, which existing regulatory frameworks struggle to address. Integrated sensing and communication (ISAC), a hallmark of next-generation mobile communication, offers a transformative approach to low-altitude security governance. By leveraging existing cellular infrastructure and spectrum resources, ISAC enables the construction of a seamless wide-area sensing network, supports intelligent feature extraction and intent inference, facilitates real-time collaborative decision-making, and establishes a dynamic trust authentication framework. This article systematically reviews the technical system, analyzes the security challenges, forecasts the enabling value of ISAC, and discusses the resulting open problems and challenges, thereby laying a foundation for future research and industrial implementation.
comment: 6 pages, 5 figures; This paper presents a forward-looking perspective on the application of ISAC technology in low-altitude security governance, addressing challenges posed by low-altitude, slow-speed, and small-size targets
Data-Enabled Predictive Control and Guidance for Autonomous Underwater Vehicles
This paper presents a fully data-driven control framework for autonomous underwater vehicles (AUVs) based on Data-Enabled Predictive Control (DeePC). The approach eliminates the need for explicit hydrodynamic modeling by exploiting measured input-output data to predict and optimize future system behavior. Classic DeePC was employed in the heading control, while a cascaded DeePC architecture is proposed for depth regulation. For 3-D waypoint path following, the Adaptive Line-of-Sight algorithm is extended to a predictive formulation and integrated with DeePC. All methods are validated in extensive simulation on the REMUS~100 AUV and compared with classical PI/PID control. The results demonstrate superior tracking performance and robustness of DeePC under ocean-current disturbances and nonlinear operating conditions, while significantly reducing modeling effort.
comment: 14 pages, 6 figures
Analyzing the Impact of Demand Response on Short-Circuit Current via a Unit Commitment Model
In low-carbon grids, system flexibility can be enhanced through mechanisms such as Demand Response (DR), enabling the efficient utilization of renewable energy. However, as Synchronous Generators (SGs) are being replaced by renewable energy sources characterized by Inverter-Based Resources (IBR), system stability is severely affected. Due to the limited overload capability of IBRs, their Short-Circuit Current (SCC) contribution is much smaller than that of SGs. As a result, protection devices may fail to trip during faults. Consequently, the remaining SGs play a key role in providing sufficient SCC. Since the commitment of SGs is closely related to system loading conditions, DR can indirectly affect their SCC provision, a relationship that has not yet been investigated in the literature. Therefore, this paper incorporates both DR and SCC constraints into a unit commitment problem and conducts case studies on an IEEE 30-bus system. The results show that although DR can reduce total costs by adjusting power demand, it may also lead to inadequate SCC levels. Nevertheless, when flexible loads are properly coordinated with SCC requirements, the total cost increases by only 0.3%, which is significantly lower than the cost of system dispatch without DR. This demonstrates that DR can facilitate stable system operation in a cost-effective manner.
comment: 1-5 pages. submitted to PESGM 2026, Canada
Robust Output Tracking for Induced Seismicity Mitigation in Underground Reservoirs Governed by a Nonlinear 3D PDE-ODE System
This paper presents a robust output-feedback controller for induced seismicity mitigation in geological reservoirs described by a coupled 3D PDE-ODE model. The controller is a MIMO Super-Twisting design, producing a continuous control signal and requiring minimal model information, while accommodating parameter uncertainty and spatial heterogeneity. Two operational outputs are regulated simultaneously: regional pressures and seismicity rates computed over reservoir sub-regions. Closed-loop properties are established via explicit bounds on the solution and its time derivative for both the infinite-dimensional dynamics and the nonlinear ODE system, yielding finite-time or exponential convergence of the tracking errors. The method is evaluated on a Groningen gas-field case study in two scenarios: gas production while not exceeding the intrinsic seismicity of the region, and combined production with CO$_2$ injection toward net-zero operation. Simulations demonstrate accurate tracking of pressure and seismicity targets across regions under significant parameter uncertainty, supporting safer reservoir operation without sacrificing production objectives.
Implementing Optimal Taxation: A Constrained Optimization Framework for Tax Reform
While optimal taxation theory provides clear prescriptions for tax design, translating these insights into actual tax codes remains difficult. Existing work largely offers theoretical characterizations of optimal systems, while practical implementation methods are scarce. Bridging this gap involves designing tax rules that meet theoretical goals, while accommodating administrative, distributional, and other practical constraints that arise in real-world reform. We develop a method casting tax reform as a constrained optimization problem by parametrizing the entire income tax code as a set of piecewise linear functions mapping tax-relevant inputs into liabilities and marginal rates. This allows users to impose constraints on marginal rate schedules, limits on income swings, and objectives like revenue neutrality, efficiency, simplicity, or distributional fairness that reflect both theoretical and practical considerations. The framework is computationally tractable for complex tax codes and flexible enough to accommodate diverse constraints, welfare objectives and behavioral responses. Whereas existing tools are typically used for ex-post `what-if' analysis of specific reforms, our framework explicitly incorporates real-world reform constraints and jointly optimizes across the full tax code. We illustrate the framework in several simulated settings, including a detailed reconstruction of the Dutch income tax system. For the Dutch case, we generate a family of reforms that smooth existing spikes in marginal tax rates to any desired cap, reduce the number of rules, and impose hard caps on income losses households can experience from the reform. We also introduce \texttt{TaxSolver}, an open-source package, allowing policymakers and researchers to implement and extend the framework.
comment: 40 pages, 5 figures
An Ion-Intercalation Memristor for Enabling Full Parallel Writing in Crossbar Networks
Crossbar architectures have long been seen as a promising foundation for in-memory computing, using memristor arrays for high-density, energy-efficient analog computation. However, this conventional architecture suffers from a fundamental limitation: the inability to perform parallel write operations due to the sneak path problem. This arises from the structural overlap of read and write paths, forcing sequential or semi-parallel updates and severely limiting scalability. To address this, we introduce a new memristor design that decouples read and write operations at the device level. This design enables orthogonal conductive paths, and employs a reversible ion doping mechanism, inspired by lithium-ion battery principles, to modulate resistance states independently of computation. Fabricated devices exhibit near-ideal memristive characteristics and stable performance under isolated read/write conditions.
Robust Offset-free Kernelized Data-Driven Predictive Control for Nonlinear Systems
This paper proposes a novel Kernelized Data-Driven Predictive Control (KDPC) scheme for robust, offset-free tracking of nonlinear systems. Our computationally efficient hybrid approach separates the prediction: (1) kernel ridge regression learns the nonlinear map from past trajectories, and (2) analytical linearization of the kernel map approximates the effect of future inputs. This linearization is key, allowing the controller to be formulated as a standard Quadratic Program (QP) for efficient real-time implementation. Offset-free tracking is inherently achieved by using input increments. We provide theoretical guarantees for recursive feasibility and asymptotic stability. The algorithm is validated on a nonlinear Van der Pol oscillator, where it successfully rejects unmeasured disturbances and eliminates steady-state errors, outperforming a standard model-based controller.
Robotics
Iterative Refinement Improves Compositional Image Generation
Text-to-image (T2I) models have achieved remarkable progress, yet they continue to struggle with complex prompts that require simultaneously handling multiple objects, relations, and attributes. Existing inference-time strategies, such as parallel sampling with verifiers or simply increasing denoising steps, can improve prompt alignment but remain inadequate for richly compositional settings where many constraints must be satisfied. Inspired by the success of chain-of-thought reasoning in large language models, we propose an iterative test-time strategy in which a T2I model progressively refines its generations across multiple steps, guided by feedback from a vision-language model as the critic in the loop. Our approach is simple, requires no external tools or priors, and can be flexibly applied to a wide range of image generators and vision-language models. Empirically, we demonstrate consistent gains on image generation across benchmarks: a 16.9% improvement in all-correct rate on ConceptMix (k=7), a 13.8% improvement on T2I-CompBench (3D-Spatial category) and a 12.5% improvement on Visual Jenga scene decomposition compared to compute-matched parallel sampling. Beyond quantitative gains, iterative refinement produces more faithful generations by decomposing complex prompts into sequential corrections, with human evaluators preferring our method 58.7% of the time over 41.3% for the parallel baseline. Together, these findings highlight iterative self-correction as a broadly applicable principle for compositional image generation. Results and visualizations are available at https://iterative-img-gen.github.io/
comment: Project webpage: https://iterative-img-gen.github.io/
Rethinking Video Generation Model for the Embodied World
Video generation models have significantly advanced embodied intelligence, unlocking new possibilities for generating diverse robot data that capture perception, reasoning, and action in the physical world. However, synthesizing high-quality videos that accurately reflect real-world robotic interactions remains challenging, and the lack of a standardized benchmark limits fair comparisons and progress. To address this gap, we introduce a comprehensive robotics benchmark, RBench, designed to evaluate robot-oriented video generation across five task domains and four distinct embodiments. It assesses both task-level correctness and visual fidelity through reproducible sub-metrics, including structural consistency, physical plausibility, and action completeness. Evaluation of 25 representative models highlights significant deficiencies in generating physically realistic robot behaviors. Furthermore, the benchmark achieves a Spearman correlation coefficient of 0.96 with human evaluations, validating its effectiveness. While RBench provides the necessary lens to identify these deficiencies, achieving physical realism requires moving beyond evaluation to address the critical shortage of high-quality training data. Driven by these insights, we introduce a refined four-stage data pipeline, resulting in RoVid-X, the largest open-source robotic dataset for video generation with 4 million annotated video clips, covering thousands of tasks and enriched with comprehensive physical property annotations. Collectively, this synergistic ecosystem of evaluation and data establishes a robust foundation for rigorous assessment and scalable training of video models, accelerating the evolution of embodied AI toward general intelligence.
comment: Github: https://github.com/DAGroup-PKU/ReVidgen/ Project website: https://dagroup-pku.github.io/ReVidgen.github.io/
FlowSSC: Universal Generative Monocular Semantic Scene Completion via One-Step Latent Diffusion
Semantic Scene Completion (SSC) from monocular RGB images is a fundamental yet challenging task due to the inherent ambiguity of inferring occluded 3D geometry from a single view. While feed-forward methods have made progress, they often struggle to generate plausible details in occluded regions and preserve the fundamental spatial relationships of objects. Such accurate generative reasoning capability for the entire 3D space is critical in real-world applications. In this paper, we present FlowSSC, the first generative framework applied directly to monocular semantic scene completion. FlowSSC treats the SSC task as a conditional generation problem and can seamlessly integrate with existing feed-forward SSC methods to significantly boost their performance. To achieve real-time inference without compromising quality, we introduce Shortcut Flow-matching that operates in a compact triplane latent space. Unlike standard diffusion models that require hundreds of steps, our method utilizes a shortcut mechanism to achieve high-fidelity generation in a single step, enabling practical deployment in autonomous systems. Extensive experiments on SemanticKITTI demonstrate that FlowSSC achieves state-of-the-art performance, significantly outperforming existing baselines.
comment: Under Review
MonoRace: Winning Champion-Level Drone Racing with Robust Monocular AI
Autonomous drone racing represents a major frontier in robotics research. It requires an Artificial Intelligence (AI) that can run on board light-weight flying robots under tight resource and time constraints, while pushing the physical system to its limits. The state of the art in this area consists of a system with a stereo camera and an inertial measurement unit (IMU) that beat human drone racing champions in a controlled indoor environment. Here, we present MonoRace: an onboard drone racing approach that uses a monocular, rolling-shutter camera and IMU that generalizes to a competition environment without any external motion tracking system. The approach features robust state estimation that combines neural-network-based gate segmentation with a drone model. Moreover, it includes an offline optimization procedure that leverages the known geometry of gates to refine any state estimation parameter. This offline optimization is based purely on onboard flight data and is important for fine-tuning the vital external camera calibration parameters. Furthermore, the guidance and control are performed by a neural network that foregoes inner loop controllers by directly sending motor commands. This small network runs on the flight controller at 500Hz. The proposed approach won the 2025 Abu Dhabi Autonomous Drone Racing Competition (A2RL), outperforming all competing AI teams and three human world champion pilots in a direct knockout tournament. It set a new milestone in autonomous drone racing research, reaching speeds up to 100 km/h on the competition track and successfully coping with problems such as camera interference and IMU saturation.
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries
Vision-Language-Action (VLA) models have shown promise in robot manipulation but often struggle to generalize to new instructions or complex multi-task scenarios. We identify a critical pathology in current training paradigms where goal-driven data collection creates a dataset bias. In such datasets, language instructions are highly predictable from visual observations alone, causing the conditional mutual information between instructions and actions to vanish, a phenomenon we term Information Collapse. Consequently, models degenerate into vision-only policies that ignore language constraints and fail in out-of-distribution (OOD) settings. To address this, we propose BayesianVLA, a novel framework that enforces instruction following via Bayesian decomposition. By introducing learnable Latent Action Queries, we construct a dual-branch architecture to estimate both a vision-only prior $p(a \mid v)$ and a language-conditioned posterior $π(a \mid v, \ell)$. We then optimize the policy to maximize the conditional Pointwise Mutual Information (PMI) between actions and instructions. This objective effectively penalizes the vision shortcut and rewards actions that explicitly explain the language command. Without requiring new data, BayesianVLA significantly improves generalization. Extensive experiments across on SimplerEnv and RoboCasa demonstrate substantial gains, including an 11.3% improvement on the challenging OOD SimplerEnv benchmark, validating the ability of our approach to robustly ground language in action.
V-CAGE: Context-Aware Generation and Verification for Scalable Long-Horizon Embodied Tasks
Learning long-horizon embodied behaviors from synthetic data remains challenging because generated scenes are often physically implausible, language-driven programs frequently "succeed" without satisfying task semantics, and high-level instructions require grounding into executable action sequences. To address these limitations, we introduce V-CAGE, a closed-loop framework for generating robust, semantically aligned manipulation datasets at scale. First, we propose a context-aware instantiation mechanism that enforces geometric consistency during scene synthesis. By dynamically maintaining a map of prohibited spatial areas as objects are placed, our system prevents interpenetration and ensures reachable, conflict-free configurations in cluttered environments. Second, to bridge the gap between abstract intent and low-level control, we employ a hierarchical instruction decomposition module. This decomposes high-level goals (e.g., "get ready for work") into compositional action primitives, facilitating coherent long-horizon planning. Crucially, we enforce semantic correctness through a VLM-based verification loop. Acting as a visual critic, the VLM performs rigorous rejection sampling after each subtask, filtering out "silent failures" where code executes but fails to achieve the visual goal. Experiments demonstrate that V-CAGE yields datasets with superior physical and semantic fidelity, significantly boosting the success rate and generalization of downstream policies compared to non-verified baselines.
Influence of Operator Expertise on Robot Supervision and Intervention
With increasing levels of robot autonomy, robots are increasingly being supervised by users with varying levels of robotics expertise. As the diversity of the user population increases, it is important to understand how users with different expertise levels approach the supervision task and how this impacts performance of the human-robot team. This exploratory study investigates how operators with varying expertise levels perceive information and make intervention decisions when supervising a remote robot. We conducted a user study (N=27) where participants supervised a robot autonomously exploring four unknown tunnel environments in a simulator, and provided waypoints to intervene when they believed the robot had encountered difficulties. By analyzing the interaction data and questionnaire responses, we identify differing patterns in intervention timing and decision-making strategies across novice, intermediate, and expert users.
Systematic Evaluation of Hip Exoskeleton Assistance Parameters for Enhancing Gait Stability During Ground Slip Perturbations
Falls are the leading cause of injury related hospitalization and mortality among older adults. Consequently, mitigating age-related declines in gait stability and reducing fall risk during walking is a critical goal for assistive devices. Lower-limb exoskeletons have the potential to support users in maintaining stability during walking. However, most exoskeleton controllers are optimized to reduce the energetic cost of walking rather than to improve stability. While some studies report stability benefits with assistance, the effects of specific parameters, such as assistance magnitude and duration, remain unexplored. To address this gap, we systematically modulated the magnitude and duration of torque provided by a bilateral hip exoskeleton during slip perturbations in eight healthy adults, quantifying stability using whole-body angular momentum (WBAM). WBAM responses were governed by a significant interaction between assistance magnitude and duration, with duration determining whether exoskeleton assistance was stabilizing or destabilizing relative to not wearing the exoskeleton device. Compared to an existing energy-optimized controller, experimentally identified stability-optimal parameters reduced WBAM range by 25.7% on average. Notably, substantial inter-subject variability was observed in the parameter combinations that minimized WBAM during perturbations. We found that optimizing exoskeleton assistance for energetic outcomes alone is insufficient for improving reactive stability during gait perturbations. Stability-focused exoskeleton control should prioritize temporal assistance parameters and include user-specific personalization. This study represents an important step toward personalized, stability-focused exoskeleton control, with direct implications for improving stability and reducing fall risk in older adults.
CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes
Dexterous grasping in cluttered environments presents substantial challenges due to the high degrees of freedom of dexterous hands, occlusion, and potential collisions arising from diverse object geometries and complex layouts. To address these challenges, we propose CADGrasp, a two-stage algorithm for general dexterous grasping using single-view point cloud inputs. In the first stage, we predict sparse IBS, a scene-decoupled, contact- and collision-aware representation, as the optimization target. Sparse IBS compactly encodes the geometric and contact relationships between the dexterous hand and the scene, enabling stable and collision-free dexterous grasp pose optimization. To enhance the prediction of this high-dimensional representation, we introduce an occupancy-diffusion model with voxel-level conditional guidance and force closure score filtering. In the second stage, we develop several energy functions and ranking strategies for optimization based on sparse IBS to generate high-quality dexterous grasp poses. Extensive experiments in both simulated and real-world settings validate the effectiveness of our approach, demonstrating its capability to mitigate collisions while maintaining a high grasp success rate across diverse objects and complex scenes.
ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data
While deep learning has significantly advanced robotic object recognition, purely data-driven approaches often lack semantic consistency and fail to leverage valuable, pre-existing knowledge about the environment. This report presents the ExPrIS project, which addresses this challenge by investigating how knowledge-level expectations can serve as to improve object interpretation from sensor data. Our approach is based on the incremental construction of a 3D Semantic Scene Graph (3DSSG). We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet. These are embedded into a heterogeneous Graph Neural Network (GNN) to create an expectation-biased inference process. This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time. The report details this architecture, its evaluation, and outlines its planned integration on a mobile robotic platform.
comment: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this article is published in KI - Künstliche Intelligenz, and is available online at https://doi.org/10.1007/s13218-026-00901-7
Risk Estimation for Automated Driving
Safety is a central requirement for automated vehicles. As such, the assessment of risk in automated driving is key in supporting both motion planning technologies and safety evaluation. In automated driving, risk is characterized by two aspects. The first aspect is the uncertainty on the state estimates of other road participants by an automated vehicle. The second aspect is the severity of a collision event with said traffic participants. Here, the uncertainty aspect typically causes the risk to be non-zero for near-collision events. This makes risk particularly useful for automated vehicle motion planning. Namely, constraining or minimizing risk naturally navigates the automated vehicle around traffic participants while keeping a safety distance based on the level of uncertainty and the potential severity of the impending collision. Existing approaches to calculate the risk either resort to empirical modeling or severe approximations, and, hence, lack generalizability and accuracy. In this paper, we combine recent advances in collision probability estimation with the concept of collision severity to develop a general method for accurate risk estimation. The proposed method allows us to assign individual severity functions for different collision constellations, such as, e.g., frontal or side collisions. Furthermore, we show that the proposed approach is computationally efficient, which is beneficial, e.g., in real-time motion planning applications. The programming code for an exemplary implementation of Gaussian uncertainties is also provided.
comment: 10 pages, 5 figures
DWPP: Dynamic Window Pure Pursuit Considering Velocity and Acceleration Constraints
Pure pursuit and its variants are widely used for mobile robot path tracking owing to their simplicity and computational efficiency. However, many conventional approaches do not explicitly account for velocity and acceleration constraints, resulting in discrepancies between commanded and actual velocities that result in overshoot and degraded tracking performance. To address this problem, this paper proposes dynamic window pure pursuit (DWPP), which fundamentally reformulates the command velocity computation process to explicitly incorporate velocity and acceleration constraints. Specifically, DWPP formulates command velocity computation in the velocity space (the $v$-$ω$ plane) and selects the command velocity as the point within the dynamic window that is closest to the line $ω= κv$. Experimental results demonstrate that DWPP avoids constraint-violating commands and achieves superior path-tracking accuracy compared with conventional pure pursuit methods. The proposed method has been integrated into the official Nav2 repository and is publicly available (https://github.com/ros-navigation/navigation2).
comment: 28 pages, 12 figures
Graph-Based Adaptive Planning for Coordinated Dual-Arm Robotic Disassembly of Electronic Devices (eGRAP)
E-waste is growing rapidly while recycling rates remain low. We propose an electronic-device Graph-based Adaptive Planning (eGRAP) that integrates vision, dynamic planning, and dual-arm execution for autonomous disassembly. A camera-equipped arm identifies parts and estimates their poses, and a directed graph encodes which parts must be removed first. A scheduler uses topological ordering of this graph to select valid next steps and assign them to two robot arms, allowing independent tasks to run in parallel. One arm carries a screwdriver (with an eye-in-hand depth camera) and the other holds or handles components. We demonstrate eGRAP on 3.5in hard drives: as parts are unscrewed and removed, the system updates its graph and plan online. Experiments show consistent full disassembly of each HDD, with high success rates and efficient cycle times, illustrating the method's ability to adaptively coordinate dual-arm tasks in real time.
comment: 7 Pages, 8 Figures, 5 Tables
HumanDiffusion: A Vision-Based Diffusion Trajectory Planner with Human-Conditioned Goals for Search and Rescue UAV
Reliable human--robot collaboration in emergency scenarios requires autonomous systems that can detect humans, infer navigation goals, and operate safely in dynamic environments. This paper presents HumanDiffusion, a lightweight image-conditioned diffusion planner that generates human-aware navigation trajectories directly from RGB imagery. The system combines YOLO-11--based human detection with diffusion-driven trajectory generation, enabling a quadrotor to approach a target person and deliver medical assistance without relying on prior maps or computationally intensive planning pipelines. Trajectories are predicted in pixel space, ensuring smooth motion and a consistent safety margin around humans. We evaluate HumanDiffusion in simulation and real-world indoor mock-disaster scenarios. On a 300-sample test set, the model achieves a mean squared error of 0.02 in pixel-space trajectory reconstruction. Real-world experiments demonstrate an overall mission success rate of 80% across accident-response and search-and-locate tasks with partial occlusions. These results indicate that human-conditioned diffusion planning offers a practical and robust solution for human-aware UAV navigation in time-critical assistance settings.
comment: This paper has been accepted at HRI, Late Breaking Report, 2026
TIDAL: Temporally Interleaved Diffusion and Action Loop for High-Frequency VLA Control
Large-scale Vision-Language-Action (VLA) models offer semantic generalization but suffer from high inference latency, limiting them to low-frequency batch-and-execute paradigm. This frequency mismatch creates an execution blind spot, causing failures in dynamic environments where targets move during the open-loop execution window. We propose TIDAL (Temporally Interleaved Diffusion and Action Loop), a hierarchical framework that decouples semantic reasoning from high-frequency actuation. TIDAL operates as a backbone-agnostic module for diffusion-based VLAs, using a dual-frequency architecture to redistribute the computational budget. Specifically, a low-frequency macro-intent loop caches semantic embeddings, while a high-frequency micro-control loop interleaves single-step flow integration with execution. This design enables approximately 9 Hz control updates on edge hardware (vs. approximately 2.4 Hz baselines) without increasing marginal overhead. To handle the resulting latency shift, we introduce a temporally misaligned training strategy where the policy learns predictive compensation using stale semantic intent alongside real-time proprioception. Additionally, we address the insensitivity of static vision encoders to velocity by incorporating a differential motion predictor. TIDAL is architectural, making it orthogonal to system-level optimizations. Experiments show a 2x performance gain over open-loop baselines in dynamic interception tasks. Despite a marginal regression in static success rates, our approach yields a 4x increase in feedback frequency and extends the effective horizon of semantic embeddings beyond the native action chunk size. Under non-paused inference protocols, TIDAL remains robust where standard baselines fail due to latency.
Vision-Language Models on the Edge for Real-Time Robotic Perception
Vision-Language Models (VLMs) enable multimodal reasoning for robotic perception and interaction, but their deployment in real-world systems remains constrained by latency, limited onboard resources, and privacy risks of cloud offloading. Edge intelligence within 6G, particularly Open RAN and Multi-access Edge Computing (MEC), offers a pathway to address these challenges by bringing computation closer to the data source. This work investigates the deployment of VLMs on ORAN/MEC infrastructure using the Unitree G1 humanoid robot as an embodied testbed. We design a WebRTC-based pipeline that streams multimodal data to an edge node and evaluate LLaMA-3.2-11B-Vision-Instruct deployed at the edge versus in the cloud under real-time conditions. Our results show that edge deployment preserves near-cloud accuracy while reducing end-to-end latency by 5\%. We further evaluate Qwen2-VL-2B-Instruct, a compact model optimized for resource-constrained environments, which achieves sub-second responsiveness, cutting latency by more than half but at the cost of accuracy.
HumanoidVLM: Vision-Language-Guided Impedance Control for Contact-Rich Humanoid Manipulation
Humanoid robots must adapt their contact behavior to diverse objects and tasks, yet most controllers rely on fixed, hand-tuned impedance gains and gripper settings. This paper introduces HumanoidVLM, a vision-language driven retrieval framework that enables the Unitree G1 humanoid to select task-appropriate Cartesian impedance parameters and gripper configurations directly from an egocentric RGB image. The system couples a vision-language model for semantic task inference with a FAISS-based Retrieval-Augmented Generation (RAG) module that retrieves experimentally validated stiffness-damping pairs and object-specific grasp angles from two custom databases, and executes them through a task-space impedance controller for compliant manipulation. We evaluate HumanoidVLM on 14 visual scenarios and achieve a retrieval accuracy of 93%. Real-world experiments show stable interaction dynamics, with z-axis tracking errors typically within 1-3.5 cm and virtual forces consistent with task-dependent impedance settings. These results demonstrate the feasibility of linking semantic perception with retrieval-based control as an interpretable path toward adaptive humanoid manipulation.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
On-the-fly hand-eye calibration for the da Vinci surgical robot
In Robot-Assisted Minimally Invasive Surgery (RMIS), accurate tool localization is crucial to ensure patient safety and successful task execution. However, this remains challenging for cable-driven robots, such as the da Vinci robot, because erroneous encoder readings lead to pose estimation errors. In this study, we propose a calibration framework to produce accurate tool localization results through computing the hand-eye transformation matrix on-the-fly. The framework consists of two interrelated algorithms: the feature association block and the hand-eye calibration block, which provide robust correspondences for key points detected on monocular images without pre-training, and offer the versatility to accommodate various surgical scenarios by adopting an array of filter approaches, respectively. To validate its efficacy, we test the framework extensively on publicly available video datasets that feature multiple surgical instruments conducting tasks in both in vitro and ex vivo scenarios, under varying illumination conditions and with different levels of key point measurement accuracy. The results show a significant reduction in tool localization errors under the proposed calibration framework, with accuracies comparable to other state-of-the-art methods while being more time-efficient.
comment: 16 pages, 13 figures
From Observation to Prediction: LSTM for Vehicle Lane Change Forecasting on Highway On/Off-Ramps
On and off-ramps are understudied road sections even though they introduce a higher level of variation in highway interactions. Predicting vehicles' behavior in these areas can decrease the impact of uncertainty and increase road safety. In this paper, the difference between this Area of Interest (AoI) and a straight highway section is studied. Multi-layered LSTM architecture to train the AoI model with ExiD drone dataset is utilized. In the process, different prediction horizons and different models' workflow are tested. The results show great promise on horizons up to 4 seconds with prediction accuracy starting from about 76% for the AoI and 94% for the general highway scenarios on the maximum horizon.
Implementing Knowledge Representation and Reasoning with Object Oriented Design IJCAI
This paper introduces KRROOD, a framework designed to bridge the integration gap between modern software engineering and Knowledge Representation & Reasoning (KR&R) systems. While Object-Oriented Programming (OOP) is the standard for developing complex applications, existing KR&R frameworks often rely on external ontologies and specialized languages that are difficult to integrate with imperative code. KRROOD addresses this by treating knowledge as a first-class programming abstraction using native class structures, bridging the gap between the logic programming and OOP paradigms. We evaluate the system on the OWL2Bench benchmark and a human-robot task learning scenario. Experimental results show that KRROOD achieves strong performance while supporting the expressive reasoning required for real-world autonomous systems.
comment: 9 pages, 2 figures, submitted to the 2026 International Joint Conference on Artificial Intelligence (IJCAI)
Moving Beyond Compliance in Soft-Robotic Catheters Through Modularity for Precision Therapies
Soft robotic instruments could navigate delicate, tortuous anatomy more safely than rigid tools, but clinical adoption is limited by insufficient tip functionalization and real-time feedback at the tissue interface. Few sensing and therapeutic modules are compact, robust, and adaptable enough to measure, and respond to, subtle physiological cues during intraluminal procedures. We present a 1.47 mm diameter modular soft robotic catheter that integrates sensing, actuation, and therapy while retaining the compliance needed for safe endoluminal navigation. Validated across multiple in vivo settings, we emphasize its utility in endoscopic retrograde cholangiopancreatography (ERCP), a highly technical procedure and a key access route to the pancreas, an organ that is fragile, difficult to instrument, and central to diseases such as pancreatic cancer. Our architecture supports up to four independently controlled functional units, allowing customizable combinations of anchoring, manipulation, sensing, and targeted drug delivery. In a live porcine model, we demonstrate semi-autonomous deployment into the pancreatic duct and 7.5 cm of endoscopic navigation within it, a region currently inaccessible with standard catheters. A closed-loop autonomous/shared-control system that combines a learned model, magnetic actuation, onboard shape sensing, and visual marker tracking further improves cannulation accuracy. Together, these results establish a scalable platform for multifunctional soft robotic catheters and a new paradigm for complex endoluminal interventions, with potential to reduce radiation exposure, shorten training, and accelerate clinical translation of soft robotic technologies.
comment: 31 pages, 6 figures, 7 supplementary figures
Stochastic Decision-Making Framework for Human-Robot Collaboration in Industrial Applications
Collaborative robots, or cobots, are increasingly integrated into various industrial and service settings to work efficiently and safely alongside humans. However, for effective human-robot collaboration, robots must reason based on human factors such as motivation level and aggression level. This paper proposes an approach for decision-making in human-robot collaborative (HRC) environments utilizing stochastic modeling. By leveraging probabilistic models and control strategies, the proposed method aims to anticipate human actions and emotions, enabling cobots to adapt their behavior accordingly. So far, most of the research has been done to detect the intentions of human co-workers. This paper discusses the theoretical framework, implementation strategies, simulation results, and potential applications of the bilateral collaboration approach for safety and efficiency in collaborative robotics.
comment: Under Review by IEEE Transactions on Human Machine Systems
AutoDriDM: An Explainable Benchmark for Decision-Making of Vision-Language Models in Autonomous Driving ACL
Autonomous driving is a highly challenging domain that requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonstrate reasoning and generalization abilities, opening new possibilities for autonomous driving; however, existing benchmarks and metrics overemphasize perceptual competence and fail to adequately assess decision-making processes. In this work, we present AutoDriDM, a decision-centric, progressive benchmark with 6,650 questions across three dimensions - Object, Scene, and Decision. We evaluate mainstream VLMs to delineate the perception-to-decision capability boundary in autonomous driving, and our correlation analysis reveals weak alignment between perception and decision-making performance. We further conduct explainability analyses of models' reasoning processes, identifying key failure modes such as logical reasoning errors, and introduce an analyzer model to automate large-scale annotation. AutoDriDM bridges the gap between perception-centered and decision-centered evaluation, providing guidance toward safer and more reliable VLMs for real-world autonomous driving.
comment: 23 pages. Submitted to ACL ARR 2026 January
FARE: Fast-Slow Agentic Robotic Exploration
This work advances autonomous robot exploration by integrating agent-level semantic reasoning with fast local control. We introduce FARE, a hierarchical autonomous exploration framework that integrates a large language model (LLM) for global reasoning with a reinforcement learning (RL) policy for local decision making. FARE follows a fast-slow thinking paradigm. The slow-thinking LLM module interprets a concise textual description of the unknown environment and synthesizes an agent-level exploration strategy, which is then grounded into a sequence of global waypoints through a topological graph. To further improve reasoning efficiency, this module employs a modularity-based pruning mechanism that reduces redundant graph structures. The fast-thinking RL module executes exploration by reacting to local observations while being guided by the LLM-generated global waypoints. The RL policy is additionally shaped by a reward term that encourages adherence to the global waypoints, enabling coherent and robust closed-loop behavior. This architecture decouples semantic reasoning from geometric decision, allowing each module to operate in its appropriate temporal and spatial scale. In challenging simulated environments, our results show that FARE achieves substantial improvements in exploration efficiency over state-of-the-art baselines. We further deploy FARE on hardware and validate it in complex, large scale $200m\times130m$ building environment.
Spatially Generalizable Mobile Manipulation via Adaptive Experience Selection and Dynamic Imagination
Mobile Manipulation (MM) involves long-horizon decision-making over multi-stage compositions of heterogeneous skills, such as navigation and picking up objects. Despite recent progress, existing MM methods still face two key limitations: (i) low sample efficiency, due to ineffective use of redundant data generated during long-term MM interactions; and (ii) poor spatial generalization, as policies trained on specific tasks struggle to transfer to new spatial layouts without additional training. In this paper, we address these challenges through Adaptive Experience Selection (AES) and model-based dynamic imagination. In particular, AES makes MM agents pay more attention to critical experience fragments in long trajectories that affect task success, improving skill chain learning and mitigating skill forgetting. Based on AES, a Recurrent State-Space Model (RSSM) is introduced for Model-Predictive Forward Planning (MPFP) by capturing the coupled dynamics between the mobile base and the manipulator and imagining the dynamics of future manipulations. RSSM-based MPFP can reinforce MM skill learning on the current task while enabling effective generalization to new spatial layouts. Comparative studies across different experimental configurations demonstrate that our method significantly outperforms existing MM policies. Real-world experiments further validate the feasibility and practicality of our method.
Landing-Induced Viscoelastic Changes in an Anthropomimetic Foot Joint Structure are Modulated by Foot Structure and Posture
Cadaveric studies have provided important insights into the mechanics of the human foot arch and plantar fascia. However, repeatedly probing posture-dependent viscoelastic responses immediately after landing impact is difficult in biological specimens, leaving the contribution of skeletal architecture to landing dynamics incompletely understood. In this study, we developed an anthropomimetic foot joint structure aimed at replicating the skeletal geometry of the human foot. Using a vertical drop apparatus that simulates landing and a viscoelastic system-identification model, we investigated how skeletal structure and posture modulate the apparent post-impact viscoelastic response. The results show that the multi-jointed anthropomimetic structure exhibited a higher damping ratio than simplified flat and rigid feet. Moreover, ankle dorsiflexion and toe extension systematically shifted the identified parameters, reducing the damping ratio under the tested conditions. Taken together, these findings indicate that an arch-like, multi-jointed skeletal architecture can enhance impact attenuation in an anthropomimetic mechanical foot, and that morphology and passive posture alone can tune the trade-off between attenuation and rebound. The observed posture-dependent trends are qualitatively consistent with reported differences in human landing strategies, suggesting that skeletal architecture may partly account for the modulation. Furthermore, these results highlight the engineering advantage of anatomically informed skeletal replication for achieving human-like apparent viscoelastic behavior through postural adjustment during landing.
comment: 27 pages, preprint
A Brain-inspired Embodied Intelligence for Fluid and Fast Reflexive Robotics Control
Recent advances in embodied intelligence have leveraged massive scaling of data and model parameters to master natural-language command following and multi-task control. In contrast, biological systems demonstrate an innate ability to acquire skills rapidly from sparse experience. Crucially, current robotic policies struggle to replicate the dynamic stability, reflexive responsiveness, and temporal memory inherent in biological motion. Here we present Neuromorphic Vision-Language-Action (NeuroVLA), a framework that mimics the structural organization of the bio-nervous system between the cortex, cerebellum, and spinal cord. We adopt a system-level bio-inspired design: a high-level model plans goals, an adaptive cerebellum module stabilizes motion using high-frequency sensors feedback, and a bio-inspired spinal layer executes lightning-fast actions generation. NeuroVLA represents the first deployment of a neuromorphic VLA on physical robotics, achieving state-of-the-art performance. We observe the emergence of biological motor characteristics without additional data or special guidance: it stops the shaking in robotic arms, saves significant energy(only 0.4w on Neuromorphic Processor), shows temporal memory ability and triggers safety reflexes in less than 20 milliseconds.
Probing Prompt Design for Socially Compliant Robot Navigation with Vision Language Models
Language models are increasingly used for social robot navigation, yet existing benchmarks largely overlook principled prompt design for socially compliant behavior. This limitation is particularly relevant in practice, as many systems rely on small vision language models (VLMs) for efficiency. Compared to large language models, small VLMs exhibit weaker decision-making capabilities, making effective prompt design critical for accurate navigation. Inspired by cognitive theories of human learning and motivation, we study prompt design along two dimensions: system guidance (action-focused, reasoning-oriented, and perception-reasoning prompts) and motivational framing, where models compete against humans, other AI systems, or their past selves. Experiments on two socially compliant navigation datasets reveal three key findings. First, for non-finetuned GPT-4o, competition against humans achieves the best performance, while competition against other AI systems performs worst. For finetuned models, competition against the model's past self yields the strongest results, followed by competition against humans, with performance further influenced by coupling effects among prompt design, model choice, and dataset characteristics. Second, inappropriate system prompt design can significantly degrade performance, even compared to direct finetuning. Third, while direct finetuning substantially improves semantic-level metrics such as perception, prediction, and reasoning, it yields limited gains in action accuracy. In contrast, our system prompts produce a disproportionately larger improvement in action accuracy, indicating that the proposed prompt design primarily acts as a decision-level constraint rather than a representational enhancement.
UniCon: A Unified System for Efficient Robot Learning Transfers
Deploying learning-based controllers across heterogeneous robots is challenging due to platform differences, inconsistent interfaces, and inefficient middleware. To address these issues, we present UniCon, a lightweight framework that standardizes states, control flow, and instrumentation across platforms. It decomposes workflows into execution graphs with reusable components, separating system states from control logic to enable plug-and-play deployment across various robot morphologies. Unlike traditional middleware, it prioritizes efficiency through batched, vectorized data flow, minimizing communication overhead and improving inference latency. This modular, data-oriented approach enables seamless sim-to-real transfer with minimal re-engineering. We demonstrate that UniCon reduces code redundancy when transferring workflows and achieves higher inference efficiency compared to ROS-based systems. Deployed on over 12 robot models from 7 manufacturers, it has been successfully integrated into ongoing research projects, proving its effectiveness in real-world scenarios.
comment: in submission, under review
Explainable OOHRI: Communicating Robot Capabilities and Limitations as Augmented Reality Affordances
Human interaction is essential for issuing personalized instructions and assisting robots when failure is likely. However, robots remain largely black boxes, offering users little insight into their evolving capabilities and limitations. To address this gap, we present explainable object-oriented HRI (X-OOHRI), an augmented reality (AR) interface that conveys robot action possibilities and constraints through visual signifiers, radial menus, color coding, and explanation tags. Our system encodes object properties and robot limits into object-oriented structures using a vision-language model, allowing explanation generation on the fly and direct manipulation of virtual twins spatially aligned within a simulated environment. We integrate the end-to-end pipeline with a physical robot and showcase diverse use cases ranging from low-level pick-and-place to high-level instructions. Finally, we evaluate X-OOHRI through a user study and find that participants effectively issue object-oriented commands, develop accurate mental models of robot limitations, and engage in mixed-initiative resolution.
TacUMI: A Multi-Modal Universal Manipulation Interface for Contact-Rich Tasks
Task decomposition is critical for understanding and learning complex long-horizon manipulation tasks. Especially for tasks involving rich physical interactions, relying solely on visual observations and robot proprioceptive information often fails to reveal the underlying event transitions. This raises the requirement for efficient collection of high-quality multi-modal data as well as robust segmentation method to decompose demonstrations into meaningful modules. Building on the idea of the handheld demonstration device Universal Manipulation Interface (UMI), we introduce TacUMI, a multi-modal data collection system that integrates additionally ViTac sensors, force-torque sensor, and pose tracker into a compact, robot-compatible gripper design, which enables synchronized acquisition of all these modalities during human demonstrations. We then propose a multi-modal segmentation framework that leverages temporal models to detect semantically meaningful event boundaries in sequential manipulations. Evaluation on a challenging cable mounting task shows more than 90 percent segmentation accuracy and highlights a remarkable improvement with more modalities, which validates that TacUMI establishes a practical foundation for both scalable collection and segmentation of multi-modal demonstrations in contact-rich tasks.
CompliantVLA-adaptor: VLM-Guided Variable Impedance Action for Safe Contact-Rich Manipulation
We propose a CompliantVLA-adaptor that augments the state-of-the-art Vision-Language-Action (VLA) models with vision-language model (VLM)-informed context-aware variable impedance control (VIC) to improve the safety and effectiveness of contact-rich robotic manipulation tasks. Existing VLA systems (e.g., RDT, Pi0, OpenVLA-oft) typically output position, but lack force-aware adaptation, leading to unsafe or failed interactions in physical tasks involving contact, compliance, or uncertainty. In the proposed CompliantVLA-adaptor, a VLM interprets task context from images and natural language to adapt the stiffness and damping parameters of a VIC controller. These parameters are further regulated using real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms the VLA baselines on a suite of complex contact-rich tasks, both in simulation and on real hardware, with improved success rates and reduced force violations. The overall success rate across all tasks increases from 9.86\% to 17.29\%, presenting a promising path towards safe contact-rich manipulation using VLAs. We release our code, prompts, and force-torque-impedance-scenario context datasets at https://sites.google.com/view/compliantvla.
comment: under review
A Universal Large Language Model -- Drone Command and Control Interface
The use of artificial intelligence (AI) for drone control can have a transformative impact on drone capabilities, especially when real world information can be integrated with drone sensing, command, and control, part of a growing field of physical AI. Large language models (LLMs) can be advantageous if trained at scale on general knowledge, but especially and in particular when the training data includes information such as detailed map geography topology of the entire planet, as well as the ability to access real time situational data such as weather. However, challenges remain in the interface between drones and LLMs in general, with each application requiring a tedious, labor intensive effort to connect the LLM trained knowledge to drone command and control. Here, we solve that problem, using an interface strategy that is LLM agnostic and drone agnostic, providing the first universal, versatile, comprehensive and easy to use drone control interface. We do this using the new model context protocol (MCP) standard, an open standard that provides a universal way for AI systems to access external data, tools, and services. We develop and deploy a cloud based Linux machine hosting an MCP server that supports the Mavlink protocol, an ubiquitous drone control language used almost universally by millions of drones including Ardupilot and PX4 framework.We demonstrate flight control of a real unmanned aerial vehicle. In further testing, we demonstrate extensive flight planning and control capability in a simulated drone, integrated with a Google Maps MCP server for up to date, real time navigation information. This demonstrates a universal approach to integration of LLMs with drone command and control, a paradigm that leverages and exploits virtually all of modern AI industry with drone technology in an easy to use interface that translates natural language to drone control.
Neural Collision Detection for Multi-arm Laparoscopy Surgical Robots Through Learning-from-Simulation
This study presents an integrated framework for enhancing the safety and operational efficiency of robotic arms in laparoscopic surgery by addressing key challenges in collision detection and minimum distance estimation. By combining analytical modeling, real-time simulation, and machine learning, the framework offers a robust solution for ensuring safe robotic operations. An analytical model was developed to estimate the minimum distances between robotic arms based on their joint configurations, offering precise theoretical calculations that serve as both a validation tool and a benchmark. To complement this, a 3D simulation environment was created to model two 7-DOF Kinova robotic arms, generating a diverse dataset of configurations for collision detection and distance estimation. Using these insights, a deep neural network model was trained with joint actuators of robot arms and relative positions as inputs, achieving a mean absolute error of 282.2 mm and an R-squared value of 0.85. The close alignment between predicted and actual distances highlights the network's accuracy and its ability to generalize spatial relationships. This work demonstrates the effectiveness of combining analytical precision with machine learning algorithms to enhance the precision and reliability of robotic systems.
Learning a Unified Latent Space for Cross-Embodiment Robot Control
We present a scalable framework for cross-embodiment humanoid robot control by learning a shared latent representation that unifies motion across humans and diverse humanoid platforms, including single-arm, dual-arm, and legged humanoid robots. Our method proceeds in two stages: first, we construct a decoupled latent space that captures localized motion patterns across different body parts using contrastive learning, enabling accurate and flexible motion retargeting even across robots with diverse morphologies. To enhance alignment between embodiments, we introduce tailored similarity metrics that combine joint rotation and end-effector positioning for critical segments, such as arms. Then, we train a goal-conditioned control policy directly within this latent space using only human data. Leveraging a conditional variational autoencoder, our policy learns to predict latent space displacements guided by intended goal directions. We show that the trained policy can be directly deployed on multiple robots without any adaptation. Furthermore, our method supports the efficient addition of new robots to the latent space by learning only a lightweight, robot-specific embedding layer. The learned latent policies can also be directly applied to the new robots. Experimental results demonstrate that our approach enables robust, scalable, and embodiment-agnostic robot control across a wide range of humanoid platforms.
Preparation and Motion Study of Magnetically Driven Micro Soft Robot Mimicking the Cownose Ray
In narrow, unstructured underwater environments such as environmental monitoring and minimally invasive medical procedures, micro soft robots exhibit unique advantages due to their flexible movement capabilities and small size. At the same time, applying bionic technology to the structural design of micro soft robots can significantly improve their swimming performance. However, limited by their miniaturization, these robots are difficult to power internally and usually adopt a wireless power supply method. This study designs and fabricates a magnetically responsive, cownose ray-inspired micro soft robot based on the swimming principle of the cownose ray. The robot is made of a certain proportion of NdFeB and PDMS. Then, a three-dimensional Helmholtz coil is used to generate an oscillating harmonic magnetic field to conduct swimming experiments on the robot, exploring the influence of magnetic field parameters on the robot's swimming performance. The experimental results show that the swimming speed is the fastest at B = 5 mT and f = 11 Hz, reaching 5.25 mm/s, which is about 0.5 body lengths per second. In addition, by adjusting the current direction and frequency of the coil, the robot can perform different swimming modes such as straight swimming, turning swimming, and directional swimming. By employing a stepwise adjustment method, the impact of response errors on the robot's trajectory can be effectively reduced. This study demonstrates a method for magnetically driven micro soft robots, laying a foundation for the application of wireless-driven robots in underwater narrow spaces.
comment: These experiments lay an important foundation for the study of tether-free control of underwater micro-soft robots. Furthermore, this research provides important references for the fields of biomimetic robots and magnetically controlled micro-soft robots
OSMa-Bench: Evaluating Open Semantic Mapping Under Varying Lighting Conditions
Open Semantic Mapping (OSM) is a key technology in robotic perception, combining semantic segmentation and SLAM techniques. This paper introduces a dynamically configurable and highly automated LLM/LVLM-powered pipeline for evaluating OSM solutions called OSMa-Bench (Open Semantic Mapping Benchmark). The study focuses on evaluating state-of-the-art semantic mapping algorithms under varying indoor lighting conditions, a critical challenge in indoor environments. We introduce a novel dataset with simulated RGB-D sequences and ground truth 3D reconstructions, facilitating the rigorous analysis of mapping performance across different lighting conditions. Through experiments on leading models such as ConceptGraphs, BBQ, and OpenScene, we evaluate the semantic fidelity of object recognition and segmentation. Additionally, we introduce a Scene Graph evaluation method to analyze the ability of models to interpret semantic structure. The results provide insights into the robustness of these models, forming future research directions for developing resilient and adaptable robotic systems. Project page is available at https://be2rlab.github.io/OSMa-Bench/.
comment: Project page: https://be2rlab.github.io/OSMa-Bench/
Locomotion Dynamics of an Underactuated Three-Link Robotic Vehicle
The wheeled three-link snake robot is a well-known example of an underactuated system modelled using nonholonomic constraints, preventing lateral slippage (skid) of the wheels. A kinematically controlled configuration assumes that both joint angles are directly prescribed as phase-shifted periodic input. In another configuration of the robot, only one joint is periodically actuated while the second joint is passively governed by a visco-elastic torsion spring. In our work, we constructed the two configurations of the wheeled robot and conducted motion experiments under different actuation inputs. Analysis of the motion tracking measurements reveals a significant amount of wheels' skid, in contrast to the assumptions used in standard nonholonomic models. Therefore, we propose modified dynamic models which include wheels' skid and viscous friction forces, as well as rolling resistance. After parameter fitting, these dynamic models reach good agreement with the motion measurements, including effects of input's frequency on the mean speed and net displacement per period. This illustrates the importance of incorporating wheels' skid and friction into the system's model.
comment: Accepted to IEEE Transactions on Robotics, January 2026
Semilinear single-track vehicle models with distributed tyre friction dynamics
This paper introduces a novel family of single-track vehicle models that incorporate a distributed representation of transient tyre dynamics, whilst simultaneously accounting for nonlinear effects induced by friction. The core of the proposed framework is represented by the distributed Friction with Bristle Dynamics (FrBD) model, which unifies and extends classical formulations such as Dahl and LuGre by describing the rolling contact process as a spatially distributed system governed by semilinear partial differential equations (PDEs). This model is systematically integrated into a single-track vehicle framework, where the resulting semilinear ODE-PDE interconnection captures the interaction between lateral vehicle motion and tyre deformation. Two main variants are considered: one with rigid tyre carcass and another with flexible carcass, each admitting a compact state-space representation. Local and global well-posedness properties for the coupled system are established rigorously, highlighting the dissipative and physically consistent properties of the distributed FrBD model. A linearisation procedure is also presented, enabling spectral analysis and transfer function derivation, and potentially facilitating the synthesis of controllers and observers. Numerical simulations demonstrate the model's capability to capture micro-shimmy oscillations and transient lateral responses to advanced steering manoeuvres. The proposed formulation advances the state-of-the-art in vehicle dynamics modelling by providing a physically grounded, mathematically rigorous, and computationally tractable approach to incorporating transient tyre behaviour in lateral vehicle dynamics, when accounting for the effect of limited friction.
comment: 37 pages, 12 figures
Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning
Simulating object deformations is a critical challenge across many scientific domains, including robotics, manufacturing, and structural mechanics. Learned Graph Network Simulators (GNSs) offer a promising alternative to traditional mesh-based physics simulators. Their speed and inherent differentiability make them particularly well suited for applications that require fast and accurate simulations, such as robotic manipulation or manufacturing optimization. However, existing learned simulators typically rely on single-step observations, which limits their ability to exploit temporal context. Without this information, these models fail to infer, e.g., material properties. Further, they rely on auto-regressive rollouts, which quickly accumulate error for long trajectories. We instead frame mesh-based simulation as a trajectory-level meta-learning problem. Using Conditional Neural Processes, our method enables rapid adaptation to new simulation scenarios from limited initial data while capturing their latent simulation properties. We utilize movement primitives to directly predict fast, stable and accurate simulations from a single model call. The resulting approach, Movement-primitive Meta-MeshGraphNet (M3GN), provides higher simulation accuracy at a fraction of the runtime cost compared to state-of-the-art GNSs across several tasks.
comment: 35 pages. Submitted to Transactions on Machine Learning Research (TMLR)
Collision Probability Estimation for Optimization-based Vehicular Motion Planning
Many motion planning algorithms for automated driving require estimating the probability of collision (POC) to account for uncertainties in the measurement and estimation of the motion of road users. Common POC estimation techniques often utilize sampling-based methods that suffer from computational inefficiency and a non-deterministic estimation, i.e., each estimation result for the same inputs is slightly different. In contrast, optimization-based motion planning algorithms require computationally efficient POC estimation, ideally using deterministic estimation, such that typical optimization algorithms for motion planning retain feasibility. Estimating the POC analytically, however, is challenging because it depends on understanding the collision conditions (e.g., vehicle's shape) and characterizing the uncertainty in motion prediction. In this paper, we propose an approach in which we estimate the POC between two vehicles by over-approximating their shapes by a multi-circular shape approximation. The position and heading of the predicted vehicle are modelled as random variables, contrasting with the literature, where the heading angle is often neglected. We guarantee that the provided POC is an over-approximation, which is essential in providing safety guarantees. For the particular case of Gaussian uncertainty in the position and heading, we present a computationally efficient algorithm for computing the POC estimate. This algorithm is then used in a path-following stochastic model predictive controller (SMPC) for motion planning. With the proposed algorithm, the SMPC generates reproducible trajectories while the controller retains its feasibility in the presented test cases and demonstrates the ability to handle varying levels of uncertainty.
comment: 14 pages, 7 figures
Allocation for Omnidirectional Aerial Robots: Incorporating Power Dynamics
Tilt-rotor aerial robots are more dynamic and versatile than fixed-rotor platforms, since the thrust vector and body orientation are decoupled. However, the coordination of servos and propellers (the allocation problem) is not trivial, especially accounting for overactuation and actuator dynamics. We incrementally build and present three novel allocation methods for tilt-rotor aerial robots, comparing them to state-of-the-art methods on a real system performing dynamic maneuvers. We extend the state-of-the-art geometric allocation into a differential allocation, which uses the platform's redundancy and does not suffer from singularities. We expand it by incorporating actuator dynamics and propeller power dynamics. These allow us to model dynamic propeller acceleration limits, bringing two main advantages: balancing propeller speed without the need for nullspace goals and allowing the platform to selectively turn off propellers during flight, opening the door to new manipulation possibilities. We also use actuator dynamics and limits to normalize the allocation problem, making it easier to tune and allowing it to track 70% faster trajectories than a geometric allocation.
GuideTouch: An Obstacle Avoidance Device with Tactile Feedback for Visually Impaired
Safe navigation for the visually impaired individuals remains a critical challenge, especially concerning head-level obstacles, which traditional mobility aids often fail to detect. We introduce GuideTouch, a compact, affordable, standalone wearable device designed for autonomous obstacle avoidance. The system integrates two vertically aligned Time-of-Flight (ToF) sensors, enabling three-dimensional environmental perception, and four vibrotactile actuators that provide directional haptic feedback. Proximity and direction information is communicated via an intuitive 4-point vibrotactile feedback system located across the user's shoulders and upper chest. For real-world robustness, the device includes a unique centrifugal self-cleaning optical cover mechanism and a sound alarm system for location if the device is dropped. We evaluated the haptic perception accuracy across 22 participants (17 male and 5 female, aged 21-48, mean 25.7, sd 6.1). Statistical analysis confirmed a significant difference between the perception accuracy of different patterns. The system demonstrated high recognition accuracy, achieving an average of 92.9% for single and double motor (primary directional) patterns. Furthermore, preliminary experiments with 14 visually impaired users validated this interface, showing a recognition accuracy of 93.75% for primary directional cues. The results demonstrate that GuideTouch enables intuitive spatial perception and could significantly improve the safety, confidence, and autonomy of users with visual impairments during independent navigation.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
DroneVLA: VLA based Aerial Manipulation
As aerial platforms evolve from passive observers to active manipulators, the challenge shifts toward designing intuitive interfaces that allow non-expert users to command these systems naturally. This work introduces a novel concept of autonomous aerial manipulation system capable of interpreting high-level natural language commands to retrieve objects and deliver them to a human user. The system is intended to integrate a MediaPipe based on Grounding DINO and a Vision-Language-Action (VLA) model with a custom-built drone equipped with a 1-DOF gripper and an Intel RealSense RGB-D camera. VLA performs semantic reasoning to interpret the intent of a user prompt and generates a prioritized task queue for grasping of relevant objects in the scene. Grounding DINO and dynamic A* planning algorithm are used to navigate and safely relocate the object. To ensure safe and natural interaction during the handover phase, the system employs a human-centric controller driven by MediaPipe. This module provides real-time human pose estimation, allowing the drone to employ visual servoing to maintain a stable, distinct position directly in front of the user, facilitating a comfortable handover. We demonstrate the system's efficacy through real-world experiments for localization and navigation, which resulted in a 0.164m, 0.070m, and 0.084m of max, mean euclidean, and root-mean squared errors, respectively, highlighting the feasibility of VLA for aerial manipulation operations.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
Warm-Starting Collision-Free Model Predictive Control With Object-Centric Diffusion
Acting in cluttered environments requires predicting and avoiding collisions while still achieving precise control. Conventional optimization-based controllers can enforce physical constraints, but they struggle to produce feasible solutions quickly when many obstacles are present. Diffusion models can generate diverse trajectories around obstacles, yet prior approaches lacked a general and efficient way to condition them on scene structure. In this paper, we show that combining diffusion-based warm-starting conditioned with a latent object-centric representation of the scene and with a collision-aware model predictive controller (MPC) yields reliable and efficient motion generation under strict time limits. Our approach conditions a diffusion transformer on the system state, task, and surroundings, using an object-centric slot attention mechanism to provide a compact obstacle representation suitable for control. The sampled trajectories are refined by an optimal control problem that enforces rigid-body dynamics and signed-distance collision constraints, producing feasible motions in real time. On benchmark tasks, this hybrid method achieved markedly higher success rates and lower latency than sampling-based planners or either component alone. Real-robot experiments with a torque-controlled Panda confirm reliable and safe execution with MPC.
comment: An open-source implementation is provided https://ahaffemayer.github.io/diffusion_warmstart_slot/
VR$^2$: A Co-Located Dual-Headset Platform for Touch-Enabled Human-Robot Interaction Research
Social-physical human-robot interaction (HRI) is difficult to study: building and programming robots integrating multiple interaction modalities is costly and slow, while VR-based prototypes often lack physical contact capabilities, breaking the visuo-tactile expectations of the user. We present VR2VR, a co-located dual-VR-headset platform for HRI research in which a participant and a hidden operator share the same physical space while experiencing different virtual embodiments. The participant sees an expressive virtual robot that interacts face-to-face in a shared virtual environment. In real time, the robot's upper-body movements, head and gaze behaviors, and facial expressions are mapped from the operator's tracked limbs and face signals. Since the operator is physically co-present and calibrated into the same coordinate frame, the operator can also touch the participant, enabling the participant to perceive robot touch synchronized with the visual perception of the robot's hands on their hands: the operator's finger and hand motion is mapped to the robot avatar using inverse kinematics to support precise contact. Beyond faithful motion retargeting for limb control, our VR2VR system supports social retargeting of multiple nonverbal cues, which can be experimentally varied and investigated while keeping the physical interaction constant. We detail the system design, calibration workflow, and safety considerations, and demonstrate how the platform can be used for experimentation and data collection in a touch-based Wizard-of-Oz HRI study, thus illustrating how VR2VR lowers barriers for rapidly prototyping and rigorously evaluating embodied, contact-based robot behaviors.
comment: 7 pages, 4 figures
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PbRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of discriminable queries available for learning preferences. This paper identifies preference discriminability, which quantifies how easily a human can judge which trajectory is closer to their ideal behavior, as a key metric for improving query efficiency. To address this, we move beyond comparisons within a single policy and instead generate queries by comparing trajectories from multiple policies, as training them from scratch promotes diversity without policy bias. We propose Discriminability-Aware Policy-to-Policy Preference-Based Efficient Reinforcement Learning (DAPPER), which integrates preference discriminability with trajectory diversification achieved by multiple policies. DAPPER trains new policies from scratch after each reward update and employs a discriminator that learns to estimate preference discriminability, enabling the prioritized sampling of more discriminable queries. During training, it jointly maximizes the preference reward and preference discriminability score, encouraging the discovery of highly rewarding and easily distinguishable policies. Experiments in simulated and real-world legged robot environments demonstrate that DAPPER outperforms previous methods in query efficiency, particularly under challenging preference discriminability conditions. A supplementary video that facilitates understanding of the proposed framework and its experimental results is available at: https://youtu.be/lRwX8FNN8n4
comment: Accepted for IEEE Robotics & Automation Magazine (RAM)
Teaching Robots Like Dogs: Learning Agile Navigation from Luring, Gesture, and Speech
In this work, we aim to enable legged robots to learn how to interpret human social cues and produce appropriate behaviors through physical human guidance. However, learning through physical engagement can place a heavy burden on users when the process requires large amounts of human-provided data. To address this, we propose a human-in-the-loop framework that enables robots to acquire navigational behaviors in a data-efficient manner and to be controlled via multimodal natural human inputs, specifically gestural and verbal commands. We reconstruct interaction scenes using a physics-based simulation and aggregate data to mitigate distributional shifts arising from limited demonstration data. Our progressive goal cueing strategy adaptively feeds appropriate commands and navigation goals during training, leading to more accurate navigation and stronger alignment between human input and robot behavior. We evaluate our framework across six real-world agile navigation scenarios, including jumping over or avoiding obstacles. Our experimental results show that our proposed method succeeds in almost all trials across these scenarios, achieving a 97.15% task success rate with less than 1 hour of demonstration data in total.
comment: 10 pages, 7 figures
PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization
Recently, diffusion models have gained popularity and attention in trajectory optimization due to their capability of modeling multi-modal probability distributions. However, addressing nonlinear equality constraints, i.e, dynamic feasibility, remains a great challenge in diffusion-based trajectory optimization. Recent diffusion-based trajectory optimization frameworks rely on a single-shooting style approach where the denoised control sequence is applied to forward propagate the dynamical system, which cannot explicitly enforce constraints on the states and frequently leads to sub-optimal solutions. In this work, we propose a novel direct trajectory optimization approach via model-based diffusion, which directly generates a sequence of states. To ensure dynamic feasibility, we propose a gradient-free projection mechanism that is incorporated into the reverse diffusion process. Our results show that, compared to a recent state-of-the-art baseline, our approach leads to zero dynamic feasibility error and approximately 4x higher success rate in a quadrotor waypoint navigation scenario involving dense static obstacles.
comment: Accepted for publication at the 2026 American Control Conference
Who Is Responsible? Self-Adaptation Under Multiple Concurrent Uncertainties With Unknown Sources in Complex ROS-Based Systems
Robotic systems increasingly operate in dynamic, unpredictable environments, where tightly coupled sensors and software modules increase the probability of a single fault cascading across components and admitting multiple plausible strategies to resolve the underlying uncertainty. Most existing self-adaptive approaches that have been applied to robotics assume predefined one-to-one uncertainty-to-adaptation mappings. We present a ROS2-based self-adaptive approach building upon the MAPE-K feedback loop that addresses (1) multiple simultaneous uncertainties with differing criticality, (2) cascading uncertainties across components, and (3) multiple plausible resolving strategies per detected symptom. Central to our approach is an adaptation rule set which lets designers specify uncertainty patterns, assign criticality levels, and enumerate multiple plausible adaptation strategies. This rule set, combined with an automatically extracted live ROS2 dependency graph, enables lightweight root-cause analysis and strategy ranking to prioritize minimal and effective adaptations. Evaluations on an underwater robot scenario and a perception use case show that our approach can identify root causes among concurrent uncertainties, favours inexpensive adaptations, reduces unnecessary adaptations, and achieves performance comparable to existing baselines designed for sequential uncertainties. The code is publicly available.
comment: submitted to ACSOS 2025
Multiagent Systems
From Who They Are to How They Act: Behavioral Traits in Generative Agent-Based Models of Social Media
Generative Agent-Based Modeling (GABM) leverages Large Language Models to create autonomous agents that simulate human behavior in social media environments, demonstrating potential for modeling information propagation, influence processes, and network phenomena. While existing frameworks characterize agents through demographic attributes, personality traits, and interests, they lack mechanisms to encode behavioral dispositions toward platform actions, causing agents to exhibit homogeneous engagement patterns rather than the differentiated participation styles observed on real platforms. In this paper, we investigate the role of behavioral traits as an explicit characterization layer to regulate agents' propensities across posting, re-sharing, commenting, reacting, and inactivity. Through large-scale simulations involving 980 agents and validation against real-world social media data, we demonstrate that behavioral traits are essential to sustain heterogeneous, profile-consistent participation patterns and enable realistic content propagation dynamics through the interplay of amplification- and interaction-oriented profiles. Our findings establish that modeling how agents act-not only who they are-is necessary for advancing GABM as a tool for studying social media phenomena.
An Agentic Operationalization of DISARM for FIMI Investigation on Social Media
The interoperability of data and intelligence across allied partners and their respective end-user groups is considered a foundational enabler to the collective defense capability--both conventional and hybrid--of NATO countries. Foreign Information Manipulation and Interference (FIMI) and related hybrid activities are conducted across various societal dimensions and infospheres, posing an ever greater challenge to the characterization of threats, sustaining situational awareness, and response coordination. Recent advances in AI have further led to the decreasing cost of AI-augmented trolling and interference activities, such as through the generation and amplification of manipulative content. Despite the introduction of the DISARM framework as a standardized metadata and analytical framework for FIMI, operationalizing it at the scale of social media remains a challenge. We propose a framework-agnostic agent-based operationalization of DISARM to investigate FIMI on social media. We develop a multi-agent pipeline in which specialized agentic AI components collaboratively (1) detect candidate manipulative behaviors, and (2) map these behaviors onto standard DISARM taxonomies in a transparent manner. We evaluated the approach on two real-world datasets annotated by domain practitioners. We demonstrate that our approach is effective in scaling the predominantly manual and heavily interpretive work of FIMI analysis, providing a direct contribution to enhancing the situational awareness and data interoperability in the context of operating in media and information-rich settings.
Multi-Agent Constraint Factorization Reveals Latent Invariant Solution Structure
Multi-agent systems (MAS) composed of large language models often exhibit improved problem-solving performance despite operating on identical information. In this work, we provide a formal explanation for this phenomenon grounded in operator theory and constrained optimization. We model each agent as enforcing a distinct family of validity constraints on a shared solution state, and show that a MAS implements a factorized composition of constraint-enforcement operators. Under mild conditions, these dynamics converge to invariant solution sets defined by the intersection of agent constraint sets. Such invariant structures are generally not dynamically accessible to a single agent applying all constraints simultaneously, even when expressive capacity and information are identical. We extend this result from exact constraint enforcement to soft constraints via proximal operators, and apply the formalism to contemporary text-based dialog systems.
Game-Theoretic Lens on LLM-based Multi-Agent Systems
Large language models (LLMs) have demonstrated strong reasoning, planning, and communication abilities, enabling them to operate as autonomous agents in open environments. While single-agent systems remain limited in adaptability and coordination, recent progress has shifted attention toward multi-agent systems (MAS) composed of interacting LLMs that pursue cooperative, competitive, or mixed objectives. This emerging paradigm provides a powerful testbed for studying social dynamics and strategic behaviors among intelligent agents. However, current research remains fragmented and lacks a unifying theoretical foundation. To address this gap, we present a comprehensive survey of LLM-based multi-agent systems through a game-theoretic lens. By organizing existing studies around the four key elements of game theory: players, strategies, payoffs, and information, we establish a systematic framework for understanding, comparing, and guiding future research on the design and analysis of LLM-based MAS.
comment: 9 pages, 5 figures
INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems
The rapid advancement of Large Language Model (LLM)-based Multi-Agent Systems (MAS) has introduced significant security vulnerabilities, where malicious influence can propagate virally through inter-agent communication. Conventional safeguards often rely on a binary paradigm that strictly distinguishes between benign and attack agents, failing to account for infected agents i.e., benign entities converted by attack agents. In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category. By leveraging infection-aware detection and topological constraints, INFA-Guard accurately localizes attack sources and infected ranges. During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity. Extensive experiments demonstrate that INFA-Guard achieves state-of-the-art performance, reducing the Attack Success Rate (ASR) by an average of 33%, while exhibiting cross-model robustness, superior topological generalization, and high cost-effectiveness.
Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems
Graph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits.
MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks
While multi-agent systems (MAS) promise elevated intelligence through coordination of agents, current approaches to automatic MAS design under-deliver. Such shortcomings stem from two key factors: (1) methodological complexity - agent orchestration is performed using sequential, code-level execution that limits global system-level holistic reasoning and scales poorly with agent complexity - and (2) efficacy uncertainty - MAS are deployed without understanding if there are tangible benefits compared to single-agent systems (SAS). We propose MAS-Orchestra, a training-time framework that formulates MAS orchestration as a function-calling reinforcement learning problem with holistic orchestration, generating an entire MAS at once. In MAS-Orchestra, complex, goal-oriented sub-agents are abstracted as callable functions, enabling global reasoning over system structure while hiding internal execution details. To rigorously study when and why MAS are beneficial, we introduce MASBENCH, a controlled benchmark that characterizes tasks along five axes: Depth, Horizon, Breadth, Parallel, and Robustness. Our analysis reveals that MAS gains depend critically on task structure, verification protocols, and the capabilities of both orchestrator and sub-agents, rather than holding universally. Guided by these insights, MAS-Orchestra achieves consistent improvements on public benchmarks including mathematical reasoning, multi-hop QA, and search-based QA. Together, MAS-Orchestra and MASBENCH enable better training and understanding of MAS in the pursuit of multi-agent intelligence.
comment: Preprint; Work in Progress
Agent Identity URI Scheme: Topology-Independent Naming and Capability-Based Discovery for Multi-Agent Systems
Multi-agent systems face a fundamental architectural flaw: agent identity is bound to network location. When agents migrate between providers, scale across instances, or federate across organizations, URI-based identity schemes break references, fragment audit trails, and require centralized coordination. We propose the agent:// URI scheme, which decouples identity from topology through three orthogonal components: a trust root establishing organizational authority, a hierarchical capability path enabling semantic discovery, and a sortable unique identifier providing stable reference. The scheme enables capability-based discovery through DHT key derivation, where queries return agents by what they do rather than where they are. Trust-root scoping prevents cross-organization pollution while permitting federation when desired. Cryptographic attestation via PASETO tokens binds capability claims to agent identity, enabling verification without real-time contact with the issuing authority. We evaluate the scheme across four dimensions: capability expressiveness (100% coverage on 369 production tools with zero collision), discovery precision (F1=1.0 across 10,000 agents), identity stability (formal proofs of migration invariance), and performance (all operations under 5 microseconds). The agent:// URI scheme provides a formally-specified, practically-evaluated foundation for decentralized agent identity and capability-based discovery.
comment: 19 pages, 6 tables. Feedback welcome on DHT incentive models and capability mapping service design
MiRAGE: A Multiagent Framework for Generating Multimodal Multihop Question-Answer Dataset for RAG Evaluation ACL
The rapid evolution of Retrieval-Augmented Generation (RAG) toward multimodal, high-stakes enterprise applications has outpaced the development of domain specific evaluation benchmarks. Existing datasets often rely on general-domain corpora or purely textual retrieval, failing to capture the complexity of specialized technical documents where information is inextricably multimodal and reasoning requires synthesizing disjoint evidence. We address this gap by introducing MiRAGE, a Multiagent framework for RAG systems Evaluation, that leverages a collaborative swarm of specialized agents to generate verified, domain-specific, multimodal, and multi-hop Question-Answer datasets. MiRAGE orchestrates a swarm of specialized agents: a recursive context optimization loop to aggregate scattered evidence, an adversarial verifier agent to guarantee factual grounding, and an agent to recognize the expert persona and the relevant domain to mimic expert cognitive workflows. Extensive empirical evaluation across four distinct domains (regulations, finance, quantitative biology, and journalism) demonstrates that MiRAGE generates datasets with significantly higher reasoning complexity (>2.3 average hops) and factual faithfulness. Our ablation studies point that MiRAGE can be powered by LLMs if textual descriptions of the images are available. Visual grounding still remains a frontier. By automating the creation of gold standard evaluation datasets that reflect the latent thematic structure of proprietary corpora, MiRAGE provides the necessary infrastructure to rigorously benchmark the next generation information retrieval systems.
comment: 12 pages, 2 figures, Submitted to ACL
Computational Foundations for Strategic Coopetition: Formalizing Collective Action and Loyalty
Mixed-motive multi-agent settings are rife with persistent free-riding because individual effort benefits all members equally, yet each member bears the full cost of their own contribution. Classical work by Holmström established that under pure self-interest, Nash equilibrium is universal shirking. While i* represents teams as composite actors, it lacks scalable computational mechanisms for analyzing how collective action problems emerge and resolve in coopetitive settings. This technical report extends computational foundations for strategic coopetition to team-level dynamics, building on companion work formalizing interdependence/complementarity (arXiv:2510.18802) and trust dynamics (arXiv:2510.24909). We develop loyalty-moderated utility functions with two mechanisms: loyalty benefit (welfare internalization plus intrinsic contribution satisfaction) and cost tolerance (reduced effort burden for loyal members). We integrate i* structural dependencies through dependency-weighted team cohesion, connecting member incentives to team-level positioning. The framework applies to both human teams (loyalty as psychological identification) and multi-agent systems (alignment coefficients and adjusted cost functions). Experimental validation across 3,125 configurations demonstrates robust loyalty effects (15.04x median effort differentiation). All six behavioral targets achieve thresholds: free-riding baseline (96.5%), loyalty monotonicity (100%), effort differentiation (100%), team size effect (100%), mechanism synergy (99.5%), and bounded outcomes (100%). Empirical validation using published Apache HTTP Server (1995-2023) case study achieves 60/60 points, reproducing contribution patterns across formation, growth, maturation, and governance phases. Statistical significance confirmed at p<0.001, Cohen's d=0.71.
comment: 68 pages, 22 figures. Third technical report in research program; should be read with companion arXiv:2510.18802 and arXiv:2510.24909. Adapts and extends complex actor material from Pant (2021) doctoral dissertation, University of Toronto
Constrained Black-Box Attacks Against Cooperative Multi-Agent Reinforcement Learning
Collaborative multi-agent reinforcement learning has rapidly evolved, offering state-of-the-art algorithms for real-world applications, including sensitive domains. However, a key challenge to its widespread adoption is the lack of a thorough investigation into its vulnerabilities to adversarial attacks. Existing work predominantly focuses on training-time attacks or unrealistic scenarios, such as access to policy weights or the ability to train surrogate policies. In this paper, we investigate new vulnerabilities under more challenging and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents. We also consider scenarios where the adversary has no access at all (no observations, actions, or weights). Our main approach is to generate perturbations that intentionally misalign how victim agents see their environment. Our approach is empirically validated on three benchmarks and 22 environments, demonstrating its effectiveness across diverse algorithms and environments. Furthermore, we show that our algorithm is sample-efficient, requiring only 1,000 samples compared to the millions needed by previous methods.
Sequential Causal Normal Form Games: Theory, Computation, and Strategic Signaling AAAI 2026
Can classical game-theoretic frameworks be extended to capture the bounded rationality and causal reasoning of AI agents? We investigate this question by extending Causal Normal Form Games (CNFGs) to sequential settings, introducing Sequential Causal Multi-Agent Systems (S-CMAS) that incorporate Pearl's Causal Hierarchy across leader-follower interactions. While theoretically elegant -- we prove PSPACE-completeness, develop equilibrium refinements, and establish connections to signaling theory -- our comprehensive empirical investigation reveals a critical limitation: S-CNE provides zero welfare improvement over classical Stackelberg equilibrium across all tested scenarios. Through 50+ Monte Carlo simulations and hand-crafted synthetic examples, we demonstrate that backward induction with rational best-response eliminates any strategic advantage from causal layer distinctions. We construct a theoretical example illustrating conditions where benefits could emerge ($ε$-rational satisficing followers), though implementation confirms that even relaxed rationality assumptions prove insufficient when good instincts align with optimal play. This negative result provides valuable insight: classical game-theoretic extensions grounded in rational choice are fundamentally incompatible with causal reasoning advantages, motivating new theoretical frameworks beyond standard Nash equilibrium for agentic AI.
comment: AAAI 2026 Workshop on Foundations of Agentic Systems Theory
Robust Verification of Concurrent Stochastic Games
Autonomous systems often operate in multi-agent settings and need to make concurrent, strategic decisions, typically in uncertain environments. Verification and control problems for these systems can be tackled with concurrent stochastic games (CSGs), but this model requires transition probabilities to be precisely specified - an unrealistic requirement in many real-world settings. We introduce *robust CSGs* and their subclass *interval CSGs* (ICSGs), which capture epistemic uncertainty about transition probabilities in CSGs. We propose a novel framework for *robust* verification of these models under worst-case assumptions about transition uncertainty. Specifically, we develop the underlying theoretical foundations and efficient algorithms, for finite- and infinite-horizon objectives in both zero-sum and nonzero-sum settings, the latter based on (social-welfare optimal) Nash equilibria. We build an implementation in the PRISM-games model checker and demonstrate the feasibility of robust verification of ICSGs across a selection of large benchmarks.
comment: Extended version of a paper accepted to TACAS 2026. Main text: 17 pages, 2 figures, 2 tables; Appendix: 37 pages, 3 figures, 3 tables. Minor revisions and clarifications to the appendix; no changes to results
Memp: Exploring Agent Procedural Memory
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions, and explore the impact of different strategies for Build, Retrieval, and Update of procedural memory. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and ALFWorld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model can also yield substantial performance gains. Code is available at https://github.com/zjunlp/MemP.
comment: Work in progress
What Makes AI Research Replicable? Executable Knowledge Graphs as Scientific Knowledge Representations
Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a pluggable, paper-centric knowledge base that automatically integrates code snippets and technical insights extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code is available at https://github.com/zjunlp/xKG.
comment: Work in progress
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System
Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions. This approach not only mitigates limitations inherent in individual studies but also facilitates novel discoveries through integrated data analysis. Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction, which demands substantial human effort and time. However, while LLM-based methods can accelerate certain stages, they still face significant challenges, such as hallucinations in paper screening and data extraction. In this paper, we propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls. The hybrid review, hierarchical extraction, self-proving, and feedback checking strategies implemented in Manalyzer significantly alleviate these two hallucinations. To comprehensively evaluate the performance of meta-analysis, we construct a new benchmark comprising 729 papers across 3 domains, encompassing text, image, and table modalities, with over 10,000 data points. Extensive experiments demonstrate that Manalyzer achieves significant performance improvements over the LLM baseline in multi meta-analysis tasks. Project page: https://black-yt.github.io/meta-analysis-page/ .
Enabling Agents to Communicate Entirely in Latent Space
While natural language is the de facto communication medium for LLM-based agents, it presents a fundamental constraint. The process of downsampling rich, internal latent states into discrete tokens inherently limits the depth and nuance of information that can be transmitted, thereby hindering collaborative problem-solving. Inspired by telepathy, which bypasses symbolic language in communication, we propose Interlat (Inter-agent Latent Space Communication), a paradigm that leverages the continuous last hidden states of an LLM as a representation of its thought for direct communication (termed latent communication). An additional learned compression process further compresses latent communication via latent space reasoning. Experiments demonstrate that Interlat outperforms both fine-tuned chain-of-thought (CoT) prompting and single-agent baselines, even across heterogeneous models, promoting more exploratory behavior and enabling genuine utilization of latent information. Further compression not only substantially accelerates inference by up to 24 times but also maintains competitive performance through an efficient information-preserving mechanism. We position this work as a feasibility study of entirely latent space inter-agent communication, and our results highlight its potential, offering valuable insights for future research.
comment: Work in progess
Where do We Poop? City-Wide Simulation of Defecation Behavior for Wastewater-Based Epidemiology
Wastewater surveillance, which regularly examines the pathogen biomarkers in wastewater samples, is a valuable tool for monitoring infectious diseases circulating in communities. Yet, most wastewater-based epidemiology methods, which use wastewater surveillance results for disease inferences, implicitly assume that individuals excrete only at their residential locations and that the population contribute to wastewater samples are static. These simplifying assumptions ignore daily mobility, social interactions, and heterogeneous toilet use behavior patterns, which can lead to biased interpretation of wastewater results, especially at upstream sampling locations such as neighborhoods, institutions, or buildings. Here, we introduce an agent-based geospatial simulation framework: Building on an established Patterns of Life model, we simulate daily human activities, mobility, and social contacts within a realistic urban environment and extend this agent-based framework with a physiologically motivated defecation cycle and toilet usage patterns. We couple this behavioral model with an infectious disease model to simulate transmissions through spatial and social interactions. When a defecation occurs for an infected agent, we use a pathogen shedding model to determine the amount of pathogen shed in the feces. Such a framework, integrating population mobility, disease transmission, toilet use behavior, and pathogen shedding models, is capable to simulate the Spatial-temporal dynamics of wastewater signals for a city. Using a case study of 10,000 simulated agents in Fulton County, Georgia, we examine how varying infection rates alter epidemic trajectories, pathogen loads in wastewater, and the spatial distribution of contamination across time.
Systems and Control (EESS)
TTCBF: A Truncated Taylor Control Barrier Function for High-Order Safety Constraints
Control Barrier Functions (CBFs) enforce safety by rendering a prescribed safe set forward invariant. However, standard CBFs are limited to safety constraints with relative degree one, while High-Order CBF (HOCBF) methods address higher relative degree at the cost of introducing a chain of auxiliary functions and multiple class K functions whose tuning scales with the relative degree. In this paper, we introduce a Truncated Taylor Control Barrier Function (TTCBF), which generalizes standard discrete-time CBFs to consider high-order safety constraints and requires only one class K function, independent of the relative degree. We also propose an adaptive variant, adaptive TTCBF (aTTCBF), that optimizes an online gain on the class K function to improve adaptability, while requiring fewer control design parameters than existing adaptive HOCBF variants. Numerical experiments in a relative-degree-six spring-mass system and a cluttered corridor navigation validate the above theoretical findings.
Stochastic EMS for Optimal 24/7 Carbon-Free Energy Operations
This paper proposes a two-stage stochastic optimization formulation to determine optimal operation and procurement plans for achieving a 24/7 carbon-free energy (CFE) compliance at minimized cost. The system in consideration follows primary energy technologies in Thailand including solar power, battery storage, and a diverse portfolio of renewable and carbon-based energy procurement sources. Unlike existing literature focused on long-term planning, this study addresses near real-time operations using a 15-minute resolution. A novel feature of the formulation is the explicit treatment of CFE compliance as a model parameter, enabling flexible targets such as a minimum percentage of hourly matching or a required number of carbon-free days within a multi-day horizon. The mixed-integer linear programming formulation accounts for uncertainties in load and solar generation by integrating deep learning-based forecasting within a receding horizon framework. By optimizing battery profiles and multi-source procurement simultaneously, the proposed system provides a feasible pathway for transitioning to carbon-free operations in emerging energy markets.
comment: 23 pages
Instantaneous Frequency in Power Systems using the Teager-Kaiser Energy Operator
This paper develops an instantaneous-frequency (IF) local estimator calculated with the complex Teager-Kaiser energy operator (CTKEO) and the dynamic-signal identity. The contribution is a novel IF expression that makes the envelope-curvature terms explicit, thus correcting the bias that affects conventional estimators used in power systems. The estimator aligns with complex-frequency (CF) kinematics and admits a geometric interpretation (curvature/torsion) without phase unwrapping. Simulations and data-driven examples demonstrate the accuracy of the proposed approach.
The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems
Modern CI/CD pipelines integrating agent-generated code exhibit a structural failure in responsibility attribution. Decisions are executed through formally correct approval processes, yet no entity possesses both the authority to approve those decisions and the epistemic capacity to meaningfully understand their basis. We define this condition as responsibility vacuum: a state in which decisions occur, but responsibility cannot be attributed because authority and verification capacity do not coincide. We show that this is not a process deviation or technical defect, but a structural property of deployments where decision generation throughput exceeds bounded human verification capacity. We identify a scaling limit under standard deployment assumptions, including parallel agent generation, CI-based validation, and individualized human approval gates. Beyond a throughput threshold, verification ceases to function as a decision criterion and is replaced by ritualized approval based on proxy signals. Personalized responsibility becomes structurally unattainable in this regime. We further characterize a CI amplification dynamic, whereby increasing automated validation coverage raises proxy signal density without restoring human capacity. Under fixed time and attention constraints, this accelerates cognitive offloading in the broad sense and widens the gap between formal approval and epistemic understanding. Additional automation therefore amplifies, rather than mitigates, the responsibility vacuum. We conclude that unless organizations explicitly redesign decision boundaries or reassign responsibility away from individual decisions toward batch- or system-level ownership, responsibility vacuum remains an invisible but persistent failure mode in scaled agent deployments.
Electrical Design of a Clean Offshore Heat and Power (CleanOFF) Hub
This paper presents an innovative offshore solution where oil & gas platform clusters are powered by a wind farm and a hydrogen hub. The results show a feasible off-grid design as an alternative to conventional electrification solutions. To address the challenges of design and operation of such a system, a power system model of the equipment and control was developed in a power system simulator called Process Power Simulator (PPSim). Power fluctuations in the wind farm are modelled using a state-of-the-art method encompassing turbulence and wakes. Various operation scenarios were used to evaluate the system design and find the right equipment size. An expensive component to over dimension is the battery energy storage system (BESS). The BESS power rating and energy capacity were found by running a combination of scenarios with extreme and natural wind variations, and contingencies. The control strategy and ramp rates of electrolyzers have significant impact on both system performance and design. A ramp rate in the order of seconds as opposed to minutes will decrease the required BESS size by 60-70%. Choosing synchronized control of the electrolyzers can further reduce the BESS size by 15-20%. The simulations also revealed challenges to achieve self-sufficiency of hydrogen and potential design improvements are suggested.
Stealthy bias injection attack detection based on Kullback-Leibler divergence in stochastic linear systems
This paper studies the design of detection observers against stealthy bias injection attacks in stochastic linear systems under Gaussian noise, considering adversaries that exploit noise and inject crafted bias signals into a subset of sensors in a slow and coordinated manner, thereby achieving malicious objectives while remaining stealthy. To address such attacks, we formulate the observer design as a max-min optimization problem to enhance the detectability of worst-case BIAs, which attain a prescribed attack impact with the least detectability evaluated via Kullback-Leibler divergence. To reduce the computational complexity of the derived non-convex design problem, we consider the detectability of worst-case BIAs at three specific time instants: attack onset, one step after attack occurrence, and the steady state. We prove that the Kalman filter is optimal for maximizing the BIA detectability at the attack onset, regardless of the subset of attacked sensors. For the one-step and steady-state cases, the observer design problems are approximated by bi-convex optimization problems, which can be efficiently solved using alternating optimization and alternating direction method of multipliers. Moreover, more tractable linear matrix inequality relaxations are developed. Finally, the effectiveness of the proposed stealth-aware detection framework is demonstrated through an application to a thermal system.
comment: 26 pages, 10 figures
Rank-one Riemannian Subspace Descent for Nonlinear Matrix Equations
We propose a rank-one Riemannian subspace descent algorithm for computing symmetric positive definite (SPD) solutions to nonlinear matrix equations arising in control theory, dynamic programming, and stochastic filtering. For solution matrices of size $n\times n$, standard approaches for dense matrix equations typically incur $\mathcal{O}(n^3)$ cost per-iteration, while the efficient $\mathcal{O}(n^2)$ methods either rely on sparsity or low-rank solutions, or have iteration counts that scale poorly. The proposed method entails updating along the dominant eigen-component of a transformed Riemannian gradient, identified using at most $\mathcal{O}(\log(n))$ power iterations. The update structure also enables exact step-size selection in many cases at minimal additional cost. For objectives defined as compositions of standard matrix operations, each iteration can be implemented using only matrix--vector products, yielding $\mathcal{O}(n^2)$ arithmetic cost. We prove an $\mathcal{O}(n)$ iteration bound under standard smoothness assumptions, with improved bounds under geodesic strong convexity. Numerical experiments on large-scale CARE, DARE, and other nonlinear matrix equations show that the proposed algorithm solves instances (up to $n=10{,}000$ in our tests) for which the compared solvers, including MATLAB's \texttt{icare}, structure-preserving doubling, and subspace-descent baselines fail to return a solution. These results demonstrate that rank-one manifold updates provide a practical approach for high-dimensional and dense SPD-constrained matrix equations. MATLAB code implementation is publicly available on GitHub : \href{https://github.com/yogeshd-iitk/nonlinear_matrix_equation_R1RSD}{\textcolor{blue}{https://github.com/yogeshd-iitk/nonlinear\_matrix \_equation\_R1RSD}}
Practical prescribed-time prescribed performance control with asymptotic convergence -- A vanishing sigma-modification approach
In this paper, we present a method capable of ensuring practical prescribed-time control with guaranteed performance for a class of nonlinear systems in the presence of time-varying parametric and dynamic uncertainties, and uncertain control coefficients. Our design consists of two key steps. First, we construct a performance-rate function that freezes at and after a user-specified time T, playing a crucial role in achieving desired precision within prescribed time T and dealing with unmodeled dynamics. Next, based on this function and a sigma-modification strategy in which the leakage term starts to vanish at t > T, we develop an adaptive dynamic surface control framework to reduce control complexity, deal with uncertainties, ensure prescribed performance, practical prescribed-time convergence to a specific region, and ultimately achieve asymptotic convergence. The effectiveness of the proposed control method is validated through numerical simulations.
Contingency Planning for Safety-Critical Autonomous Vehicles: A Review and Perspectives
Contingency planning is the architectural capability that enables autonomous vehicles (AVs) to anticipate and mitigate discrete, high-impact hazards, such as sensor outages and adversarial interactions. This paper presents a comprehensive survey of the field, synthesizing fragmented literature into a unified logic-conditioned hybrid control framework. Within this formalism, we categorize approaches into two distinct paradigms: Reactive Safety, which responds to realized hazards by enforcing safety constraints or executing fail-safe maneuvers; and Proactive Safety, which optimizes for future recourse by branching over potential modal transitions. In addition, we propose a fine-grained taxonomy that partitions the landscape into external contingencies (environmental and interactive hazards) and internal contingencies (system faults). Through a critical comparative analysis, we reveal a fundamental structural divergence: internal faults are predominantly addressed via reactive fail-safe mechanisms, whereas external interaction uncertainties increasingly require proactive branching strategies. Furthermore, we identify a critical methodological divergence: whereas physical hazards are typically managed with formal guarantees, semantic and out-of-distribution anomalies currently rely heavily on empirical validation. We conclude by identifying the open challenges in bridging the gap between theoretical guarantees and practical validation, advocating for hybrid architectures and standardized benchmarking to transition contingency planning from formulation to certifiable real-world deployment.
comment: 23 pages, 6 figures
Blended Dynamics and Emergence in Open Quantum Networks
In this paper, we develop a blended dynamics framework for open quantum networks with diffusive couplings. The network consists of qubits interconnected through Hamiltonian couplings, environmental dissipation, and consensus-like diffusive interactions. Such networks commonly arise in spontaneous emission processes and non-Hermitian quantum computing, and their evolution follows a Lindblad master equation. Blended dynamics theory is well established in the classical setting as a tool for analyzing emergent behaviors in heterogeneous networks with diffusive couplings. Its key insight is to blend the local dynamics rather than the trajectories of individual nodes. Perturbation analysis then shows that, under sufficiently strong coupling, all node trajectories tend to stay close to those of the blended system over time. We first show that this theory extends naturally to the reduced-state dynamics of quantum networks, revealing classical-like clustering phenomena in which qubits converge to a shared equilibrium or a common trajectory determined by the quantum blended reduced-state dynamics. We then extend the analysis to qubit coherent states using quantum Laplacians and induced graphs, proving orbit attraction of the network density operator toward the quantum blended coherent dynamics, establishing the emergence of intrinsically quantum and dynamically clustering behaviors. Finally, numerical examples validate the theoretical results.
Differential Privacy on Affine Manifolds: Geometrically Confined Privacy in Linear Dynamical Systems
In this paper, we present a comprehensive framework for differential privacy over affine manifolds and validate its usefulness in the contexts of differentially private cloud-based control and average consensus. We consider differential privacy mechanisms for linear queries when the input data are constrained to lie on affine manifolds, a structural property that is assumed to be available as prior knowledge to adversaries. In this setting, the definition of neighborhood adjacency must be formulated with respect to the intrinsic geometry of the manifolds. We demonstrate that such affine-manifold constraints can fundamentally alter the attainable privacy levels relative to the unconstrained case. In particular, we derive necessary and sufficient conditions under which differential privacy can be realized via structured noise injection mechanisms, wherein correlated Gaussian or Laplace noise distributions, rather than i.i.d. perturbations, are calibrated to the dataset. Based on these characterizations, we develop explicit noise calibration procedures that guarantee the tight realization of any prescribed privacy budget with a matching noise magnitude. Finally, we show that the proposed framework admits direct applications to linear dynamical systems ranging from differentially private cloud-based control to privacy-preserving average consensus, all of which naturally involve affine-manifold constraints. The established theoretical results are illustrated through numerical examples.
Hierarchical Optimization Based Multi-objective Dynamic Regulation Scheme for VANET Topology
As a core technology of intelligent transportation systems, vehicular ad-hoc networks support latency-sensitive services such as safety warning and cooperative perception via vehicle-to-everything communications. However, their highly dynamic topology increases average path length, raises latency, and reduces throughput, severely limiting communication performance. Existing topology optimization methods lack capabilities in multi-objective coordination, dynamic adaptation, and global-local synergy. To address this, this paper proposes a two-layer dynamic topology regulation scheme combining local feature aggregation and global adjustment. The scheme constructs a dynamic multi-objective optimization model integrating average path length, end-to-end latency, and network throughput, and achieves multi-index coordination via link adaptability metrics and a dynamic normalization mechanism. it quickly responds to local link changes via feature fusion of local node feature extraction and dynamic neighborhood sensing, and balances optimization accuracy and real-time performance using a dual-mode adaptive solving strategy for global topology adjustment. It reduces network oscillation risks by introducing a performance improvement threshold and a topology validity verification mechanism. Simulation results on real urban road networks via the SUMO platform show that the proposed scheme outperforms traditional methods in average path length (stabilizing at ~4 hops), end-to-end latency (remaining ~0.01 s), and network throughput.
comment: 10 pages, 6 figures. A topology optimization strategy is proposed in this paper to optimize the latency, average path length, and throughput in Vehicular Ad Hoc Networks (VANETs)
Ramping-aware Enhanced Flexibility Aggregation of Distributed Generation with Energy Storage in Power Distribution Networks
Power distribution networks are increasingly hosting controllable and flexible distributed energy resources (DERs) that, when aggregated, can provide ancillary support to transmission systems. However, existing aggregation schemes often ignore the ramping constraints of these DERs, which can render them impractical in real deployments. This work proposes a ramping-aware flexibility aggregation scheme, computed at the transmission-distribution boundary, that explicitly accounts for DER ramp limits and yields flexibility envelopes that are provably disaggregable. To further enhance the attainable flexibility region, we introduce a novel pre-ramping strategy, which proactively adjusts resource operating points to enlarge the aggregated flexibility envelope while preserving both network feasibility and disaggregation guarantees. The proposed method demonstrates a 5.2% to 19.2% improvement in flexibility relative to the baseline model, depending on system conditions. We validate the scheme on an IEEE-33 bus distribution system and provide formal proofs showing that both aggregation strategies are disaggregable for all feasible trajectories within the aggregate flexibility envelope.
comment: 10 pages, 4 figures
Efficient reformulations of ReLU deep neural networks for surrogate modelling in power system optimisation
The ongoing decarbonisation of power systems is driving an increasing reliance on distributed energy resources, which introduces complex and nonlinear interactions that are difficult to capture in conventional optimisation models. As a result, machine learning based surrogate modelling has emerged as a promising approach, but integrating machine learning models such as ReLU deep neural networks (DNNs) directly into optimisation often results in nonconvex and computationally intractable formulations. This paper proposes a linear programming (LP) reformulation for a class of convexified ReLU DNNs with non-negative weight matrices beyond the first layer, enabling a tight and tractable embedding of learned surrogate models in optimisation. We evaluate the method using a case study on learning the prosumer's responsiveness within an aggregator bidding problem in the Danish tertiary capacity market. The proposed reformulation is benchmarked against state-of-the-art alternatives, including piecewise linearisation (PWL), MIP-based embedding, and other LP relaxations. Across multiple neural network architectures and market scenarios, the convexified ReLU DNN achieves solution quality comparable to PWL and MIP-based reformulations while significantly improving computational performance and preserving model fidelity, unlike penalty-based reformulations. The results demonstrate that convexified ReLU DNNs offer a scalable and reliable methodology for integrating learned surrogate models in optimisation, with applicability to a wide range of emerging power system applications.
comment: 24 pages, 7 figures, 3 tables
A Two-Stage Risk-Averse DRO-MILP Methodological Framework for Managing AI/Data Center Demand Shocks
The rapid growth of artificial intelligence (AI)-driven data centers is reshaping electricity demand patterns. This is achieved by introducing fast, multi-gigawatt load ramps that challenge the stability and resilience of modern power systems. Traditional resilience frameworks focus mainly on physical outages and largely overlook these emerging digital-era disturbances. This paper proposes a unified two-stage, risk-aware distributionally robust optimization (DRO)-MILP framework that coordinates the pre-allocation and post-event dispatch of Flexible Capacity Modules (FCMs), including BESS, fast-ramping generation, demand response, and potential long-duration storage. Stage-I optimally positions FCMs using DRO with CVaR to hedge against uncertain AI load surges. Stage-II models real-time stabilization following stochastic demand-shock scenarios, minimizing imbalance, unserved energy, and restoration penalties. The framework is designed to be applied on IEEE 33-bus system or expanded for scalability to larger IEEE test feeders capable of representing AI-scale loads. This contributes a scalable planning tool for resilient, AI-integrated distribution grids.
comment: Submitted to the 2026 IEEE Power and Energy Society General Meeting
Calibrated uncertainty quantification for prosumer flexibility aggregation in ancillary service markets
Reliable forecasting of prosumer flexibility is critical for demand response aggregators participating in frequency controlled ancillary services market, where strict reliability requirements such as the P90 standard are enforced. Limited historical data, dependence on exogeneous factors, and heterogenous prosumer behaviour introduce significant epistemic uncertainty, making deterministic or poorly calibrated probabilistic models unsuitable for market bidding. This paper proposes the use of scalable uncertainty quantification framework that integrates Monte Carlo dropout (MCD) with conformal prediction (CP) to produce calibrated, finite sample prediction intervals for aggregated prosumer flexibility. The proposed framework is applied to a behind-the-meter aggregator participating in the Danish manual frequency restoration reserve capacity market. A large-scale synthetic dataset is generated using a modified industry-grade home energy management system, combined with publicly available load, solar, price, activation and device-level data. The resulting machine learning surrogate model captures aggregate prosumer price responsiveness and provides uncertainty-aware estimates suitable for market bidding. Multiple multivariate CP strategies are evaluated and benchmarked against conventional MCD-based methods. Results show that standalone MCD systematically overestimates available flexibility and violates P90 compliance, whereas the proposed MCD-CP framework achieves reliable coverage with controlled conservatism. When embedded in aggregator bidding model, conformalised methods substantially reduce overbidding risk and achieve upto 70% of perfect-information profit while satisfying regulatory reliability constraints, providing practical, computationally efficient, and market-compliant solution for aggregator flexibility forecasting under uncertainty.
comment: Single column 31 pages, 10 figures, 3 tables, submitted for review to Applied Energy
Input-to-State Stabilizing Neural Controllers for Unknown Switched Nonlinear Systems within Compact Sets
This paper develops a neural network based control framework that ensures system safety and input-to-state stability (ISS) for general nonlinear switched systems with unknown dynamics. Leveraging the concept of dwell time, we derive Lyapunov based sufficient conditions under which both safety and ISS of the closed-loop switched system are guaranteed. The feedback controllers and the associated Lyapunov functions are parameterized using neural networks and trained from data collected over a compact state space via deterministic sampling. To provide formal stability guarantees under the learned controllers, we introduce a validity condition based on Lipschitz continuity assumptions, which is embedded directly into the training framework. This ensures that the resulting neural network controllers satisfy provable correctness and stability guarantees beyond the sampled data. As a special case, the proposed framework recovers ISS and safety under arbitrary switching when a common Lyapunov function exists. Simulation results on a representative switched nonlinear system demonstrate the effectiveness of the proposed approach.
Specifying and Verifying RDMA Synchronisation (Extended Version)
Remote direct memory access (RDMA) allows a machine to directly read from and write to the memory of remote machine, enabling high-throughput, low-latency data transfer. Ensuring correctness of RDMA programs has only recently become possible with the formalisation of $\text{RDMA}^\text{TSO}$ semantics (describing the behaviour of RDMA networking over a TSO CPU). However, this semantics currently lacks a formalisation of remote synchronisation, meaning that the implementations of common abstractions such as locks cannot be verified. In this paper, we close this gap by presenting $\text{RDMA}^{\text{TSO}}_{\text{RMW}}$, the first semantics for remote `read-modify-write' (RMW) instructions over TSO. It turns out that remote RMW operations are weak and only ensure atomicity against other remote RMWs. We therefore build a set of composable synchronisation abstractions starting with the $\text{RDMA}^{\text{WAIT}}_{\text{RMW}}$ library. Underpinned by $\text{RDMA}^{\text{WAIT}}_{\text{RMW}}$, we then specify, implement and verify three classes of remote locks that are suitable for different scenarios. Additionally, we develop the notion of a strong RDMA model, $\text{RDMA}^{\text{SC}}_{\text{RMW}}$, which is akin to sequential consistency in shared memory architectures. Our libraries are built to be compatible with an existing set of high-performance libraries called LOCO, which ensures compositionality and verifiability.
comment: 95 pages, extended version of ESOP 2026 paper
An Ion-Intercalation Memristor for Enabling Full Parallel Writing in Crossbar Networks
Crossbar architectures have long been seen as a promising foundation for in-memory computing, using memristor arrays for high-density, energy-efficient analog computation. However, this conventional architecture suffers from a fundamental limitation: the inability to perform parallel write operations due to the sneak path problem. This arises from the structural overlap of read and write paths, forcing sequential or semi-parallel updates and severely limiting scalability. To address this, we introduce a new memristor design that decouples read and write operations at the device level. This design enables orthogonal conductive paths, and employs a reversible ion doping mechanism, inspired by lithium-ion battery principles, to modulate resistance states independently of computation. Fabricated devices exhibit near-ideal memristive characteristics and stable performance under isolated read/write conditions.
Applicability and Limitation Analysis of PMU Data and Phasor Concept for Low- and High- Frequency Oscillations
Phasor Measurement Units (PMUs) convert high-speed waveform data into low-speed phasor data, which are fundamental to wide-area monitoring and control in power systems, with oscillation detection and localization among their most prominent applications. However, representing electrical waveform signals with oscillations using PMU phasors is effective only for low-frequency oscillations. This paper investigates the root causes of this limitation, focusing on errors introduced by Discrete Fourier Transform (DFT)-based signal processing, in addition to the attenuation effects of anti-aliasing filters, and the impact of low reporting rates. To better represent and estimate waveform signals with oscillations, we propose a more general signal model and a multi-step estimation method that leverages one-cycle DFT, the Matrix Pencil Method, and the Least Squares Method. Numerical experiments demonstrate the superior performance of the proposed signal model and estimation method. Furthermore, this paper reveals that the phasor concept, let alone PMU phasors, can become invalid for waveform signals with high-frequency oscillations characterized by asymmetric sub- and super-synchronous components. These findings highlight the fundamental limitations of PMU data and phasor concept, and emphasize the need to rely on waveform data for analyzing high-frequency oscillations in modern power systems.
comment: 10 pages, 12 figures, submitted to IEEE Transactions on Power Systems
Semantics in Actuation Systems: From Age of Actuation to Age of Actuated Information
In this paper, we study the timeliness of actions in communication systems where actuation is constrained by control permissions or energy availability. Building on the Age of Actuation (AoA) metric, which quantifies the timeliness of actions independently of data freshness, we introduce a new metric, the \emph{Age of Actuated Information (AoAI)}. AoAI captures the end-to-end timeliness of actions by explicitly accounting for the age of the data packet at the moment it is actuated. We analyze and characterize both AoA and AoAI in discrete-time systems with data storage capabilities under multiple actuation scenarios. The actuator requires both a data packet and an actuation opportunity, which may be provided by a controller or enabled by harvested energy. Data packets may be stored either in a single-packet buffer or an infinite-capacity queue for future actuation. For these settings, we derive closed-form expressions for the average AoA and AoAI and investigate their structural differences. While AoA and AoAI coincide in instantaneous actuation systems, they differentiate when data buffering is present. Our results reveal counterintuitive regimes in which increasing update or actuation rates degrade action timeliness for both AoA and AoAI. Moreover, as part of the analysis, we obtain a novel closed-form characterization of the steady-state distribution of a Geo/Geo/1 queue operating under the FCFS discipline, expressed solely in terms of the queue length and the age of the head-of-line packet. The proposed metrics and analytical results provide new insights into the semantics of timeliness in systems where information ultimately serves the purpose of actuation.
comment: Submitted for possible Journal publication
Preparation and Motion Study of Magnetically Driven Micro Soft Robot Mimicking the Cownose Ray
In narrow, unstructured underwater environments such as environmental monitoring and minimally invasive medical procedures, micro soft robots exhibit unique advantages due to their flexible movement capabilities and small size. At the same time, applying bionic technology to the structural design of micro soft robots can significantly improve their swimming performance. However, limited by their miniaturization, these robots are difficult to power internally and usually adopt a wireless power supply method. This study designs and fabricates a magnetically responsive, cownose ray-inspired micro soft robot based on the swimming principle of the cownose ray. The robot is made of a certain proportion of NdFeB and PDMS. Then, a three-dimensional Helmholtz coil is used to generate an oscillating harmonic magnetic field to conduct swimming experiments on the robot, exploring the influence of magnetic field parameters on the robot's swimming performance. The experimental results show that the swimming speed is the fastest at B = 5 mT and f = 11 Hz, reaching 5.25 mm/s, which is about 0.5 body lengths per second. In addition, by adjusting the current direction and frequency of the coil, the robot can perform different swimming modes such as straight swimming, turning swimming, and directional swimming. By employing a stepwise adjustment method, the impact of response errors on the robot's trajectory can be effectively reduced. This study demonstrates a method for magnetically driven micro soft robots, laying a foundation for the application of wireless-driven robots in underwater narrow spaces.
comment: These experiments lay an important foundation for the study of tether-free control of underwater micro-soft robots. Furthermore, this research provides important references for the fields of biomimetic robots and magnetically controlled micro-soft robots
Economic and Reliability Value of Improved Offshore Wind Forecasting in Bulk Power Grid Operation: A Case Study of The New York Power Grid
This study investigates the economic and reliability benefits of improved offshore wind forecasting for grid operations along the U.S. East Coast. We introduce and evaluate a state-of-the-art, machine-learning-based offshore wind forecasting model tailored for this region by integrating its improved forecasts into a dynamic reserve procurement framework aligned with New York Independent System Operator (NYISO) practices to evaluate their economic value. To determine system-wide reserve needs, plant-specific reserves are aggregated. However, conventional methods overlook spatial correlation across sites, often leading to over procurement. To address this, we propose a risk-based reserve aggregation technique that leverages spatial diversification. Additionally, we evaluate the reliability improvements enabled by the enhanced offshore wind forecast. To evaluate the operational impact, we propose an operational resource adequacy framework that captures uncertainty from forecast errors and grid conditions. Using this framework, we quantify key reliability metrics under different offshore wind forecast scenarios. Using New York State as a case study, we find that the improved forecast enables more accurate reserve estimation, reducing procurement costs by 5.53% in 2035 scenario compared to a well-validated numerical weather prediction model. Applying the risk-based aggregation further reduces total production costs by 7.21%. From a reliability perspective, the improved forecasts lower the system Loss of Load Probability (LOLP) by approximately 19% in the 2035 scenario, highlighting its potential to enhance system reliability during real-time grid operations.
comment: The authors are withdrawing this submission to allow for further revisions and improvements
Closed-loop Uplink Radio Resource Management in CF-O-RAN Empowered 5G Aerial Corridor
In this paper, we investigate the uplink (UL) radio resource management for 5G aerial corridors with an open-radio access network (O-RAN)-enabled cell-free (CF) massive multiple-input multiple-output (mMIMO) system. Our objective is to maximize the minimum spectral efficiency (SE) by jointly optimizing unmanned aerial vehicle (UAV)-open radio unit (O-RU) association and UL transmit power under quality-of-service (QoS) constraints. Owing to its NP-hard nature, the formulated problem is decomposed into two tractable sub-problems solved via alternating optimization (AO) using two computationally efficient algorithms. We then propose (i) a QoS-driven and multi-connectivity-enabled association algorithm incorporating UAV-centric and O-RU-centric criteria with targeted refinement for weak UAVs, and (ii) a bisection-guided fixed-point power control algorithm achieving global optimality with significantly reduced complexity, hosted as xApp at the near-real-time (near-RT) RAN intelligent controller (RIC) of O-RAN. Solving the resource-allocation problem requires global channel state information (CSI), which incurs substantial measurement and signaling overhead. To mitigate this, we leverage a channel knowledge map (CKM) within the O-RAN non-RT RIC to enable efficient environment-aware CSI inference. Simulation results show that the proposed framework achieves up to 440% improvement in minimum SE, 100% QoS satisfaction and fairness, while reducing runtime by up to 99.7% compared to an interior point solver-based power allocation solution, thereby enabling O-RAN compliant real-time deployment.
comment: 6 Pages, Accepted in IEEE ICC 2026
Priority-Based Bandwidth Allocation in Network Slicing-Enabled Cell-Free Massive MIMO Systems
This paper addresses joint admission control and per-user equipment (UE) bandwidth allocation to maximize weighted sum-rate in network slicing-enabled user-centric cell-free (CF) massive multiple-input multiple-output (mMIMO) systems when aggregate quality-of-service (QoS) demand may exceed available bandwidth. Specifically, we optimize bandwidth allocation while satisfying heterogeneous QoS requirements across enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) slices in the uplink. The formulated problem is NP-hard, rendering global optimality computationally intractable. We decompose it into two sub-problems and solve them via computationally efficient heuristics within a sequential framework. We propose (i) a hierarchical admission control scheme that selectively admits UEs under bandwidth scarcity, prioritizing URLLC to ensure latency-sensitive QoS compliance, and (ii) an iterative gradient-based bandwidth allocation scheme that transfers bandwidth across slices guided by marginal utility and reallocates resources within slices. Simulation results demonstrate that the proposed scheme achieves near-optimal performance, deviating from an interior point solver-based benchmark by at most 2.2% in weighted sum-rate while reducing runtime by 99.7%, thereby enabling practical real-time deployment. Compared to a baseline round-robin scheme without admission control, the proposed approach achieves up to 1085% and 7% higher success rates for eMBB and URLLC slices, respectively, by intentionally sacrificing sum-rate to guarantee QoS. Sensitivity analysis further reveals that the proposed solution adapts effectively to diverse eMBB/URLLC traffic compositions, maintaining 47-51% eMBB and 93-94% URLLC success rates across varying load scenarios, confirming its robustness for resource-constrained large-scale deployments.
comment: 6 Pages, Accepted in IEEE ICC 2026
Network Slicing Resource Management in Uplink User-Centric Cell-Free Massive MIMO Systems
This paper addresses the joint optimization of per-user equipment (UE) bandwidth allocation and UE-access point (AP) association to maximize weighted sum-rate while satisfying heterogeneous quality-of-service (QoS) requirements across enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) slices in the uplink of a network slicing-enabled user-centric cell-free (CF) massive multiple-input multiple-output (mMIMO) system. The formulated problem is NP-hard, rendering global optimality computationally intractable. To address this challenge, it is decomposed into two sub-problems, each solved by a computationally efficient heuristic scheme, and jointly optimized through an alternating optimization framework. We then propose (i) a bandwidth allocation scheme that balances UE priority, spectral efficiency, and minimum bandwidth demand under limited resources to ensure fair QoS distribution, and (ii) a priority-based UE-AP association assignment approach that balances UE service quality with system capacity constraints. Together, these approaches provide a practical and computationally efficient solution for resource-constrained network slicing scenarios, where QoS feasibility is often violated under dense deployments and limited bandwidth, necessitating graceful degradation and fair QoS preservation rather than solely maximizing the aggregate sum-rate. Simulation results demonstrate that the proposed scheme achieves up to 52% higher weighted sum-rate, 140% and 58% higher QoS success rates for eMBB and URLLC slices, respectively, while reducing runtime by up to 97% compared to the considered benchmarks.
comment: 6 pages, Accepted in IEEE ICC 2026
A Constraint Programming Model for the Super-Agile Earth Observation Satellite Imaging Scheduling Problem
As the dependence on satellite imaging continues to grow, modern satellites have become increasingly agile, with the new generation, namely super-agile Earth observation satellites (SAEOS), providing unprecedented imaging flexibility. The highly dynamic capabilities of these satellites introduce additional challenges to the scheduling of observation tasks, as existing approaches for conventional agile satellites do not account for variable observation durations and multiple imaging directions. Although some efforts have been made in this regard, the SAEOS imaging scheduling problem (SAEOS-ISP) remains largely unexplored, and no exact approaches have yet been proposed. In this context, this study presents the first exact Constraint Programming formulation for the SAEOS-ISP, considering flexible observation windows, multiple pointing directions and sequence-dependent transition times across multiple satellites. Computational experiments on a newly generated benchmark set demonstrate that the model can be solved efficiently and within very short computational times. Moreover, the results also show that the proposed approach has the potential to achieve higher computational performance compared to the non-exact approaches that are currently considered state-of-the-art.
comment: 12 pages, 4 figures, To be published in the Proceedings of the International Conference on Operations Research and Enterprise Systems (ICORES 2026)
Robust Verification of Concurrent Stochastic Games
Autonomous systems often operate in multi-agent settings and need to make concurrent, strategic decisions, typically in uncertain environments. Verification and control problems for these systems can be tackled with concurrent stochastic games (CSGs), but this model requires transition probabilities to be precisely specified - an unrealistic requirement in many real-world settings. We introduce *robust CSGs* and their subclass *interval CSGs* (ICSGs), which capture epistemic uncertainty about transition probabilities in CSGs. We propose a novel framework for *robust* verification of these models under worst-case assumptions about transition uncertainty. Specifically, we develop the underlying theoretical foundations and efficient algorithms, for finite- and infinite-horizon objectives in both zero-sum and nonzero-sum settings, the latter based on (social-welfare optimal) Nash equilibria. We build an implementation in the PRISM-games model checker and demonstrate the feasibility of robust verification of ICSGs across a selection of large benchmarks.
comment: Extended version of a paper accepted to TACAS 2026. Main text: 17 pages, 2 figures, 2 tables; Appendix: 37 pages, 3 figures, 3 tables. Minor revisions and clarifications to the appendix; no changes to results
Asynchronous Grid Connections Providing Fast-Frequency Response: System Integration Study
This paper presents an integration study for a power electronic-based fast-frequency response technology, an asynchronous grid connection operating as an aggregator for behindthe-meter resources and distributed generators. Both technical feasibility and techno-economic viability studies are presented. The dynamic performance of the fast-frequency response enabled by the asynchronous grid connection is validated with Power Hardware-in-the-Loop experiments and transferred to an IEEE 9-bus system in DigSilent PowerFactory for dynamic stability analysis. We demonstrate that droop-based control enhancements to the local distributed generators could allow their aggregation to provide grid-supporting functionalities and participate in the market for ancillary services. To this end, we performed a long-term simulation embedding the system within the ancillary service market framework of PJM. The fast-frequency response regulation is subsequently used to calculate the potential revenue and project the results on a 15-year investment horizon. Finally, the techno-economic analysis concludes with recommendations for enhancements to access the full potential of distributed generators on a technical and regulatory level.
comment: to be published in IEEE Transactions on Power Delivery
Exact Covariance Characterization for Controlled Linear Systems subject to Stochastic Parametric and Additive Uncertainties
This work addresses the exact characterization of the covariance dynamics related to linear discrete-time systems subject to both additive and parametric stochastic uncertainties that are potentially unbounded. Using this characterization, the problem of control design for state covariance dynamics is addressed, providing conditions that are conservative yet more tractable compared to standard necessary and sufficient ones for the same class of systems. Numerical results assess this new characterization by comparing it to the empirical covariance and illustrate the control design problem.
Research on Adaptive Inertial Control in Synchronization Systems: Based on Variational Optimization Methods and Their Applications in the Stability of Complex Networks
Aiming at the core problem that it is difficult for a fixed inertia coefficient to balance transient disturbance suppression and long-term stability in complex network synchronization systems, an adaptive inertia control strategy based on variational optimization is proposed. Taking the Kuramoto model with inertia as the research carrier, the analytical expression of the time-varying inertia coefficient M(t) is strictly derived by the functional variational method, and a hierarchical control structure of "benchmark inertia + disturbance feedback" is constructed to achieve the organic unity of minimizing the vulnerability performance function H(T) and stability constraints. A multimodal decoupling control strategy based on Laplacian eigenvector projection is designed to enhance the feedback strength of the dominant mode by eigenvalue weighting, improving the control accuracy and dynamic response speed. Simulation verification is carried out in complex network systems, and the control performance of regular networks (RG), random networks (ER), small-world networks (SW), scale-free networks (SF) and spider webs (SP) under three typical disturbances of pulses, monotonic decays and oscillatory decays is systematically analyzed. The results show that the proposed strategy reduces H(T) of the five networks by 19%-25%, shortens the relaxation time by 15%-24%, and the real parts of all system eigenvalues are less than -0.25s^-1 , meeting the asymptotic stability criterion. This study provides a new theoretical framework and engineering implementation scheme for the stability control of complex network synchronization systems, which can be widely applied to fields such as power grids, communication networks, and neural networks.
comment: 36 pages, 4 figures
SRFS: Parallel Processing Fault-tolerant ROS2-based Flight Software for the Space Ranger CubeSat
Traditional Real-Time Operating Systems (RTOS) often suffer from limited parallel performance, whereas thread monitoring in Linux-based systems remains challenging. To overcome these limitations, this paper presents a satellite flight software system design based on the Robot Operating System (ROS), which utilizes its reliable built-in publish-subscribe messaging mechanism to facilitate inter-application communication. In response to the complex functional demands of modern small satellites, the proposed design integrates both hardware and software architectures, along with system scheduling and error-correction strategies. This integration supports efficient parallel data processing, enhances system reliability, and shortens the development cycle through code reuse. The system was rigorously evaluated through comprehensive tests covering time delay, system management, fault tolerance, and maintenance procedures. Experimental results confirm the system's effectiveness in telemetry, remote control, integration of new features, and autonomous error recovery. The findings underscore the high reliability and maintainability of the ROS-based satellite flight software, offering a valuable reference for the rapid development of high-performance small satellite systems.
Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems
Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics can mask this operational asymmetry. We introduce an operator-legible evaluation framework -- Under-Prediction Rate (UPR), tail Reserve$_{99.5}^{\%}$ requirements, and explicit inflation diagnostics (Bias$_{24h}$/OPR) -- to quantify one-sided reliability risk beyond MAPE. Using this framework, we evaluate state space models (Mamba variants) and strong baselines on a weather-aligned California Independent System Operator (CAISO) dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 regional transmission areas) under a rolling-origin walk-forward backtest. We develop and evaluate thermal-lag-aligned weather fusion strategies for these architectures. Our results demonstrate that standard accuracy metrics are insufficient proxies for operational safety: models with comparable MAPE can imply materially different tail reserve requirements (Reserve$_{99.5}^{\%}$). We show that explicit weather integration narrows error distributions, reducing the impact of temperature-driven demand spikes. Furthermore, while probabilistic calibration reduces large-error events, it can induce systematic schedule inflation. We introduce Bias/OPR-constrained objectives to enable auditable trade-offs between minimizing tail risk and preventing trivial over-forecasting.
comment: 30 pages, 7 figures, 9 tables
Two-Level Distributed Interference Management for Large-Scale HAPS-Empowered vHetNets
High altitude platform stations (HAPS) offer a promising solution for achieving ubiquitous connectivity in next-generation wireless networks (xG). Integrating HAPS with terrestrial networks, creating HAPS-empowered vertical heterogeneous networks (vHetNets), significantly improves coverage and capacity and supports emerging novel use cases. In HAPS-empowered vHetNets, HAPS and terrestrial network tiers can share the same spectrum, forming harmonized spectrum vHetNets that enhance spectral efficiency (SE). However, harmonized spectrum vHetNets face major challenges, including severe co-channel interference and scalability in large-scale deployments. To address the first challenge, we adopt a cell-free multiple-input multiple-output (MIMO) network architecture in which users are simultaneously served by multiple base stations using beamforming. However, beamforming weight design leads to a nonconvex, high-dimensional optimization problem, highlighting the scalability challenge. To address this second challenge, we develop a two-level distributed proportional fairness beamforming weight design (PFBWD) algorithm. This algorithm combines the augmented Lagrangian method (ALM) with a three-block ADMM framework. Simulation results demonstrate the performance improvements achieved by integrating HAPS with standalone terrestrial networks, as well as the reduced complexity and signaling overhead of the distributed algorithm compared to centralized algorithms.
PAD-TRO: Projection-Augmented Diffusion for Direct Trajectory Optimization
Recently, diffusion models have gained popularity and attention in trajectory optimization due to their capability of modeling multi-modal probability distributions. However, addressing nonlinear equality constraints, i.e, dynamic feasibility, remains a great challenge in diffusion-based trajectory optimization. Recent diffusion-based trajectory optimization frameworks rely on a single-shooting style approach where the denoised control sequence is applied to forward propagate the dynamical system, which cannot explicitly enforce constraints on the states and frequently leads to sub-optimal solutions. In this work, we propose a novel direct trajectory optimization approach via model-based diffusion, which directly generates a sequence of states. To ensure dynamic feasibility, we propose a gradient-free projection mechanism that is incorporated into the reverse diffusion process. Our results show that, compared to a recent state-of-the-art baseline, our approach leads to zero dynamic feasibility error and approximately 4x higher success rate in a quadrotor waypoint navigation scenario involving dense static obstacles.
comment: Accepted for publication at the 2026 American Control Conference
Control Occupation Kernel Regression for Nonlinear Control-Affine Systems
This manuscript presents an algorithm for obtaining an approximation of a nonlinear high order control affine dynamical system. Controlled trajectories of the system are leveraged as the central unit of information via embedding them in vector-valued reproducing kernel Hilbert space (vvRKHS). The trajectories are embedded as the so-called higher order control occupation kernels which represent an operator on the vvRKHS corresponding to iterated integration after multiplication by a given controller. The solution to the system identification problem is then the unique solution of an infinite dimensional regularized regression problem. The representer theorem is then used to express the solution as finite linear combination of these occupation kernels, which converts an infinite dimensional optimization problem to a finite dimensional optimization problem. The vector valued structure of the Hilbert space allows for simultaneous approximation of the drift and control effectiveness components of the control affine system. Several experiments are performed to demonstrate the effectiveness of the developed approach.
Robotics
Q-learning with Adjoint Matching
We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies because direct gradient-based optimization via backpropagation through their multi-step denoising process is numerically unstable. Existing methods work around this either by only using the value and discarding the gradient information, or by relying on approximations that sacrifice policy expressivity or bias the learned policy. QAM sidesteps both of these challenges by leveraging adjoint matching, a recently proposed technique in generative modeling, which transforms the critic's action gradient to form a step-wise objective function that is free from unstable backpropagation, while providing an unbiased, expressive policy at the optimum. Combined with temporal-difference backup for critic learning, QAM consistently outperforms prior approaches on hard, sparse reward tasks in both offline and offline-to-online RL.
comment: 32 pages, 8 figures, 7 tables
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers
Standard Vision-Language-Action (VLA) models typically fine-tune a monolithic Vision-Language Model (VLM) backbone explicitly for robotic control. However, this approach creates a critical tension between maintaining high-level general semantic understanding and learning low-level, fine-grained sensorimotor skills, often leading to "catastrophic forgetting" of the model's open-world capabilities. To resolve this conflict, we introduce TwinBrainVLA, a novel architecture that coordinates a generalist VLM retaining universal semantic understanding and a specialist VLM dedicated to embodied proprioception for joint robotic control. TwinBrainVLA synergizes a frozen "Left Brain", which retains robust general visual reasoning, with a trainable "Right Brain", specialized for embodied perception, via a novel Asymmetric Mixture-of-Transformers (AsyMoT) mechanism. This design allows the Right Brain to dynamically query semantic knowledge from the frozen Left Brain and fuse it with proprioceptive states, providing rich conditioning for a Flow-Matching Action Expert to generate precise continuous controls. Extensive experiments on SimplerEnv and RoboCasa benchmarks demonstrate that TwinBrainVLA achieves superior manipulation performance compared to state-of-the-art baselines while explicitly preserving the comprehensive visual understanding capabilities of the pre-trained VLM, offering a promising direction for building general-purpose robots that simultaneously achieve high-level semantic understanding and low-level physical dexterity.
comment: GitHub: https://github.com/ZGC-EmbodyAI/TwinBrainVLA
SandWorm: Event-based Visuotactile Perception with Active Vibration for Screw-Actuated Robot in Granular Media
Perception in granular media remains challenging due to unpredictable particle dynamics. To address this challenge, we present SandWorm, a biomimetic screw-actuated robot augmented by peristaltic motion to enhance locomotion, and SWTac, a novel event-based visuotactile sensor with an actively vibrated elastomer. The event camera is mechanically decoupled from vibrations by a spring isolation mechanism, enabling high-quality tactile imaging of both dynamic and stationary objects. For algorithm design, we propose an IMU-guided temporal filter to enhance imaging consistency, improving MSNR by 24%. Moreover, we systematically optimize SWTac with vibration parameters, event camera settings and elastomer properties. Motivated by asymmetric edge features, we also implement contact surface estimation by U-Net. Experimental validation demonstrates SWTac's 0.2 mm texture resolution, 98% stone classification accuracy, and 0.15 N force estimation error, while SandWorm demonstrates versatile locomotion (up to 12.5 mm/s) in challenging terrains, successfully executes pipeline dredging and subsurface exploration in complex granular media (observed 90% success rate). Field experiments further confirm the system's practical performance.
comment: Accepted by IEEE Transactions on Robotics
Diffusion-Guided Backdoor Attacks in Real-World Reinforcement Learning
Backdoor attacks embed hidden malicious behaviors in reinforcement learning (RL) policies and activate them using triggers at test time. Most existing attacks are validated only in simulation, while their effectiveness in real-world robotic systems remains unclear. In physical deployment, safety-constrained control pipelines such as velocity limiting, action smoothing, and collision avoidance suppress abnormal actions, causing strong attenuation of conventional backdoor attacks. We study this previously overlooked problem and propose a diffusion-guided backdoor attack framework (DGBA) for real-world RL. We design small printable visual patch triggers placed on the floor and generate them using a conditional diffusion model that produces diverse patch appearances under real-world visual variations. We treat the robot control stack as a black-box system. We further introduce an advantage-based poisoning strategy that injects triggers only at decision-critical training states. We evaluate our method on a TurtleBot3 mobile robot and demonstrate reliable activation of targeted attacks while preserving normal task performance. Demo videos and code are available in the supplementary material.
Zero-shot adaptable task planning for autonomous construction robots: a comparative study of lightweight single and multi-AI agent systems
Robots are expected to play a major role in the future construction industry but face challenges due to high costs and difficulty adapting to dynamic tasks. This study explores the potential of foundation models to enhance the adaptability and generalizability of task planning in construction robots. Four models are proposed and implemented using lightweight, open-source large language models (LLMs) and vision language models (VLMs). These models include one single agent and three multi-agent teams that collaborate to create robot action plans. The models are evaluated across three construction roles: Painter, Safety Inspector, and Floor Tiling. Results show that the four-agent team outperforms the state-of-the-art GPT-4o in most metrics while being ten times more cost-effective. Additionally, teams with three and four agents demonstrate the improved generalizability. By discussing how agent behaviors influence outputs, this study enhances the understanding of AI teams and supports future research in diverse unstructured environments beyond construction.
Group-Invariant Unsupervised Skill Discovery: Symmetry-aware Skill Representations for Generalizable Behavior
Unsupervised skill discovery aims to acquire behavior primitives that improve exploration and accelerate downstream task learning. However, existing approaches often ignore the geometric symmetries of physical environments, leading to redundant behaviors and sample inefficiency. To address this, we introduce Group-Invariant Skill Discovery (GISD), a framework that explicitly embeds group structure into the skill discovery objective. Our approach is grounded in a theoretical guarantee: we prove that in group-symmetric environments, the standard Wasserstein dependency measure admits a globally optimal solution comprised of an equivariant policy and a group-invariant scoring function. Motivated by this, we formulate the Group-Invariant Wasserstein dependency measure, which restricts the optimization to this symmetry-aware subspace without loss of optimality. Practically, we parameterize the scoring function using a group Fourier representation and define the intrinsic reward via the alignment of equivariant latent features, ensuring that the discovered skills generalize systematically under group transformations. Experiments on state-based and pixel-based locomotion benchmarks demonstrate that GISD achieves broader state-space coverage and improved efficiency in downstream task learning compared to a strong baseline.
comment: 14 pages, 6 figures
Active Cross-Modal Visuo-Tactile Perception of Deformable Linear Objects
This paper presents a novel cross-modal visuo-tactile perception framework for the 3D shape reconstruction of deformable linear objects (DLOs), with a specific focus on cables subject to severe visual occlusions. Unlike existing methods relying predominantly on vision, whose performance degrades under varying illumination, background clutter, or partial visibility, the proposed approach integrates foundation-model-based visual perception with adaptive tactile exploration. The visual pipeline exploits SAM for instance segmentation and Florence for semantic refinement, followed by skeletonization, endpoint detection, and point-cloud extraction. Occluded cable segments are autonomously identified and explored with a tactile sensor, which provides local point clouds that are merged with the visual data through Euclidean clustering and topology-preserving fusion. A B-spline interpolation driven by endpoint-guided point sorting yields a smooth and complete reconstruction of the cable shape. Experimental validation using a robotic manipulator equipped with an RGB-D camera and a tactile pad demonstrates that the proposed framework accurately reconstructs both simple and highly curved single or multiple cable configurations, even when large portions are occluded. These results highlight the potential of foundation-model-enhanced cross-modal perception for advancing robotic manipulation of deformable objects.
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation
Achieving human-level performance in Vision-and-Language Navigation (VLN) requires an embodied agent to jointly understand multimodal instructions and visual-spatial context while reasoning over long action sequences. Recent works, such as NavCoT and NavGPT-2, demonstrate the potential of Chain-of-Thought (CoT) reasoning for improving interpretability and long-horizon planning. Moreover, multimodal extensions like OctoNav-R1 and CoT-VLA further validate CoT as a promising pathway toward human-like navigation reasoning. However, existing approaches face critical drawbacks: purely textual CoTs lack spatial grounding and easily overfit to sparse annotated reasoning steps, while multimodal CoTs incur severe token inflation by generating imagined visual observations, making real-time navigation impractical. In this work, we propose FantasyVLN, a unified implicit reasoning framework that preserves the benefits of CoT reasoning without explicit token overhead. Specifically, imagined visual tokens are encoded into a compact latent space using a pretrained Visual AutoRegressor (VAR) during CoT reasoning training, and the model jointly learns from textual, visual, and multimodal CoT modes under a unified multi-CoT strategy. At inference, our model performs direct instruction-to-action mapping while still enjoying reasoning-aware representations. Extensive experiments on LH-VLN show that our approach achieves reasoning-aware yet real-time navigation, improving success rates and efficiency while reducing inference latency by an order of magnitude compared to explicit CoT methods.
Efficient Coordination with the System-Level Shared State: An Embodied-AI Native Modular Framework
As Embodied AI systems move from research prototypes to real world deployments, they tend to evolve rapidly while remaining reliable under workload changes and partial failures. In practice, many deployments are only partially decoupled: middleware moves messages, but shared context and feedback semantics are implicit, causing interface drift, cross-module interference, and brittle recovery at scale. We present ANCHOR, a modular framework that makes decoupling and robustness explicit system-level primitives. ANCHOR separates (i) Canonical Records, an evolvable contract for the standardized shared state, from (ii) a communication bus for many-to-many dissemination and feedback-oriented coordination, forming an inspectable end-to-end loop. We validate closed-loop feasibility on a de-identified workflow instantiation, characterize latency distributions under varying payload sizes and publish rates, and demonstrate automatic stream resumption after hard crashes and restarts even with shared-memory loss. Overall, ANCHOR turns ad-hoc integration glue into explicit contracts, enabling controlled degradation under load and self-healing recovery for scalable deployment of closed-loop AI systems.
GuideTouch: An Obstacle Avoidance Device for Visually Impaired
Safe navigation for the visually impaired individuals remains a critical challenge, especially concerning head-level obstacles, which traditional mobility aids often fail to detect. We introduce GuideTouch, a compact, affordable, standalone wearable device designed for autonomous obstacle avoidance. The system integrates two vertically aligned Time-of-Flight (ToF) sensors, enabling three-dimensional environmental perception, and four vibrotactile actuators that provide directional haptic feedback. Proximity and direction information is communicated via an intuitive 4-point vibrotactile feedback system located across the user's shoulders and upper chest. For real-world robustness, the device includes a unique centrifugal self-cleaning optical cover mechanism and a sound alarm system for location if the device is dropped. We evaluated the haptic perception accuracy across 22 participants (17 male and 5 female, aged 21-48, mean 25.7, sd 6.1). Statistical analysis confirmed a significant difference between the perception accuracy of different patterns. The system demonstrated high recognition accuracy, achieving an average of 92.9% for single and double motor (primary directional) patterns. Furthermore, preliminary experiments with 14 visually impaired users validated this interface, showing a recognition accuracy of 93.75% for primary directional cues. The results demonstrate that GuideTouch enables intuitive spatial perception and could significantly improve the safety, confidence, and autonomy of users with visual impairments during independent navigation.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
DroneVLA: VLA based Aerial Manipulation
As aerial platforms evolve from passive observers to active manipulators, the challenge shifts toward designing intuitive interfaces that allow non-expert users to command these systems naturally. This work introduces a novel concept of autonomous aerial manipulation system capable of interpreting high-level natural language commands to retrieve objects and deliver them to a human user. The system is intended to integrate a MediaPipe based on Grounding DINO and a Vision-Language-Action (VLA) model with a custom-built drone equipped with a 1-DOF gripper and an Intel RealSense RGB-D camera. VLA performs semantic reasoning to interpret the intent of a user prompt and generates a prioritized task queue for grasping of relevant objects in the scene. Grounding DINO and dynamic A* planning algorithm are used to navigate and safely relocate the object. To ensure safe and natural interaction during the handover phase, the system employs a human-centric controller driven by MediaPipe. This module provides real-time human pose estimation, allowing the drone to employ visual servoing to maintain a stable, distinct position directly in front of the user, facilitating a comfortable handover. We demonstrate the system's efficacy through real-world experiments for localization and navigation, which resulted in a 0.164m, 0.070m, and 0.084m of max, mean euclidean, and root-mean squared errors, respectively, highlighting the feasibility of VLA for aerial manipulation operations.
comment: This paper has been accepted for publication at LBR of HRI 2026 conference
HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction
Drones operating in human-occupied spaces suffer from insufficient communication mechanisms that create uncertainty about their intentions. We present HoverAI, an embodied aerial agent that integrates drone mobility, infrastructure-independent visual projection, and real-time conversational AI into a unified platform. Equipped with a MEMS laser projector, onboard semi-rigid screen, and RGB camera, HoverAI perceives users through vision and voice, responding via lip-synced avatars that adapt appearance to user demographics. The system employs a multimodal pipeline combining VAD, ASR (Whisper), LLM-based intent classification, RAG for dialogue, face analysis for personalization, and voice synthesis (XTTS v2). Evaluation demonstrates high accuracy in command recognition (F1: 0.90), demographic estimation (gender F1: 0.89, age MAE: 5.14 years), and speech transcription (WER: 0.181). By uniting aerial robotics with adaptive conversational AI and self-contained visual output, HoverAI introduces a new class of spatially-aware, socially responsive embodied agents for applications in guidance, assistance, and human-centered interaction.
comment: This paper has been accepted for publication at LBR HRI 2026 conference
Sample Efficient Learning of Body-Environment Interaction of an Under-Actuated System
Geometric mechanics provides valuable insights into how biological and robotic systems use changes in shape to move by mechanically interacting with their environment. In high-friction environments it provides that the entire interaction is captured by the ``motility map''. Here we compare methods for learning the motility map from motion tracking data of a physical robot created specifically to test these methods by having under-actuated degrees of freedom and a hard to model interaction with its substrate. We compared four modeling approaches in terms of their ability to predict body velocity from shape change within the same gait, across gaits, and across speeds. Our results show a trade-off between simpler methods which are superior on small training datasets, and more sophisticated methods, which are superior when more training data is available.
RIM Hand : A Robotic Hand with an Accurate Carpometacarpal Joint and Nitinol-Supported Skeletal Structure
This paper presents the flexible RIM Hand, a biomimetic robotic hand that precisely replicates the carpometacarpal (CMC) joints and employs superelastic Nitinol wires throughout its skeletal framework. By modeling the full carpal-to-metacarpal anatomy, the design enables realistic palm deformation through tendon-driven fingers while enhancing joint restoration and supports skeletal structure with Nitinol-based dorsal extensors. A flexible silicone skin further increases contact friction and contact area, enabling stable grasps for diverse objects. Experiments show that the palm can deform up to 28%, matching human hand flexibility, while achieving more than twice the payload capacity and three times the contact area compared to a rigid palm design. The RIM Hand thus offers improved dexterity, compliance, and anthropomorphism, making it promising for prosthetic and service-robot applications.
comment: Soft Robotics
SUNSET -- A Sensor-fUsioN based semantic SegmEnTation exemplar for ROS-based self-adaptation
The fact that robots are getting deployed more often in dynamic environments, together with the increasing complexity of their software systems, raises the need for self-adaptive approaches. In these environments robotic software systems increasingly operate amid (1) uncertainties, where symptoms are easy to observe but root causes are ambiguous, or (2) multiple uncertainties appear concurrently. We present SUNSET, a ROS2-based exemplar that enables rigorous, repeatable evaluation of architecture-based self-adaptation in such conditions. It implements a sensor fusion semantic-segmentation pipeline driven by a trained Machine Learning (ML) model whose input preprocessing can be perturbed to induce realistic performance degradations. The exemplar exposes five observable symptoms, where each can be caused by different root causes and supports concurrent uncertainties spanning self-healing and self-optimisation. SUNSET includes the segmentation pipeline, a trained ML model, uncertainty-injection scripts, a baseline controller, and step-by-step integration and evaluation documentation to facilitate reproducible studies and fair comparison.
Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning
This paper presents a deep reinforcement learning (DRL) based controller for collective navigation of unmanned aerial vehicle (UAV) swarms in communication-denied environments, enabling robust operation in complex, obstacle-rich environments. Inspired by biological swarms where informed individuals guide groups without explicit communication, we employ an implicit leader-follower framework. In this paradigm, only the leader possesses goal information, while follower UAVs learn robust policies using only onboard LiDAR sensing, without requiring any inter-agent communication or leader identification. Our system utilizes LiDAR point clustering and an extended Kalman filter for stable neighbor tracking, providing reliable perception independent of external positioning systems. The core of our approach is a DRL controller, trained in GPU-accelerated Nvidia Isaac Sim, that enables followers to learn complex emergent behaviors - balancing flocking and obstacle avoidance - using only local perception. This allows the swarm to implicitly follow the leader while robustly addressing perceptual challenges such as occlusion and limited field-of-view. The robustness and sim-to-real transfer of our approach are confirmed through extensive simulations and challenging real-world experiments with a swarm of five UAVs, which successfully demonstrated collective navigation across diverse indoor and outdoor environments without any communication or external localization.
A General One-Shot Multimodal Active Perception Framework for Robotic Manipulation: Learning to Predict Optimal Viewpoint
Active perception in vision-based robotic manipulation aims to move the camera toward more informative observation viewpoints, thereby providing high-quality perceptual inputs for downstream tasks. Most existing active perception methods rely on iterative optimization, leading to high time and motion costs, and are tightly coupled with task-specific objectives, which limits their transferability. In this paper, we propose a general one-shot multimodal active perception framework for robotic manipulation. The framework enables direct inference of optimal viewpoints and comprises a data collection pipeline and an optimal viewpoint prediction network. Specifically, the framework decouples viewpoint quality evaluation from the overall architecture, supporting heterogeneous task requirements. Optimal viewpoints are defined through systematic sampling and evaluation of candidate viewpoints, after which large-scale training datasets are constructed via domain randomization. Moreover, a multimodal optimal viewpoint prediction network is developed, leveraging cross-attention to align and fuse multimodal features and directly predict camera pose adjustments. The proposed framework is instantiated in robotic grasping under viewpoint-constrained environments. Experimental results demonstrate that active perception guided by the framework significantly improves grasp success rates. Notably, real-world evaluations achieve nearly double the grasp success rate and enable seamless sim-to-real transfer without additional fine-tuning, demonstrating the effectiveness of the proposed framework.
Highly Deformable Proprioceptive Membrane for Real-Time 3D Shape Reconstruction
Reconstructing the three-dimensional (3D) geometry of object surfaces is essential for robot perception, yet vision-based approaches are generally unreliable under low illumination or occlusion. This limitation motivates the design of a proprioceptive membrane that conforms to the surface of interest and infers 3D geometry by reconstructing its own deformation. Conventional shape-aware membranes typically rely on resistive, capacitive, or magneto-sensitive mechanisms. However, these methods often encounter challenges such as structural complexity, limited compliance during large-scale deformation, and susceptibility to electromagnetic interference. This work presents a soft, flexible, and stretchable proprioceptive silicone membrane based on optical waveguide sensing. The membrane sensor integrates edge-mounted LEDs and centrally distributed photodiodes (PDs), interconnected via liquid-metal traces embedded within a multilayer elastomeric composite. Rich deformation-dependent light intensity signals are decoded by a data-driven model to recover the membrane geometry as a 3D point cloud. On a customized 140 mm square membrane, real-time reconstruction of large-scale out-of-plane deformation is achieved at 90 Hz with an average reconstruction error of 1.3 mm, measured by Chamfer distance, while maintaining accuracy for indentations up to 25 mm. The proposed framework provides a scalable, robust, and low-profile solution for global shape perception in deformable robotic systems.
comment: 13 pages, 7 figures
Learning Fine-Grained Correspondence with Cross-Perspective Perception for Open-Vocabulary 6D Object Pose Estimation
Open-vocabulary 6D object pose estimation empowers robots to manipulate arbitrary unseen objects guided solely by natural language. However, a critical limitation of existing approaches is their reliance on unconstrained global matching strategies. In open-world scenarios, trying to match anchor features against the entire query image space introduces excessive ambiguity, as target features are easily confused with background distractors. To resolve this, we propose Fine-grained Correspondence Pose Estimation (FiCoP), a framework that transitions from noise-prone global matching to spatially-constrained patch-level correspondence. Our core innovation lies in leveraging a patch-to-patch correlation matrix as a structural prior to narrowing the matching scope, effectively filtering out irrelevant clutter to prevent it from degrading pose estimation. Firstly, we introduce an object-centric disentanglement preprocessing to isolate the semantic target from environmental noise. Secondly, a Cross-Perspective Global Perception (CPGP) module is proposed to fuse dual-view features, establishing structural consensus through explicit context reasoning. Finally, we design a Patch Correlation Predictor (PCP) that generates a precise block-wise association map, acting as a spatial filter to enforce fine-grained, noise-resilient matching. Experiments on the REAL275 and Toyota-Light datasets demonstrate that FiCoP improves Average Recall by 8.0% and 6.1%, respectively, compared to the state-of-the-art method, highlighting its capability to deliver robust and generalized perception for robotic agents operating in complex, unconstrained open-world environments. The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP.
comment: The source code will be made publicly available at https://github.com/zjjqinyu/FiCoP
LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI
Simulated environments play an essential role in embodied AI, functionally analogous to test cases in software engineering. However, existing environment generation methods often emphasize visual realism (e.g., object diversity and layout coherence), overlooking a crucial aspect: logical diversity from the testing perspective. This limits the comprehensive evaluation of agent adaptability and planning robustness in distinct simulated environments. To bridge this gap, we propose LogicEnvGen, a novel method driven by Large Language Models (LLMs) that adopts a top-down paradigm to generate logically diverse simulated environments as test cases for agents. Given an agent task, LogicEnvGen first analyzes its execution logic to construct decision-tree-structured behavior plans and then synthesizes a set of logical trajectories. Subsequently, it adopts a heuristic algorithm to refine the trajectory set, reducing redundant simulation. For each logical trajectory, which represents a potential task situation, LogicEnvGen correspondingly instantiates a concrete environment. Notably, it employs constraint solving for physical plausibility. Furthermore, we introduce LogicEnvEval, a novel benchmark comprising four quantitative metrics for environment evaluation. Experimental results verify the lack of logical diversity in baselines and demonstrate that LogicEnvGen achieves 1.04-2.61x greater diversity, significantly improving the performance in revealing agent faults by 4.00%-68.00%.
comment: 19 pages, 15 figures, 6 tables
The OncoReach Stylet for Brachytherapy: Design Evaluation and Pilot Study
Cervical cancer accounts for a significant portion of the global cancer burden among women. Interstitial brachytherapy (ISBT) is a standard procedure for treating cervical cancer; it involves placing a radioactive source through a straight hollow needle within or in close proximity to the tumor and surrounding tissue. However, the use of straight needles limits surgical planning to a linear needle path. We present the OncoReach stylet, a handheld, tendon-driven steerable stylet designed for compatibility with standard ISBT 15- and 13-gauge needles. Building upon our prior work, we evaluated design parameters like needle gauge, spherical joint count and spherical joint placement, including an asymmetric disk design to identify a configuration that maximizes bending compliance while retaining axial stiffness. Free space experiments quantified tip deflection across configurations, and a two-tube Cosserat rod model accurately predicted the centerline shape of the needle for most trials. The best performing configuration was integrated into a reusable handheld prototype that enables manual actuation. A patient-derived, multi-composite phantom model of the uterus and pelvis was developed to conduct a pilot study of the OncoReach steerable stylet with one expert user. Results showed the ability to steer from less-invasive, medial entry points to reach the lateral-most targets, underscoring the significance of steerable stylets.
Learning-Augmented Online TRP on a Line
We study the online traveling repairperson problem on a line within the recently proposed learning-augmented framework, which provides predictions on the requests to be served via machine learning. In the original model (with no predictions), there is a stream of requests released over time along the line. The goal is to minimize the sum (or average) of the completion times of the requests. In the original model, the state-of-the-art competitive ratio lower bound is $1+\sqrt{2} > 2.414$ for any deterministic algorithm and the state-of-the-art competitive ratio upper bound is 4 for a deterministic algorithm. Our prediction model involves predicted positions, possibly error-prone, of each request in the stream known a priori but the arrival times of requests are not known until their arrival. We first establish a 3-competitive lower bound which extends to the original model. We then design a deterministic algorithm that is $(2+\sqrt{3})\approx 3.732$-competitive when predictions are perfect. With imperfect predictions (maximum error $δ> 0$), we show that our deterministic algorithm becomes $\min\{3.732+4δ,4\}$-competitive, knowing $δ$. To the best of our knowledge, these are the first results for online traveling repairperson problem in the learning-augmented framework.
comment: 8 pages, 5 figures, 3 tables, and 2 pseudocodes
UNCLE-Grasp: Uncertainty-Aware Grasping of Leaf-Occluded Strawberries
Robotic strawberry harvesting is challenging under partial occlusion, where leaves induce significant geometric uncertainty and make grasp decisions based on a single deterministic shape estimate unreliable. From a single partial observation, multiple incompatible 3D completions may be plausible, causing grasps that appear feasible on one completion to fail on another. We propose an uncertainty-aware grasping pipeline for partially occluded strawberries that explicitly models completion uncertainty arising from both occlusion and learned shape reconstruction. Our approach uses point cloud completion with Monte Carlo dropout to sample multiple shape hypotheses, generates candidate grasps for each completion, and evaluates grasp feasibility using physically grounded force-closure-based metrics. Rather than selecting a grasp based on a single estimate, we aggregate feasibility across completions and apply a conservative lower confidence bound (LCB) criterion to decide whether a grasp should be attempted or safely abstained. We evaluate the proposed method in simulation and on a physical robot across increasing levels of synthetic and real leaf occlusion. Results show that uncertainty-aware decision making enables reliable abstention from high-risk grasp attempts under severe occlusion while maintaining robust grasp execution when geometric confidence is sufficient, outperforming deterministic baselines in both simulated and physical robot experiments.
Robust Haptic Rendering Using a Nonlinear Impedance Matching Approach (NIMA) for Robotic Laparoscopic Surgery
Background: The integration of haptic feedback into robot-assisted minimally invasive surgery (RAMIS) has long been limited by challenges in accurately rendering forces and ensuring system safety. The need for robust, high-fidelity haptic systems is critical for enhancing the precision and reliability of teleoperated surgical tools. Methods: In this study, we present a Nonlinear Impedance Matching Approach (NIMA) designed to improve force rendering by accurately modelling complex tool-tissue interactions. Based on our previously validated Impedance Matching Approach (IMA), our novel NIMA method includes nonlinear dynamics to capture and render tool-tissue forces effectively. Results: NIMA improves force feedback accuracy with a mean absolute error (MAE) of 0.01 (SD 0.02) N, achieving a 95% reduction in MAE compared to IMA. Furthermore, NIMA effectively eliminates haptic "kickback" by ensuring no force is applied by the haptic device to the user's hand when they release the handle, enhancing both patient safety and user comfort. Conclusion: NIMA's ability to account for nonlinearities in tool-tissue interactions provides an improvement in force fidelity, responsiveness, and precision across various surgical conditions. Our findings promote the advancement of haptic feedback systems for robotic surgery, offering a realistic and reliable interface for robot-assisted surgical procedures.
Agentic AI Meets Edge Computing in Autonomous UAV Swarms
The integration of agentic AI, powered by large language models (LLMs) with autonomous reasoning, planning, and execution, into unmanned aerial vehicle (UAV) swarms opens new operational possibilities and brings the vision of the Internet of Drones closer to reality. However, infrastructure constraints, dynamic environments, and the computational demands of multi-agent coordination limit real-world deployment in high-risk scenarios such as wildfires and disaster response. This paper investigates the integration of LLM-based agentic AI and edge computing to realize scalable and resilient autonomy in UAV swarms. We first discuss three architectures for supporting UAV swarms - standalone, edge-enabled, and edge-cloud hybrid deployment - each optimized for varying autonomy and connectivity levels. Then, a use case for wildfire search and rescue (SAR) is designed to demonstrate the efficiency of the edge-enabled architecture, enabling high SAR coverage, reduced mission completion times, and a higher level of autonomy compared to traditional approaches. Finally, we highlight open challenges in integrating LLMs and edge computing for mission-critical UAV-swarm applications.
RoboBrain 2.5: Depth in Sight, Time in Mind
We introduce RoboBrain 2.5, a next-generation embodied AI foundation model that advances general perception, spatial reasoning, and temporal modeling through extensive training on high-quality spatiotemporal supervision. Building upon its predecessor, RoboBrain 2.5 introduces two major capability upgrades. Specifically, it unlocks Precise 3D Spatial Reasoning by shifting from 2D pixel-relative grounding to depth-aware coordinate prediction and absolute metric constraint comprehension, generating complete 3D manipulation traces as ordered keypoint sequences under physical constraints. Complementing this spatial precision, the model establishes Dense Temporal Value Estimation that provides dense, step-aware progress prediction and execution state understanding across varying viewpoints, producing stable feedback signals for downstream learning. Together, these upgrades extend the framework toward more physically grounded and execution-aware embodied intelligence for complex, fine-grained manipulation. The code and checkpoints are available at project website: https://superrobobrain.github.io
comment: 37 pages, 13 figures, Technical Report
SilentDrift: Exploiting Action Chunking for Stealthy Backdoor Attacks on Vision-Language-Action Models
Vision-Language-Action (VLA) models are increasingly deployed in safety-critical robotic applications, yet their security vulnerabilities remain underexplored. We identify a fundamental security flaw in modern VLA systems: the combination of action chunking and delta pose representations creates an intra-chunk visual open-loop. This mechanism forces the robot to execute K-step action sequences, allowing per-step perturbations to accumulate through integration. We propose SILENTDRIFT, a stealthy black-box backdoor attack exploiting this vulnerability. Our method employs the Smootherstep function to construct perturbations with guaranteed C2 continuity, ensuring zero velocity and acceleration at trajectory boundaries to satisfy strict kinematic consistency constraints. Furthermore, our keyframe attack strategy selectively poisons only the critical approach phase, maximizing impact while minimizing trigger exposure. The resulting poisoned trajectories are visually indistinguishable from successful demonstrations. Evaluated on the LIBERO, SILENTDRIFT achieves a 93.2% Attack Success Rate with a poisoning rate under 2%, while maintaining a 95.3% Clean Task Success Rate.
AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning
Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .
comment: 28 pages, 25 figures. The first four authors contributed equally
Tube-Based Robust Control Strategy for Vision-Guided Autonomous Vehicles
A robust control strategy for autonomous vehicles can improve system stability, enhance riding comfort, and prevent driving accidents. This paper presents a novel interpolation-tube-based constrained iterative linear quadratic regulator (itube-CILQR) algorithm for autonomous computer-vision-based vehicle lane-keeping. The goal of the algorithm is to enhance robustness during high-speed cornering on tight turns. Compared with standard tube-based approaches, the proposed itube-CILQR algorithm reduces system conservatism and exhibits higher computational speed. Numerical simulations and vision-based experiments were conducted to examine the feasibility of using the proposed algorithm for controlling autonomous vehicles. The results indicated that the proposed algorithm achieved superior vehicle lane-keeping performance to variational CILQR-based methods and model predictive control (MPC) approaches involving the use of a classical interior-point optimizer. Specifically, itube-CILQR required an average runtime of 3.45 ms to generate a control signal for guiding a self-driving vehicle. By comparison, itube-MPC typically required a 4.32 times longer computation time to complete the same task. Moreover, the influence of conservatism on system behavior was investigated by exploring the variations in the interpolation variables derived using the proposed itube-CILQR algorithm during lane-keeping maneuvers.
comment: 15 pages, 16 figures
FlyPose: Towards Robust Human Pose Estimation From Aerial Views WACV
Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.
comment: 11 pages, 9 figures, IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2026
Safety on the Fly: Constructing Robust Safety Filters via Policy Control Barrier Functions at Runtime
Control Barrier Functions (CBFs) have proven to be an effective tool for performing safe control synthesis for nonlinear systems. However, guaranteeing safety in the presence of disturbances and input constraints for high relative degree systems is a difficult problem. In this work, we propose the Robust Policy CBF (RPCBF), a practical approach for constructing robust CBF approximations online via the estimation of a value function. We establish conditions under which the approximation qualifies as a valid CBF and demonstrate the effectiveness of the RPCBF-safety filter in simulation on a variety of high relative degree input-constrained systems. Finally, we demonstrate the benefits of our method in compensating for model errors on a hardware quadcopter platform by treating the model errors as disturbances. Website including code: www.oswinso.xyz/rpcbf/
comment: Accepted in RAL. The project page can be found at www.oswinso.xyz/rpcbf/
Sequentially Teaching Sequential Tasks $(ST)^2$: Teaching Robots Long-horizon Manipulation Skills
Learning from demonstration has proved itself useful for teaching robots complex skills with high sample efficiency. However, teaching long-horizon tasks with multiple skills is challenging as deviations tend to accumulate, the distributional shift becomes more evident, and human teachers become fatigued over time, thereby increasing the likelihood of failure. To address these challenges, we introduce $(ST)^2$, a sequential method for learning long-horizon manipulation tasks that allows users to control the teaching flow by specifying key points, enabling structured and incremental demonstrations. Using this framework, we study how users respond to two teaching paradigms: (i) a traditional monolithic approach, in which users demonstrate the entire task trajectory at once, and (ii) a sequential approach, in which the task is segmented and demonstrated step by step. We conducted an extensive user study on the restocking task with $16$ participants in a realistic retail store environment, evaluating the user preferences and effectiveness of the methods. User-level analysis showed superior performance for the sequential approach in most cases (10 users), compared with the monolithic approach (5 users), with one tie. Our subjective results indicate that some teachers prefer sequential teaching -- as it allows them to teach complicated tasks iteratively -- or others prefer teaching in one go due to its simplicity.
comment: Accepted for publication in IEEE Robotics and Automation Magazine
Robotic Tele-Operation for Upper Aerodigestive Tract Microsurgery: System Design and Validation
Upper aerodigestive tract (UADT) treatments frequently employ transoral laser microsurgery (TLM) for procedures such as the removal of tumors or polyps. In TLM, a laser beam is used to cut target tissue, while forceps are employed to grasp, manipulate, and stabilize tissue within the UADT. Although TLM systems may rely on different technologies and interfaces, forceps manipulation is still predominantly performed manually, introducing limitations in ergonomics, precision, and controllability. This paper proposes a novel robotic system for tissue manipulation in UADT procedures, based on a novel end-effector designed for forceps control. The system is integrated within a teleoperation framework that employs a robotic manipulator with a programmed remote center of motion (RCM), enabling precise and constrained instrument motion while improving surgeon ergonomics. The proposed approach is validated through two experimental studies and a dedicated usability evaluation, demonstrating its effectiveness and suitability for UADT surgical applications.
Omni-LIVO: Robust RGB-Colored Multi-Camera Visual-Inertial-LiDAR Odometry via Photometric Migration and ESIKF Fusion
Wide field-of-view (FoV) LiDAR sensors provide dense geometry across large environments, but existing LiDAR-inertial-visual odometry (LIVO) systems generally rely on a single camera, limiting their ability to fully exploit LiDAR-derived depth for photometric alignment and scene colorization. We present Omni-LIVO, a tightly coupled multi-camera LIVO system that leverages multi-view observations to comprehensively utilize LiDAR geometric information across extended spatial regions. Omni-LIVO introduces a Cross-View direct alignment strategy that maintains photometric consistency across non-overlapping views, and extends the Error-State Iterated Kalman Filter (ESIKF) with multi-view updates and adaptive covariance. The system is evaluated on public benchmarks and our custom dataset, showing improved accuracy and robustness over state-of-the-art LIVO, LIO, and visual-inertial SLAM baselines. Code and dataset will be released upon publication.
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation tasks into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact points and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Preference-based Reinforcement Learning (PbRL) enables policy learning through simple queries comparing trajectories from a single policy. While human responses to these queries make it possible to learn policies aligned with human preferences, PbRL suffers from low query efficiency, as policy bias limits trajectory diversity and reduces the number of discriminable queries available for learning preferences. This paper identifies preference discriminability, which quantifies how easily a human can judge which trajectory is closer to their ideal behavior, as a key metric for improving query efficiency. To address this, we move beyond comparisons within a single policy and instead generate queries by comparing trajectories from multiple policies, as training them from scratch promotes diversity without policy bias. We propose Discriminability-Aware Policy-to-Policy Preference-Based Efficient Reinforcement Learning (DAPPER), which integrates preference discriminability with trajectory diversification achieved by multiple policies. DAPPER trains new policies from scratch after each reward update and employs a discriminator that learns to estimate preference discriminability, enabling the prioritized sampling of more discriminable queries. During training, it jointly maximizes the preference reward and preference discriminability score, encouraging the discovery of highly rewarding and easily distinguishable policies. Experiments in simulated and real-world legged robot environments demonstrate that DAPPER outperforms previous methods in query efficiency, particularly under challenging preference discriminability conditions. A supplementary video that facilitates understanding of the proposed framework and its experimental results is available at: https://youtu.be/lRwX8FNN8n4
comment: Accepted for IEEE Robotics & Automation Magazine (RAM)
SurfSLAM: Sim-to-Real Underwater Stereo Reconstruction For Real-Time SLAM
Localization and mapping are core perceptual capabilities for underwater robots. Stereo cameras provide a low-cost means of directly estimating metric depth to support these tasks. However, despite recent advances in stereo depth estimation on land, computing depth from image pairs in underwater scenes remains challenging. In underwater environments, images are degraded by light attenuation, visual artifacts, and dynamic lighting conditions. Furthermore, real-world underwater scenes frequently lack rich texture useful for stereo depth estimation and 3D reconstruction. As a result, stereo estimation networks trained on in-air data cannot transfer directly to the underwater domain. In addition, there is a lack of real-world underwater stereo datasets for supervised training of neural networks. Poor underwater depth estimation is compounded in stereo-based Simultaneous Localization and Mapping (SLAM) algorithms, making it a fundamental challenge for underwater robot perception. To address these challenges, we propose a novel framework that enables sim-to-real training of underwater stereo disparity estimation networks using simulated data and self-supervised finetuning. We leverage our learned depth predictions to develop SurfSLAM, a novel framework for real-time underwater SLAM that fuses stereo cameras with IMU, barometric, and Doppler Velocity Log (DVL) measurements. Lastly, we collect a challenging real-world dataset of shipwreck surveys using an underwater robot. Our dataset features over 24,000 stereo pairs, along with high-quality, dense photogrammetry models and reference trajectories for evaluation. Through extensive experiments, we demonstrate the advantages of the proposed training approach on real-world data for improving stereo estimation in the underwater domain and for enabling accurate trajectory estimation and 3D reconstruction of complex shipwreck sites.
Multiagent Systems
LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems
The field of artificial intelligence has undergone a revolution from foundational Transformer architectures to reasoning-capable systems approaching human-level performance. We present LLMOrbit, a comprehensive circular taxonomy navigating the landscape of large language models spanning 2019-2025. This survey examines over 50 models across 15 organizations through eight interconnected orbital dimensions, documenting architectural innovations, training methodologies, and efficiency patterns defining modern LLMs, generative AI, and agentic systems. We identify three critical crises: (1) data scarcity (9-27T tokens depleted by 2026-2028), (2) exponential cost growth ($3M to $300M+ in 5 years), and (3) unsustainable energy consumption (22x increase), establishing the scaling wall limiting brute-force approaches. Our analysis reveals six paradigms breaking this wall: (1) test-time compute (o1, DeepSeek-R1 achieve GPT-4 performance with 10x inference compute), (2) quantization (4-8x compression), (3) distributed edge computing (10x cost reduction), (4) model merging, (5) efficient training (ORPO reduces memory 50%), and (6) small specialized models (Phi-4 14B matches larger models). Three paradigm shifts emerge: (1) post-training gains (RLHF, GRPO, pure RL contribute substantially, DeepSeek-R1 achieving 79.8% MATH), (2) efficiency revolution (MoE routing 18x efficiency, Multi-head Latent Attention 8x KV cache compression enables GPT-4-level performance at <$0.30/M tokens), and (3) democratization (open-source Llama 3 88.6% MMLU surpasses GPT-4 86.4%). We provide insights into techniques (RLHF, PPO, DPO, GRPO, ORPO), trace evolution from passive generation to tool-using agents (ReAct, RAG, multi-agent systems), and analyze post-training innovations.
Collective intelligence in science: direct elicitation of diverse information from experts with unknown information structure
Suppose we need a deep collective analysis of an open scientific problem: there is a complex scientific hypothesis and a large online group of mutually unrelated experts with relevant private information of a diverse and unpredictable nature. This information may be results of experts' individual experiments, original reasoning of some of them, results of AI systems they use, etc. We propose a simple mechanism based on a self-resolving play-money prediction market entangled with a chat. We show that such a system can easily be brought to an equilibrium where participants directly share their private information on the hypothesis through the chat and trade as if the market were resolved in accordance with the truth of the hypothesis. This approach will lead to efficient aggregation of relevant information in a completely interpretable form even if the ground truth cannot be established and experts initially know nothing about each other and cannot perform complex Bayesian calculations. Finally, by rewarding the experts with some real assets proportionally to the play money they end up with, we can get an innovative way to fund large-scale collaborative studies of any type.
comment: 21 pages
The Orchestration of Multi-Agent Systems: Architectures, Protocols, and Enterprise Adoption
Orchestrated multi-agent systems represent the next stage in the evolution of artificial intelligence, where autonomous agents collaborate through structured coordination and communication to achieve complex, shared objectives. This paper consolidates and formalizes the technical composition of such systems, presenting a unified architectural framework that integrates planning, policy enforcement, state management, and quality operations into a coherent orchestration layer. Another primary contribution of this work is the in-depth technical delineation of two complementary communication protocols - the Model Context Protocol, which standardizes how agents access external tools and contextual data, and the Agent2Agent protocol, which governs peer coordination, negotiation, and delegation. Together, these protocols establish an interoperable communication substrate that enables scalable, auditable, and policy-compliant reasoning across distributed agent collectives. Beyond protocol design, the paper details how orchestration logic, governance frameworks, and observability mechanisms collectively sustain system coherence, transparency, and accountability. By synthesizing these elements into a cohesive technical blueprint, this paper provides comprehensive treatments of orchestrated multi-agent systems - bridging conceptual architectures with implementation-ready design principles for enterprise-scale AI ecosystems.
Communication-Free Collective Navigation for a Swarm of UAVs via LiDAR-Based Deep Reinforcement Learning
This paper presents a deep reinforcement learning (DRL) based controller for collective navigation of unmanned aerial vehicle (UAV) swarms in communication-denied environments, enabling robust operation in complex, obstacle-rich environments. Inspired by biological swarms where informed individuals guide groups without explicit communication, we employ an implicit leader-follower framework. In this paradigm, only the leader possesses goal information, while follower UAVs learn robust policies using only onboard LiDAR sensing, without requiring any inter-agent communication or leader identification. Our system utilizes LiDAR point clustering and an extended Kalman filter for stable neighbor tracking, providing reliable perception independent of external positioning systems. The core of our approach is a DRL controller, trained in GPU-accelerated Nvidia Isaac Sim, that enables followers to learn complex emergent behaviors - balancing flocking and obstacle avoidance - using only local perception. This allows the swarm to implicitly follow the leader while robustly addressing perceptual challenges such as occlusion and limited field-of-view. The robustness and sim-to-real transfer of our approach are confirmed through extensive simulations and challenging real-world experiments with a swarm of five UAVs, which successfully demonstrated collective navigation across diverse indoor and outdoor environments without any communication or external localization.
TruthTensor: Evaluating LLMs Human Imitation through Prediction Market Drift and Holistic Reasoning
Evaluating language models and AI agents remains fundamentally challenging because static benchmarks fail to capture real-world uncertainty, distribution shift, and the gap between isolated task accuracy and human-aligned decision-making under evolving conditions. This paper introduces TruthTensor, a novel, reproducible evaluation paradigm that measures Large Language Models (LLMs) not only as prediction engines but as human-imitation systems operating in socially-grounded, high-entropy environments. Building on forward-looking, contamination-free tasks, our framework anchors evaluation to live prediction markets and combines probabilistic scoring to provide a holistic view of model behavior. TruthTensor complements traditional correctness metrics with drift-centric diagnostics and explicit robustness checks for reproducibility. It specify human vs. automated evaluation roles, annotation protocols, and statistical testing procedures to ensure interpretability and replicability of results. In experiments across 500+ real markets (political, economic, cultural, technological), TruthTensor demonstrates that models with similar forecast accuracy can diverge markedly in calibration, drift, and risk-sensitivity, underscoring the need to evaluate models along multiple axes (accuracy, calibration, narrative stability, cost, and resource efficiency). TruthTensor therefore operationalizes modern evaluation best practices, clear hypothesis framing, careful metric selection, transparent compute/cost reporting, human-in-the-loop validation, and open, versioned evaluation contracts, to produce defensible assessments of LLMs in real-world decision contexts. We publicly release TruthTensor at https://truthtensor.com
comment: 16 pages, 6 figures, 2 tables
Tokenomics: Quantifying Where Tokens Are Used in Agentic Software Engineering
LLM-based Multi-Agent (LLM-MA) systems are increasingly applied to automate complex software engineering tasks such as requirements engineering, code generation, and testing. However, their operational efficiency and resource consumption remain poorly understood, hindering practical adoption due to unpredictable costs and environmental impact. To address this, we conduct an analysis of token consumption patterns in an LLM-MA system within the Software Development Life Cycle (SDLC), aiming to understand where tokens are consumed across distinct software engineering activities. We analyze execution traces from 30 software development tasks performed by the ChatDev framework using a GPT-5 reasoning model, mapping its internal phases to distinct development stages (Design, Coding, Code Completion, Code Review, Testing, and Documentation) to create a standardized evaluation framework. We then quantify and compare token distribution (input, output, reasoning) across these stages. Our preliminary findings show that the iterative Code Review stage accounts for the majority of token consumption for an average of 59.4% of tokens. Furthermore, we observe that input tokens consistently constitute the largest share of consumption for an average of 53.9%, providing empirical evidence for potentially significant inefficiencies in agentic collaboration. Our results suggest that the primary cost of agentic software engineering lies not in initial code generation but in automated refinement and verification. Our novel methodology can help practitioners predict expenses and optimize workflows, and it directs future research toward developing more token-efficient agent collaboration protocols.
If You Want Coherence, Orchestrate a Team of Rivals: Multi-Agent Models of Organizational Intelligence
AI Agents can perform complex operations at great speed, but just like all the humans we have ever hired, their intelligence remains fallible. Miscommunications aren't noticed, systemic biases have no counter-action, and inner monologues are rarely written down. We did not come to fire them for their mistakes, but to hire them and provide a safe productive working environment. We posit that we can reuse a common corporate organizational structure: teams of independent AI agents with strict role boundaries can work with common goals, but opposing incentives. Multiple models serving as a team of rivals can catch and minimize errors within the final product at a small cost to the velocity of actions. In this paper we demonstrate that we can achieve reliability without acquiring perfect components, but through careful orchestration of imperfect ones. This paper describes the architecture of such a system in practice: specialized agent teams (planners, executors, critics, experts), organized into an organization with clear goals, coordinated through a remote code executor that keeps data transformations and tool invocations separate from reasoning models. Rather than agents directly calling tools and ingesting full responses, they write code that executes remotely; only relevant summaries return to agent context. By preventing raw data and tool outputs from contaminating context windows, the system maintains clean separation between perception (brains that plan and reason) and execution (hands that perform heavy data transformations and API calls). We demonstrate the approach achieves over 90% internal error interception prior to user exposure while maintaining acceptable latency tradeoffs. A survey from our traces shows that we only trade off cost and latency to achieve correctness and incrementally expand capabilities without impacting existing ones.
comment: 15 pages, 6 figures, 7 tables
MARBLE: Multi-Agent Reasoning for Bioinformatics Learning and Evolution
Motivation: Developing high-performing bioinformatics models typically requires repeated cycles of hypothesis formulation, architectural redesign, and empirical validation, making progress slow, labor-intensive, and difficult to reproduce. Although recent LLM-based assistants can automate isolated steps, they lack performance-grounded reasoning and stability-aware mechanisms required for reliable, iterative model improvement in bioinformatics workflows. Results: We introduce MARBLE, an execution-stable autonomous model refinement framework for bioinformatics models. MARBLE couples literature-aware reference selection with structured, debate-driven architectural reasoning among role-specialized agents, followed by autonomous execution, evaluation, and memory updates explicitly grounded in empirical performance. Across spatial transcriptomics domain segmentation, drug-target interaction prediction, and drug response prediction, MARBLE consistently achieves sustained performance improvements over strong baselines across multiple refinement cycles, while maintaining high execution robustness and low regression rates. Framework-level analyses demonstrate that structured debate, balanced evidence selection, and performance-grounded memory are critical for stable, repeatable model evolution, rather than single-run or brittle gains. Availability: Source code, data and Supplementary Information are available at https://github.com/PRISM-DGU/MARBLE.
Predicting Long-Term Self-Rated Health in Small Areas Using Ordinal Regression and Microsimulation
This paper presents an approach for predicting the self-rated health of individuals in a future population utilising the individuals' socio-economic characteristics. An open-source microsimulation is used to project Ireland's population into the future where each individual is defined by a number of demographic and socio-economic characteristics. The model is disaggregated spatially at the Electoral Division level, allowing for analysis of results at that, or any broader geographical scales. Ordinal regression is utilised to predict an individual's self-rated health based on their socio-economic characteristics and this method is shown to match well to Ireland's 2022 distribution of health statuses. Due to differences in the health status distributions of the health microdata and the national data, an alignment technique is proposed to bring predictions closer to real values. It is illustrated for one potential future population that the effects of an ageing population may outweigh other improvements in socio-economic outcomes to disimprove Ireland's mean self-rated health slightly. Health modelling at this kind of granular scale could offer local authorities a chance to predict and combat health issues which may arise in their local populations in the future.
Ireland in 2057: Projections using a Geographically Diverse Dynamic Microsimulation
This paper presents a dynamic microsimulation model developed for Ireland, designed to simulate key demographic processes and individual life-course transitions from 2022 to 2057. The model captures four primary events: births, deaths, internal migration, and international migration, enabling a comprehensive examination of population dynamics over time. Each individual in the simulation is defined by seven core attributes: age, sex, marital status, citizenship, whether the person was living in Ireland in the previous year, highest level of education attained, and economic status. These characteristics evolve stochastically based on transition probabilities derived from empirical data from the Irish context. Individuals are spatially disaggregated at the Electoral Division level. By modelling individuals at this granular level, the simulation facilitates in-depth local analysis of demographic shifts and socioeconomic outcomes under varying scenarios and policy assumptions. The model thus serves as a versatile tool for both academic inquiry and evidence-based policy development, offering projections that can inform long-term planning and strategic decision-making through 2057. The microsimulation achieves a close match in population size and makeup in all scenarios when compared to Demographic Component Methods. Education levels are projected to increase significantly, with nearly 70% of young people projected to attain a third level degree at some point in their lifetime. The unemployment rate is also projected to decrease as a result of the increased education levels.
comment: 44 pages, 11 figures
Mathematical Framework for Custom Reward Functions in Job Application Evaluation using Reinforcement Learning
Most of the traditional Applicant Tracking Systems (ATS) depend on strict matching using keywords, where candidates that are highly qualified are many times disqualified because of minor semantic differences. In this article, the two-stage process of developing a more comprehensive resume assessment system based on a small language model that is trained with fewer than 600M parameters is introduced and fine-tuned by using GRPO with a uniquely designed reward function. The initial stage is Supervised Fine-Tuning (SFT), which is used to create a strong base model with the ability to perceive resumes beyond superficial overlap of keywords. This SFT model is further optimized in the second step with Reinforcement Learning (RL) via GRPO with the help of multi-component-based rewarding, which will not be considered as a commission of tokens matching. In the initial RL experiments, we found a severe difficulty in the shape of reward hacking: overly aggressive penalty terms resulted in unstable training dynamics and prohibitively negative model behavior. This was solved by trial-and-error refinement of the reward and careful training hyperparameter tuning, which led to a stable and controlled process of gentle polishing. The GRPO-refined model shows high real-life performance, as it shows an accuracy of 91% on unseen data used for testing. It has a high recall of 0.85 on the SELECTED class with a perfect precision of 1.0, which highlights its high reliability for identifying qualified applicants. These findings demonstrate that an appropriately structured two-step fine-tuning pipeline can effectively be used to transfer a small language model into human-like candidate evaluation, surpassing the shortcomings of both traditional ATS systems and unrefined uses of reinforcement learning.
comment: 13 pages, 4 figures, 2 equations, 3 Tables
A Layered Protocol Architecture for the Internet of Agents
Large Language Models (LLMs) have demonstrated remarkable performance improvements and the ability to learn domain-specific languages (DSLs), including APIs and tool interfaces. This capability has enabled the creation of AI agents that can perform preliminary computations and act through tool calling, which is now being standardized via protocols like MCP. However, LLMs face fundamental limitations: their context windows cannot grow indefinitely, restricting their memory and computational capacity. Agent collaboration emerges as essential for solving increasingly complex problems, mirroring how computational systems rely on different types of memory to scale. The "Internet of Agents" (IoA) represents the communication stack that enables agents to scale by distributing computation across collaborating entities. Current network architectural stacks (OSI and TCP/IP) were designed for data delivery between hosts and processes, not for agent collaboration with semantic understanding. To address this gap, we propose two new layers: an Agent Communication Layer (L8) and an Agent Semantic Layer (L9). L8 formalizes the structure of communication, standardizing message envelopes, speech-act performatives (e.g., REQUEST, INFORM), and interaction patterns (e.g., request-reply, publish-subscribe), building on protocols like MCP. The proposed L9 layer: (1) formalizes semantic context discovery and negotiation, (2) provides semantic grounding by binding terms to semantic context, and (3) semantically validates incoming prompts and performs disambiguation as needed. Furthermore, L9 introduces primitives for coordination and consensus, allowing agents to achieve alignment on shared states, collective goals, and distributed beliefs. Together, these layers provide the foundation for scalable, distributed agent collaboration, enabling the next generation of multi-agentic systems.
Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai
Large Language Models (LLMs) are increasingly embedded in autonomous agents that engage, converse, and co-evolve in online social platforms. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments). We conduct a large-scale empirical analysis of agent behavior, examining how toxic responses relate to toxic stimuli, how repeated exposure to toxicity affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that toxic responses are more likely following toxic stimuli, and, at the same time, cumulative toxic exposure (repeated over time) significantly increases the probability of toxic responding. We further introduce two influence metrics, revealing a strong negative correlation between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents, particularly as such agents operate in online environments where they may engage not only with other AI chatbots, but also with human counterparts. This could trigger unwanted and pernicious phenomena, such as hate-speech propagation and cyberbullying. In an effort to reduce such risks, monitoring exposure to toxic content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.
Systems and Control (EESS)
The Impact of Interference Cognition on the Reliability and Capacity of Industrial Wireless Communications
Interference significantly impacts the performance of industrial wireless networks, particularly n severe interference environments with dense networks reusing spectrum resources intensively. Although delicate interference information is often unavailable in conventional networks, emerging interference cognition techniques can compensate this critical problem with possibly different precision. This paper investigates the relationship between precision of interference cognition and system performance. We propose a novel performance analysis framework that quantifies the impact of varying interference information precision on achievable rate. Specifically, leveraging the Nakagami-$\mathbf{m}$ fading channel model, we analytically and asymptotically analyze the average achievable rate in the finite blocklength regime for different precision levels of signal and interference information. Our findings reveal the critical importance of identifying per-link interference information for achieving optimal performance. Additionally, obtaining instantaneous information is more beneficial for signal links.
comment: This work has been submitted to the IEEE for possible publication
A flexible language model-assisted electronic design automation framework
Large language models (LLMs) are transforming electronic design automation (EDA) by enhancing design stages such as schematic design, simulation, netlist synthesis, and place-and-route. Existing methods primarily focus these optimisations within isolated open-source EDA tools and often lack the flexibility to handle multiple domains, such as analogue, digital, and radio-frequency design. In contrast, modern systems require to interface with commercial EDA environments, adhere to tool-specific operation rules, and incorporate feedback from design outcomes while supporting diverse design flows. We propose a versatile framework that uses LLMs to generate files compatible with commercial EDA tools and optimise designs using power-performance-area reports. This is accomplished by guiding the LLMs with tool constraints and feedback from design outputs to meet tool requirements and user specifications. Case studies on operational transconductance amplifiers, microstrip patch antennas, and FPGA circuits show that the framework is effective as an EDA-aware assistant, handling diverse design challenges reliably.
comment: 17 pages, 5 figures, 1 Supplementary (12 pages, 13 figures, 1 table)
Data-Driven Safe Output Regulation of Strict-Feedback Linear Systems with Input Delay
This paper develops a data-driven safe control framework for linear systems possessing a known strict-feedback structure, but with most plant parameters, external disturbances, and input delay being unknown. By leveraging Koopman operator theory, we utilize Krylov dynamic mode decomposition (DMD) to extract the system dynamics from measured data, enabling the reconstruction of the system and disturbance matrices. Concurrently, the batch least-squares identification (BaLSI) method is employed to identify other unknown parameters in the input channel. Using control barrier functions (CBFs) and backstepping, we first develop a full-state safe controller. Based on this, we build an output-feedback controller by performing system identification using only the output data and actuation signals as well as constructing an observer to estimate the unmeasured plant states. The proposed approach achieves: 1) finite-time identification of a substantial set of unknown system quantities, and 2) exponential convergence of the output state (the state furthest from the control input) to a reference trajectory while rigorously ensuring safety constraints. The effectiveness of the proposed method is demonstrated through a safe vehicle platooning application.
Utilizing the Perceived Age to Maximize Freshness in Query-Based Update Systems
Query-based sampling has become an increasingly popular technique for monitoring Markov sources in pull-based update systems. However, most of the contemporary literature on this assumes an exponential distribution for query delay and often relies on the assumption that the feedback or replies to the queries are instantaneous. In this work, we relax both of these assumptions and find optimal sampling policies for monitoring continuous-time Markov chains (CTMC) under generic delay distributions. In particular, we show that one can obtain significant gains in terms of mean binary freshness (MBF) by employing a waiting based strategy for query-based sampling.
Optimal Calibration of the endpoint-corrected Hilbert Transform
Accurate, low-latency estimates of the instantaneous phase of oscillations are essential for closed-loop sensing and actuation, including (but not limited to) phase-locked neurostimulation and other real-time applications. The endpoint-corrected Hilbert transform (ecHT) reduces boundary artefacts of the Hilbert transform by applying a causal narrow-band filter to the analytic spectrum. This improves the phase estimate at the most recent sample. Despite its widespread empirical use, the systematic endpoint distortions of ecHT have lacked a principled, closed-form analysis. In this study, we derive the ecHT endpoint operator analytically and demonstrate that its output can be decomposed into a desired positive-frequency term (a deterministic complex gain that induces a calibratable amplitude/phase bias) and a residual leakage term setting an irreducible variance floor. This yields (i) an explicit characterisation and bounds for endpoint phase/amplitude error, (ii) a mean-squared-error-optimal scalar calibration (c-ecHT), and (iii) practical design rules relating window length, bandwidth/order, and centre-frequency mismatch to residual bias via an endpoint group delay. The resulting calibrated ecHT achieves near-zero mean phase error and remains computationally compatible with real-time pipelines. Code and analyses are provided at https://github.com/eosmers/cecHT.
Where to Place a Heavy Payload on a Multirotor UAV for Best Control Performance
This paper studies the impact of rigidly attached heavy payload placement - where the payload mass significantly influences the UAV's dynamics - on the stability and control performance of a multirotor unmanned aerial vehicle (UAV). In particular, we focus on how the position of such a payload relative to the vehicle's Center of Gravity (CoG) affects the stability and control performance at an arbitrary point of interest on the UAV, such as the payload position, and on how this position can be optimized. Our conclusions are based on two key contributions. First, we analyze the stability of the zero-dynamics of a complete nonlinear model of the UAV with payload. We demonstrate that the stability of the zero dynamics depends on the vertical signed distance in the body-fixed frame between the controlled output position and the combined CoG of the UAV with payload. Specifically, positioning the output below the CoG yields unstable zero dynamics, while the linearized zero dynamics are marginally stable when placing it above, indicating reduced sensitivity to input disturbances. Second, we analyze the performance of the linearized UAV model with payload by providing an analytical expression for the H2-norm, from which we can quantify the system's attenuation to white noise input disturbances. We conclude that less control authority leads to a higher optimal position of the controlled output with respect to the CoG for closed-loop white-noise disturbance rejection capabilities, also when the heavy payload is the controlled output. The results are illustrated through numerical examples.
Small Models, Big Impact: Tool-Augmented AI Agents for Wireless Network Planning
Large Language Models (LLMs) such as ChatGPT promise revolutionary capabilities for Sixth-Generation (6G) wireless networks but their massive computational requirements and tendency to generate technically incorrect information create deployment barriers. In this work, we introduce MAINTAINED: autonomous artificial intelligence agent for wireless network deployment. Instead of encoding domain knowledge within model parameters, our approach orchestrates specialized computational tools for geographic analysis, signal propagation modeling, and network optimization. In a real-world case study, MAINTAINED outperforms state-of-the-art LLMs including ChatGPT-4o, Claude Sonnet 4, and DeepSeek-R1 by up to 100-fold in verified performance metrics while requiring less computational resources. This paradigm shift, moving from relying on parametric knowledge towards externalizing domain knowledge into verifiable computational tools, eliminates hallucination in technical specifications and enables edge-deployable Artificial Intelligence (AI) for wireless communications.
comment: 7 pages, 4 figures, 2 tables, accepted by IEEE Communications Magazine
Base Station Sleeping Strategy Based on Load Sharing in Ultra-Dense Networks
To address the issues of high operational costs and low energy efficiency (EE) caused by the dense deployment of small base stations (s-BSs) in 5G ultra-dense networks (UDNs), this paper first constructs a multi-objective mathematical optimization model targeting maximizing EE and minimizing the number of active BSs. The model incorporates key constraints including BS operational state, user equipment (UE)-BS connection relationship, and load threshold, laying a theoretical foundation for the coordinated optimization of energy conservation and quality of service. Based on this model, an integrated solution combining UE-BS initial connection optimization and load-sharing based BS sleeping is proposed. In the initial connection phase, with communication quality and BS load as dual constraints, efficient matching between UEs and optimal BSs is achieved through three sequential steps: communication feasibility screening, redundant connection removal, and overload load redistribution. This resolves the problems of load imbalance and difficult identification of redundant BSs in UDNs arising from unordered initial connections. In the BS sleeping phase, a BS sleeping index, comprehensively considering UE transferability and backup BS resources, is innovatively introduced to quantify BS dormancy priority. Through a closed-loop process involving low-load BS screening, adjacent BS load evaluation, and load sharing by two takeover BSs based on their capacity, accurate dormancy of redundant BSs and collaborative load migration are realized. Simulation results in a typical UDNs scenario demonstrate that, compared with the traditional baseline scheme, the proposed solution exhibits significant advantages in convergence speed, optimization of the number of active BSs, and EE improvement.
comment: 11 pages,6 figures
Integrated Sensing and Communication for Low-Altitude Security
The dense concentration of low-altitude, slow-speed, and small-size targets in the complex low-altitude environment poses significant security challenges, including failures in continuous wide-area sensing and ambiguous target intent, which existing regulatory frameworks struggle to address. Integrated sensing and communication (ISAC), a hallmark of next-generation mobile communication, offers a transformative approach to low-altitude security governance. By leveraging existing cellular infrastructure and spectrum resources, ISAC enables the construction of a seamless wide-area sensing network, supports intelligent feature extraction and intent inference, facilitates real-time collaborative decision-making, and establishes a dynamic trust authentication framework. This article systematically reviews the technical system, analyzes the security challenges, forecasts the enabling value of ISAC, and discusses the resulting open problems and challenges, thereby laying a foundation for future research and industrial implementation.
comment: 6 pages, 5 figures; This paper presents a forward-looking perspective on the application of ISAC technology in low-altitude security governance, addressing challenges posed by low-altitude, slow-speed, and small-size targets
Linear viscoelastic rheological FrBD models
In [1], a new modeling paradigm for developing rate-and-state-dependent, control-oriented friction models was introduced. The framework, termed Friction with Bristle Dynamics (FrBD), combines nonlinear analytical expressions for the friction coefficient with constitutive equations for bristle-like elements. Within the FrBD framework, this letter introduces two novel formulations based on the two most general linear viscoelastic models for solids: the Generalized Maxwell (GM) and Generalized Kelvin-Voigt (GKV) elements. Both are analyzed in terms of boundedness and passivity, revealing that these properties are satisfied for any physically meaningful parametrization. An application of passivity for control design is also illustrated, considering an example from robotics. The findings of this letter systematically integrate rate-and-state dynamic friction models with linear viscoelasticity.
comment: 6 pages, 3 figures. Under review at IEEE LCSS
A Blockchain-Oriented Software Engineering Architecture for Carbon Credit Certification Systems
Carbon credit systems have emerged as a policy tool to incentivize emission reductions and support the transition to clean energy. Reliable carbon-credit certification depends on mechanisms that connect actual, measured renewable-energy production to verifiable emission-reduction records. Although blockchain and IoT technologies have been applied to emission monitoring and trading, existing work offers limited support for certification processes, particularly for small and medium-scale renewable installations. This paper introduces a blockchain-based carbon-credit certification architecture, demonstrated through a 100 kWp photovoltaic case study, that integrates real-time IoT data collection, edge-level aggregation, and secure on-chain storage on a permissioned blockchain with smart contracts. Unlike approaches focused on trading mechanisms, the proposed system aligns with European legislation and voluntary carbon-market standards, clarifying the practical requirements and constraints that apply to photovoltaic operators. The resulting architecture provides a structured pathway for generating verifiable carbon-credit records and supporting third-party verification.
comment: 2026 IEEE International Conference on Software Analysis, Evolution and Reengineering - Companion (SANER-C) 9th International Workshop on Blockchain Oriented Software Engineering March 17-20, 2026 Limassol, Cyprus
Research on Adaptive Inertial Control in Synchronization Systems: Based on Variational Optimization Methods and Their Applications in the Stability of Complex Networks
Aiming at the core problem that it is difficult for a fixed inertia coefficient to balance transient disturbance suppression and long-term stability in complex network synchronization systems, an adaptive inertia control strategy based on variational optimization is proposed. Taking the Kuramoto model with inertia as the research carrier, the analytical expression of the time-varying inertia coefficient M(t) is strictly derived by the functional variational method, and a hierarchical control structure of "benchmark inertia + disturbance feedback" is constructed to achieve the organic unity of minimizing the vulnerability performance function H(T) and stability constraints. A multimodal decoupling control strategy based on Laplacian eigenvector projection is designed to enhance the feedback strength of the dominant mode by eigenvalue weighting, improving the control accuracy and dynamic response speed. Simulation verification is carried out in complex network systems, and the control performance of regular networks (RG), random networks (ER), small-world networks (SW), scale-free networks (SF) and spider webs (SP) under three typical disturbances of pulses, monotonic decays and oscillatory decays is systematically analyzed. The results show that the proposed strategy reduces H(T) of the five networks by 19%-25%, shortens the relaxation time by 15%-24%, and the real parts of all system eigenvalues are less than -0.25s^-1 , meeting the asymptotic stability criterion. This study provides a new theoretical framework and engineering implementation scheme for the stability control of complex network synchronization systems, which can be widely applied to fields such as power grids, communication networks, and neural networks.
comment: 36 pages, 4 figures
RIM Hand : A Robotic Hand with an Accurate Carpometacarpal Joint and Nitinol-Supported Skeletal Structure
This paper presents the flexible RIM Hand, a biomimetic robotic hand that precisely replicates the carpometacarpal (CMC) joints and employs superelastic Nitinol wires throughout its skeletal framework. By modeling the full carpal-to-metacarpal anatomy, the design enables realistic palm deformation through tendon-driven fingers while enhancing joint restoration and supports skeletal structure with Nitinol-based dorsal extensors. A flexible silicone skin further increases contact friction and contact area, enabling stable grasps for diverse objects. Experiments show that the palm can deform up to 28%, matching human hand flexibility, while achieving more than twice the payload capacity and three times the contact area compared to a rigid palm design. The RIM Hand thus offers improved dexterity, compliance, and anthropomorphism, making it promising for prosthetic and service-robot applications.
comment: Soft Robotics
Distributed Coverage Control on Poriferous Surface via Poly-Annulus Conformal Mapping
The inherent non-convexity of poriferous surfaces typically entraps agents in local minima and complicates workload distribution. To resolve this, we propose a distributed diffeomorphic coverage control framework for the multi-agent system (MAS) in such surfaces. First, we establish a distributed poly-annulus conformal mapping that transforms arbitrary poriferous surfaces into a multi-hole disk. Leveraging this topological equivalence, a collision-free sectorial partition mechanism is designed in the multi-hole disk, which rigorously induces strictly connected subregions and workload balance on the poriferous surfaces. This mechanism utilizes a buffer-based sequence mechanism to ensure strict topological safety when bypassing obstacles. Furthermore, a pull-back Riemannian metric is constructed to define the length metric that encodes safety constraints. Based on this metric, a distributed gradient-based control law is synthesized to drive agents toward optimal configurations, ensuring simultaneous obstacle avoidance and coverage optimization. Theoretical analyses guarantee the Input-to-State Stability (ISS) of the partition dynamics and the asymptotic convergence of the closed-loop system. Numerical simulations confirm the reachability and robustness of the proposed coverage algorithm, offering a scalable solution for distributed coverage in poriferous surfaces.
Resilient Hierarchical Power Control for Hybrid GFL/GFM Microgrids Under Mixed Cyber-Attacks and Physical Constraints
Hybrid microgrids integrating Grid-Following (GFL) and Grid-Forming (GFM) inverters present complex control challenges arising from the decoupling between long-term economic dispatch and real-time dynamic regulation, as well as the distinct physical limitations of heterogeneous inverters under cyber uncertainties. This paper proposes a Resilient Hierarchical Power Control (RHPC) strategy to unify these conflicting requirements within a cohesive framework. A standardized power increment mechanism is developed to bridge the tertiary and secondary layers, ensuring that real-time load fluctuations are compensated strictly according to the optimal economic ratios derived from the tertiary layer. To address the strict active power saturation constraints of GFL units, a dynamic activation scheme coupled with projection operators is introduced, which actively isolates saturated nodes from the consensus loop to prevent integrator wind-up and preserve the stability of the GFM backbone. Furthermore, the proposed framework incorporates a multi-scale attention mechanism and LSTM-based predictors into the secondary control protocol, endowing the system with robustness against unbounded False Data Injection (FDI) attacks and packet losses. Rigorous theoretical analysis confirms that the system achieves Uniformly Ultimately Bounded (UUB) convergence, and simulations on a modified IEEE 33-bus system demonstrate that the proposed strategy significantly improves power sharing accuracy and operational resilience in both grid-connected and islanded modes compared to conventional methods.
Outage Identification from Electricity Market Data: Quickest Change Detection Approach
Power system outages expose market participants to significant financial risk unless promptly detected and hedged. We develop an outage identification method from public market signals grounded in the parametric quickest change detection (QCD) theory. Parametric QCD operates on stochastic data streams, distinguishing pre- and post-change regimes using the ratio of their respective probability density functions. To derive the density functions for normal and post-outage market signals, we exploit multi-parametric programming to decompose complex market signals into parametric random variables with a known density. These densities are then used to construct a QCD-based statistic that triggers an alarm as soon as the statistic exceeds an appropriate threshold. Numerical experiments on a stylized PJM testbed demonstrate rapid line outage identification from public streams of electricity demand and price data.
comment: 7 pages, 2 figures, 1 table
Balancing Independent and Collaborative Service
We study a two-type server queueing system where flexible Type-I servers, upon their initial interaction with jobs, decide in real time whether to process them independently or in collaboration with dedicated Type-II servers. Independent processing begins immediately, as does collaborative service if a Type-II server is available. Otherwise, the job and its paired Type-I server wait in queue for collaboration. Type-I servers are non-preemptive and cannot engage with new jobs until their current job is completed. We provide a complete characterization of the structural properties of the optimal policy for the clearing system. In particular, an optimal control is shown to follow a threshold structure based on the number of jobs in the queue before a Type-I first interaction and on the number of jobs in either independent or collaborative service. We propose simple threshold heuristics, based on linear approximations, for real-time decision-making. In much of the parameter and state spaces, we establish theoretical bounds that compare the thresholds proposed by our heuristics to those of optimal policies and identify parameter configurations where these bounds are attained. Outside of these regions, the optimal thresholds are infinite. Numerical experiments further demonstrate the accuracy and robustness of our heuristics, particularly when the initial queue length is high. Our proposed heuristics achieve costs within 0.5% of the optimal policy on average and significantly outperform benchmark policies that exhibit extreme sensitivity to system parameters, sometimes incurring costs exceeding 100% of the optimal.
Control policies for a two-stage queueing system with parallel and single server options
We study a two-stage tandem service queue attended by two servers. Each job-server pair must complete both service phases together, with the server unable to begin a new job until the current one is fully processed after two stages. Immediately after the first phase of service, the server decides whether to send the job/customer to a downstream station that allows parallel processing or to a single-service facility that offers faster or higher-quality service but handles only one job at a time. This choice determines whether the second phase commences immediately or (potentially) after waiting in a queue for the single-service facility to become available. The decision-making scenario is modeled via a Markov decision process formulation, of a clearing system with holding costs at each station. We fully characterize the structural properties of an optimal control policy based on the relationship between the service rates at the downstream stations. A numerical study highlights the significance of optimal control by comparing its performance against several natural heuristic policies.
All-Pass Fractional OPF: A Solver-Friendly, Physics-Preserving Approximation of AC OPF
This paper presents a fractional approximation of the AC optimal power flow (AC OPF) problem based on an all-pass approximation of the exponential power flow kernel. The classical AC OPF relies on trigonometric coupling between bus voltage phasors, which yields a nonconvex program with oscillatory derivatives that can slow, or in some cases destabilize, interior-point methods. We replace the trigonometric terms with an all-pass fractional (APF) approximation whose real and imaginary components act as smooth surrogates for the cosine and sine functions, and we introduce a pre-rotation to shift the argument of the approximation toward its most accurate region, ensuring that the reformulated power flow model preserves physical loss behavior, maintains the symmetry of the classical kernels, and improves the conditioning of the Jacobian and Hessian matrices. The proposed APF OPF formulation remains nonconvex, as in the classical model, but it eliminates trigonometric evaluations and empirically produces larger and more stable Newton steps under standard interior-point solvers. Numerical results on more than 25 IEEE and PGLib test systems ranging from 9 to 10{,}000 buses demonstrate that the APF OPF model achieves solutions with accuracy comparable to that of the classical formulation while reducing solver times, indicating a more solver-friendly nonconvex representation of AC OPF. All code, functions, verification scripts, and generated results are publicly available on \href{https://github.com/LSU-RAISE-LAB/APF-OPF}{GitHub}, along with a README describing how to run and reproduce the experiments.
Event-Triggered Newton Extremum Seeking for Multivariable Optimization
This paper presents a static event-triggered control strategy for multivariable Newton-based extremum seeking. The proposed method integrates event-triggered actuation into the Newton-based optimization framework to reduce control updates while maintaining rapid convergence to the extremum. Unlike traditional gradient-based extremum seeking, where the convergence rate depends on the unknown Hessian of the cost function, the proposed approach employs a dynamic estimator of the Hessian inverse, formulated as a Riccati equation, enabling user-assignable convergence rates. The event-triggering mechanism is designed to minimize unnecessary actuation updates while preserving stability and performance. Using averaging theory, we establish local stability results and exponential convergence to a neighborhood of the unknown extremum point. Additionally, numerical simulations illustrate the benefits of the proposed approach over gradient-based and continuously actuated Newton-based extremum seeking, showing improved convergence rates and reduced control update frequency, leading to more efficient implementation in real-time optimization scenarios.
$π$MPC: A Parallel-in-horizon and Construction-free NMPC Solver
The alternating direction method of multipliers (ADMM) has gained increasing popularity in embedded model predictive control (MPC) due to its code simplicity and pain-free parameter selection. However, existing ADMM solvers either target general quadratic programming (QP) problems or exploit sparse MPC formulations via Riccati recursions, which are inherently sequential and therefore difficult to parallelize for long prediction horizons. This technical note proposes a novel \textit{parallel-in-horizon} and \textit{construction-free} nonlinear MPC algorithm, termed $π$MPC, which combines a new variable-splitting scheme with a velocity-based system representation in the ADMM framework, enabling horizon-wise parallel execution while operating directly on system matrices without explicit MPC-to-QP construction. Numerical experiments and accompanying code are provided to validate the effectiveness of the proposed method.
comment: 8 pages
Rethinking On-Device LLM Reasoning: Why Analogical Mapping Outperforms Abstract Thinking for IoT DDoS Detection
The rapid expansion of IoT deployments has intensified cybersecurity threats, notably Distributed Denial of Service (DDoS) attacks, characterized by increasingly sophisticated patterns. Leveraging Generative AI through On-Device Large Language Models (ODLLMs) provides a viable solution for real-time threat detection at the network edge, though limited computational resources present challenges for smaller ODLLMs. This paper introduces a novel detection framework that integrates Chain-of-Thought (CoT) reasoning with Retrieval-Augmented Generation (RAG), tailored specifically for IoT edge environments. We systematically evaluate compact ODLLMs, including LLaMA 3.2 (1B, 3B) and Gemma 3 (1B, 4B), using structured prompting and exemplar-driven reasoning strategies. Experimental results demonstrate substantial performance improvements with few-shot prompting, achieving macro-average F1 scores as high as 0.85. Our findings highlight the significant advantages of incorporating exemplar-based reasoning, underscoring that CoT and RAG approaches markedly enhance small ODLLMs' capabilities in accurately classifying complex network attacks under stringent resource constraints.
VCSEL-based CPO for Scale-Up in A.I. Datacenter. Status and Perspectives
The drastic increase of bandwidth demands in AI datacenters requires new solutions with low power consumption and high bandwidth densities. Further increasing the total bandwidth with copper interconnects is challenging, thus it is crucial to introduce optics into the scale-up network. For this purpose, the most important metrics are the energy efficiency of the total link in addition to the spatial bandwidth-density and reach. With the inefficiency of copper traces at high speeds, transitioning to co-packaged optics is no longer optional but essential. Here, we show that the mature VCSEL technology offers the ideal combination of low-cost, low-latency, high-reliability, and energy efficiency at all bitrates, thanks to their unique versatility and high wall-plug-efficiency. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
comment: 8 pages, 6 figures and tables
Tube-Based Robust Control Strategy for Vision-Guided Autonomous Vehicles
A robust control strategy for autonomous vehicles can improve system stability, enhance riding comfort, and prevent driving accidents. This paper presents a novel interpolation-tube-based constrained iterative linear quadratic regulator (itube-CILQR) algorithm for autonomous computer-vision-based vehicle lane-keeping. The goal of the algorithm is to enhance robustness during high-speed cornering on tight turns. Compared with standard tube-based approaches, the proposed itube-CILQR algorithm reduces system conservatism and exhibits higher computational speed. Numerical simulations and vision-based experiments were conducted to examine the feasibility of using the proposed algorithm for controlling autonomous vehicles. The results indicated that the proposed algorithm achieved superior vehicle lane-keeping performance to variational CILQR-based methods and model predictive control (MPC) approaches involving the use of a classical interior-point optimizer. Specifically, itube-CILQR required an average runtime of 3.45 ms to generate a control signal for guiding a self-driving vehicle. By comparison, itube-MPC typically required a 4.32 times longer computation time to complete the same task. Moreover, the influence of conservatism on system behavior was investigated by exploring the variations in the interpolation variables derived using the proposed itube-CILQR algorithm during lane-keeping maneuvers.
comment: 15 pages, 16 figures
Conditioning Aircraft Trajectory Prediction on Meteorological Data with a Physics-Informed Machine Learning Approach
Accurate aircraft trajectory prediction (TP) in air traffic management systems is confounded by a number of epistemic uncertainties, dominated by uncertain meteorological conditions and operator specific procedures. Handling this uncertainty necessitates the use of probabilistic, machine learned models for generating trajectories. However, the trustworthiness of such models is limited if generated trajectories are not physically plausible. For this reason we propose a physics-informed approach in which aircraft thrust and airspeed are learned from data and are used to condition the existing Base of Aircraft Data (BADA) model, which is physics-based and enforces energy-based constraints on generated trajectories. A set of informative features are identified and used to condition a probabilistic model of aircraft thrust and airspeed, with the proposed scheme demonstrating a 20% improvement in skilfulness across a set of six metrics, compared against a baseline probabilistic model that ignores contextual information such as meteorological conditions.
comment: Accepted to AIAA SciTech 2026 Forum
Current and temperature imbalances in parallel-connected grid storage battery modules
A key challenge with large battery systems is heterogeneous currents and temperatures in modules with parallel-connected cells. Although extreme currents and temperatures are detrimental to the performance and lifetime of battery cells, there is not a consensus on the scale of typical imbalances within grid storage modules. Here, we quantify these imbalances through simulations and experiments on an industrially representative grid storage battery module consisting of prismatic lithium iron phosphate cells, elucidating the evolution of current and temperature imbalances and their dependence on individual cell and module parameter variations. Using a sensitivity analysis, we find that varying contact resistances and cell resistances contribute strongly to temperature differences between cells, from which we define safety thresholds on cell-to-cell variability. Finally, we investigate how these thresholds change for different applications, to outline a set of robustness metrics that show how cycling at lower C-rates and narrower SOC ranges can mitigate failures.
Blockage-Aware Multi-RIS WSR Maximization via Per-RIS Indexed Synchronization Sequences and Closed-Form Riemannian Updates
Millimeter-wave (mmWave) multi-user MIMO systems are highly vulnerable to blockage, and reconfigurable intelligent surfaces (RIS) have been proposed as a remedy. However, RIS links may themselves be blocked, while most prior works assume ideal RIS availability. We propose an end-to-end blockage-aware multi-RIS weighted sum-rate (WSR) optimization framework. The BS transmits short per-RIS indexed synchronization signals, enabling each user to identify blocked panels through a simple energy detection test. Based on the detected feasible sets, we jointly optimize the BS precoder and RIS phases via a Closed-form Riemannian Phase Alignment (CRPA) algorithm. CRPA provides unit-modulus-preserving closed-form updates, requiring no projection or line search, and ensures monotone ascent. Simulations validate reliable blockage detection and notable WSR and convergence gains over existing baselines.
Distribution Steering for Discrete-Time Uncertain Ensemble Systems
Ensemble systems appear frequently in many engineering applications and, as a result, they have become an important research topic in control theory. These systems are best characterized by the evolution of their underlying state distribution. Despite the work to date, few results exist dealing with the problem of directly modifying (i.e., ``steering'') the distribution of an ensemble system. In addition, in most existing results, the distribution of the states of an ensemble of discrete-time systems is assumed to be Gaussian. However, in case the system parameters are uncertain, it is not always realistic to assume that the distribution of the system follows a Gaussian distribution, thus complicating the solution of the overall problem. In this paper, we address the general distribution steering problem for first-order discrete-time ensemble systems, where the distributions of the system parameters and the states are arbitrary with finite first few moments. Linear system dynamics are considered using the method of power moments to transform the original infinite-dimensional problem into a finite-dimensional one. We also propose a control law for the ensuing moment system, which allows us to obtain the power moments of the desired control inputs. Finally, we solve the inverse problem to obtain the feasible control inputs from their corresponding power moments. We provide a numerical example to validate our theoretical developments.
comment: 8 pages, 6 figures
Computationally Efficient Signal Detection with Unknown Bandwidths
Signal detection in environments with unknown signal bandwidth and time intervals is a fundamental problem in adversarial and spectrum-sharing scenarios. This paper addresses the problem of detecting signals occupying unknown degrees of freedom from non-coherent power measurements, where the signal is constrained to an interval in one dimension or a hyper-cube in multiple dimensions. A GLRT is derived, resulting in a straightforward metric involving normalized average signal energy for each candidate signal set. We present bounds on false alarm and missed detection probabilities, demonstrating their dependence on SNR and signal set sizes. To overcome the inherent computational complexity of exhaustive searches, we propose a computationally efficient binary search method, reducing the complexity from O(N^2) to O(N) for one-dimensional cases. Simulations indicate that the method maintains performance near exhaustive searches and achieves asymptotic consistency, with interval-of-overlap converging to one under constant SNR as measurement size increases. The simulation studies also demonstrate superior performance and reduced complexity compared to contemporary neural network-based approaches, specifically outperforming custom-trained U-Net models in spectrum detection tasks.
comment: Published in the IEEE Open Journal of the Communications Society
Co-Buchi Barrier Certificates for Discrete-time Dynamical Systems
Barrier certificates provide functional overapproximations for the reachable set of dynamical systems and provide inductive guarantees on the safe evolution of the system. In automata-theoretic verification, a key query is to determine whether the system visits a given predicate over the states finitely often, typically resulting from the complement of the traditional Buchi acceptance condition. This paper proposes a barrier certificate approach to answer such queries by developing a notion of co-Buchi barrier certificates (CBBCs) that generalize classic barrier certificates to ensure that the traces of a system visit a given predicate a fixed number of times. Our notion of CBBC is inspired from bounded synthesis paradigm to LTL realizability, where the LTL specifications are converted to safety automata via universal co-Buchi automata with a bound on final state visitations provided as a hyperparameter. Our application of CBBCs in verification is analogous: we fix a bound and search for a suitable barrier certificate, increasing the bound if no suitable function can be found. We then use these CBBCs to verify our system against properties specified by co-Buchi automata and demonstrate their effectiveness via some case studies.
Robotics
Event-based Heterogeneous Information Processing for Online Vision-based Obstacle Detection and Localization
This paper introduces a novel framework for robotic vision-based navigation that integrates Hybrid Neural Networks (HNNs) with Spiking Neural Network (SNN)-based filtering to enhance situational awareness for unmodeled obstacle detection and localization. By leveraging the complementary strengths of Artificial Neural Networks (ANNs) and SNNs, the system achieves both accurate environmental understanding and fast, energy-efficient processing. The proposed architecture employs a dual-pathway approach: an ANN component processes static spatial features at low frequency, while an SNN component handles dynamic, event-based sensor data in real time. Unlike conventional hybrid architectures that rely on domain conversion mechanisms, our system incorporates a pre-developed SNN-based filter that directly utilizes spike-encoded inputs for localization and state estimation. Detected anomalies are validated using contextual information from the ANN pathway and continuously tracked to support anticipatory navigation strategies. Simulation results demonstrate that the proposed method offers acceptable detection accuracy while maintaining computational efficiency close to SNN-only implementations, which operate at a fraction of the resource cost. This framework represents a significant advancement in neuromorphic navigation systems for robots operating in unpredictable and dynamic environments.
Robustness and Resilience Evaluation of Eco-Driving Strategies at Signalized Intersections
Eco-driving strategies have demonstrated substantial potential for improving energy efficiency and reducing emissions, especially at signalized intersections. However, evaluations of eco-driving methods typically rely on simplified simulation or experimental conditions, where certain assumptions are made to manage complexity and experimental control. This study introduces a unified framework to evaluate eco-driving strategies through the lens of two complementary criteria: control robustness and environmental resilience. We define formal indicators that quantify performance degradation caused by internal execution variability and external environmental disturbances, respectively. These indicators are then applied to assess multiple eco-driving controllers through real-world vehicle experiments. The results reveal key tradeoffs between tracking accuracy and adaptability, showing that optimization-based controllers offer more consistent performance across varying disturbance levels, while analytical controllers may perform comparably under nominal conditions but exhibit greater sensitivity to execution and timing variability.
CLEAR: A Semantic-Geometric Terrain Abstraction for Large-Scale Unstructured Environments
Long-horizon navigation in unstructured environments demands terrain abstractions that scale to tens of km$^2$ while preserving semantic and geometric structure, a combination existing methods fail to achieve. Grids scale poorly; quadtrees misalign with terrain boundaries; neither encodes landcover semantics essential for traversability-aware planning. This yields infeasible or unreliable paths for autonomous ground vehicles operating over 10+ km$^2$ under real-time constraints. CLEAR (Connected Landcover Elevation Abstract Representation) couples boundary-aware spatial decomposition with recursive plane fitting to produce convex, semantically aligned regions encoded as a terrain-aware graph. Evaluated on maps spanning 9-100~km$^2$ using a physics-based simulator, CLEAR achieves up to 10x faster planning than raw grids with only 6.7% cost overhead and delivers 6-9% shorter, more reliable paths than other abstraction baselines. These results highlight CLEAR's scalability and utility for long-range navigation in applications such as disaster response, defense, and planetary exploration.
comment: Under review for an IEEE conference
Towards Natural Language Environment: Understanding Seamless Natural-Language-Based Human-Multi-Robot Interactions
As multiple robots are expected to coexist in future households, natural language is increasingly envisioned as a primary medium for human-robot and robot-robot communication. This paper introduces the concept of a Natural Language Environment (NLE), defined as an interaction space in which humans and multiple heterogeneous robots coordinate primarily through natural language. Rather than proposing a deployable system, this work aims to explore the design space of such environments. We first synthesize prior work on language-based human-robot interaction to derive a preliminary design space for NLEs. We then conduct a role-playing study in virtual reality to investigate how people conceptualize, negotiate, and coordinate human-multi-robot interactions within this imagined environment. Based on qualitative and quantitative analysis, we refine the preliminary design space and derive design implications that highlight key tensions and opportunities around task coordination dominance, robot autonomy, and robot personality in Natural Language Environments.
Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints
Autonomous navigation for nano-scale unmanned aerial vehicles (nano-UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub-100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state-of-the-art in sensing, computing, and control architectures designed specifically for these sub- 100mW computational envelopes. We critically analyse the transition from classical geometry-based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra-low-power System-on-Chips (SoCs) and neuromorphic event-based control. Beyond algorithms, we evaluate the hardware-software co-design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning-based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long-term endurance, robust obstacle avoidance in dynamic environments, and the "Sim-to-Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data-driven perception to enable fully autonomous, agile nano-UAVs in GPS-denied environments.
comment: 28 pages, 5 figures, 1 table. Review article
Diffusion-based Inverse Model of a Distributed Tactile Sensor for Object Pose Estimation
Tactile sensing provides a promising sensing modality for object pose estimation in manipulation settings where visual information is limited due to occlusion or environmental effects. However, efficiently leveraging tactile data for estimation remains a challenge due to partial observability, with single observations corresponding to multiple possible contact configurations. This limits conventional estimation approaches largely tailored to vision. We propose to address these challenges by learning an inverse tactile sensor model using denoising diffusion. The model is conditioned on tactile observations from a distributed tactile sensor and trained in simulation using a geometric sensor model based on signed distance fields. Contact constraints are enforced during inference through single-step projection using distance and gradient information from the signed distance field. For online pose estimation, we integrate the inverse model with a particle filter through a proposal scheme that combines generated hypotheses with particles from the prior belief. Our approach is validated in simulated and real-world planar pose estimation settings, without access to visual data or tight initial pose priors. We further evaluate robustness to unmodeled contact and sensor dynamics for pose tracking in a box-pushing scenario. Compared to local sampling baselines, the inverse sensor model improves sampling efficiency and estimation accuracy while preserving multimodal beliefs across objects with varying tactile discriminability.
MATTERIX: toward a digital twin for robotics-assisted chemistry laboratory automation
Accelerated materials discovery is critical for addressing global challenges. However, developing new laboratory workflows relies heavily on real-world experimental trials, and this can hinder scalability because of the need for numerous physical make-and-test iterations. Here we present MATTERIX, a multiscale, graphics processing unit-accelerated robotic simulation framework designed to create high-fidelity digital twins of chemistry laboratories, thus accelerating workflow development. This multiscale digital twin simulates robotic physical manipulation, powder and liquid dynamics, device functionalities, heat transfer and basic chemical reaction kinetics. This is enabled by integrating realistic physics simulation and photorealistic rendering with a modular graphics processing unit-accelerated semantics engine, which models logical states and continuous behaviors to simulate chemistry workflows across different levels of abstraction. MATTERIX streamlines the creation of digital twin environments through open-source asset libraries and interfaces, while enabling flexible workflow design via hierarchical plan definition and a modular skill library that incorporates learning-based methods. Our approach demonstrates sim-to-real transfer in robotic chemistry setups, reducing reliance on costly real-world experiments and enabling the testing of hypothetical automated workflows in silico. The project website is available at https://accelerationconsortium.github.io/Matterix/ .
comment: Darvish, K., Sohal, A., Mandal, A. et al. MATTERIX: toward a digital twin for robotics-assisted chemistry laboratory automation. Nat Comput Sci (2025)
Active Informative Planning for UAV-based Weed Mapping using Discrete Gaussian Process Representations
Accurate agricultural weed mapping using unmanned aerial vehicles (UAVs) is crucial for precision farming. While traditional methods rely on rigid, pre-defined flight paths and intensive offline processing, informative path planning (IPP) offers a way to collect data adaptively where it is most needed. Gaussian process (GP) mapping provides a continuous model of weed distribution with built-in uncertainty. However, GPs must be discretised for practical use in autonomous planning. Many discretisation techniques exist, but the impact of discrete representation choice remains poorly understood. This paper investigates how different discrete GP representations influence both mapping quality and mission-level performance in UAV-based weed mapping. Considering a UAV equipped with a downward-facing camera, we implement a receding-horizon IPP strategy that selects sampling locations based on the map uncertainty, travel cost, and coverage penalties. We investigate multiple discretisation strategies for representing the GP posterior and use their induced map partitions to generate candidate viewpoints for planning. Experiments on real-world weed distributions show that representation choice significantly affects exploration behaviour and efficiency. Overall, our results demonstrate that discretisation is not only a representational detail but a key design choice that shapes planning dynamics, coverage efficiency, and computational load in online UAV weed mapping.
Helical Tendon-Driven Continuum Robot with Programmable Follow-the-Leader Operation
Spinal cord stimulation (SCS) is primarily utilized for pain management and has recently demonstrated efficacy in promoting functional recovery in patients with spinal cord injury. Effective stimulation of motor neurons ideally requires the placement of SCS leads in the ventral or lateral epidural space where the corticospinal and rubrospinal motor fibers are located. This poses significant challenges with the current standard of manual steering. In this study, we present a static modeling approach for the ExoNav, a steerable robotic tool designed to facilitate precise navigation to the ventral and lateral epidural space. Cosserat rod framework is employed to establish the relationship between tendon actuation forces and the robot's overall shape. The effects of gravity, as an example of an external load, are investigated and implemented in the model and simulation. The experimental results indicate RMSE values of 1.76mm, 2.33mm, 2.18mm, and 1.33mm across four tested prototypes. Based on the helical shape of the ExoNav upon actuation, it is capable of performing follow-the-leader (FTL) motion by adding insertion and rotation DoFs to this robotic system, which is shown in simulation and experimentally. The proposed simulation has the capability to calculate optimum tendon tensions to follow the desired FTL paths while gravity-induced robot deformations are present. Three FTL experimental trials are conducted and the end-effector position showed repeatable alignments with the desired path with maximum RMSE value of 3.75mm. Ultimately, a phantom model demonstration is conducted where the teleoperated robot successfully navigated to the lateral and ventral spinal cord targets. Additionally, the user was able to navigate to the dorsal root ganglia, illustrating ExoNav's potential in both motor function recovery and pain management.
comment: 8 pages, 9 figures
LLM-VLM Fusion Framework for Autonomous Maritime Port Inspection using a Heterogeneous UAV-USV System
Maritime port inspection plays a critical role in ensuring safety, regulatory compliance, and operational efficiency in complex maritime environments. However, existing inspection methods often rely on manual operations and conventional computer vision techniques that lack scalability and contextual understanding. This study introduces a novel integrated engineering framework that utilizes the synergy between Large Language Models (LLMs) and Vision Language Models (VLMs) to enable autonomous maritime port inspection using cooperative aerial and surface robotic platforms. The proposed framework replaces traditional state-machine mission planners with LLM-driven symbolic planning and improved perception pipelines through VLM-based semantic inspection, enabling context-aware and adaptive monitoring. The LLM module translates natural language mission instructions into executable symbolic plans with dependency graphs that encode operational constraints and ensure safe UAV-USV coordination. Meanwhile, the VLM module performs real-time semantic inspection and compliance assessment, generating structured reports with contextual reasoning. The framework was validated using the extended MBZIRC Maritime Simulator with realistic port infrastructure and further assessed through real-world robotic inspection trials. The lightweight on-board design ensures suitability for resource-constrained maritime platforms, advancing the development of intelligent, autonomous inspection systems. Project resources (code and videos) can be found here: https://github.com/Muhayyuddin/llm-vlm-fusion-port-inspection
comment: submitted in AEJ
Exploiting Light To Enhance The Endurance and Navigation of Lighter-Than-Air Micro-Drones
Micro-Unmanned Aerial Vehicles (UAVs) are rapidly expanding into tasks from inventory to environmental sensing, yet their short endurance and unreliable navigation in GPS-denied spaces limit deployment. Lighter-Than-Air (LTA) drones offer an energy-efficient alternative: they use a helium envelope to provide buoyancy, which enables near-zero-power drain during hovering and much longer operation. LTAs are promising, but their design is complex, and they lack integrated solutions to enable sustained autonomous operations and navigation with simple, low-infrastructure. We propose a compact, self-sustaining LTA drone that uses light for both energy harvesting and navigation. Our contributions are threefold: (i) a high-fidelity simulation framework to analyze LTA aerodynamics and select a stable, efficient configuration; (ii) a framework to integrate solar cells on the envelope to provide net-positive energy; and (iii) a point-and-go navigation system with three light-seeking algorithms operating on a single light beacon. Our LTA-analysis, together with the integrated solar panels, not only saves energy while flying, but also enables sustainable operation: providing 1 minute of flying time for every 4 minutes of energy harvesting, under illuminations of 80klux. We also demonstrate robust single-beacon navigation towards a light source that can be up to 7m away, in indoor and outdoor environments, even with moderate winds. The resulting system indicates a plausible path toward persistent, autonomous operation for indoor and outdoor monitoring. More broadly, this work provides a practical pathway for translating the promise of LTA drones into a persistent, self-sustaining aerial system.
Static Is Not Enough: A Comparative Study of VR and SpaceMouse in Static and Dynamic Teleoperation Tasks
Imitation learning relies on high-quality demonstrations, and teleoperation is a primary way to collect them, making teleoperation interface choice crucial for the data. Prior work mainly focused on static tasks, i.e., discrete, segmented motions, yet demonstrations also include dynamic tasks requiring reactive control. As dynamic tasks impose fundamentally different interface demands, insights from static-task evaluations cannot generalize. To address this gap, we conduct a within-subjects study comparing a VR controller and a SpaceMouse across two static and two dynamic tasks ($N=25$). We assess success rate, task duration, cumulative success, alongside NASA-TLX, SUS, and open-ended feedback. Results show statistically significant advantages for VR: higher success rates, particularly on dynamic tasks, shorter successful execution times across tasks, and earlier successes across attempts, with significantly lower workload and higher usability. As existing VR teleoperation systems are rarely open-source or suited for dynamic tasks, we release our VR interface to fill this gap.
comment: 5 pages, 5 figures. Accepted in HRI'26 (Late-Breaking Reports track) in 12 Jan, 2026
Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization
We introduce Being-H0.5, a foundational Vision-Language-Action (VLA) model designed for robust cross-embodiment generalization across diverse robotic platforms. While existing VLAs often struggle with morphological heterogeneity and data scarcity, we propose a human-centric learning paradigm that treats human interaction traces as a universal "mother tongue" for physical interaction. To support this, we present UniHand-2.0, the largest embodied pre-training recipe to date, comprising over 35,000 hours of multimodal data across 30 distinct robotic embodiments. Our approach introduces a Unified Action Space that maps heterogeneous robot controls into semantically aligned slots, enabling low-resource robots to bootstrap skills from human data and high-resource platforms. Built upon this human-centric foundation, we design a unified sequential modeling and multi-task pre-training paradigm to bridge human demonstrations and robotic execution. Architecturally, Being-H0.5 utilizes a Mixture-of-Transformers design featuring a novel Mixture-of-Flow (MoF) framework to decouple shared motor primitives from specialized embodiment-specific experts. Finally, to make cross-embodiment policies stable in the real world, we introduce Manifold-Preserving Gating for robustness under sensory shift and Universal Async Chunking to universalize chunked control across embodiments with different latency and control profiles. We empirically demonstrate that Being-H0.5 achieves state-of-the-art results on simulated benchmarks, such as LIBERO (98.9%) and RoboCasa (53.9%), while also exhibiting strong cross-embodiment capabilities on five robotic platforms.
comment: 44 pages
Imitation learning-based spacecraft rendezvous and docking method with Expert Demonstration
Existing spacecraft rendezvous and docking control methods largely rely on predefined dynamic models and often exhibit limited robustness in realistic on-orbit environments. To address this issue, this paper proposes an Imitation Learning-based spacecraft rendezvous and docking control framework (IL-SRD) that directly learns control policies from expert demonstrations, thereby reducing dependence on accurate modeling. We propose an anchored decoder target mechanism, which conditions the decoder queries on state-related anchors to explicitly constrain the control generation process. This mechanism enforces physically consistent control evolution and effectively suppresses implausible action deviations in sequential prediction, enabling reliable six-degree-of-freedom (6-DOF) rendezvous and docking control. To further enhance stability, a temporal aggregation mechanism is incorporated to mitigate error accumulation caused by the sequential prediction nature of Transformer-based models, where small inaccuracies at each time step can propagate and amplify over long horizons. Extensive simulation results demonstrate that the proposed IL-SRD framework achieves accurate and energy-efficient model-free rendezvous and docking control. Robustness evaluations further confirm its capability to maintain competitive performance under significant unknown disturbances. The source code is available at https://github.com/Dongzhou-1996/IL-SRD.
comment: 6 figures, 4 tables. Focus on 6-DOF spacecraft rendezvous and docking control using imitation learning-based control method
Active Inference-Driven World Modeling for Adaptive UAV Swarm Trajectory Design ICASSP 2026
This paper proposes an Active Inference-based framework for autonomous trajectory design in UAV swarms. The method integrates probabilistic reasoning and self-learning to enable distributed mission allocation, route ordering, and motion planning. Expert trajectories generated using a Genetic Algorithm with Repulsion Forces (GA-RF) are employed to train a hierarchical World Model capturing swarm behavior across mission, route, and motion levels. During online operation, UAVs infer actions by minimizing divergence between current beliefs and model-predicted states, enabling adaptive responses to dynamic environments. Simulation results show faster convergence, higher stability, and safer navigation than Q-Learning, demonstrating the scalability and cognitive grounding of the proposed framework for intelligent UAV swarm control.
comment: This paper has been accepted for presentation at the 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (IEEE ICASSP 2026) Workshop: 'Multi-Modal Signal Processing and AI for Communications and Sensing in 6G and Beyond (MuSiC-6GB)'
ForeDiffusion: Foresight-Conditioned Diffusion Policy via Future View Construction for Robot Manipulation
Diffusion strategies have advanced visual motor control by progressively denoising high-dimensional action sequences, providing a promising method for robot manipulation. However, as task complexity increases, the success rate of existing baseline models decreases considerably. Analysis indicates that current diffusion strategies are confronted with two limitations. First, these strategies only rely on short-term observations as conditions. Second, the training objective remains limited to a single denoising loss, which leads to error accumulation and causes grasping deviations. To address these limitations, this paper proposes Foresight-Conditioned Diffusion (ForeDiffusion), by injecting the predicted future view representation into the diffusion process. As a result, the policy is guided to be forward-looking, enabling it to correct trajectory deviations. Following this design, ForeDiffusion employs a dual loss mechanism, combining the traditional denoising loss and the consistency loss of future observations, to achieve the unified optimization. Extensive evaluation on the Adroit suite and the MetaWorld benchmark demonstrates that ForeDiffusion achieves an average success rate of 80% for the overall task, significantly outperforming the existing mainstream diffusion methods by 23% in complex tasks, while maintaining more stable performance across the entire tasks.
Dynamic Hand Gesture Recognition for Robot Manipulator Tasks
This paper proposes a novel approach to recognizing dynamic hand gestures facilitating seamless interaction between humans and robots. Here, each robot manipulator task is assigned a specific gesture. There may be several such tasks, hence, several gestures. These gestures may be prone to several dynamic variations. All such variations for different gestures shown to the robot are accurately recognized in real-time using the proposed unsupervised model based on the Gaussian Mixture model. The accuracy during training and real-time testing prove the efficacy of this methodology.
PlannerRFT: Reinforcing Diffusion Planners through Closed-Loop and Sample-Efficient Fine-Tuning
Diffusion-based planners have emerged as a promising approach for human-like trajectory generation in autonomous driving. Recent works incorporate reinforcement fine-tuning to enhance the robustness of diffusion planners through reward-oriented optimization in a generation-evaluation loop. However, they struggle to generate multi-modal, scenario-adaptive trajectories, hindering the exploitation efficiency of informative rewards during fine-tuning. To resolve this, we propose PlannerRFT, a sample-efficient reinforcement fine-tuning framework for diffusion-based planners. PlannerRFT adopts a dual-branch optimization that simultaneously refines the trajectory distribution and adaptively guides the denoising process toward more promising exploration, without altering the original inference pipeline. To support parallel learning at scale, we develop nuMax, an optimized simulator that achieves 10 times faster rollout compared to native nuPlan. Extensive experiments shows that PlannerRFT yields state-of-the-art performance with distinct behaviors emerging during the learning process.
Sparse ActionGen: Accelerating Diffusion Policy with Real-time Pruning
Diffusion Policy has dominated action generation due to its strong capabilities for modeling multi-modal action distributions, but its multi-step denoising processes make it impractical for real-time visuomotor control. Existing caching-based acceleration methods typically rely on $\textit{static}$ schedules that fail to adapt to the $\textit{dynamics}$ of robot-environment interactions, thereby leading to suboptimal performance. In this paper, we propose $\underline{\textbf{S}}$parse $\underline{\textbf{A}}$ction$\underline{\textbf{G}}$en ($\textbf{SAG}$) for extremely sparse action generation. To accommodate the iterative interactions, SAG customizes a rollout-adaptive prune-then-reuse mechanism that first identifies prunable computations globally and then reuses cached activations to substitute them during action diffusion. To capture the rollout dynamics, SAG parameterizes an observation-conditioned diffusion pruner for environment-aware adaptation and instantiates it with a highly parameter- and inference-efficient design for real-time prediction. Furthermore, SAG introduces a one-for-all reusing strategy that reuses activations across both timesteps and blocks in a zig-zag manner, minimizing the global redundancy. Extensive experiments on multiple robotic benchmarks demonstrate that SAG achieves up to 4$\times$ generation speedup without sacrificing performance. Project Page: https://sparse-actiongen.github.io/.
From Design to Deorbit: A Solar-Electric Autonomous Module for Multi-Debris Remediation
The escalating accumulation of orbital debris threatens the sustainability of space operations, necessitating active removal solutions that overcome the limitations of current fuel-dependent methods. To address this, this study introduces a novel remediation architecture that integrates a mechanical clamping system for secure capture with a high-efficiency, solar-powered NASA Evolutionary Xenon Thruster (NEXT) and autonomous navigation protocols. High-fidelity simulations validate the architecture's capabilities, demonstrating a successful retrograde deorbit from 800 km to 100 km, <10m position Root Mean Square Errors (RMSE) via radar-based Extended Kalman Filter (EKF) navigation, and a 93\% data delivery efficiency within 1 second using Delay/Disruption Tolerant Network (DTN) protocols. This approach significantly advances orbital management by establishing a benchmark for renewable solar propulsion that minimizes reliance on conventional fuels and extends mission longevity for multi-target removal.
comment: 6 pages, 13 Figures, 2 tables
FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions
Humanoid robots are capable of performing various actions such as greeting, dancing and even backflipping. However, these motions are often hard-coded or specifically trained, which limits their versatility. In this work, we present FRoM-W1, an open-source framework designed to achieve general humanoid whole-body motion control using natural language. To universally understand natural language and generate corresponding motions, as well as enable various humanoid robots to stably execute these motions in the physical world under gravity, FRoM-W1 operates in two stages: (a) H-GPT: utilizing massive human data, a large-scale language-driven human whole-body motion generation model is trained to generate diverse natural behaviors. We further leverage the Chain-of-Thought technique to improve the model's generalization in instruction understanding. (b) H-ACT: After retargeting generated human whole-body motions into robot-specific actions, a motion controller that is pretrained and further fine-tuned through reinforcement learning in physical simulation enables humanoid robots to accurately and stably perform corresponding actions. It is then deployed on real robots via a modular simulation-to-reality module. We extensively evaluate FRoM-W1 on Unitree H1 and G1 robots. Results demonstrate superior performance on the HumanML3D-X benchmark for human whole-body motion generation, and our introduced reinforcement learning fine-tuning consistently improves both motion tracking accuracy and task success rates of these humanoid robots. We open-source the entire FRoM-W1 framework and hope it will advance the development of humanoid intelligence.
comment: Project Page: https://openmoss.github.io/FRoM-W1
Contact-Aware Neural Dynamics
High-fidelity physics simulation is essential for scalable robotic learning, but the sim-to-real gap persists, especially for tasks involving complex, dynamic, and discontinuous interactions like physical contacts. Explicit system identification, which tunes explicit simulator parameters, is often insufficient to align the intricate, high-dimensional, and state-dependent dynamics of the real world. To overcome this, we propose an implicit sim-to-real alignment framework that learns to directly align the simulator's dynamics with contact information. Our method treats the off-the-shelf simulator as a base prior and learns a contact-aware neural dynamics model to refine simulated states using real-world observations. We show that using tactile contact information from robotic hands can effectively model the non-smooth discontinuities inherent in contact-rich tasks, resulting in a neural dynamics model grounded by real-world data. We demonstrate that this learned forward dynamics model improves state prediction accuracy and can be effectively used to predict policy performance and refine policies trained purely in standard simulators, offering a scalable, data-driven approach to sim-to-real alignment.
comment: 8 pages
FocusNav: Spatial Selective Attention with Waypoint Guidance for Humanoid Local Navigation
Robust local navigation in unstructured and dynamic environments remains a significant challenge for humanoid robots, requiring a delicate balance between long-range navigation targets and immediate motion stability. In this paper, we propose FocusNav, a spatial selective attention framework that adaptively modulates the robot's perceptual field based on navigational intent and real-time stability. FocusNav features a Waypoint-Guided Spatial Cross-Attention (WGSCA) mechanism that anchors environmental feature aggregation to a sequence of predicted collision-free waypoints, ensuring task-relevant perception along the planned trajectory. To enhance robustness in complex terrains, the Stability-Aware Selective Gating (SASG) module autonomously truncates distal information when detecting instability, compelling the policy to prioritize immediate foothold safety. Extensive experiments on the Unitree G1 humanoid robot demonstrate that FocusNav significantly improves navigation success rates in challenging scenarios, outperforming baselines in both collision avoidance and motion stability, achieving robust navigation in dynamic and complex environments.
comment: 12 pages, 11 figures
AirHunt: Bridging VLM Semantics and Continuous Planning for Efficient Aerial Object Navigation
Recent advances in large Vision-Language Models (VLMs) have provided rich semantic understanding that empowers drones to search for open-set objects via natural language instructions. However, prior systems struggle to integrate VLMs into practical aerial systems due to orders-of-magnitude frequency mismatch between VLM inference and real-time planning, as well as VLMs' limited 3D scene understanding. They also lack a unified mechanism to balance semantic guidance with motion efficiency in large-scale environments. To address these challenges, we present AirHunt, an aerial object navigation system that efficiently locates open-set objects with zero-shot generalization in outdoor environments by seamlessly fusing VLM semantic reasoning with continuous path planning. AirHunt features a dual-pathway asynchronous architecture that establishes a synergistic interface between VLM reasoning and path planning, enabling continuous flight with adaptive semantic guidance that evolves through motion. Moreover, we propose an active dual-task reasoning module that exploits geometric and semantic redundancy to enable selective VLM querying, and a semantic-geometric coherent planning module that dynamically reconciles semantic priorities and motion efficiency in a unified framework, enabling seamless adaptation to environmental heterogeneity. We evaluate AirHunt across diverse object navigation tasks and environments, demonstrating a higher success rate with lower navigation error and reduced flight time compared to state-of-the-art methods. Real-world experiments further validate AirHunt's practical capability in complex and challenging environments. Code and dataset will be made publicly available before publication.
DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition
One of the central challenges in visual place recognition (VPR) is learning a robust global representation that remains discriminative under large viewpoint changes, illumination variations, and severe domain shifts. While visual foundation models (VFMs) provide strong local features, most existing methods rely on a single model, overlooking the complementary cues offered by different VFMs. However, exploiting such complementary information inevitably alters token distributions, which challenges the stability of existing query-based global aggregation schemes. To address these challenges, we propose DC-VLAQ, a representation-centric framework that integrates the fusion of complementary VFMs and robust global aggregation. Specifically, we first introduce a lightweight residual-guided complementary fusion that anchors representations in the DINOv2 feature space while injecting complementary semantics from CLIP through a learned residual correction. In addition, we propose the Vector of Local Aggregated Queries (VLAQ), a query--residual global aggregation scheme that encodes local tokens by their residual responses to learnable queries, resulting in improved stability and the preservation of fine-grained discriminative cues. Extensive experiments on standard VPR benchmarks, including Pitts30k, Tokyo24/7, MSLS, Nordland, SPED, and AmsterTime, demonstrate that DC-VLAQ consistently outperforms strong baselines and achieves state-of-the-art performance, particularly under challenging domain shifts and long-term appearance changes.
comment: 10 pages, 4 figures, 5 tables
RPT*: Global Planning with Probabilistic Terminals for Target Search in Complex Environments
Routing problems such as Hamiltonian Path Problem (HPP), seeks a path to visit all the vertices in a graph while minimizing the path cost. This paper studies a variant, HPP with Probabilistic Terminals (HPP-PT), where each vertex has a probability representing the likelihood that the robot's path terminates there, and the objective is to minimize the expected path cost. HPP-PT arises in target object search, where a mobile robot must visit all candidate locations to find an object, and prior knowledge of the object's location is expressed as vertex probabilities. While routing problems have been studied for decades, few of them consider uncertainty as required in this work. The challenge lies not only in optimally ordering the vertices, as in standard HPP, but also in handling history dependency: the expected path cost depends on the order in which vertices were previously visited. This makes many existing methods inefficient or inapplicable. To address the challenge, we propose a search-based approach RPT* with solution optimality guarantees, which leverages dynamic programming in a new state space to bypass the history dependency and novel heuristics to speed up the computation. Building on RPT*, we design a Hierarchical Autonomous Target Search (HATS) system that combines RPT* with either Bayesian filtering for lifelong target search with noisy sensors, or autonomous exploration to find targets in unknown environments. Experiments in both simulation and real robot show that our approach can naturally balance between exploitation and exploration, thereby finding targets more quickly on average than baseline methods.
Can the Waymo Open Motion Dataset Support Realistic Behavioral Modeling? A Validation Study with Naturalistic Trajectories
The Waymo Open Motion Dataset (WOMD) has become a popular resource for data-driven modeling of autonomous vehicles (AVs) behavior. However, its validity for behavioral analysis remains uncertain due to proprietary post-processing, the absence of error quantification, and the segmentation of trajectories into 20-second clips. This study examines whether WOMD accurately captures the dynamics and interactions observed in real-world AV operations. Leveraging an independently collected naturalistic dataset from Level 4 AV operations in Phoenix, Arizona (PHX), we perform comparative analyses across three representative urban driving scenarios: discharging at signalized intersections, car-following, and lane-changing behaviors. For the discharging analysis, headways are manually extracted from aerial video to ensure negligible measurement error. For the car-following and lane-changing cases, we apply the Simulation-Extrapolation (SIMEX) method to account for empirically estimated error in the PHX data and use Dynamic Time Warping (DTW) distances to quantify behavioral differences. Results across all scenarios consistently show that behavior in PHX falls outside the behavioral envelope of WOMD. Notably, WOMD underrepresents short headways and abrupt decelerations. These findings suggest that behavioral models calibrated solely on WOMD may systematically underestimate the variability, risk, and complexity of naturalistic driving. Caution is therefore warranted when using WOMD for behavior modeling without proper validation against independently collected data.
PolyFly: Polytopic Optimal Planning for Collision-Free Cable-Suspended Aerial Payload Transportation
Aerial transportation robots using suspended cables have emerged as versatile platforms for disaster response and rescue operations. To maximize the capabilities of these systems, robots need to aggressively fly through tightly constrained environments, such as dense forests and structurally unsafe buildings, while minimizing flight time and avoiding obstacles. Existing methods geometrically over-approximate the vehicle and obstacles, leading to conservative maneuvers and increased flight times. We eliminate these restrictions by proposing PolyFly, an optimal global planner which considers a non-conservative representation for aerial transportation by modeling each physical component of the environment, and the robot (quadrotor, cable and payload), as independent polytopes. We further increase the model accuracy by incorporating the attitude of the physical components by constructing orientation-aware polytopes. The resulting optimal control problem is efficiently solved by converting the polytope constraints into smooth differentiable constraints via duality theory. We compare our method against the existing state-of-the-art approach in eight maze-like environments and show that PolyFly produces faster trajectories in each scenario. We also experimentally validate our proposed approach on a real quadrotor with a suspended payload, demonstrating the practical reliability and accuracy of our method.
Gauss-Newton accelerated MPPI Control
Model Predictive Path Integral (MPPI) control is a sampling-based optimization method that has recently attracted attention, particularly in the robotics and reinforcement learning communities. MPPI has been widely applied as a GPU-accelerated random search method to deterministic direct single-shooting optimal control problems arising in model predictive control (MPC) formulations. MPPI offers several key advantages, including flexibility, robustness, ease of implementation, and inherent parallelizability. However, its performance can deteriorate in high-dimensional settings since the optimal control problem is solved via Monte Carlo sampling. To address this limitation, this paper proposes an enhanced MPPI method that incorporates a Jacobian reconstruction technique and the second-order Generalized Gauss-Newton method. This novel approach is called \textit{Gauss-Newton accelerated MPPI}. The numerical results show that the Gauss-Newton accelerated MPPI approach substantially improves MPPI scalability and computational efficiency while preserving the key benefits of the classical MPPI framework, making it a promising approach even for high-dimensional problems.
comment: 6 pages, 3 figures, submitted to the IFAC World Congress 2026
Combining Shape Completion and Grasp Prediction for Fast and Versatile Grasping with a Multi-Fingered Hand
Grasping objects with limited or no prior knowledge about them is a highly relevant skill in assistive robotics. Still, in this general setting, it has remained an open problem, especially when it comes to only partial observability and versatile grasping with multi-fingered hands. We present a novel, fast, and high fidelity deep learning pipeline consisting of a shape completion module that is based on a single depth image, and followed by a grasp predictor that is based on the predicted object shape. The shape completion network is based on VQDIF and predicts spatial occupancy values at arbitrary query points. As grasp predictor, we use our two-stage architecture that first generates hand poses using an autoregressive model and then regresses finger joint configurations per pose. Critical factors turn out to be sufficient data realism and augmentation, as well as special attention to difficult cases during training. Experiments on a physical robot platform demonstrate successful grasping of a wide range of household objects based on a depth image from a single viewpoint. The whole pipeline is fast, taking only about 1 s for completing the object's shape (0.7 s) and generating 1000 grasps (0.3 s).
comment: 8 pages, 10 figures, 3 tables, 1 algorithm. Published in Humanoids 2023. Project page: https://aidx-lab.org/grasping/humanoids23
Shape Completion with Prediction of Uncertain Regions IROS 2023
Shape completion, i.e., predicting the complete geometry of an object from a partial observation, is highly relevant for several downstream tasks, most notably robotic manipulation. When basing planning or prediction of real grasps on object shape reconstruction, an indication of severe geometric uncertainty is indispensable. In particular, there can be an irreducible uncertainty in extended regions about the presence of entire object parts when given ambiguous object views. To treat this important case, we propose two novel methods for predicting such uncertain regions as straightforward extensions of any method for predicting local spatial occupancy, one through postprocessing occupancy scores, the other through direct prediction of an uncertainty indicator. We compare these methods together with two known approaches to probabilistic shape completion. Moreover, we generate a dataset, derived from ShapeNet, of realistically rendered depth images of object views with ground-truth annotations for the uncertain regions. We train on this dataset and test each method in shape completion and prediction of uncertain regions for known and novel object instances and on synthetic and real data. While direct uncertainty prediction is by far the most accurate in the segmentation of uncertain regions, both novel methods outperform the two baselines in shape completion and uncertain region prediction, and avoiding the predicted uncertain regions increases the quality of grasps for all tested methods.
comment: 7 pages, 5 figures, Published in IROS 2023. Project page: https://hummat.github.io/2023-iros-uncertain/
Astra: Efficient Transformer Architecture and Contrastive Dynamics Learning for Embodied Instruction Following EMNLP 2025
Vision-language-action models have gained significant attention for their ability to model multimodal sequences in embodied instruction following tasks. However, most existing models rely on causal attention, which we find suboptimal for processing sequences composed of interleaved segments from different modalities. In this paper, we introduce Astra, a novel Transformer architecture featuring trajectory attention and learnable action queries, designed to efficiently process segmented multimodal trajectories and predict actions for imitation learning. Furthermore, we propose a contrastive dynamics learning objective to enhance the model's understanding of environment dynamics and multimodal alignment, complementing the primary behavior cloning objective. Through extensive experiments on three large-scale robot manipulation benchmarks, Astra demonstrates substantial performance improvements over previous models.
comment: Accepted to EMNLP 2025 (main). Published version: https://aclanthology.org/2025.emnlp-main.688/ Code available at: https://github.com/yueen-ma/Astra
LLM-Glasses: GenAI-driven Glasses with Haptic Feedback for Navigation of Visually Impaired People
LLM-Glasses is a wearable navigation system which assists visually impaired people by utilizing YOLO-World object detection, GPT-4o-based reasoning, and haptic feedback for real-time guidance. The device translates visual scene understanding into intuitive tactile feedback on the temples, allowing hands-free navigation. Three studies evaluate the system: recognition of 13 haptic patterns with an average recognition rate of 81.3%, VICON-based guidance with predefined paths using haptic cues, and an LLM-guided scene evaluation with decision accuracies of 91.8% without obstacles, 84.6% with static obstacles, and 81.5% with dynamic obstacles. These results show that LLM-Glasses can deliver reliable navigation support in controlled environments and motivate further work on responsiveness and deployment in more complex real-world scenarios.
Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, proceedings of the International Workshop on Active Inference 2025. Erratum v1: in Eq. (50), $p(y_t, Θ, u_t \mid y_{*}, \mathcal{D}_k)$ should have been $p(y_t, Θ\mid u_t, y_{*}, \mathcal{D}_k)$
Event-Grounding Graph: Unified Spatio-Temporal Scene Graph from Robotic Observations
A fundamental aspect for building intelligent autonomous robots that can assist humans in their daily lives is the construction of rich environmental representations. While advances in semantic scene representations have enriched robotic scene understanding, current approaches lack a connection between spatial features and dynamic events; e.g., connecting the blue mug to the event washing a mug. In this work, we introduce the event-grounding graph (EGG), a framework grounding event interactions to spatial features of a scene. This representation allows robots to perceive, reason, and respond to complex spatio-temporal queries. Experiments using real robotic data demonstrate EGG's capability to retrieve relevant information and respond accurately to human inquiries concerning the environment and events within. Furthermore, the EGG framework's source code and evaluation dataset are released as open-source at: https://github.com/aalto-intelligent-robotics/EGG.
comment: Submitted to RA-L
Prespecified-Performance Kinematic Tracking Control for Aerial Manipulation
This paper studies the kinematic tracking control problem for aerial manipulators. Existing kinematic tracking control methods, which typically employ proportional-derivative feedback or tracking-error-based feedback strategies, may fail to achieve tracking objectives within specified time constraints. To address this limitation, we propose a novel control framework comprising two key components: end-effector tracking control based on a user-defined preset trajectory and quadratic programming-based reference allocation. Compared with state-of-the-art approaches, the proposed method has several attractive features. First, it ensures that the end-effector reaches the desired position within a preset time while keeping the tracking error within a performance envelope that reflects task requirements. Second, quadratic programming is employed to allocate the references of the quadcopter base and the Delta arm, while considering the physical constraints of the aerial manipulator, thus preventing solutions that may violate physical limitations. The proposed approach is validated through three experiments. Experimental results demonstrate the effectiveness of the proposed algorithm and its capability to guarantee that the target position is reached within the preset time.
A Survey on Vision-Language-Action Models for Embodied AI
Embodied AI is widely recognized as a cornerstone of artificial general intelligence because it involves controlling embodied agents to perform tasks in the physical world. Building on the success of large language models and vision-language models, a new category of multimodal models -- referred to as vision-language-action models (VLAs) -- has emerged to address language-conditioned robotic tasks in embodied AI by leveraging their distinct ability to generate actions. The recent proliferation of VLAs necessitates a comprehensive survey to capture the rapidly evolving landscape. To this end, we present the first survey on VLAs for embodied AI. This work provides a detailed taxonomy of VLAs, organized into three major lines of research. The first line focuses on individual components of VLAs. The second line is dedicated to developing VLA-based control policies adept at predicting low-level actions. The third line comprises high-level task planners capable of decomposing long-horizon tasks into a sequence of subtasks, thereby guiding VLAs to follow more general user instructions. Furthermore, we provide an extensive summary of relevant resources, including datasets, simulators, and benchmarks. Finally, we discuss the challenges facing VLAs and outline promising future directions in embodied AI. A curated repository associated with this survey is available at: https://github.com/yueen-ma/Awesome-VLA.
comment: Project page: https://github.com/yueen-ma/Awesome-VLA
Safe Navigation under State Uncertainty: Online Adaptation for Robust Control Barrier Functions
Measurements and state estimates are often imperfect in control practice, posing challenges for safety-critical applications, where safety guarantees rely on accurate state information. In the presence of estimation errors, several prior robust control barrier function (R-CBF) formulations have imposed strict conditions on the input. These methods can be overly conservative and can introduce issues such as infeasibility, high control effort, etc. This work proposes a systematic method to improve R-CBFs, and demonstrates its advantages on a tracked vehicle that navigates among multiple obstacles. A primary contribution is a new optimization-based online parameter adaptation scheme that reduces the conservativeness of existing R-CBFs. In order to reduce the complexity of the parameter optimization, we merge several safety constraints into one unified numerical CBF via Poisson's equation. We further address the dual relative degree issue that typically causes difficulty in vehicle tracking. Experimental trials demonstrate the overall performance improvement of our approach over existing formulations.
Genie Centurion: Accelerating Scalable Real-World Robot Training with Human Rewind-and-Refine Guidance
While Vision-Language-Action (VLA) models show strong generalizability in various tasks, real-world deployment of robotic policy still requires large-scale, high-quality human expert demonstrations. However, data collection via human teleoperation requires continuous operator attention, which is costly, hard to scale. To address this, we propose Genie Centurion (GCENT), a scalable and general data collection paradigm based on human rewind-and-refine guidance, enabling robots' interactive learning in deployment. GCENT starts at an imperfect policy and improves over time. When the robot execution failures occur, GCENT allows robots to revert to a previous state with a rewind mechanism, after which a teleoperator provides corrective demonstrations to refine the policy. This framework supports a one-human-to-many-robots supervision scheme with a Task Sentinel module, which autonomously predicts task success and solicits human intervention when necessary. Empirical results show that GCENT achieves up to 40% higher task success rates than state-of-the-art data collection methods, and reaches comparable performance using less than half the data in long-horizon and precise tasks. We also quantify the data yield-to-effort ratio under multi-robot scenarios, demonstrating GCENT's potential for scalable and cost-efficient robot policy training in real-world environments.
PERSEUS: Perception with Semantic Endoscopic Understanding and SLAM
Purpose: Natural orifice surgeries minimize the need for incisions and reduce the recovery time compared to open surgery; however, they require a higher level of expertise due to visualization and orientation challenges. We propose a perception pipeline for these surgeries that allows semantic scene understanding. Methods: We bring learning-based segmentation, depth estimation, and 3D reconstruction modules together to create real-time segmented maps of the surgical scenes. Additionally, we use registration with robot poses to solve the scale ambiguity of mapping from monocular images, and allow the use of semantically informed real-time reconstructions in robotic surgeries. Results: We achieve sub-milimeter reconstruction accuracy based on average one-sided Chamfer distances, average pose registration RMSE of 0.9 mm, and an estimated scale within 2% of ground truth. Conclusion: We present a modular perception pipeline, integrating semantic segmentation with real-time monocular SLAM for natural orifice surgeries. This pipeline offers a promising solution for scene understanding that can facilitate automation or surgeon guidance.
comment: 13 pages, 6 figures, 2 tables. Under review for The 17th International Conference on Information Processing in Computer-Assisted Interventions (IPCAI 2026)
Multiagent Systems
A simulation of urban incidents involving pedestrians and vehicles based on Weighted A*
This document presents a comprehensive simulation framework designed to model urban incidents involving pedestrians and vehicles. Using a multiagent systems approach, two types of agents (pedestrians and vehicles) are introduced within a 2D grid based urban environment. The environment encodes streets, sidewalks, buildings, zebra crossings, and obstacles such as potholes and infrastructure elements. Each agent employs a weighted A* algorithm for pathfinding, allowing for variation in decision making behavior such as reckless movement or strict rule-following. The model aims to simulate interactions, assess risk of collisions, and evaluate efficiency under varying environmental and behavioral conditions. Experimental results explore how factors like obstacle density, presence of traffic control mechanisms, and behavioral deviations affect safety and travel efficiency.
LLM-as-RNN: A Recurrent Language Model for Memory Updates and Sequence Prediction
Large language models are strong sequence predictors, yet standard inference relies on immutable context histories. After making an error at generation step t, the model lacks an updatable memory mechanism that improves predictions for step t+1. We propose LLM-as-RNN, an inference-only framework that turns a frozen LLM into a recurrent predictor by representing its hidden state as natural-language memory. This state, implemented as a structured system-prompt summary, is updated at each timestep via feedback-driven text rewrites, enabling learning without parameter updates. Under a fixed token budget, LLM-as-RNN corrects errors and retains task-relevant patterns, effectively performing online learning through language. We evaluate the method on three sequential benchmarks in healthcare, meteorology, and finance across Llama, Gemma, and GPT model families. LLM-as-RNN significantly outperforms zero-shot, full-history, and MemPrompt baselines, improving predictive accuracy by 6.5% on average, while producing interpretable, human-readable learning traces absent in standard context accumulation.
comment: 17 pages, 5 figures, 6 tables
CooperBench: Why Coding Agents Cannot be Your Teammates Yet
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This contrasts sharply with human teams, where adding teammates typically improves productivity. Our analysis reveals three key issues: (1) communication channels become jammed with vague, ill-timed, and inaccurate messages; (2) even with effective communication, agents deviate from their commitments; and (3) agents often hold incorrect expectations about others' plans and communication. Through large-scale simulation, we also observe rare but interesting emergent coordination behavior including role division, resource division, and negotiation. Our research presents a novel benchmark for collaborative coding and calls for a shift from pursuing individual agent capability to developing social intelligence.
comment: https://cooperbench.com
Prompt Injection Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching
Prompt injection remains a central obstacle to the safe deployment of large language models, particularly in multi-agent settings where intermediate outputs can propagate or amplify malicious instructions. Building on earlier work that introduced a four-metric Total Injection Vulnerability Score (TIVS), this paper extends the evaluation framework with semantic similarity-based caching and a fifth metric (Observability Score Ratio) to yield TIVS-O, investigating how defence effectiveness interacts with transparency in a HOPE-inspired Nested Learning architecture. The proposed system combines an agentic pipeline with Continuum Memory Systems that implement semantic similarity-based caching across 301 synthetically generated injection-focused prompts drawn from ten attack families, while a fourth agent performs comprehensive security analysis using five key performance indicators. In addition to traditional injection metrics, OSR quantifies the richness and clarity of security-relevant reasoning exposed by each agent, enabling an explicit analysis of trade-offs between strict mitigation and auditability. Experiments show that the system achieves secure responses with zero high-risk breaches, while semantic caching delivers substantial computational savings, achieving a 41.6% reduction in LLM calls and corresponding decreases in latency, energy consumption, and carbon emissions. Five TIVS-O configurations reveal optimal trade-offs between mitigation strictness and forensic transparency. These results indicate that observability-aware evaluation can reveal non-monotonic effects within multi-agent pipelines and that memory-augmented agents can jointly maximize security robustness, real-time performance, operational cost savings, and environmental sustainability without modifying underlying model weights, providing a production-ready pathway for secure and green LLM deployments.
comment: 33 pages, 19 figures
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models WWW 2026
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems, yet their performance is critically dependent on the design of their underlying collaboration topology. As MAS become increasingly deployed in web services (e.g., search engines), designing adaptive topologies for diverse cross-domain user queries becomes essential. Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm, where a specialized model is trained for each specific task domain. This approach suffers from poor generalization to unseen domains and fails to leverage shared structural knowledge across different tasks. To address this, we propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language through a single universal model. Our approach integrates a Task-Aware Graph State Encoder (TAGSE) that filters task-relevant node information via sparse gating, and a Mixture-of-Experts (MoE) architecture that dynamically selects specialized sub-networks to drive node and edge prediction. We employ a three-stage training strategy: unconditional pre-training on canonical topologies for structural priors, large-scale conditional pre-training on LLM-generated datasets for task-topology mappings, and supervised fine-tuning on empirically validated graphs. Experiments across six diverse benchmarks show that OFA-TAD significantly outperforms specialized one-for-one models, generating highly adaptive MAS topologies. Code: https://github.com/Shiy-Li/OFA-MAS.
comment: Accepted by WWW 2026
Communication Methods in Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning is a promising research area that extends established reinforcement learning approaches to problems formulated as multi-agent systems. Recently, a multitude of communication methods have been introduced to this field to address problems such as partially observable environments, non-stationarity, and exponentially growing action spaces. Communication further enables efficient cooperation among all agents interacting in an environment. This work aims at providing an overview of communication techniques in multi-agent reinforcement learning. By an in-depth analysis of 29 publications on this topic, the strengths and weaknesses of explicit, implicit, attention-based, graph-based, and hierarchical/role-based communication are evaluated. The results of this comparison show that there is no general, optimal communication framework for every problem. On the contrary, the choice of communication depends heavily on the problem at hand. The comparison also highlights the importance of communication methods with low computational overhead to enable scalability to environments where many agents interact. Finally, the paper discusses current research gaps, emphasizing the need for standardized benchmarking of system-level metrics and improved robustness under realistic communication conditions to enhance the real-world applicability of these approaches.
comment: 12 pages, 2 figures
The Cost of EFX: Generalized-Mean Welfare and Complexity Dichotomies with Few Surplus Items
Envy-freeness up to any good (EFX) is a central fairness notion for allocating indivisible goods, yet its existence is unresolved in general. In the setting with few surplus items, where the number of goods exceeds the number of agents by a small constant (at most three), EFX allocations are guaranteed to exist, shifting the focus from existence to efficiency and computation. We study how EFX interacts with generalized-mean ($p$-mean) welfare, which subsumes commonly-studied utilitarian ($p=1$), Nash ($p=0$), and egalitarian ($p \rightarrow -\infty$) objectives. We establish sharp complexity dichotomies at $p=0$: for any fixed $p \in (0,1]$, both deciding whether EFX can attain the global $p$-mean optimum and computing an EFX allocation maximizing $p$-mean welfare are NP-hard, even with at most three surplus goods; in contrast, for any fixed $p \leq 0$, we give polynomial-time algorithms that optimize $p$-mean welfare within the space of EFX allocations and efficiently certify when EFX attains the global optimum. We further quantify the welfare loss of enforcing EFX via the price of fairness framework, showing that for $p > 0$, the loss can grow linearly with the number of agents, whereas for $p \leq 0$, it is bounded by a constant depending on the surplus (and for Nash welfare it vanishes asymptotically). Finally we show that requiring Pareto-optimality alongside EFX is NP-hard (and becomes $Σ_2^P$-complete for a stronger variant of EFX). Overall, our results delineate when EFX is computationally costly versus structurally aligned with welfare maximization in the setting with few surplus items.
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives
We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-layer transformer with linear self-attention (LSA) trained to implement gradient-descent-like updates on a quadratic regression objective from in-context examples. We then study the coupled dynamics when two such LSA agents alternately update from each other's outputs under potentially misaligned fixed objectives. Within this framework, we characterize the generation dynamics and show that misalignment leads to a biased equilibrium where neither agent reaches its target, with residual errors predictable from the objective gap and the prompt-induced geometry. We also characterize an adversarial regime where asymmetric convergence is possible: one agent reaches its objective exactly while inducing persistent bias in the other. We further contrast this fixed objective regime with an adaptive multi-agent setting, wherein a helper agent updates a turn-based objective to implement a Newton-like step for the main agent, eliminating the plateau and accelerating its convergence. Experiments with trained LSA agents, as well as black-box GPT-5-mini runs on in-context linear regression tasks, are consistent with our theoretical predictions within this simplified setting. We view our framework as a mechanistic framework that links prompt geometry and objective misalignment to stability, bias, and robustness, and as a stepping stone toward analyzing more realistic multi-agent LLM systems.
Systems and Control (EESS)
Distribution-Free Confidence Ellipsoids for Ridge Regression with PAC Bounds
Linearly parametrized models are widely used in control and signal processing, with the least-squares (LS) estimate being the archetypical solution. When the input is insufficiently exciting, the LS problem may be unsolvable or numerically unstable. This issue can be resolved through regularization, typically with ridge regression. Although regularized estimators reduce the variance error, it remains important to quantify their estimation uncertainty. A possible approach for linear regression is to construct confidence ellipsoids with the Sign-Perturbed Sums (SPS) ellipsoidal outer approximation (EOA) algorithm. The SPS EOA builds non-asymptotic confidence ellipsoids under the assumption that the noises are independent and symmetric about zero. This paper introduces an extension of the SPS EOA algorithm to ridge regression, and derives probably approximately correct (PAC) upper bounds for the resulting region sizes. Compared with previous analyses, our result explicitly show how the regularization parameter affects the region sizes, and provide tighter bounds under weaker excitation assumptions. Finally, the practical effect of regularization is also demonstrated via simulation experiments.
On the Relation of State Space Models and Hidden Markov Models
State Space Models (SSMs) and Hidden Markov Models (HMMs) are foundational frameworks for modeling sequential data with latent variables and are widely used in signal processing, control theory, and machine learning. Despite their shared temporal structure, they differ fundamentally in the nature of their latent states, probabilistic assumptions, inference procedures, and training paradigms. Recently, deterministic state space models have re-emerged in natural language processing through architectures such as S4 and Mamba, raising new questions about the relationship between classical probabilistic SSMs, HMMs, and modern neural sequence models. In this paper, we present a unified and systematic comparison of HMMs, linear Gaussian state space models, Kalman filtering, and contemporary NLP state space models. We analyze their formulations through the lens of probabilistic graphical models, examine their inference algorithms -- including forward-backward inference and Kalman filtering -- and contrast their learning procedures via Expectation-Maximization and gradient-based optimization. By highlighting both structural similarities and semantic differences, we clarify when these models are equivalent, when they fundamentally diverge, and how modern NLP SSMs relate to classical probabilistic models. Our analysis bridges perspectives from control theory, probabilistic modeling, and modern deep learning.
Safe Navigation in Cluttered Environments Via Spline-Based Harmonic Potential Fields
We provide a complete motion-planning mechanism that ensures target tracking and obstacle avoidance in a cluttered environment. For a given polyhedral decomposition of the feasible space, we adopt a novel procedure that constrains the agent to move only through a prescribed sequence of cells via a suitable control policy. For each cell, we construct a harmonic potential surface induced by a Dirichlet boundary condition given as a cardinal B-spline curve. A detailed analysis of the curve behavior (periodicity, support) and of the associated control point selection allows us to explicitly compute these harmonic potential surfaces, from which we subsequently derive the corresponding control policy. We illustrate that the resulting construction funnels the agent safely along the chain of cells from the starting point to the target.
Autonomous Navigation at the Nano-Scale: Algorithms, Architectures, and Constraints
Autonomous navigation for nano-scale unmanned aerial vehicles (nano-UAVs) is governed by extreme Size, Weight, and Power (SWaP) constraints (with the weight < 50 g and sub-100 mW onboard processor), distinguishing it fundamentally from standard robotic paradigms. This review synthesizes the state-of-the-art in sensing, computing, and control architectures designed specifically for these sub- 100mW computational envelopes. We critically analyse the transition from classical geometry-based methods to emerging "Edge AI" paradigms, including quantized deep neural networks deployed on ultra-low-power System-on-Chips (SoCs) and neuromorphic event-based control. Beyond algorithms, we evaluate the hardware-software co-design requisite for autonomy, covering advancements in dense optical flow, optimized Simultaneous Localization and Mapping (SLAM), and learning-based flight control. While significant progress has been observed in visual navigation and relative pose estimation, our analysis reveals persistent gaps in long-term endurance, robust obstacle avoidance in dynamic environments, and the "Sim-to-Real" transfer of reinforcement learning policies. This survey provides a roadmap for bridging these gaps, advocating for hybrid architectures that fuse lightweight classical control with data-driven perception to enable fully autonomous, agile nano-UAVs in GPS-denied environments.
comment: 28 pages, 5 figures, 1 table. Review article
Modelling viable supply networks with cooperative adaptive financing
We propose a financial liquidity policy sharing method for firm-to-firm supply networks, introducing a scalable autonomous control function for viable complex adaptive supply networks. Cooperation and competition in supply chains is reconciled through overlapping collaborative sets, making firms interdependent and enabling distributed risk governance. How cooperative range - visibility - affects viability is studied using dynamic complex adaptive systems modelling. We find that viability needs cooperation; visibility and viability grow together in scale-free supply networks; and distributed control, where firms only have limited partner information, outperforms centralised control. This suggests that policy toward network viability should implement distributed supply chain financial governance, supporting interfirm collaboration, to enable autonomous control.
Emissions and cost tradeoffs of time-matched clean electricity procurement under inter-annual weather variability: case study of hydrogen production
Time-matching requirements (TMRs) for clean electricity procurement are increasingly adopted in voluntary corporate sustainability initiatives and regulatory frameworks. While prior research has evaluated cost and emissions impacts of hourly vs. annual TMR, these studies typically rely on single-year weather scenarios that do not capture inter-annual variability in variable renewable energy (VRE) generation. We use a capacity expansion model to assess how inter-annual weather variability affects procurement-driven infrastructure investments, costs, and emissions for a grid-connected hydrogen producer under both annual and hourly time-matching strategies. Using a Texas case study, we compare deterministic (single weather scenario) and stochastic (nine weather scenarios) modeling approaches. Both procurement investments and cost and emissions outcomes are sensitive to weather scenario, with annual matching exhibiting greater sensitivity than hourly matching. Stochastic modeling finds higher cost premiums for hourly versus annual matching compared to deterministic modeling, though emissions trends remain directionally consistent. Demand flexibility through H2 storage is critical for lowering hourly matching cost premiums under weather-driven VRE variability. Partial hourly matching (e.g., 80-90% compliance) can modestly reduce costs while maintaining minimal emissions impacts. Finally, we examine how grid-level renewable portfolio standards (RPS) affect additionality and emissions. When stringent additionality is achieved via binding RPS constraints on non-H2 electricity demand, annual matching can produce emissions reductions comparable to hourly matching at lower cost.
comment: 7 Figures, 1 table (main text)
QoS-Aware Energy Optimization via Cell Switching in Heterogeneous Networks
The growing demand for mobile data services in dense urban areas has intensified the need for energy-efficient radio access networks (RANs) in future 6G systems. In this context, one promising strategy is cell switching (CS), which dynamically deactivates underutilized small base stations (SBSs) to reduce power consumption. However, while previous research explored CS primarily based on traffic load, ensuring user quality of service (QoS) under realistic channel conditions remains a challenge. In this paper, we propose a novel optimization-driven CS framework that jointly minimizes network power consumption and guarantees user QoS by enforcing a minimum received power threshold as part of offloading decisions. In contrast to prior load-based or learning-based approaches, our method explicitly integrates channel-aware information into the CS process, thus ensuring reliable service quality for offloaded users. Furthermore, flexibility of the proposed framework enables operators to adapt system behavior between energy-saving and QoS-preserving modes by tuning a single design parameter. Simulation results demonstrate that the proposed approach achieves up to 30% power savings as compared to baseline methods while fully maintaining QoS under diverse network conditions. Scalability and robustness of the proposed method in realistic heterogeneous networks (HetNets) further highlight its potential as a practical solution for sustainable 6G deployments.
IntAgent: NWDAF-Based Intent LLM Agent Towards Advanced Next Generation Networks
Intent-based networks (IBNs) are gaining prominence as an innovative technology that automates network operations through high-level request statements, defining what the network should achieve. In this work, we introduce IntAgent, an intelligent intent LLM agent that integrates NWDAF analytics and tools to fulfill the network operator's intents. Unlike previous approaches, we develop an intent tools engine directly within the NWDAF analytics engine, allowing our agent to utilize live network analytics to inform its reasoning and tool selection. We offer an enriched, 3GPP-compliant data source that enhances the dynamic, context-aware fulfillment of network operator goals, along with an MCP tools server for scheduling, monitoring, and analytics tools. We demonstrate the efficacy of our framework through two practical use cases: ML-based traffic prediction and scheduled policy enforcement, which validate IntAgent's ability to autonomously fulfill complex network intents.
comment: conference
Exploiting Light To Enhance The Endurance and Navigation of Lighter-Than-Air Micro-Drones
Micro-Unmanned Aerial Vehicles (UAVs) are rapidly expanding into tasks from inventory to environmental sensing, yet their short endurance and unreliable navigation in GPS-denied spaces limit deployment. Lighter-Than-Air (LTA) drones offer an energy-efficient alternative: they use a helium envelope to provide buoyancy, which enables near-zero-power drain during hovering and much longer operation. LTAs are promising, but their design is complex, and they lack integrated solutions to enable sustained autonomous operations and navigation with simple, low-infrastructure. We propose a compact, self-sustaining LTA drone that uses light for both energy harvesting and navigation. Our contributions are threefold: (i) a high-fidelity simulation framework to analyze LTA aerodynamics and select a stable, efficient configuration; (ii) a framework to integrate solar cells on the envelope to provide net-positive energy; and (iii) a point-and-go navigation system with three light-seeking algorithms operating on a single light beacon. Our LTA-analysis, together with the integrated solar panels, not only saves energy while flying, but also enables sustainable operation: providing 1 minute of flying time for every 4 minutes of energy harvesting, under illuminations of 80klux. We also demonstrate robust single-beacon navigation towards a light source that can be up to 7m away, in indoor and outdoor environments, even with moderate winds. The resulting system indicates a plausible path toward persistent, autonomous operation for indoor and outdoor monitoring. More broadly, this work provides a practical pathway for translating the promise of LTA drones into a persistent, self-sustaining aerial system.
Stability of Information-Based Routing in Dynamic Transportation Networks
Recent studies on transportation networks have shown that real-time route guidance can inadvertently induce congestion or oscillatory traffic patterns. Nevertheless, such technologies also offer a promising opportunity to manage traffic non-intrusively by shaping the information delivered to users, thereby mitigating congestion and enhancing network stability. A key step toward this goal is to identify information signals that ensure the existence of an equilibrium with desirable stability and convergence properties. This challenge is particularly relevant when traffic density and routing dynamics evolve concurrently, as increasingly occurs with digital signaling and real-time navigation technologies. To address this, we analyze a parallel-path transportation network with a single origin-destination pair, incorporating joint traffic density and logit-based routing dynamics that evolve at the same timescale. We characterize a class of density-dependent traffic information that guarantees a unique equilibrium in the free-flow regime, ensures its asymptotic stability, and keeps traffic densities within the free-flow region for all time. The theoretical results are complemented by a numerical case study demonstrating how the framework can inform the design of traffic information that reduces total travel time without compromising credibility.
Convex Model Predictive Control for Safe Output Consensus of Nonlinear Multi-Agent Systems
Nonlinear dynamics and safety constraints typically result in a nonlinear programming problem when applying model predictive control to achieve safe output consensus. To avoid the heavy computational burden of solving a nonlinear programming problem directly, this paper proposes a novel Convex Model Predictive Control (CMPC) approach based on a Sequential Quadratic Programming (SQP) scheme. The core of our method lies in transforming the nonlinear constraints into linear forms: we linearize the system dynamics and convexify the discrete-time high-order control barrier functions using a proposed tangent-line projection method. Consequently, the original problem is reduced to a quadratic program that can be iteratively solved within the SQP scheme at each time step of CMPC. Furthermore, we provide the formal guarantee of the convergence of the SQP scheme, and subsequently guarantee the recursive feasibility and stability of CMPC. Simulations on multi-agent systems with unicycle dynamics demonstrate a 35-52 times reduction in computation time compared with baseline methods, confirming the suitability of the proposed approach for real-time safe output consensus control.
Feedforward-Feedback Integration in Flight Control: Reinforcement Learning with Sliding Mode Control
Learning-based controllers leverage nonlinear couplings and enhance transients but seldom offer guarantees under tight input constraints. Robust feedback like sliding-mode control (SMC) provides these guarantees but is conservative in isolation. This paper creates a learning-augmented framework where a deep reinforcement learning policy produces feedforward commands and an SMC law imposes actuator limits, bounds learned authority, and guarantees robustness. The policy is modeled as a matched, bounded input, and Lyapunov-based conditions link SMC gains to the admissible feedforward bound, guaranteeing stability under saturation. This formulation is applicable to nonlinear, underactuated plants with hard constraints. To illustrate the methodology, the method is applied to a six-degree-of-freedom aircraft model and compared with Reinforcement Learning and isolated SMC. Simulation results show that the hybrid controller improves transient behavior and reduces control oscillations compared to standalone RL and SMC controllers, while preserving robustness under modeling uncertainties and disturbances. Even using it with partially trained policies, SMC component of the control stabilizes transients, whereas fully trained policies provide faster convergence, reduced constraint violations, and robustness. These results illustrate that learning-augmented control offers superior performance with robustness guarantees under tight input constraints.
comment: 16 pages
Guiding vector field-based guidance under wind disturbances applied to a tailsitter UAV
This paper develops a guidance control law based on a parametric Guiding Vector Field (GVF) and integrates it with a state-of-the-art acceleration and attitude control architecture for tailsitters. The resulting framework enables a direct comparison between traditional trajectory-tracking guidance and GVF-based path-following guidance using a realistic tailsitter model operating under windy conditions. Through extensive simulations, it is shown that for agile flight scenarios with wind and small initial position error, both guidance strategies achieve comparable tracking performance, indicating that the additional complexity introduced by the GVF formulation is not always justified. However, the GVF-based approach exhibits an advantage when initial deviation from the path is present, yielding smooth and well-behaved convergence toward the desired path. Two additional contributions support this evaluation. First, a modification of the parametric GVF is proposed that guarantees exponential stability of the tracking error dynamics for a single integrator system. Second, the differential flatness transform of a tailsitter vehicle is extended to account for explicit knowledge of the wind velocity vector.
Imitation learning-based spacecraft rendezvous and docking method with Expert Demonstration
Existing spacecraft rendezvous and docking control methods largely rely on predefined dynamic models and often exhibit limited robustness in realistic on-orbit environments. To address this issue, this paper proposes an Imitation Learning-based spacecraft rendezvous and docking control framework (IL-SRD) that directly learns control policies from expert demonstrations, thereby reducing dependence on accurate modeling. We propose an anchored decoder target mechanism, which conditions the decoder queries on state-related anchors to explicitly constrain the control generation process. This mechanism enforces physically consistent control evolution and effectively suppresses implausible action deviations in sequential prediction, enabling reliable six-degree-of-freedom (6-DOF) rendezvous and docking control. To further enhance stability, a temporal aggregation mechanism is incorporated to mitigate error accumulation caused by the sequential prediction nature of Transformer-based models, where small inaccuracies at each time step can propagate and amplify over long horizons. Extensive simulation results demonstrate that the proposed IL-SRD framework achieves accurate and energy-efficient model-free rendezvous and docking control. Robustness evaluations further confirm its capability to maintain competitive performance under significant unknown disturbances. The source code is available at https://github.com/Dongzhou-1996/IL-SRD.
comment: 6 figures, 4 tables. Focus on 6-DOF spacecraft rendezvous and docking control using imitation learning-based control method
From Vertices to Convex Hulls: Certifying Set-Wise Compatibility for CBF Constraints
This paper develops certificates that propagate compatibility of multiple control barrier function (CBF) constraints from sampled vertices to their convex hull. Under mild concavity and affinity assumptions, we present three sufficient feasibility conditions under which feasible inputs over the convex hull can be obtained per coordinate, with a common input, or via convex blending. We also describe the associated computational methods, based on interval intersections or an offline linear program (LP). Beyond certifying compatibility, we give conditions under which the quadratic-program (QP) safety filter is affine in the state. This enables explicit implementations via convex combinations of vertex-feasible inputs. Case studies illustrate the results.
Report on Earth Observation Missions and Ground Station Management using On-Demand Satellite Operation System
Since the launch of its first satellite in 2009, Tohoku University has continuously developed and operated Earth observation satellites and engineering demonstration satellites in the 50cm-class and CubeSat-class (up to 3U). The 50cm-class satellite launched into operation in 2021 enabled efficient operations through cloud-based management functions for both the satellite and ground stations, including automatic command generation. By 2022, up to eight operational satellites were simultaneously managed on a daily basis using three ground stations (Sendai, Hakodate, and Sweden). This paper presents the operational achievements to date and introduces the system that supports efficient satellite operations
comment: 8 pages, 11 figures. ISTS 35th, July 12-18, 2025
System Analysis and Pre-Flight Evaluation of Deployable Solar Panels for 3U CubeSat HOKUSHIN-1
This paper describes the system design methodology derived from the development and evaluation tests of deployable solar panels to be mounted on a 3U CubeSat. The study mainly includes structural analysis, thermal analysis, and a review of vibration test results. Hokkaido University is developing the 3U CubeSat HOKUSHIN-1 in collaboration with Tohoku University and Muroran Institute of Technology. Deployable solar panels are a key technology for future planned lunar exploration missions, as they enable power-intensive communication and propulsion required for orbit control. The satellite also demonstrates a newly developed compact and efficient propulsion system. The satellite has dimensions of approximately 10x10x34 cm, a mass of 3.99 kg, and will be deployed into a circular orbit at an altitude of about 400 km with an orbital inclination of 51.6 degrees from the International Space Station.
comment: 8 pages, 12 figures, 3 tables. ISTS 34th, June 3-9, 2023
Lessons Learned from Structural Design and Vibration Testing of 50-kg Microsatellites Deployed from the International Space Station
Hokkaido University and Tohoku University have been developing and operating a constellation of 50-cm-class microsatellites for Earth observation. DIWATA-1, launched in 2016, was deployed into a circular orbit at an altitude of approximately 400 km from the International Space Station (ISS). For the subsequent satellite developed in 2021, the structural design and vibration test campaign were optimized to meet a strict one-year development schedule. This paper summarizes how the structural design of the previous satellite was reviewed and updated, and how the vibration test was successfully completed in a single trial to minimize schedule and technical risks. These lessons learned provide valuable insights, as there are only a limited number of reported cases of 50-kg-class microsatellites deployed from the ISS.
comment: 8 pages, 17 figures, 6 tables. ISTS 33rd, February 26-March 4, 2022
Sensing-Limited Control of Noiseless Linear Systems Under Nonlinear Observations
This paper investigates the fundamental information-theoretic limits for the control and sensing of noiseless linear dynamical systems subject to a broad class of nonlinear observations. We analyze the interactions between the control and sensing components by characterizing the minimum information flow required for stability. Specifically, we derive necessary conditions for mean-square observability and stabilizability, demonstrating that the average directed information rate from the state to the observations must exceed the intrinsic expansion rate of the unstable dynamics. Furthermore, to address the challenges posed by non-Gaussian distributions inherent to nonlinear observation channels, we establish sufficient conditions by imposing regularity assumptions, specifically log-concavity, on the system's probabilistic components. We show that under these conditions, the divergence of differential entropy implies the convergence of the estimation error, thereby closing the gap between information-theoretic bounds and estimation performance. By establishing these results, we unveil the fundamental performance limits imposed by the sensing layer, extending classical data-rate constraints to the more challenging regime of nonlinear observation models.
comment: 5 pages, conference
Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration
Zero-shot Vision-and-Language Navigation (VLN) agents leveraging Large Language Models (LLMs) excel in generalization but suffer from insufficient spatial perception. Focusing on complex continuous environments, we categorize key perceptual bottlenecks into three spatial challenges: door interaction,multi-room navigation, and ambiguous instruction execution, where existing methods consistently suffer high failure rates. We present Spatial-VLN, a perception-guided exploration framework designed to overcome these challenges. The framework consists of two main modules. The Spatial Perception Enhancement (SPE) module integrates panoramic filtering with specialized door and region experts to produce spatially coherent, cross-view consistent perceptual representations. Building on this foundation, our Explored Multi-expert Reasoning (EMR) module uses parallel LLM experts to address waypoint-level semantics and region-level spatial transitions. When discrepancies arise between expert predictions, a query-and-explore mechanism is activated, prompting the agent to actively probe critical areas and resolve perceptual ambiguities. Experiments on VLN-CE demonstrate that Spatial VLN achieves state-of-the-art performance using only low-cost LLMs. Furthermore, to validate real-world applicability, we introduce a value-based waypoint sampling strategy that effectively bridges the Sim2Real gap. Extensive real-world evaluations confirm that our framework delivers superior generalization and robustness in complex environments. Our codes and videos are available at https://yueluhhxx.github.io/Spatial-VLN-web/.
Resource-Conscious RL Algorithms for Deep Brain Stimulation
Deep Brain Stimulation (DBS) has proven to be a promising treatment of Parkinson's Disease (PD). DBS involves stimulating specific regions of the brain's Basal Ganglia (BG) using electric impulses to alleviate symptoms of PD such as tremors, rigidity, and bradykinesia. Although most clinical DBS approaches today use a fixed frequency and amplitude, they suffer from side effects (such as slurring of speech) and shortened battery life of the implant. Reinforcement learning (RL) approaches have been used in recent research to perform DBS in a more adaptive manner to improve overall patient outcome. These RL algorithms are, however, too complex to be trained in vivo due to their long convergence time and requirement of high computational resources. We propose a new Time & Threshold-Triggered Multi-Armed Bandit (T3P MAB) RL approach for DBS that is more effective than existing algorithms. Further, our T3P agent is lightweight enough to be deployed in the implant, unlike current deep-RL strategies, and even forgoes the need for an offline training phase. Additionally, most existing RL approaches have focused on modulating only frequency or amplitude, and the possibility of tuning them together remains greatly unexplored in the literature. Our RL agent can tune both frequency and amplitude of DBS signals to the brain with better sample efficiency and requires minimal time to converge. We implement an MAB agent for DBS for the first time on hardware to report energy measurements and prove its suitability for resource-constrained platforms. Our T3P MAB algorithm is deployed on a variety of microcontroller unit (MCU) setups to show its efficiency in terms of power consumption as opposed to other existing RL approaches used in recent work.
From Noise to Knowledge: System Identification with Systematic Polytope Construction via Cyclic Reformulation
Model-based control requires accurate mathematical models to guarantee control performance and stability. However, obtaining accurate models is challenging due to process and sensor noise. This paper proposes a novel identification algorithm that derives polytopic uncertainty models by interpreting noise-induced parameter fluctuations as intrinsic uncertainty. The method applies cyclic reformulation with period N to linear time-invariant systems, yielding N parameter sets with slight variations that serve as polytope vertices. This enables systematic polytopic model construction from a single identification experiment. Simulation results demonstrate significant improvements: the proposed method achieves higher parameter estimation accuracy and reduces prediction errors by approximately half compared to conventional approaches. The vertex count N provides systematic control over the precision of uncertainty representation.
Closed-loop Uplink Radio Resource Management in CF-O-RAN Empowered 5G Aerial Corridor
In this paper, we investigate the uplink (UL) radio resource management for 5G aerial corridors with an open-radio access network (O-RAN)-enabled cell-free (CF) massive multiple-input multiple-output (mMIMO) system. Our objective is to maximize the minimum spectral efficiency (SE) by jointly optimizing unmanned aerial vehicle (UAV)-open radio unit (O-RU) association and UL transmit power under quality-of-service (QoS) constraints. Owing to its NP-hard nature, the formulated problem is decomposed into two tractable sub-problems solved via alternating optimization (AO) using two computationally efficient algorithms. We then propose (i) a QoS-driven and multi-connectivity-enabled association algorithm incorporating UAV-centric and O-RU-centric criteria with targeted refinement for weak UAVs, and (ii) a bisection-guided fixed-point power control algorithm achieving global optimality with significantly reduced complexity, hosted as xApp at the near-real-time (near-RT) RAN intelligent controller (RIC) of O-RAN. Solving the resource-allocation problem requires global channel state information (CSI), which incurs substantial measurement and signaling overhead. To mitigate this, we leverage a channel knowledge map (CKM) within the O-RAN non-RT RIC to enable efficient environment-aware CSI inference. Simulation results show that the proposed framework achieves up to 440% improvement in minimum SE, 100% QoS satisfaction and fairness, while reducing runtime by up to 99.7% compared to an interior point solver-based power allocation solution, thereby enabling O-RAN compliant real-time deployment.
comment: 6 Pages, Accepted in IEEE ICC 2026
Priority-Based Bandwidth Allocation in Network Slicing-Enabled Cell-Free Massive MIMO Systems
This paper addresses joint admission control and per-user equipment (UE) bandwidth allocation to maximize weighted sum-rate in network slicing-enabled user-centric cell-free (CF) massive multiple-input multiple-output (mMIMO) systems when aggregate quality-of-service (QoS) demand may exceed available bandwidth. Specifically, we optimize bandwidth allocation while satisfying heterogeneous QoS requirements across enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) slices in the uplink. The formulated problem is NP-hard, rendering global optimality computationally intractable. We decompose it into two sub-problems and solve them via computationally efficient heuristics within a sequential framework. We propose (i) a hierarchical admission control scheme that selectively admits UEs under bandwidth scarcity, prioritizing URLLC to ensure latency-sensitive QoS compliance, and (ii) an iterative gradient-based bandwidth allocation scheme that transfers bandwidth across slices guided by marginal utility and reallocates resources within slices. Simulation results demonstrate that the proposed scheme achieves near-optimal performance, deviating from a CVX-based benchmark by at most 2.2% in weighted sum-rate while reducing runtime by 99.7%, thereby enabling practical real-time deployment. Compared to a baseline round-robin scheme without admission control, the proposed approach achieves up to 1085% and 7% higher success rates for eMBB and URLLC slices, respectively, by intentionally sacrificing sum-rate to guarantee QoS. Sensitivity analysis further reveals that the proposed solution adapts effectively to diverse eMBB/URLLC traffic compositions, maintaining 47-51% eMBB and 93-94% URLLC success rates across varying load scenarios, confirming its robustness for resource-constrained large-scale deployments.
comment: 6 Pages, Accepted in IEEE ICC 2026
Network Slicing Resource Management in Uplink User-Centric Cell-Free Massive MIMO Systems
This paper addresses the joint optimization of per-user equipment (UE) bandwidth allocation and UE-access point (AP) association to maximize weighted sum-rate while satisfying heterogeneous quality-of-service (QoS) requirements across enhanced mobile broadband (eMBB) and ultra-reliable low-latency communication (URLLC) slices in the uplink of a network slicing-enabled user-centric cell-free (CF) massive multiple-input multiple-output (mMIMO) system. The formulated problem is NP-hard, rendering global optimality computationally intractable. To address this challenge, it is decomposed into two sub-problems, each solved by a computationally efficient heuristic scheme, and jointly optimized through an alternating optimization framework. We then propose (i) a bandwidth allocation scheme that balances UE priority, spectral efficiency, and minimum bandwidth demand under limited resources to ensure fair QoS distribution, and (ii) a priority-based UE-AP association assignment approach that balances UE service quality with system capacity constraints. Together, these approaches provide a practical and computationally efficient solution for resource-constrained network slicing scenarios, where QoS feasibility is often violated under dense deployments and limited bandwidth, necessitating graceful degradation and fair QoS preservation rather than solely maximizing the aggregate sum-rate. Simulation results demonstrate that the proposed scheme achieves up to 52% higher weighted sum-rate, 140% and 58% higher QoS success rates for eMBB and URLLC slices, respectively, while reducing runtime by up to 97% compared to considered benchmarks.
comment: 6 pages, Accepted in IEEE ICC 2026
Multiagent Reinforcement Learning in Enhancing Resilience of Microgrids under Extreme Weather Events
Grid resilience is crucial in light of power interruptions caused by increasingly frequent extreme weather events. Well-designed energy management systems (EMS) have made progress in improving microgrid resilience through the coordination of distributed energy resources (DERs), but still face significant challenges in addressing the uncertainty of load demand caused by extreme weather. The integration of deep reinforcement learning (DRL) into EMS design enables optimized microgrid control strategies for coordinating DERs. Building on this, we proposed a cooperative multi-agent deep reinforcement learning (MADRL)-based EMS framework to provide flexible scalability for microgrids, enhance resilience and reduce operational costs during power outages. Specifically, the gated recurrent unit with a gating mechanism was introduced to extract features from temporal data, which enables the EMS to coordinate DERs more efficiently. Next, the proposed MADRL method incorporating action masking techniques was evaluated in the IEEE 33-Bus system using real-world data on renewable generation and power load. Finally, the numerical results demonstrated the superiority of the proposed method in reducing operating costs as well as the effectiveness in enhancing microgrid resilience during power interruptions.
comment: 26 pages, 9 figures
Can the Waymo Open Motion Dataset Support Realistic Behavioral Modeling? A Validation Study with Naturalistic Trajectories
The Waymo Open Motion Dataset (WOMD) has become a popular resource for data-driven modeling of autonomous vehicles (AVs) behavior. However, its validity for behavioral analysis remains uncertain due to proprietary post-processing, the absence of error quantification, and the segmentation of trajectories into 20-second clips. This study examines whether WOMD accurately captures the dynamics and interactions observed in real-world AV operations. Leveraging an independently collected naturalistic dataset from Level 4 AV operations in Phoenix, Arizona (PHX), we perform comparative analyses across three representative urban driving scenarios: discharging at signalized intersections, car-following, and lane-changing behaviors. For the discharging analysis, headways are manually extracted from aerial video to ensure negligible measurement error. For the car-following and lane-changing cases, we apply the Simulation-Extrapolation (SIMEX) method to account for empirically estimated error in the PHX data and use Dynamic Time Warping (DTW) distances to quantify behavioral differences. Results across all scenarios consistently show that behavior in PHX falls outside the behavioral envelope of WOMD. Notably, WOMD underrepresents short headways and abrupt decelerations. These findings suggest that behavioral models calibrated solely on WOMD may systematically underestimate the variability, risk, and complexity of naturalistic driving. Caution is therefore warranted when using WOMD for behavior modeling without proper validation against independently collected data.
Invertibility Conditions for the Admittance Matrices of Balanced Power Systems
The admittance matrix encodes the network topology and electrical parameters of a power system in order to relate the current injection and voltage phasors. Since admittance matrices are central to many power engineering analyses, their characteristics are important subjects of theoretical studies. This paper focuses on the key characteristic of \emph{invertibility}. Previous literature has presented an invertibility condition for admittance matrices. This paper first identifies and fixes a technical issue in the proof of this previously presented invertibility condition. This paper then extends this previous work by deriving new conditions that are applicable to a broader class of systems with lossless branches and transformers with off-nominal tap ratios.
comment: 12 pages, 4 figures, 1 table, published in IEEE Transactions on Power Systems
$k$-Inductive and Interpolation-Inspired Barrier Certificates for Stochastic Dynamical Systems
In this paper, we introduce two new types of barrier certificates that are based on multiple functions rather than a single one. A conventional barrier certificate for a stochastic dynamical system is a nonnegative real-valued function whose expected value does not increase as the system evolves. This requirement guarantees that the barrier certificate forms a nonnegative supermartingale and can be used to derive a lower bound on the probability that the system remains safe. A key advantage of such certificates is that they can be automatically searched for using tools such as optimization programs instantiated with a fixed template. When this search is unsuccessful, the common practice is to modify the template and attempt the synthesis again. Drawing inspiration from logical interpolation, we first propose an alternative framework that uses a collection of functions to jointly serve as a barrier certificate. We refer to this construct as an interpolation-inspired barrier certificate. Nonetheless, we observe that these certificates still require one function in the collection to satisfy a supermartingale condition. Motivated by recent work in the literature, we next combine k-induction with interpolation-inspired certificates to relax this supermartingale constraint. We develop a general and more flexible notion of barrier certificates, which we call k-inductive interpolation-inspired barrier certificates. This formulation encompasses multiple ways of integrating interpolation-inspired barrier certificates with k-induction. We highlight two specific instantiations among these possible combinations. For polynomial systems, we employ sum-of-squares (SOS) programming to synthesize the corresponding set of functions. Finally, through our case studies, we show that the proposed methods enable the use of simpler templates and yield tighter lower bounds on the safety probability.
comment: Under review with a journal
A predictive modular approach to constraint satisfaction under uncertainty -- with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, employing a metabolic network model consisting of 23 extracellular metabolites and 126 reactions. In the case study, the average constraint-violation cost is reduced by more than 60% compared to the case without the proposed constraint-handling method.
comment: Published in Journal of Process Control
Gauss-Newton accelerated MPPI Control
Model Predictive Path Integral (MPPI) control is a sampling-based optimization method that has recently attracted attention, particularly in the robotics and reinforcement learning communities. MPPI has been widely applied as a GPU-accelerated random search method to deterministic direct single-shooting optimal control problems arising in model predictive control (MPC) formulations. MPPI offers several key advantages, including flexibility, robustness, ease of implementation, and inherent parallelizability. However, its performance can deteriorate in high-dimensional settings since the optimal control problem is solved via Monte Carlo sampling. To address this limitation, this paper proposes an enhanced MPPI method that incorporates a Jacobian reconstruction technique and the second-order Generalized Gauss-Newton method. This novel approach is called \textit{Gauss-Newton accelerated MPPI}. The numerical results show that the Gauss-Newton accelerated MPPI approach substantially improves MPPI scalability and computational efficiency while preserving the key benefits of the classical MPPI framework, making it a promising approach even for high-dimensional problems.
comment: 6 pages, 3 figures, submitted to the IFAC World Congress 2026
Accurate Small-Signal Modeling of Digitally Controlled Buck Converters with ADC-PWM Synchronization
Digital control has become increasingly widespread in modern power electronic converters. When acquiring feedback signals such as the inductor current, synchronizing the analog-to-digital converter (ADC) with the digital pulse-width modulator (DPWM) is commonly employed to accurately track their steady-state average. However, the small-signal implications of such synchronization have not been investigated. This paper presents an exact small-signal model for digitally controlled buck converters operating in forced continuous-conduction mode (FCCM) under constant-frequency current-mode control, explicitly accounting for DPWM-ADC synchronization. Using a sampled-data framework, the proposed model captures all sideband effects introduced by the sampling process, yielding precise predictions of both analog and digital loop gains, even at frequencies beyond the switching and sampling frequencies. Both asymmetrical and symmetrical carrier modulations are considered. Furthermore, the digital loop gain is derived in closed form using the modified z-transform, enabling low-complexity compensator design and stability assessment. Within this framework, the analog loop gain can be directly obtained from the digital loop gain, thereby eliminating the need for computationally intensive infinite series evaluations. The validity of the proposed model is confirmed through both simulation and experimental results.
On the static and small signal analysis of DAB converter
This document develops a method to solve the periodic operating point of Dual-Active-Bridge (DAB).
Towards a constructive framework for control theory
This work presents a framework for control theory based on constructive analysis to account for discrepancy between mathematical results and their implementation in a computer, also referred to as computational uncertainty. In control engineering, the latter is usually either neglected or considered submerged into some other type of uncertainty, such as system noise, and addressed within robust control. However, even robust control methods may be compromised when the mathematical objects involved in the respective algorithms fail to exist in exact form and subsequently fail to satisfy the required properties. For instance, in general stabilization using a control Lyapunov function, computational uncertainty may distort stability certificates or even destabilize the system despite robustness of the stabilization routine with regards to system, actuator and measurement noise. In fact, battling numerical problems in practical implementation of controllers is common among control engineers. Such observations indicate that computational uncertainty should indeed be addressed explicitly in controller synthesis and system analysis. The major contribution here is a fairly general framework for proof techniques in analysis and synthesis of control systems based on constructive analysis which explicitly states that every computation be doable only up to a finite precision thus accounting for computational uncertainty. A series of previous works is overviewed, including constructive system stability and stabilization, approximate optimal controls, eigenvalue problems, Caratheodory trajectories, measurable selectors. Additionally, a new constructive version of the Danskin's theorem, which is crucial in adversarial defense, is presented.
comment: Published under: https://ieeexplore.ieee.org/document/9419858
Message passing-based inference in an autoregressive active inference agent
We present the design of an autoregressive active inference agent in the form of message passing on a factor graph. Expected free energy is derived and distributed across a planning graph. The proposed agent is validated on a robot navigation task, demonstrating exploration and exploitation in a continuous-valued observation space with bounded continuous-valued actions. Compared to a classical optimal controller, the agent modulates action based on predictive uncertainty, arriving later but with a better model of the robot's dynamics.
comment: 14 pages, 4 figures, proceedings of the International Workshop on Active Inference 2025. Erratum v1: in Eq. (50), $p(y_t, Θ, u_t \mid y_{*}, \mathcal{D}_k)$ should have been $p(y_t, Θ\mid u_t, y_{*}, \mathcal{D}_k)$
Fusion of Quadratic Time-Frequency Analysis and Convolutional Neural Networks to Diagnose Bearing Faults Under Time-Varying Speeds
Diagnosis of bearing faults is paramount to reducing maintenance costs and operational breakdowns. Bearing faults are primary contributors to machine vibrations, and analyzing their signal morphology offers insights into their health status. Unfortunately, existing approaches are optimized for controlled environments, neglecting realistic conditions such as time-varying rotational speeds and the vibration's non-stationary nature. This paper presents a fusion of time-frequency analysis and deep learning techniques to diagnose bearing faults under time-varying speeds and varying noise levels. First, we formulate the bearing fault-induced vibrations and discuss the link between their non-stationarity and the bearing's inherent and operational parameters. We also elucidate quadratic time-frequency distributions and validate their effectiveness in resolving distinctive dynamic patterns associated with different bearing faults. Based on this, we design a time-frequency convolutional neural network (TF-CNN) to diagnose various faults in rolling-element bearings. Our experimental findings undeniably demonstrate the superior performance of TF-CNN in comparison to recently developed techniques. They also assert its versatility in capturing fault-relevant non-stationary features that couple with speed changes and show its exceptional resilience to noise, consistently surpassing competing methods across various signal-to-noise ratios and performance metrics. Altogether, the TF-CNN achieves substantial accuracy improvements up to 15%, in severe noise conditions.
Virtual Holonomic and Nonholonomic Constraints on Lie groups
This paper develops a geometric framework for virtual constraints on Lie groups, with emphasis on mechanical systems modeled as affine connection systems. Virtual holonomic and virtual nonholonomic constraints, including linear and affine nonholonomic constraints, are formulated directly at the level of the Lie algebra and characterized as feedback--invariant manifolds. For each class of constraint, we establish existence and uniqueness conditions for enforcing feedback laws and show that the resulting closed--loop trajectories evolve as the dynamics of mechanical systems endowed with induced constrained connections, generalizing classical holonomic and nonholonomic reductions. Beyond stabilization, the framework enables the systematic generation of low--dimensional motion primitives on Lie groups by enforcing invariant, possibly affine, manifolds and shaping nontrivial dynamical regimes. The approach is illustrated through representative examples, including quadrotor UAVs and a rigid body with an internal rotor, where classical control laws are recovered as special cases and affine constraint--induced motion primitives are obtained.
comment: The title was changed to better describe the content of the revised version of the manuscript
Validation methodology on real data of reversible Kalman Filter for state estimation with Manifold
This work extends a previous study that introduced an algorithm for state estimation on manifolds within the framework of the Kalman filter. Its objective is to address the limitations of the earlier approach. The reversible Kalman filter was designed to provide a methodology for evaluating the accuracy of existing Kalman filter variants with arbitrary precision on synthetic data. It has favorable numerical properties on synthetic data, achieving arbitrary precision without relying on the small-velocity assumption and depending only on sensor noise. However, its application to real data encountered difficulties related to measurement noise, which was mitigated using a heuristic. In particular, the heuristic involved an event detection step switching between reversible Kalman filter and classical Kalman variant at chosen moments. In the present work, we propose a study of this detection step and propose a methodology to prove at which moment the reversible Kalman approach improves on classical multiplicative variant.
Robustness of quantum algorithms: Worst-case fidelity bounds and implications for design
Errors occurring on noisy hardware pose a key challenge to reliable quantum computing. Existing techniques such as error correction, mitigation, or suppression typically separate the error handling from the algorithm analysis and design. In this paper, we develop an alternative, algorithm-centered framework for understanding and improving the robustness against errors. For a given quantum algorithm and error model, we derive worst-case fidelity bounds which can be efficiently computed to certify the robustness. We consider general error models including coherent and (Markovian) incoherent errors and allowing for set-based error descriptions to address uncertainty or time-dependence in the errors. Our results give rise to guidelines for robust algorithm design and compilation by optimizing our theoretical robustness measure. We demonstrate the practicality of the framework with numerical results on algorithm analysis and robust optimization, including the robustness analysis of a 50-qubit modular adder circuit.
Safe Navigation under State Uncertainty: Online Adaptation for Robust Control Barrier Functions
Measurements and state estimates are often imperfect in control practice, posing challenges for safety-critical applications, where safety guarantees rely on accurate state information. In the presence of estimation errors, several prior robust control barrier function (R-CBF) formulations have imposed strict conditions on the input. These methods can be overly conservative and can introduce issues such as infeasibility, high control effort, etc. This work proposes a systematic method to improve R-CBFs, and demonstrates its advantages on a tracked vehicle that navigates among multiple obstacles. A primary contribution is a new optimization-based online parameter adaptation scheme that reduces the conservativeness of existing R-CBFs. In order to reduce the complexity of the parameter optimization, we merge several safety constraints into one unified numerical CBF via Poisson's equation. We further address the dual relative degree issue that typically causes difficulty in vehicle tracking. Experimental trials demonstrate the overall performance improvement of our approach over existing formulations.
An Adaptive Online Smoother with Closed-Form Solutions and Information-Theoretic Lag Selection for Conditional Gaussian Nonlinear Systems
Data assimilation (DA) combines partial observations with dynamical models to improve state estimation. Filter-based DA uses only past and present data and is the prerequisite for real-time forecasts. Smoother-based DA exploits both past and future observations. It aims to fill in missing data, provide more accurate estimations, and develop high-quality datasets. However, the standard smoothing procedure requires using all historical state estimations, which is storage-demanding, especially for high-dimensional systems. This paper develops an adaptive-lag online smoother for a large class of complex dynamical systems with strong nonlinear and non-Gaussian features, which has important applications to many real-world problems. The adaptive lag allows the utilization of observations only within a nearby window, thus reducing computational complexity and storage needs. Online lag adjustment is essential for tackling turbulent systems, where temporal autocorrelation varies significantly over time due to intermittency, extreme events, and nonlinearity. Based on the uncertainty reduction in the estimated state, an information criterion is developed to systematically determine the adaptive lag. Notably, the mathematical structure of these systems facilitates the use of closed analytic formulae to calculate the online smoother and adaptive lag, avoiding empirical tunings as in ensemble-based DA methods. The adaptive online smoother is applied to studying three important scientific problems. First, it helps detect online causal relationships between state variables. Second, the advantage of reduced computational storage expenditure is illustrated via Lagrangian DA, a high-dimensional nonlinear problem. Finally, the adaptive smoother advances online parameter estimation with partial observations, emphasizing the role of the observed extreme events in accelerating convergence.
comment: Latest revision. 46 pages (Main Text pp. 1--28; Appendix pp. 29--40), 9 figures (7 in Main Text, 2 in Appendix). Currently under review in Journal of Nonlinear Science (Springer Nature). Code available upon request. For further details visit https://mariosandreou.short.gy/OnlineSmootherCGNS
Robotics
Enabling High-Curvature Navigation in Eversion Robots through Buckle-Inducing Constrictive Bands
Tip-growing eversion robots are renowned for their ability to access remote spaces through narrow passages. However, achieving reliable navigation remains a significant challenge. Existing solutions often rely on artificial muscles integrated into the robot body or active tip-steering mechanisms. While effective, these additions introduce structural complexity and compromise the defining advantages of eversion robots: their inherent softness and compliance. In this paper, we propose a passive approach to reduce bending stiffness by purposefully introducing buckling points along the robot's outer wall. We achieve this by integrating inextensible diameter-reducing circumferential bands at regular intervals along the robot body facilitating forward motion through tortuous, obstacle cluttered paths. Rather than relying on active steering, our approach leverages the robot's natural interaction with the environment, allowing for smooth, compliant navigation. We present a Cosserat rod-based mathematical model to quantify this behavior, capturing the local stiffness reductions caused by the constricting bands and their impact on global bending mechanics. Experimental results demonstrate that these bands reduce the robot's stiffness when bent at the tip by up to 91 percent, enabling consistent traversal of 180 degree bends with a bending radius of as low as 25 mm-notably lower than the 35 mm achievable by standard eversion robots under identical conditions. The feasibility of the proposed method is further demonstrated through a case study in a colon phantom. By significantly improving maneuverability without sacrificing softness or increasing mechanical complexity, this approach expands the applicability of eversion robots in highly curved pathways, whether in relation to pipe inspection or medical procedures such as colonoscopy.
Language-Based Swarm Perception: Decentralized Person Re-Identification via Natural Language Descriptions
We introduce a method for decentralized person re-identification in robot swarms that leverages natural language as the primary representational modality. Unlike traditional approaches that rely on opaque visual embeddings -- high-dimensional feature vectors extracted from images -- the proposed method uses human-readable language to represent observations. Each robot locally detects and describes individuals using a vision-language model (VLM), producing textual descriptions of appearance instead of feature vectors. These descriptions are compared and clustered across the swarm without centralized coordination, allowing robots to collaboratively group observations of the same individual. Each cluster is distilled into a representative description by a language model, providing an interpretable, concise summary of the swarm's collective perception. This approach enables natural-language querying, enhances transparency, and supports explainable swarm behavior. Preliminary experiments demonstrate competitive performance in identity consistency and interpretability compared to embedding-based methods, despite current limitations in text similarity and computational load. Ongoing work explores refined similarity metrics, semantic navigation, and the extension of language-based perception to environmental elements. This work prioritizes decentralized perception and communication, while active navigation remains an open direction for future study.
KILO-EKF: Koopman-Inspired Learned Observations Extended Kalman Filter
We present the Koopman-Inspired Learned Observations Extended Kalman Filter (KILO-EKF), which combines a standard EKF prediction step with a correction step based on a Koopman-inspired measurement model learned from data. By lifting measurements into a feature space where they are linear in the state, KILO-EKF enables flexible modeling of complex or poorly calibrated sensors while retaining the structure and efficiency of recursive filtering. The resulting linear-Gaussian measurement model is learned in closed form from groundtruth training data, without iterative optimization or reliance on an explicit parametric sensor model. At inference, KILO-EKF performs a standard EKF update using Jacobians obtained via the learned lifting. We validate the approach on a real-world quadrotor localization task using an IMU, ultra-wideband (UWB) sensors, and a downward-facing laser. We compare against multiple EKF baselines with varying levels of sensor calibration. KILO-EKF achieves better accuracy and consistency compared to data-calibrated baselines, and significantly outperforms EKFs that rely on imperfect geometric models, while maintaining real-time inference and fast training. These results demonstrate the effectiveness of Koopman-inspired measurement learning as a scalable alternative to traditional model-based calibration.
comment: Submitted to RA-L. 9 pages, 9 figures, 1 table. Note: version submitted to RA-L did not include the Appendix section present in this arXiv version
ReWorld: Multi-Dimensional Reward Modeling for Embodied World Models
Recently, video-based world models that learn to simulate the dynamics have gained increasing attention in robot learning. However, current approaches primarily emphasize visual generative quality while overlooking physical fidelity, dynamic consistency, and task logic, especially for contact-rich manipulation tasks, which limits their applicability to downstream tasks. To this end, we introduce ReWorld, a framework aimed to employ reinforcement learning to align the video-based embodied world models with physical realism, task completion capability, embodiment plausibility and visual quality. Specifically, we first construct a large-scale (~235K) video preference dataset and employ it to train a hierarchical reward model designed to capture multi-dimensional reward consistent with human preferences. We further propose a practical alignment algorithm that post-trains flow-based world models using this reward through a computationally efficient PPO-style algorithm. Comprehensive experiments and theoretical analysis demonstrate that ReWorld significantly improves the physical fidelity, logical coherence, embodiment and visual quality of generated rollouts, outperforming previous methods.
Learning Diverse Skills for Behavior Models with Mixture of Experts
Imitation learning has demonstrated strong performance in robotic manipulation by learning from large-scale human demonstrations. While existing models excel at single-task learning, it is observed in practical applications that their performance degrades in the multi-task setting, where interference across tasks leads to an averaging effect. To address this issue, we propose to learn diverse skills for behavior models with Mixture of Experts, referred to as Di-BM. Di-BM associates each expert with a distinct observation distribution, enabling experts to specialize in sub-regions of the observation space. Specifically, we employ energy-based models to represent expert-specific observation distributions and jointly train them alongside the corresponding action models. Our approach is plug-and-play and can be seamlessly integrated into standard imitation learning methods. Extensive experiments on multiple real-world robotic manipulation tasks demonstrate that Di-BM significantly outperforms state-of-the-art baselines. Moreover, fine-tuning the pretrained Di-BM on novel tasks exhibits superior data efficiency and the reusable of expert-learned knowledge. Code is available at https://github.com/robotnav-bot/Di-BM.
VR$^2$: A Co-Located Dual-Headset Platform for Touch-Enabled Human-Robot Interaction Research
Touch-rich human-robot interaction (HRI) is difficult to study: building and programming physical robots is costly and slow, while VR-based robot prototypes often remove physical contact or break the tight coupling between an agent's body and the user's felt touch. We present VR2VR, a co-located dual VR-headset platform for HRI research in which a participant and a hidden operator share the same physical space while experiencing different virtual embodiments. The participant sees an expressive virtual robot that interacts face-to-face in a shared virtual environment. In real time, the robot's upper-body gestures, head and gaze behaviors, and facial expressions are mapped from the operator's tracked motion and face signals. Because the operator is physically co-present and calibrated into the same coordinate frame, the operator can also physically touch the participant, enabling the participant to perceive robot touch aligned with the robot's hands; finger and hand motion are mapped to the robot using inverse kinematics to support precise contact. Beyond faithful motion retargeting for limb teleoperation, our VR2VR system supports experimental control by retargeting or selectively enabling nonverbal channels (e.g., head only vs. head+eyes vs. head+eyes+facial expressions) while keeping physical interaction constant. We detail the system design, calibration workflow, and safety considerations, and demonstrate the platform through a touch-based Wizard-of-Oz HRI study, illustrating how VR2VR lowers barriers for rapidly prototyping and rigorously evaluating embodied, touch-centric robot behaviors.
comment: 7 pages, 4 figures
R-VoxelMap: Accurate Voxel Mapping with Recursive Plane Fitting for Online LiDAR Odometry
This paper proposes R-VoxelMap, a novel voxel mapping method that constructs accurate voxel maps using a geometry-driven recursive plane fitting strategy to enhance the localization accuracy of online LiDAR odometry. VoxelMap and its variants typically fit and check planes using all points in a voxel, which may lead to plane parameter deviation caused by outliers, over segmentation of large planes, and incorrect merging across different physical planes. To address these issues, R-VoxelMap utilizes a geometry-driven recursive construction strategy based on an outlier detect-and-reuse pipeline. Specifically, for each voxel, accurate planes are first fitted while separating outliers using random sample consensus (RANSAC). The remaining outliers are then propagated to deeper octree levels for recursive processing, ensuring a detailed representation of the environment. In addition, a point distribution-based validity check algorithm is devised to prevent erroneous plane merging. Extensive experiments on diverse open-source LiDAR(-inertial) simultaneous localization and mapping (SLAM) datasets validate that our method achieves higher accuracy than other state-of-the-art approaches, with comparable efficiency and memory usage. Code will be available on GitHub.
CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology
In this paper, the CD-TWINSAFE is introduced, a V2I-based digital twin for Autonomous Vehicles. The proposed architecture is composed of two stacks running simultaneously, an on-board driving stack that includes a stereo camera for scene understanding, and a digital twin stack that runs an Unreal Engine 5 replica of the scene viewed by the camera as well as returning safety alerts to the cockpit. The on-board stack is implemented on the vehicle side including 2 main autonomous modules; localization and perception. The position and orientation of the ego vehicle are obtained using on-board sensors. Furthermore, the perception module is responsible for processing 20-fps images from stereo camera and understands the scene through two complementary pipelines. The pipeline are working on object detection and feature extraction including object velocity, yaw and the safety metrics time-to-collision and time-headway. The collected data form the driving stack are sent to the infrastructure side through the ROS-enabled architecture in the form of custom ROS2 messages and sent over UDP links that ride a 4G modem for V2I communication. The environment is monitored via the digital twin through the shared messages which update the information of the spawned ego vehicle and detected objects based on the real-time localization and perception data. Several tests with different driving scenarios to confirm the validity and real-time response of the proposed architecture.
User-to-Vehicle Interaction in Smart Mobility: The GO-DRiVeS Autonomous Ride-Sharing Application
This paper introduces the GO-DRiVeS application, an on demand ride sharing and requesting mobile application tailored specifically to save long walks and challenges which are time consuming and tiring especially during hot days or when carrying heavy items, faced by university students and staff. The GO-DRiVeS application was developed following the Agile methodology for its flexibility. In addition to, using the mobile application system architecture and client-server architecture. GO-DRiVeS was implemented using React Native (Expo) for the frontend, Node.js and Express for the backend, and MongoDB as the database; based on a detailed analyses to the existing transportation application, comparing their frameworks and identifying their essential functionalities. GO-DRiVeS supports core features like user registration, ride requesting and real-time tracking.In addition to handling multiple requests at the same time in a first come first serve manner. The application was developed based on these features, and the results were conducted in the form of multiple experiments that demonstrated stable behavior in handling the requests, as presented in the Methodology and Results chapters.
From Prompts to Pavement: LMMs-based Agentic Behavior-Tree Generation Framework for Autonomous Vehicles
Autonomous vehicles (AVs) require adaptive behavior planners to navigate unpredictable, real-world environments safely. Traditional behavior trees (BTs) offer structured decision logic but are inherently static and demand labor-intensive manual tuning, limiting their applicability at SAE Level 5 autonomy. This paper presents an agentic framework that leverages large language models (LLMs) and multi-modal vision models (LVMs) to generate and adapt BTs on the fly. A specialized Descriptor agent applies chain-of-symbols prompting to assess scene criticality, a Planner agent constructs high-level sub-goals via in-context learning, and a Generator agent synthesizes executable BT sub-trees in XML format. Integrated into a CARLA+Nav2 simulation, our system triggers only upon baseline BT failure, demonstrating successful navigation around unexpected obstacles (e.g., street blockage) with no human intervention. Compared to a static BT baseline, this approach is a proof-of-concept that extends to diverse driving scenarios.
From Shallow Waters to Mariana Trench: A Survey of Bio-inspired Underwater Soft Robots
Sample Exploring the ocean environment holds profound significance in areas such as resource exploration and ecological protection. Underwater robots struggle with extreme water pressure and often cause noise and damage to the underwater ecosystem, while bio-inspired soft robots draw inspiration from aquatic creatures to address these challenges. These bio-inspired approaches enable robots to withstand high water pressure, minimize drag, operate with efficient manipulation and sensing systems, and interact with the environment in an eco-friendly manner. Consequently, bio-inspired soft robots have emerged as a promising field for ocean exploration. This paper reviews recent advancements in underwater bio-inspired soft robots, analyses their design considerations when facing different desired functions, bio-inspirations, ambient pressure, temperature, light, and biodiversity , and finally explores the progression from bio-inspired principles to practical applications in the field and suggests potential directions for developing the next generation of underwater soft robots.
comment: Provisional accepted by Bioinspiration & Biomimetics
OpenNavMap: Structure-Free Topometric Mapping via Large-Scale Collaborative Localization
Scalable and maintainable map representations are fundamental to enabling large-scale visual navigation and facilitating the deployment of robots in real-world environments. While collaborative localization across multi-session mapping enhances efficiency, traditional structure-based methods struggle with high maintenance costs and fail in feature-less environments or under significant viewpoint changes typical of crowd-sourced data. To address this, we propose OPENNAVMAP, a lightweight, structure-free topometric system leveraging 3D geometric foundation models for on-demand reconstruction. Our method unifies dynamic programming-based sequence matching, geometric verification, and confidence-calibrated optimization to robust, coarse-to-fine submap alignment without requiring pre-built 3D models. Evaluations on the Map-Free benchmark demonstrate superior accuracy over structure-from-motion and regression baselines, achieving an average translation error of 0.62m. Furthermore, the system maintains global consistency across 15km of multi-session data with an absolute trajectory error below 3m for map merging. Finally, we validate practical utility through 12 successful autonomous image-goal navigation tasks on simulated and physical robots. Code and datasets will be publicly available in https://rpl-cs-ucl.github.io/OpenNavMap_page.
comment: 21 pages, 20 figures
An Efficient and Multi-Modal Navigation System with One-Step World Model
Navigation is a fundamental capability for mobile robots. While the current trend is to use learning-based approaches to replace traditional geometry-based methods, existing end-to-end learning-based policies often struggle with 3D spatial reasoning and lack a comprehensive understanding of physical world dynamics. Integrating world models-which predict future observations conditioned on given actions-with iterative optimization planning offers a promising solution due to their capacity for imagination and flexibility. However, current navigation world models, typically built on pure transformer architectures, often rely on multi-step diffusion processes and autoregressive frame-by-frame generation. These mechanisms result in prohibitive computational latency, rendering real-time deployment impossible. To address this bottleneck, we propose a lightweight navigation world model that adopts a one-step generation paradigm and a 3D U-Net backbone equipped with efficient spatial-temporal attention. This design drastically reduces inference latency, enabling high-frequency control while achieving superior predictive performance. We also integrate this model into an optimization-based planning framework utilizing anchor-based initialization to handle multi-modal goal navigation tasks. Extensive closed-loop experiments in both simulation and real-world environments demonstrate our system's superior efficiency and robustness compared to state-of-the-art baselines.
A Comprehensive Review of Bio-Inspired Approaches to Coordination, Communication, and System Architecture in Underwater Swarm Robotics
The increasing complexity of marine operations has intensified the need for intelligent robotic systems to support ocean observation, exploration, and resource management. Underwater swarm robotics offers a promising framework that extends the capabilities of individual autonomous platforms through collective coordination. Inspired by natural systems, such as fish schools and insect colonies, bio-inspired swarm approaches enable distributed decision-making, adaptability, and resilience under challenging marine conditions. Yet research in this field remains fragmented, with limited integration across algorithmic, communication, and hardware design perspectives. This review synthesises bio-inspired coordination mechanisms, communication strategies, and system design considerations for underwater swarm robotics. It examines key marine-specific algorithms, including the Artificial Fish Swarm Algorithm, Whale Optimisation Algorithm, Coral Reef Optimisation, and Marine Predators Algorithm, highlighting their applications in formation control, task allocation, and environmental interaction. The review also analyses communication constraints unique to the underwater domain and emerging acoustic, optical, and hybrid solutions that support cooperative operation. Additionally, it examines hardware and system design advances that enhance system efficiency and scalability. A multi-dimensional classification framework evaluates existing approaches across communication dependency, environmental adaptability, energy efficiency, and swarm scalability. Through this integrated analysis, the review unifies bio-inspired coordination algorithms, communication modalities, and system design approaches. It also identifies converging trends, key challenges, and future research directions for real-world deployment of underwater swarm systems.
comment: Published as part of the Special Issue: Wide Application of Marine Robotic Systems, in the Journal of Marine Science and Engineering
EmoBipedNav: Emotion-aware Social Navigation for Bipedal Robots with Deep Reinforcement Learning
This study presents an emotion-aware navigation framework -- EmoBipedNav -- using deep reinforcement learning (DRL) for bipedal robots walking in socially interactive environments. The inherent locomotion constraints of bipedal robots challenge their safe maneuvering capabilities in dynamic environments. When combined with the intricacies of social environments, including pedestrian interactions and social cues, such as emotions, these challenges become even more pronounced. To address these coupled problems, we propose a two-stage pipeline that considers both bipedal locomotion constraints and complex social environments. Specifically, social navigation scenarios are represented using sequential LiDAR grid maps (LGMs), from which we extract latent features, including collision regions, emotion-related discomfort zones, social interactions, and the spatio-temporal dynamics of evolving environments. The extracted features are directly mapped to the actions of reduced-order models (ROMs) through a DRL architecture. Furthermore, the proposed framework incorporates full-order dynamics and locomotion constraints during training, effectively accounting for tracking errors and restrictions of the locomotion controller while planning the trajectory with ROMs. Comprehensive experiments demonstrate that our approach exceeds both model-based planners and DRL-based baselines. The hardware videos and open-source code are available at https://gatech-lidar.github.io/emobipednav.github.io/.
comment: 13 pages
Knot So Simple: A Minimalistic Environment for Spatial Reasoning
We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
comment: Fix camera ready footer
MimicKit: A Reinforcement Learning Framework for Motion Imitation and Control
MimicKit is an open-source framework for training motion controllers using motion imitation and reinforcement learning. The codebase provides implementations of commonly-used motion-imitation techniques and RL algorithms. This framework is intended to support research and applications in computer graphics and robotics by providing a unified training framework, along with standardized environment, agent, and data structures. The codebase is designed to be modular and easily configurable, enabling convenient modification and extension to new characters and tasks. The open-source codebase is available at: https://github.com/xbpeng/MimicKit.
Learning with pyCub: A Simulation and Exercise Framework for Humanoid Robotics
We present pyCub, an open-source physics-based simulation of the humanoid robot iCub, along with exercises to teach students the basics of humanoid robotics. Compared to existing iCub simulators (iCub SIM, iCub Gazebo), which require C++ code and YARP as middleware, pyCub works without YARP and with Python code. The complete robot with all articulations has been simulated, with two cameras in the eyes and the unique sensitive skin of the iCub comprising 4000 receptors on its body surface. The exercises range from basic control of the robot in velocity, joint, and Cartesian space to more complex tasks like gazing, grasping, or reactive control. The whole framework is written and controlled with Python, thus allowing to be used even by people with small or almost no programming practice. The exercises can be scaled to different difficulty levels. We tested the framework in two runs of a course on humanoid robotics. The simulation, exercises, documentation, Docker images, and example videos are publicly available at https://rustlluk.github.io/pyCub.
comment: Submitted for RiE 2026
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a single wrong instruction can directly endanger human safety. This paper addresses the urgent need to systematically evaluate LLM performance in scenarios where even minor errors are catastrophic. Through a qualitative evaluation of a fire evacuation scenario, we identified critical failure cases in LLM-based decision-making. Based on these, we designed seven tasks for quantitative assessment, categorized into: Complete Information, Incomplete Information, and Safety-Oriented Spatial Reasoning (SOSR). Complete information tasks utilize ASCII maps to minimize interpretation ambiguity and isolate spatial reasoning from visual processing. Incomplete information tasks require models to infer missing context, testing for spatial continuity versus hallucinations. SOSR tasks use natural language to evaluate safe decision-making in life-threatening contexts. We benchmark various LLMs and Vision-Language Models (VLMs) across these tasks. Beyond aggregate performance, we analyze the implications of a 1% failure rate, highlighting how "rare" errors escalate into catastrophic outcomes. Results reveal serious vulnerabilities: several models achieved a 0% success rate in ASCII navigation, while in a simulated fire drill, models instructed robots to move toward hazardous areas instead of emergency exits. Our findings lead to a sobering conclusion: current LLMs are not ready for direct deployment in safety-critical systems. A 99% accuracy rate is dangerously misleading in robotics, as it implies one out of every hundred executions could result in catastrophic harm. We demonstrate that even state-of-the-art models cannot guarantee safety, and absolute reliance on them creates unacceptable risks.
comment: Corrected author order in metadata; manuscript unchanged
Floor Plan-Guided Visual Navigation Incorporating Depth and Directional Cues
Current visual navigation strategies mainly follow an exploration-first and then goal-directed navigation paradigm. This exploratory phase inevitably compromises the overall efficiency of navigation. Recent studies propose leveraging floor plans alongside RGB inputs to guide agents, aiming for rapid navigation without prior exploration or mapping. Key issues persist despite early successes. The modal gap and content misalignment between floor plans and RGB images necessitate an efficient approach to extract the most salient and complementary features from both for reliable navigation. Here, we propose GlocDiff, a novel framework that employs a diffusion-based policy to continuously predict future waypoints. This policy is conditioned on two complementary information streams: (1) local depth cues derived from the current RGB observation, and (2) global directional guidance extracted from the floor plan. The former handles immediate navigation safety by capturing surrounding geometry, while the latter ensures goal-directed efficiency by offering definitive directional cues. Extensive evaluations on the FloNa benchmark demonstrate that GlocDiff achieves superior efficiency and effectiveness. Furthermore, its successful deployment in real-world scenarios underscores its strong potential for broad practical application.
Multiagent Systems
Semantic Fusion: Verifiable Alignment in Decentralized Multi-Agent Systems
We present Semantic Fusion (SF), a formal framework for decentralized semantic coordination in multi-agent systems. SF allows agents to operate over scoped views of shared memory, propose structured updates, and maintain global coherence through local ontology-based validation and refresh without centralized control or explicit message passing. The central theoretical result is a bisimulation theorem showing that each agent's local execution is behaviorally equivalent to its projection of the global semantics, in both deterministic and probabilistic settings. This enables safety, liveness, and temporal properties to be verified locally and soundly lifted to the full system. SF supports agents whose update proposals vary across invocations, including those generated by learned or heuristic components, provided updates pass semantic validation before integration. We establish deterministic and probabilistic guarantees ensuring semantic alignment under asynchronous or degraded communication. To validate the model operationally, we implement a lightweight reference architecture that instantiates its core mechanisms. A 250-agent simulation evaluates these properties across over 11,000 validated updates, demonstrating convergence under probabilistic refresh, bounded communication, and resilience to agent failure. Together, these results show that Semantic Fusion can provide a formal and scalable basis for verifiable autonomy in decentralized systems.
comment: 29 pages
Agentic Artificial Intelligence (AI): Architectures, Taxonomies, and Evaluation of Large Language Model Agents
Artificial Intelligence is moving from models that only generate text to Agentic AI, where systems behave as autonomous entities that can perceive, reason, plan, and act. Large Language Models (LLMs) are no longer used only as passive knowledge engines but as cognitive controllers that combine memory, tool use, and feedback from their environment to pursue extended goals. This shift already supports the automation of complex workflows in software engineering, scientific discovery, and web navigation, yet the variety of emerging designs, from simple single loop agents to hierarchical multi agent systems, makes the landscape hard to navigate. In this paper, we investigate architectures and propose a unified taxonomy that breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration. We use this lens to describe the move from linear reasoning procedures to native inference time reasoning models, and the transition from fixed API calls to open standards like the Model Context Protocol (MCP) and Native Computer Use. We also group the environments in which these agents operate, including digital operating systems, embodied robotics, and other specialized domains, and we review current evaluation practices. Finally, we highlight open challenges, such as hallucination in action, infinite loops, and prompt injection, and outline future research directions toward more robust and reliable autonomous systems.
comment: 28 pages, 4 figures, 5 tables
Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition
Software bugs cost technology providers (e.g., AT&T) billions annually and cause developers to spend roughly 50% of their time on bug resolution. Traditional methods for bug localization often analyze the suspiciousness of code components (e.g., methods, documents) in isolation, overlooking their connections with other components in the codebase. Recent advances in Large Language Models (LLMs) and agentic AI techniques have shown strong potential for code understanding, but still lack causal reasoning during code exploration and struggle to manage growing context effectively, limiting their capability. In this paper, we present a novel agentic technique for bug localization -- CogniGent -- that overcomes the limitations above by leveraging multiple AI agents capable of causal reasoning, call-graph-based root cause analysis and context engineering. It emulates developers-inspired debugging practices (a.k.a., dynamic cognitive debugging) and conducts hypothesis testing to support bug localization. We evaluate CogniGent on a curated dataset of 591 bug reports using three widely adopted performance metrics and compare it against six established baselines from the literature. Experimental results show that our technique consistently outperformed existing traditional and LLM-based techniques, achieving MAP improvements of 23.33-38.57% at the document and method levels. Similar gains were observed in MRR, with increases of 25.14-53.74% at both granularity levels. Statistical significance tests also confirm the superiority of our technique. By addressing the reasoning, dependency, and context limitations, CogniGent advances the state of bug localization, bridging human-like cognition with agentic automation for improved performance.
comment: 13 pages, 7 tables, 5 figures
Generative AI Agents for Controllable and Protected Content Creation NeurIPS 2025
The proliferation of generative AI has transformed creative workflows, yet current systems face critical challenges in controllability and content protection. We propose a novel multi-agent framework that addresses both limitations through specialized agent roles and integrated watermarking mechanisms. Unlike existing multi-agent systems focused solely on generation quality, our approach uniquely combines controllable content synthesis with provenance protection during the generation process itself. The framework orchestrates Director/Planner, Generator, Reviewer, Integration, and Protection agents with human-in-the-loop feedback to ensure alignment with user intent while embedding imperceptible digital watermarks. We formalize the pipeline as a joint optimization objective unifying controllability, semantic alignment, and protection robustness. This work contributes to responsible generative AI by positioning multi-agent architectures as a solution for trustworthy creative workflows with built-in ownership tracking and content traceability.
comment: Accepted GenProCC NeurIPS 2025, Paper # 33
Rethinking the Value of Multi-Agent Workflow: A Strong Single Agent Baseline
Recent advances in LLM-based multi-agent systems (MAS) show that workflows composed of multiple LLM agents with distinct roles, tools, and communication patterns can outperform single-LLM baselines on complex tasks. However, most frameworks are homogeneous, where all agents share the same base LLM and differ only in prompts, tools, and positions in the workflow. This raises the question of whether such workflows can be simulated by a single agent through multi-turn conversations. We investigate this across seven benchmarks spanning coding, mathematics, general question answering, domain-specific reasoning, and real-world planning and tool use. Our results show that a single agent can reach the performance of homogeneous workflows with an efficiency advantage from KV cache reuse, and can even match the performance of an automatically optimized heterogeneous workflow. Building on this finding, we propose \textbf{OneFlow}, an algorithm that automatically tailors workflows for single-agent execution, reducing inference costs compared to existing automatic multi-agent design frameworks without trading off accuracy. These results position the single-LLM implementation of multi-agent workflows as a strong baseline for MAS research. We also note that single-LLM methods cannot capture heterogeneous workflows due to the lack of KV cache sharing across different LLMs, highlighting future opportunities in developing \textit{truly} heterogeneous multi-agent systems.
Benchmarking AI scientists for omics data driven biological discovery
Recent advances in large language models have enabled the emergence of AI scientists that aim to autonomously analyze biological data and assist scientific discovery. Despite rapid progress, it remains unclear to what extent these systems can extract meaningful biological insights from real experimental data. Existing benchmarks either evaluate reasoning in the absence of data or focus on predefined analytical outputs, failing to reflect realistic, data-driven biological research. Here, we introduce BAISBench (Biological AI Scientist Benchmark), a benchmark for evaluating AI scientists on real single-cell transcriptomic datasets. BAISBench comprises two tasks: cell type annotation across 15 expert-labeled datasets, and scientific discovery through 193 multiple-choice questions derived from biological conclusions reported in 41 published single-cell studies. We evaluated several representative AI scientists using BAISBench and, to provide a human performance baseline, invited six graduate-level bioinformaticians to collectively complete the same tasks. The results show that while current AI scientists fall short of fully autonomous biological discovery, they already demonstrate substantial potential in supporting data-driven biological research. These results position BAISBench as a practical benchmark for characterizing the current capabilities and limitations of AI scientists in biological research. We expect BAISBench to serve as a practical evaluation framework for guiding the development of more capable AI scientists and for helping biologists identify AI systems that can effectively support real-world research workflows. The BAISBench can be found at: https://github.com/EperLuo/BAISBench, https://huggingface.co/datasets/EperLuo/BaisBench.
Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models
Empathetic response generation is a crucial task for creating more human-like and supportive conversational agents. However, existing methods face a core trade-off between the analytical depth of specialized models and the generative fluency of Large Language Models (LLMs). To address this, we propose TRACE, Task-decomposed Reasoning for Affective Communication and Empathy, a novel framework that models empathy as a structured cognitive process by decomposing the task into a pipeline for analysis and synthesis. By building a comprehensive understanding before generation, TRACE unites deep analysis with expressive generation. Experimental results show that our framework significantly outperforms strong baselines in both automatic and LLM-based evaluations, confirming that our structured decomposition is a promising paradigm for creating more capable and interpretable empathetic agents. Our code is available at https://anonymous.4open.science/r/TRACE-18EF/README.md.
Systems and Control (EESS)
Resilient Interval Observer-Based Control for Cooperative Adaptive Cruise Control under FDI Attack
Connectivity in connected and autonomous vehicles (CAVs) introduces vulnerability to cyber threats such as false data injection (FDI) attacks, which can compromise system reliability and safety. To ensure resilience, this paper proposes a control framework combining a nonlinear controller with an interval observer for robust state estimation under measurement noise. The observer bounds leader's states, while a neural network-based estimator estimates the unknown FDI attacks in real time. These estimates are then used to mitigate FDI attack effects maintaining safe inter-vehicle spacing. The proposed approach leverages an idea of interval observer-based estimation and merges model-based and learning-based methods to achieve accurate estimations and real-time performance. MATLAB/Simulink results confirm resilient tracking, precise FDI attack estimation, and robustness to noise, demonstrating potential for real-world CACC applications under cyberattacks, disturbance, and bounded measurement noise.
Allocating Corrective Control to Mitigate Multi-agent Safety Violations Under Private Preferences
We propose a novel framework that computes the corrective control efforts to ensure joint safety in multi-agent dynamical systems. This framework efficiently distributes the required corrective effort without revealing individual agents' private preferences. Our framework integrates high-order control barrier functions (HOCBFs), which enforce safety constraints with formal guarantees of safety for complex dynamical systems, with a privacy-preserving resource allocation mechanism based on the progressive second price (PSP) auction. When a joint safety constraint is violated, agents iteratively bid on new corrective efforts via 'avoidance credits' rather than explicitly solving for feasible corrective efforts that remove the safety violation. The resulting correction, determined via a second price payment rule, coincides with the socially optimal safe distribution of corrective actions. Critically, the bidding process achieves this optimal allocation efficiently and without revealing private preferences of individual agents. We demonstrate this method through multi-robot hardware experiments on the Robotarium platform.
comment: 8 pages, 3 figures, Submitted to IEEE Robotics and Automation Letters (RA-L)
HERMES: A Unified Open-Source Framework for Realtime Multimodal Physiological Sensing, Edge AI, and Intervention in Closed-Loop Smart Healthcare Applications
Intelligent assistive technologies are increasingly recognized as critical daily-use enablers for people with disabilities and age-related functional decline. Longitudinal studies, curation of quality datasets, live monitoring in activities of daily living, and intelligent intervention devices, share the largely unsolved need in reliable high-throughput multimodal sensing and processing. Streaming large heterogeneous data from distributed sensors, historically closed-source environments, and limited prior works on realtime closed-loop AI methodologies, inhibit such applications. To accelerate the emergence of clinical deployments, we deliver HERMES - an open-source high-performance Python framework for continuous multimodal sensing and AI processing at the edge. It enables synchronized data collection, and realtime streaming inference with user PyTorch models, on commodity computing devices. HERMES is applicable to fixed-lab and free-living environments, of distributed commercial and custom sensors. It is the first work to offer a holistic methodology that bridges cross-disciplinary gaps in real-world implementation strategies, and guides downstream AI model development. Its application on the closed-loop intelligent prosthesis use case illustrates the process of suitable AI model development from the generated constraints and trade-offs. Validation on the use case, with 4 synchronized hosts cooperatively capturing 18 wearable and off-body modalities, demonstrates performance and relevance of HERMES to the trajectory of the intelligent healthcare domain.
comment: Submitted to ACM SenSys '26, 12 pages (excl. references), 9 figures
Automated Angular Received-Power Characterization of Embedded mmWave Transmitters Using Geometry-Calibrated Spatial Sampling
This paper presents an automated measurement methodology for angular received-power characterization of embedded millimeter-wave transmitters using geometry-calibrated spatial sampling. Characterization of integrated mmWave transmitters remains challenging due to limited angular coverage and alignment variability in conventional probe-station techniques, as well as the impracticality of anechoic-chamber testing for platform-mounted active modules. To address these challenges, we introduce RAPTAR, an autonomous measurement system for angular received-power acquisition under realistic installation constraints. A collaborative robot executes geometry-calibrated, collision-aware hemispherical trajectories while carrying a calibrated receive probe, enabling controlled and repeatable spatial positioning around a fixed device under test. A spectrum-analyzer-based receiver chain acquires amplitude-only received power as a function of angle and distance following quasi-static pose stabilization. The proposed framework enables repeatable angular received-power mapping and power-domain comparison against idealized free-space references derived from full-wave simulation. Experimental results for a 60-GHz radar module demonstrate a mean absolute received-power error below 2 dB relative to simulation-derived references and a 36.5 % reduction in error compared to manual probe-station measurements, attributed primarily to reduced alignment variability and consistent spatial sampling. The proposed method eliminates the need for coherent field measurements and near-field transformations, enabling practical power-domain characterization of embedded mmWave modules. It is well suited for angular validation in real-world platforms where conventional anechoic measurements are impractical.
An Experimental Comparison of Sliding Mode and Immersion and Invariance Adaptive Controllers forPosition-feedback Tracking of a Simple Mechanical System with Friction
The purpose of this paper is to illustrate, in an experimental facility consisting of a simple pendular device, the performance of a sliding mode adaptive position-feedback tracking controller of mechanical systems with friction reported in the literature. To put this experimental evidence in perspective, we compare the performance of the sliding mode scheme with the one obtained by an adaptive controller designed following the well-known immersion and invariance technique.
Time-Continuous Modeling for Temporal Affective Pattern Recognition in LLMs
This paper introduces a dataset and conceptual framework for LLMs to mimic real world emotional dynamics through time and in-context learning leveraging physics-informed neural network, opening a possibility for interpretable dialogue modeling.
Worst-case Nonlinear Regression with Error Bounds
This paper proposes an active-learning approach to worst-case nonlinear regression with deterministic error guarantees. Given a known nonlinear function defined over a compact set, we compute a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To address the nonsmooth nature of the resulting minimax problem, we introduce a smooth approximation of the $L_\infty$-type loss that enables efficient gradient-based training. We iteratively enrich the training set by actively learning points of largest approximation error through global optimization. The resulting models admit certified worst-case error bounds, either constant or input-dependent, over the entire input domain. The approach is demonstrated through approximations of nonlinear functions and nonconvex sets, as well as through the derivation of uncertain models of more complex nonlinear dynamics within a given model class, and the approximation of explicit model predictive control laws.
comment: 23 pages, 8 figures
Analyzing the Impact of EV Battery Charging on the Distribution Network
Many countries are rapidly adopting electric vehicles (EVs) due to their meager running cost and environment-friendly nature. EVs are likely to dominate the internal combustion (IC) engine cars entirely over the next few years. With the rise in popularity of EVs, adverse effects of EV charging loads on the grid system have been observed. Since the distribution system (DS) does not cope with the high overloading capacity, the negative impact of EV charging load on the distribution network (DN) cannot be neglected. A high level of EV penetration with uncoordinated charging is the primary cause of voltage instability, increased peak load demand, and reliability issues of the DN. In this paper, a detailed overview of all the notable impacts of EV charging on voltage profile, power quality, and DS performance is discussed. This work also reviews the different topologies of EV chargers and the issues introduced by power converters on the utility grid. Finally, the strategies for improving the charging of EVs proposed in the literature to consider the random nature of EVs charging, the management of peak loads, and bidirectional power flow are summarized.
comment: 6 pages, 5 figures
Solvability of The Output Corridor Control Problem by Pulse-Modulated Feedback
The problem of maintaining the output of a positive time-invariant single-input single-output system within a predefined corridor of values is treated. For third-order plants possessing a certain structure, it is proven that the problem is always solvable under stationary conditions by means of pulse-modulated feedback. The obtained result is utilized to assess the feasibility of patient-specific pharmacokinetic-pharmacodynamic models with respect to patient safety. A population of Wiener models capturing the dynamics of a neuromuscular blockade agent is studied to investigate whether or not they can be driven into the desired output corridor by clinically acceptable sequential drug doses (boluses). It is demonstrated that low values of a parameter in the nonlinear pharmacodynamic part lie behind the detected model infeasibility.
The value of hedging against energy storage uncertainties when designing energy parks
Energy storage is needed to match renewable generation to industrial loads in energy parks. However, the future performance of bulk storage technologies is currently highly uncertain. Due to the urgency of decarbonization targets, energy park projects must be designed and begun now. But, as uncertainty in storage performance reduces, a different technology than identified during initial design may turn out cheaper. Enabling flexibility so that design adaptations can be made as better information becomes available would lower the cost of decarbonizing industry. But having this flexibility is itself costly. This raises the question, "Is it worth it?" This study quantifies the benefit of retaining flexibility to adapt energy park designs and optionality over storage technology choice as uncertainty reduces, to determine whether it is economically worthwhile. It applies the Value of Information analysis framework to the sizing of wind, solar, and storage in an illustrative energy park model based on a real-world proposal near Rotterdam, considering uncertainty in storage efficiency, lifetime, and capital cost. Updating asset sizings after storage uncertainty reduced is found to reduce total costs by 18% on average. Having the option to switch storage technology choice as well reduces costs by a further 13%, which is substantially greater than the cost of providing storage optionality. Using two storage technologies in the energy park reduces costs by 14%, and in this case storage optionality is not worthwhile. These results are robust to the level of uncertainty reduction in storage performance, and the risk aversion of the system designer.
comment: 35 pages, 10 figures, 10 tables
A Note on Comparator-Overdrive-Delay Conditioning for Current-Mode Control
Comparator-overdrive-delay conditioning is a new control conditioning approach for high-frequency current-mode control. No existing literature rigorously studies the effect of the comparator overdrive delay on the current-mode control. The results in this paper provide insights into the mechanism of comparator-overdrive-delay conditioning.
comment: Add extra case studies
Dynamic Hybrid Modeling: Incremental Identification and Model Predictive Control
Mathematical models are crucial for optimizing and controlling chemical processes, yet they often face significant limitations in terms of computational time, algorithm complexity, and development costs. Hybrid models, which combine mechanistic models with data-driven models (i.e. models derived via the application of machine learning to experimental data), have emerged as a promising solution to these challenges. However, the identification of dynamic hybrid models remains difficult due to the need to integrate data-driven models within mechanistic model structures. We present an incremental identification approach for dynamic hybrid models that decouples the mechanistic and data-driven components to overcome computational and conceptual difficulties. Our methodology comprises four key steps: (1) regularized dynamic parameter estimation to determine optimal time profiles for flux variables, (2) correlation analysis to evaluate relationships between variables, (3) data-driven model identification using advanced machine learning techniques, and (4) hybrid model integration to combine the mechanistic and data-driven components. This approach facilitates early evaluation of model structure suitability, accelerates the development of hybrid models, and allows for independent identification of data-driven components. Three case studies are presented to illustrate the robustness, reliability, and efficiency of our incremental approach in handling complex systems and scenarios with limited data.
comment: 21 pages, 11 Figures
The Battle of the Water Futures
The highly anticipated 'Battle of the Water Networks' is back with a new challenge for the water community. This competition will be hosted at the 4th International Joint Conference on Water Distribution Systems Analysis and Computing and Control in the Water Industry (WDSA/CCWI 2026), taking place in Paphos, Cyprus, from May 18-21, 2026. This competition embodies the core mission of Water-Futures and the theme for WDSA/CCWI 2026: "Designing the next generation of urban water (and wastewater) systems." The objective is to design and operate a water distribution system over a long-term horizon under deep uncertainty, with interventions applied in stages. For the first time, this challenge features a staged-design approach, unobservable and unknown uncertainties, and incorporates elements of policymaking and artificial intelligence. The solutions will be assessed using a transparent and inspectable open-source evaluation framework.
Advanced safety filter based on SOS Control Barrier and Lyapunov Functions
This paper presents a novel safety filter framework that ensures both safety and the preservation of the legacy control action within a nominal region. This modular design allows the safety filter to be integrated into the control hierarchy without compromising the performance of the existing legacy controller during nominal operation. For a control-affine system, this is accomplished by formulating multiple Control Barrier Functions (CBFs) and Control Lyapunov-like Functions (CLFs) conditions, alongside a forward invariance condition for the legacy controller, as sum-of-squares constraints. Additionally, the state-dependent inequality constraints of the resulting Quadratic Program (QP) -- encoding the CBF and CLF conditions -- are designed to remain inactive within the nominal region, ensuring preservation of the legacy control action and performance. Our safety filter design is also the first to include quadratic input constraints, and does not need an explicit specification of the attractor, as it is implicitly defined by the legacy controller. To avoid the chattering effect and guarantee the uniqueness and Lipschitz continuity of solutions, the state-dependent inequality constraints of the Quadratic Program are selected to be regular. Finally, we demonstrate the method in a detailed case study involving the control of a three-phase ac/dc power converter.
Barrier Integral Control for Global Asymptotic Tracking of Uncertain Nonlinear Systems under State and Input Constraints
This paper addresses the problem of asymptotic tracking for high-order control-affine MIMO nonlinear systems with unknown dynamic terms subject to input and transient state constraints. We introduce Barrier Integral Control (BRIC), a novel algorithm designed to confine the system's state within a predefined funnel, ensuring adherence to the transient state constraints, and asymptotically drive it to a given reference trajectory from any initial condition. The algorithm leverages the innovative integration of a reciprocal barrier function and error-integral terms, featuring smooth feedback control. We further develop an extension of the algorithm, entailing continuous feedback, that uses a reference-modification technique to account for the input-saturation constraints. Notably, BRIC operates without relying on any information or approximation schemes for the (unknown) dynamic terms, which, unlike a large class of previous works, are not assumed to be bounded or to comply with globally Lipschitz/growth conditions. Additionally, the system's trajectory and asymptotic performance are decoupled from the uncertain model, control-gain selection, and initial conditions. Finally, comparative simulation studies validate the effectiveness of the proposed algorithm.
AI-Native Open RAN for Non-Terrestrial Networks: An Overview
Non-terrestrial network (NTN) is envisioned as a critical component of Sixth Generation (6G) networks by enabling ubiquitous services and enhancing network resilience. However, the inherent mobility and high-altitude operation of NTN pose significant challenges throughout the development and operations (DevOps) lifecycle. To address these challenges, integrating NTNs with the Open Radio Access Network (ORAN) is a promising approach, since ORAN can offer disaggregation, openness, virtualization, and embedded intelligence. Despite extensive literature on ORAN and NTN, a holistic view of ORAN-based NTN frameworks is still lacking, particularly regarding how ORAN can effectively address the existing challenges of NTN. Furthermore, although artificial intelligence native (AI-Native) capabilities have the potential to enhance intelligence network control and optimization, their practical realization in NTNs has not yet been sufficiently investigated. Therefore, in this paper, we provide a comprehensive and structured overview of AI-Native ORAN for NTN. This paper commences with an in-depth review of the existing literature and subsequently introduces the necessary background about ORAN, NTN, and AI-Native for communication. After analyzing the DevOps challenges for NTN, we propose the orchestrated AI-Native ORAN-based NTN framework and discuss its key technological enablers. Finally, we present the representative use cases and outline the prospective future research directions of this study.
Robotics
Learning Legged MPC with Smooth Neural Surrogates
Deep learning and model predictive control (MPC) can play complementary roles in legged robotics. However, integrating learned models with online planning remains challenging. When dynamics are learned with neural networks, three key difficulties arise: (1) stiff transitions from contact events may be inherited from the data; (2) additional non-physical local nonsmoothness can occur; and (3) training datasets can induce non-Gaussian model errors due to rapid state changes. We address (1) and (2) by introducing the smooth neural surrogate, a neural network with tunable smoothness designed to provide informative predictions and derivatives for trajectory optimization through contact. To address (3), we train these models using a heavy-tailed likelihood that better matches the empirical error distributions observed in legged-robot dynamics. Together, these design choices substantially improve the reliability, scalability, and generalizability of learned legged MPC. Across zero-shot locomotion tasks of increasing difficulty, smooth neural surrogates with robust learning yield consistent reductions in cumulative cost on simple, well-conditioned behaviors (typically 10-50%), while providing substantially larger gains in regimes where standard neural dynamics often fail outright. In these regimes, smoothing enables reliable execution (from 0/5 to 5/5 success) and produces about 2-50x lower cumulative cost, reflecting orders-of-magnitude absolute improvements in robustness rather than incremental performance gains.
Neural Process-Based Reactive Controller for Autonomous Racing
Attention-based neural architectures have become central to state-of-the-art methods in real-time nonlinear control. As these data-driven models continue to be integrated into increasingly safety-critical domains, ensuring statistically grounded and provably safe decision-making becomes essential. This paper introduces a novel reactive control framework for gap-based navigation using the Attentive Neural Process (AttNP) and a physics-informed extension, the PI-AttNP. Both models are evaluated in a simulated F1TENTH-style Ackermann steering racecar environment, chosen as a fast-paced proxy for safety-critical autonomous driving scenarios. The PI-AttNP augments the AttNP architecture with approximate model-based priors to inject physical inductive bias, enabling faster convergence and improved prediction accuracy suited for real-time control. To further ensure safety, we derive and implement a control barrier function (CBF)-based filtering mechanism that analytically enforces collision avoidance constraints. This CBF formulation is fully compatible with the learned AttNP controller and generalizes across a wide range of racing scenarios, providing a lightweight and certifiable safety layer. Our results demonstrate competitive closed-loop performance while ensuring real-time constraint satisfaction.
comment: 6 pages, 4 figures
Listen, Look, Drive: Coupling Audio Instructions for User-aware VLA-based Autonomous Driving
Vision Language Action (VLA) models promise an open-vocabulary interface that can translate perceptual ambiguity into semantically grounded driving decisions, yet they still treat language as a static prior fixed at inference time. As a result, the model must infer continuously shifting objectives from pixels alone, yielding delayed or overly conservative maneuvers. We argue that effective VLAs for autonomous driving need an online channel in which users can influence driving with specific intentions. To this end, we present EchoVLA, a user-aware VLA that couples camera streams with in situ audio instructions. We augment the nuScenes dataset with temporally aligned, intent-specific speech commands generated by converting ego-motion descriptions into synthetic audios. Further, we compose emotional speech-trajectory pairs into a multimodal Chain-of-Thought (CoT) for fine-tuning a Multimodal Large Model (MLM) based on Qwen2.5-Omni. Specifically, we synthesize the audio-augmented dataset with different emotion types paired with corresponding driving behaviors, leveraging the emotional cues embedded in tone, pitch, and speech tempo to reflect varying user states, such as urgent or hesitant intentions, thus enabling our EchoVLA to interpret not only the semantic content but also the emotional context of audio commands for more nuanced and emotionally adaptive driving behavior. In open-loop benchmarks, our approach reduces the average L2 error by $59.4\%$ and the collision rate by $74.4\%$ compared to the baseline of vision-only perception. More experiments on nuScenes dataset validate that EchoVLA not only steers the trajectory through audio instructions, but also modulates driving behavior in response to the emotions detected in the user's speech.
comment: Accepted by IV
Active Semantic Mapping of Horticultural Environments Using Gaussian Splatting
Semantic reconstruction of agricultural scenes plays a vital role in tasks such as phenotyping and yield estimation. However, traditional approaches that rely on manual scanning or fixed camera setups remain a major bottleneck in this process. In this work, we propose an active 3D reconstruction framework for horticultural environments using a mobile manipulator. The proposed system integrates the classical Octomap representation with 3D Gaussian Splatting to enable accurate and efficient target-aware mapping. While a low-resolution Octomap provides probabilistic occupancy information for informative viewpoint selection and collision-free planning, 3D Gaussian Splatting leverages geometric, photometric, and semantic information to optimize a set of 3D Gaussians for high-fidelity scene reconstruction. We further introduce simple yet effective strategies to enhance robustness against segmentation noise and reduce memory consumption. Simulation experiments demonstrate that our method outperforms purely occupancy-based approaches in both runtime efficiency and reconstruction accuracy, enabling precise fruit counting and volume estimation. Compared to a 0.01m-resolution Octomap, our approach achieves an improvement of 6.6% in fruit-level F1 score under noise-free conditions, and up to 28.6% under segmentation noise. Additionally, it achieves a 50% reduction in runtime, highlighting its potential for scalable, real-time semantic reconstruction in agricultural robotics.
comment: 9 pages, 4 figures
BiKC+: Bimanual Hierarchical Imitation with Keypose-Conditioned Coordination-Aware Consistency Policies
Robots are essential in industrial manufacturing due to their reliability and efficiency. They excel in performing simple and repetitive unimanual tasks but still face challenges with bimanual manipulation. This difficulty arises from the complexities of coordinating dual arms and handling multi-stage processes. Recent integration of generative models into imitation learning (IL) has made progress in tackling specific challenges. However, few approaches explicitly consider the multi-stage nature of bimanual tasks while also emphasizing the importance of inference speed. In multi-stage tasks, failures or delays at any stage can cascade over time, impacting the success and efficiency of subsequent sub-stages and ultimately hindering overall task performance. In this paper, we propose a novel keypose-conditioned coordination-aware consistency policy tailored for bimanual manipulation. Our framework instantiates hierarchical imitation learning with a high-level keypose predictor and a low-level trajectory generator. The predicted keyposes serve as sub-goals for trajectory generation, indicating targets for individual sub-stages. The trajectory generator is formulated as a consistency model, generating action sequences based on historical observations and predicted keyposes in a single inference step. In particular, we devise an innovative approach for identifying bimanual keyposes, considering both robot-centric action features and task-centric operation styles. Simulation and real-world experiments illustrate that our approach significantly outperforms baseline methods in terms of success rates and operational efficiency. Implementation codes can be found at https://github.com/JoanaHXU/BiKC-plus.
comment: Accepted by IEEE Transactions on Automation Science and Engineering 2025
Domain-specific Hardware Acceleration for Model Predictive Path Integral Control
Accurately controlling a robotic system in real time is a challenging problem. To address this, the robotics community has adopted various algorithms, such as Model Predictive Control (MPC) and Model Predictive Path Integral (MPPI) control. The first is difficult to implement on non-linear systems such as unmanned aerial vehicles, whilst the second requires a heavy computational load. GPUs have been successfully used to accelerate MPPI implementations; however, their power consumption is often excessive for autonomous or unmanned targets, especially when battery-powered. On the other hand, custom designs, often implemented on FPGAs, have been proposed to accelerate robotic algorithms while consuming considerably less energy than their GPU (or CPU) implementation. However, no MPPI custom accelerator has been proposed so far. In this work, we present a hardware accelerator for MPPI control and simulate its execution. Results show that the MPPI custom accelerator allows more accurate trajectories than GPU-based MPPI implementations.
comment: 7 pages, 11 figures
Reframing Conversational Design in HRI: Deliberate Design with AI Scaffolds
Large language models (LLMs) have enabled conversational robots to move beyond constrained dialogue toward free-form interaction. However, without context-specific adaptation, generic LLM outputs can be ineffective or inappropriate. This adaptation is often attempted through prompt engineering, which is non-intuitive and tedious. Moreover, predominant design practice in HRI relies on impression-based, trial-and-error refinement without structured methods or tools, making the process inefficient and inconsistent. To address this, we present the AI-Aided Conversation Engine (ACE), a system that supports the deliberate design of human-robot conversations. ACE contributes three key innovations: 1) an LLM-powered voice agent that scaffolds initial prompt creation to overcome the "blank page problem," 2) an annotation interface that enables the collection of granular and grounded feedback on conversational transcripts, and 3) using LLMs to translate user feedback into prompt refinements. We evaluated ACE through two user studies, examining both designs' experience and end users' interactions with robots designed using ACE. Results show that ACE facilitates the creation of robot behavior prompts with greater clarity and specificity, and that the prompts generated with ACE lead to higher-quality human-robot conversational interactions.
Model selection and real-time skill assessment for suturing in robotic surgery
Automated feedback systems have the potential to provide objective skill assessment for training and evaluation in robot-assisted surgery. In this study, we examine methods to achieve real-time prediction of surgical skill level in real-time based on Objective Structured Assessment of Technical Skills (OSATS) scores. Using data acquired from the da Vinci Surgical System, we carry out three main analyses, focusing on model design, their real-time performance, and their skill-level-based cross-validation training. For the model design, we evaluate the effectiveness of multimodal deep learning models for predicting surgical skill levels using synchronized kinematic and vision data. Our models include separate unimodal baselines and fusion architectures that integrate features from both modalities and are evaluated using mean Spearman's correlation coefficients, demonstrating that the fusion model consistently outperforms unimodal models for real-time predictions. For the real-time performance, we observe the prediction's trend over time and highlight correlation with the surgeon's gestures. For the skill-level-based cross-validation, we separately trained models on surgeons with different skill levels, which showed that high-skill demonstrations allow for better performance than those trained on low-skilled ones and generalize well to similarly skilled participants. Our findings show that multimodal learning allows more stable fine-grained evaluation of surgical performance and highlights the value of expert-level training data for model generalization.
Visual-Language-Guided Task Planning for Horticultural Robots
Crop monitoring is essential for precision agriculture, but current systems lack high-level reasoning. We introduce a novel, modular framework that uses a Visual Language Model (VLM) to guide robotic task planning, interleaving input queries with action primitives. We contribute a comprehensive benchmark for short- and long-horizon crop monitoring tasks in monoculture and polyculture environments. Our main results show that VLMs perform robustly for short-horizon tasks (comparable to human success), but exhibit significant performance degradation in challenging long-horizon tasks. Critically, the system fails when relying on noisy semantic maps, demonstrating a key limitation in current VLM context grounding for sustained robotic operations. This work offers a deployable framework and critical insights into VLM capabilities and shortcomings for complex agricultural robotics.
comment: 14 pages, 4 figures
AI for Green Spaces: Leveraging Autonomous Navigation and Computer Vision for Park Litter Removal
There are 50 billion pieces of litter in the U.S. alone. Grass fields contribute to this problem because picnickers tend to leave trash on the field. We propose building a robot that can autonomously navigate, identify, and pick up trash in parks. To autonomously navigate the park, we used a Spanning Tree Coverage (STC) algorithm to generate a coverage path the robot could follow. To navigate this path, we successfully used Real-Time Kinematic (RTK) GPS, which provides a centimeter-level reading every second. For computer vision, we utilized the ResNet50 Convolutional Neural Network (CNN), which detects trash with 94.52% accuracy. For trash pickup, we tested multiple design concepts. We select a new pickup mechanism that specifically targets the trash we encounter on the field. Our solution achieved an overall success rate of 80%, demonstrating that autonomous trash pickup robots on grass fields are a viable solution.
comment: Published in IEEE/SICE SII 2025
Beyond Task and Motion Planning: Hierarchical Robot Planning with General-Purpose Skills
Task and motion planning is a well-established approach for solving long-horizon robot planning problems. However, traditional methods assume that each task-level robot action, or skill, can be reduced to kinematic motion planning. We address the challenge of combining motion planning with closed-loop motor controllers that go beyond mere kinematic considerations. We propose a novel framework that integrates these policies into motion planning using Composable Interaction Primitives (CIPs), enabling the use of diverse, non-composable pre-learned skills in hierarchical robot planning. We validate our Task and Skill Planning (TASP) approach through real-world experiments on a bimanual manipulator and a mobile manipulator, demonstrating that CIPs allow diverse robots to combine motion planning with general-purpose skills to solve complex, long-horizon tasks.
Generation of Real-time Robotic Emotional Expressions Learning from Human Demonstration in Mixed Reality
Expressive behaviors in robots are critical for effectively conveying their emotional states during interactions with humans. In this work, we present a framework that autonomously generates realistic and diverse robotic emotional expressions based on expert human demonstrations captured in Mixed Reality (MR). Our system enables experts to teleoperate a virtual robot from a first-person perspective, capturing their facial expressions, head movements, and upper-body gestures, and mapping these behaviors onto corresponding robotic components including eyes, ears, neck, and arms. Leveraging a flow-matching-based generative process, our model learns to produce coherent and varied behaviors in real-time in response to moving objects, conditioned explicitly on given emotional states. A preliminary test validated the effectiveness of our approach for generating autonomous expressions.
comment: 5
Monotone Subsystem Decomposition for Efficient Multi-Objective Robot Design ICRA
Automating design minimizes errors, accelerates the design process, and reduces cost. However, automating robot design is challenging due to recursive constraints, multiple design objectives, and cross-domain design complexity possibly spanning multiple abstraction layers. Here we look at the problem of component selection, a combinatorial optimization problem in which a designer, given a robot model, must select compatible components from an extensive catalog. The goal is to satisfy high-level task specifications while optimally balancing trade-offs between competing design objectives. In this paper, we extend our previous constraint programming approach to multi-objective design problems and propose the novel technique of monotone subsystem decomposition to efficiently compute a Pareto front of solutions for large-scale problems. We prove that subsystems can be optimized for their Pareto fronts and, under certain conditions, these results can be used to determine a globally optimal Pareto front. Furthermore, subsystems serve as an intuitive design abstraction and can be reused across various design problems. Using an example quadcopter design problem, we compare our method to a linear programming approach and demonstrate our method scales better for large catalogs, solving a multi-objective problem of 10^25 component combinations in seconds. We then expand the original problem and solve a task-oriented, multi-objective design problem to build a fleet of quadcopters to deliver packages. We compute a Pareto front of solutions in seconds where each solution contains an optimal component-level design and an optimal package delivery schedule for each quadcopter.
comment: Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2025
CAHC:A General Conflict-Aware Heuristic Caching Framework for Multi-Agent Path Finding
Multi-Agent Path Finding (MAPF) algorithms, including those for car-like robots and grid-based scenarios, face significant computational challenges due to expensive heuristic calculations. Traditional heuristic caching assumes that the heuristic function depends only on the state, which is incorrect in constraint-based search algorithms (e.g., CBS, MAPF-LNS, MAP2) where constraints from conflict resolution make the search space context-dependent. We propose \textbf{CAHC} (Conflict-Aware Heuristic Caching), a general framework that caches heuristic values based on both state and relevant constraint context, addressing this fundamental limitation. We demonstrate CAHC through a case study on CL-CBS for car-like robots, where we combine conflict-aware caching with an adaptive hybrid heuristic in \textbf{CAR-CHASE} (Car-Like Robot Conflict-Aware Heuristic Adaptive Search Enhancement). Our key innovations are (1) a compact \emph{conflict fingerprint} that efficiently encodes which constraints affect a state's heuristic, (2) a domain-adaptable relevance filter using spatial, temporal, and geometric criteria, and (3) a modular architecture that enables systematic application to diverse MAPF algorithms. Experimental evaluation on 480 CL-CBS benchmark instances demonstrates a geometric mean speedup of 2.46$\times$ while maintaining solution optimality. The optimizations improve success rate from 77.9\% to 84.8\% (+6.9 percentage points), reduce total runtime by 70.1\%, and enable solving 33 additional instances. The framework's general architecture makes it applicable as a reliable optimization technique for MAP2, MAPF-LNS, and other constraint-based MAPF algorithms.
Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?
The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We start by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.
Reflection-Based Task Adaptation for Self-Improving VLA
Pre-trained Vision-Language-Action (VLA) models represent a major leap towards general-purpose robots, yet efficiently adapting them to novel, specific tasks in-situ remains a significant hurdle. While reinforcement learning (RL) is a promising avenue for such adaptation, the process often suffers from low efficiency, hindering rapid task mastery. We introduce Reflective Self-Adaptation, a framework for rapid, autonomous task adaptation without human intervention. Our framework establishes a self-improving loop where the agent learns from its own experience to enhance both strategy and execution. The core of our framework is a dual-pathway architecture that addresses the full adaptation lifecycle. First, a Failure-Driven Reflective RL pathway enables rapid learning by using the VLM's causal reasoning to automatically synthesize a targeted, dense reward function from failure analysis. This provides a focused learning signal that significantly accelerates policy exploration. However, optimizing such proxy rewards introduces a potential risk of "reward hacking," where the agent masters the reward function but fails the actual task. To counteract this, our second pathway, Success-Driven Quality-Guided SFT, grounds the policy in holistic success. It identifies and selectively imitates high-quality successful trajectories, ensuring the agent remains aligned with the ultimate task goal. This pathway is strengthened by a conditional curriculum mechanism to aid initial exploration. We conduct experiments in challenging manipulation tasks. The results demonstrate that our framework achieves faster convergence and higher final success rates compared to representative baselines. Our work presents a robust solution for creating self-improving agents that can efficiently and reliably adapt to new environments.
A Two-Stage Reactive Auction Framework for the Multi-Depot Rural Postman Problem with Dynamic Vehicle Failures
Although unmanned vehicle fleets offer efficiency in transportation, logistics and inspection, their susceptibility to failures poses a significant challenge to mission continuity. We study the Multi-Depot Rural Postman Problem with Rechargeable and Reusable Vehicles (MD-RPP-RRV) with vehicle failures, where unmanned rechargeable vehicles placed at multiple depots with capacity constraints may fail while serving arc-based demands. To address unexpected vehicle breakdowns during operation, we propose a two-stage real-time rescheduling framework. First, a centralized auction quickly generates a feasible rescheduling solution; for this stage, we derive a theoretical additive bound that establishes an analytical guarantee on the worst-case rescheduling penalty. Second, a peer auction refines this baseline through a problem-specific magnetic field router for local schedule repair, utilizing parameters calibrated via sensitivity analysis to ensure controlled computational growth. We benchmark this approach against a simulated annealing metaheuristic to evaluate solution quality and execution speed. Experimental results on 257 diverse failure scenarios demonstrate that the framework achieves an average runtime reduction of over 95\% relative to the metaheuristic baseline, cutting rescheduling times from hours to seconds while maintaining high solution quality. The two-stage framework excels on large-scale instances, surpassing the centralized auction in nearly 80\% of scenarios with an average solution improvement exceeding 12\%. Moreover, it outperforms the simulated annealing mean and best results in 59\% and 28\% of scenarios, respectively, offering the robust speed-quality trade-off required for real-time mission continuity.
Towards Accessible Robot Control: Comparing Kinesthetic Teaching, SpaceMouse Teleoperation, and a Mixed Reality Interface
Teleoperation interfaces are essential tools for enabling human control of robotic systems. Although a wide range of interfaces has been developed, a persistent gap remains between the level of performance humans can achieve through these interfaces and the capabilities afforded by direct human-guided robot control. This gap is further exacerbated when users are inexperienced or unfamiliar with the robotic platform or control interface. In this work, we aim to better characterize this performance gap for non-expert users by comparing two teleoperation approaches, SpaceMouse teleoperation and a Mixed Reality (MR) interface, against kinesthetic teaching as a non-teleoperation baseline. All three approaches were evaluated in a comprehensive user study involving two robotic platforms and six complex manipulation tasks. Quantitative results show that the SpaceMouse and MR interfaces performed comparably, with significant differences in task completion observed for only two tasks, and success rates declining as task complexity increased. Qualitative analysis reflected these trends, highlighting differences in Physical Demand and identifying interface attributes that influence users' ability to perform, learn, and understand. This study quantifies the limitations of current teleoperation methods and incorporates subjective feedback from 25 participants. The results highlight the critical need to design and rigorously evaluate teleoperation systems for non-expert users, particularly in contexts where autonomous robots are deployed in personal or everyday environments, to ensure usability, efficiency, and accessibility.
comment: 32 pages, 12 figures
Multiagent Systems
Robust Verification of Concurrent Stochastic Games
Autonomous systems often operate in multi-agent settings and need to make concurrent, strategic decisions, typically in uncertain environments. Verification and control problems for these systems can be tackled with concurrent stochastic games (CSGs), but this model requires transition probabilities to be precisely specified - an unrealistic requirement in many real-world settings. We introduce *robust CSGs* and their subclass *interval CSGs* (ICSGs), which capture epistemic uncertainty about transition probabilities in CSGs. We propose a novel framework for *robust* verification of these models under worst-case assumptions about transition uncertainty. Specifically, we develop the underlying theoretical foundations and efficient algorithms, for finite- and infinite-horizon objectives in both zero-sum and nonzero-sum settings, the latter based on (social-welfare optimal) Nash equilibria. We build an implementation in the PRISM-games model checker and demonstrate the feasibility of robust verification of ICSGs across a selection of large benchmarks.
comment: Extended version of a paper accepted to TACAS 2026. Main text: 17 pages, 2 figures, 2 tables; Appendix: 37 pages, 3 figures, 3 tables
ATOD: An Evaluation Framework and Benchmark for Agentic Task-Oriented Dialogue System
Recent advances in task-oriented dialogue (TOD) systems, driven by large language models (LLMs) with extensive API and tool integration, have enabled conversational agents to coordinate interleaved goals, maintain long-horizon context, and act proactively through asynchronous execution. These capabilities extend beyond traditional TOD systems, yet existing benchmarks lack systematic support for evaluating such agentic behaviors. To address this gap, we introduce ATOD, a benchmark and synthetic dialogue generation pipeline that produces richly annotated conversations requiring long-term reasoning. ATOD captures key characteristics of advanced TOD, including multi-goal coordination, dependency management, memory, adaptability, and proactivity. Building on ATOD, we propose ATOD-Eval, a holistic evaluation framework that translates these dimensions into fine-grained metrics and supports reproducible offline and online evaluation. We further present a strong agentic memory-based evaluator for benchmarking on ATOD. Experiments show that ATOD-Eval enables comprehensive assessment across task completion, agentic capability, and response quality, and that the proposed evaluator offers a better accuracy-efficiency tradeoff compared to existing memory- and LLM-based approaches under this evaluation setting.
Who is Helping Whom? Analyzing Inter-dependencies to Evaluate Cooperation in Human-AI Teaming
State-of-the-art methods for Human-AI Teaming and Zero-shot Cooperation focus on task completion, i.e., task rewards, as the sole evaluation metric while being agnostic to how the two agents work with each other. Furthermore, subjective user studies only offer limited insight into the quality of cooperation existing within the team. Specifically, we are interested in understanding the cooperative behaviors arising within the team when trained agents are paired with humans -- a problem that has been overlooked by the existing literature. To formally address this problem, we propose the concept of constructive interdependence -- measuring how much agents rely on each other's actions to achieve the shared goal -- as a key metric for evaluating cooperation in human-agent teams. We interpret interdependence in terms of action interactions in a STRIPS formalism, and define metrics that allow us to assess the degree of reliance between the agents' actions. We pair state-of-the-art agents HAT with learned human models as well as human participants in a user study for the popular Overcooked domain, and evaluate the task reward and teaming performance for these human-agent teams. Our results demonstrate that although trained agents attain high task rewards, they fail to induce cooperative behavior, showing very low levels of interdependence across teams. Furthermore, our analysis reveals that teaming performance is not necessarily correlated with task reward, highlighting that task reward alone cannot reliably measure cooperation arising in a team.
Code2MCP: Transforming Code Repositories into MCP Services
The Model Context Protocol (MCP) aims to create a standard for how Large Language Models use tools. However, most current research focuses on selecting tools from an existing pool. A more fundamental, yet largely overlooked, problem is how to populate this pool by converting the vast number of existing software projects into MCP-compatible services. To bridge this gap, we introduce Code2MCP, an agent-based framework that automatically transforms a GitHub repository into a functional MCP service with minimal human intervention. Code2MCP employs a multi-agent workflow for code analysis, environment setup, tool function design, and service generation, enhanced by a self-correcting loop to ensure reliability. We demonstrate that Code2MCP successfully transforms open-source computing libraries in scientific fields such as bioinformatics, mathematics, and fluid dynamics that are not available in existing MCP servers. By providing a novel automated pathway to unlock GitHub, the world's largest code repository, for the MCP ecosystem, Code2MCP serves as a catalyst to significantly accelerate the protocol's adoption and practical application. The code is public at https://github.com/DEFENSE-SEU/Code2MCP.
Scene-Aware Vectorized Memory Multi-Agent Framework with Cross-Modal Differentiated Quantization VLMs for Visually Impaired Assistance
Visually impaired individuals face significant challenges in environmental perception. Traditional assistive technologies often lack adaptive intelligence, focusing on individual components rather than integrated systems. While Vision-Language Models (VLMs) offer a promising path to richer, integrated understanding, their deployment is severely limited by substantial computational requirements, demanding dozens of gigabytes of memory. To address these gaps in computational efficiency and integrated design, this study proposes a dual technological innovation framework: a cross-modal differentiated quantization framework for VLMs and a scene-aware vectorized memory multi-agent system. The quantization framework implements differentiated strategies, reducing memory from 38GB to 11.3GB. The multi-agent system uses vectorized memory and perception-memory-reasoning workflows to provide environmental information beyond the current view, achieving 2.83-3.52s latency to initial speech output. Experiments show the quantized 19B-parameter model only experiences a 2.05% performance drop on MMBench and maintains 63.7 accuracy on OCR-VQA (original: 64.9), outperforming smaller models with equivalent memory. This research advances computational efficiency and assistive technology, offering comprehensive assistance in scene perception, text recognition, and navigation.
comment: 28 pages,9 figures
EduThink4AI: Bridging Educational Critical Thinking and Multi-Agent LLM Systems
Large language models (LLMs) have demonstrated significant potential as educational tutoring agents, capable of tailoring hints, orchestrating lessons, and grading with near-human finesse across various academic domains. However, current LLM-based educational systems exhibit critical limitations in promoting genuine critical thinking, failing on over one-third of multi-hop questions with counterfactual premises, and remaining vulnerable to adversarial prompts that trigger biased or factually incorrect responses. To address these gaps, we propose \textbf{EDU-Prompting}, a novel multi-agent framework that bridges established educational critical thinking theories with LLM agent design to generate critical, bias-aware explanations while fostering diverse perspectives. Our systematic evaluation across theoretical benchmarks and practical college-level critical writing scenarios demonstrates that EDU-Prompting significantly enhances both content truthfulness and logical soundness in AI-generated educational responses. The framework's modular design enables seamless integration into existing prompting frameworks and educational applications, allowing practitioners to directly incorporate critical thinking catalysts that promote analytical reasoning and introduce multiple perspectives without requiring extensive system modifications.
Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits
We study a decentralized collaborative requesting problem that aims to optimize the information freshness of time-sensitive clients in edge networks consisting of multiple clients, access nodes (ANs), and servers. Clients request content through ANs acting as gateways, without observing AN states or the actions of other clients. We define the reward as the age of information reduction resulting from a client's selection of an AN, and formulate the problem as a non-stationary multi-armed bandit. In this decentralized and partially observable setting, the resulting reward process is history-dependent and coupled across clients, and exhibits both abrupt and gradual changes in expected rewards, rendering classical bandit-based approaches ineffective. To address these challenges, we propose the AGING BANDIT WITH ADAPTIVE RESET algorithm, which combines adaptive windowing with periodic monitoring to track evolving reward distributions. We establish theoretical performance guarantees showing that the proposed algorithm achieves near-optimal performance, and we validate the theoretical results through simulations.
A Two-Stage Reactive Auction Framework for the Multi-Depot Rural Postman Problem with Dynamic Vehicle Failures
Although unmanned vehicle fleets offer efficiency in transportation, logistics and inspection, their susceptibility to failures poses a significant challenge to mission continuity. We study the Multi-Depot Rural Postman Problem with Rechargeable and Reusable Vehicles (MD-RPP-RRV) with vehicle failures, where unmanned rechargeable vehicles placed at multiple depots with capacity constraints may fail while serving arc-based demands. To address unexpected vehicle breakdowns during operation, we propose a two-stage real-time rescheduling framework. First, a centralized auction quickly generates a feasible rescheduling solution; for this stage, we derive a theoretical additive bound that establishes an analytical guarantee on the worst-case rescheduling penalty. Second, a peer auction refines this baseline through a problem-specific magnetic field router for local schedule repair, utilizing parameters calibrated via sensitivity analysis to ensure controlled computational growth. We benchmark this approach against a simulated annealing metaheuristic to evaluate solution quality and execution speed. Experimental results on 257 diverse failure scenarios demonstrate that the framework achieves an average runtime reduction of over 95\% relative to the metaheuristic baseline, cutting rescheduling times from hours to seconds while maintaining high solution quality. The two-stage framework excels on large-scale instances, surpassing the centralized auction in nearly 80\% of scenarios with an average solution improvement exceeding 12\%. Moreover, it outperforms the simulated annealing mean and best results in 59\% and 28\% of scenarios, respectively, offering the robust speed-quality trade-off required for real-time mission continuity.
Systems and Control (EESS)
Reachability Guarantees for Energy Arbitrage
This paper introduces a unified framework for battery energy arbitrage under uncertain market prices that integrates chance-constrained terminal state-of-charge requirements with online threshold policies. We first cast the multi-interval arbitrage problem as a stochastic dynamic program enhanced by a probabilistic end-of-horizon state-of-charge (SoC) constraint, ensuring with high confidence that the battery terminates within a prescribed energy band. We then apply a $k$-search algorithm to derive explicit charging (buying) and discharging (selling) thresholds with provable worst-case competitive ratio, and compute the corresponding action probabilities over the decision horizon. To compute exact distributions under operational limits, we develop a probability redistribution pruning method and use it to quantify the likelihood of meeting the terminal SoC band. Leveraging the resulting SoC distribution, we estimate the minimum stopping-time required to satisfy the SoC chance constraint. Computational experiments on historical real price data demonstrate that the proposed framework substantially improves the estimation of SoC evolution and supports chance-constraint satisfaction.
A method for optimizing the structure of the software and hardware complex of a distributed process control system for large industrial enterprises
The article proposes a method for optimizing the structure of the software and hardware complex of an automated control system for continuous technological processes for large industrial enterprises. General information is given on the relevance of the problem of choosing the structure of a system built on the basis of serially produced components, a formal description of the optimization problem is given, the criterion and limitations are highlighted. A solution method using the metaheuristic algorithm of ant colonies is described. A numerical example of the solution is given, the results of the algorithm are analyzed, and directions for further research are determined.
Multimodal Feedback for Handheld Tool Guidance: Combining Wrist-Based Haptics with Augmented Reality
We investigate how vibrotactile wrist feedback can enhance spatial guidance for handheld tool movement in optical see-through augmented reality (AR). While AR overlays are widely used to support surgical tasks, visual occlusion, lighting conditions, and interface ambiguity can compromise precision and confidence. To address these challenges, we designed a multimodal system combining AR visuals with a custom wrist-worn haptic device delivering directional and state-based cues. A formative study with experienced surgeons and residents identified key tool maneuvers and preferences for reference mappings, guiding our cue design. In a cue identification experiment (N=21), participants accurately recognized five vibration patterns under visual load, with higher recognition for full-actuator states than spatial direction cues. In a guidance task (N=27), participants using both AR and haptics achieved significantly higher spatial precision (5.8 mm) and usability (SUS = 88.1) than those using either modality alone, despite having modest increases in task time. Participants reported that haptic cues provided reassuring confirmation and reduced cognitive effort during alignment. Our results highlight the promise of integrating wrist-based haptics into AR systems for high-precision, visually complex tasks such as surgical guidance. We discuss design implications for multimodal interfaces supporting confident, efficient tool manipulation.
Profit Maximization for Electric Vehicle Charging Stations Using Multiagent Reinforcement Learning
Electric vehicles (EVs) are increasingly integrated into power grids, offering economic and environmental benefits but introducing challenges due to uncoordinated charging. This study addresses the profit maximization problem for multiple EV charging stations (EVCSs) equipped with energy storage systems (ESS) and renewable energy sources (RES), with the capability for energy trading. We propose a Double Hypernetwork QMIX-based multi-agent reinforcement learning (MARL) framework to optimize cooperative energy management under uncertainty in EV demand, renewable generation, and real-time electricity prices. The framework mitigates overestimation bias in value estimation, enables distributed decision-making, and incorporates an internal energy trading mechanism. Numerical experiments using real-world data demonstrate that, compared to standard QMIX, the proposed method achieves approximately 5.3% and 12.7% higher total profit for the two regions, respectively, highlighting its economic and operational efficiency. Additionally, the approach maintains robust performance under varying levels of EV demand uncertainty and renewable energy fluctuations.
comment: 18 pages, 4 figures
Robust Verification of Concurrent Stochastic Games
Autonomous systems often operate in multi-agent settings and need to make concurrent, strategic decisions, typically in uncertain environments. Verification and control problems for these systems can be tackled with concurrent stochastic games (CSGs), but this model requires transition probabilities to be precisely specified - an unrealistic requirement in many real-world settings. We introduce *robust CSGs* and their subclass *interval CSGs* (ICSGs), which capture epistemic uncertainty about transition probabilities in CSGs. We propose a novel framework for *robust* verification of these models under worst-case assumptions about transition uncertainty. Specifically, we develop the underlying theoretical foundations and efficient algorithms, for finite- and infinite-horizon objectives in both zero-sum and nonzero-sum settings, the latter based on (social-welfare optimal) Nash equilibria. We build an implementation in the PRISM-games model checker and demonstrate the feasibility of robust verification of ICSGs across a selection of large benchmarks.
comment: Extended version of a paper accepted to TACAS 2026. Main text: 17 pages, 2 figures, 2 tables; Appendix: 37 pages, 3 figures, 3 tables
Kernel-Based Learning of Safety Barriers
The rapid integration of AI algorithms in safety-critical applications such as autonomous driving and healthcare is raising significant concerns about the ability to meet stringent safety standards. Traditional tools for formal safety verification struggle with the black-box nature of AI-driven systems and lack the flexibility needed to scale to the complexity of real-world applications. In this paper, we present a data-driven approach for safety verification and synthesis of black-box systems with discrete-time stochastic dynamics. We employ the concept of control barrier certificates, which can guarantee safety of the system, and learn the certificate directly from a set of system trajectories. We use conditional mean embeddings to embed data from the system into a reproducing kernel Hilbert space (RKHS) and construct an RKHS ambiguity set that can be inflated to robustify the result to out-of-distribution behavior. We provide the theoretical results on how to apply the approach to general classes of temporal logic specifications beyond safety. For the data-driven computation of safety barriers, we leverage a finite Fourier expansion to cast a typically intractable semi-infinite optimization problem as a linear program. The resulting spectral barrier allows us to leverage the fast Fourier transform to generate the relaxed problem efficiently, offering a scalable yet distributionally robust framework for verifying safety. Our work moves beyond restrictive assumptions on system dynamics and uncertainty, as demonstrated on two case studies including a black-box system with a neural network controller.
comment: 44 pages, 9 figures
Decentralized Motion and Resonant Damping Control for High-Bandwidth and Cross-Coupling Reduction in MIMO Nanopositioners
Piezoelectric nanopositioning systems are widely used in precision applications that require nanometer accuracy and high-speed motion; however, lightly damped resonances and pronounced cross-axis coupling severely limit bandwidth and disturbance rejection. This paper presents a decentralized dual-loop control strategy for a two-axis nanopositioner, combining an inner non-minimum-phase resonant damping controller with an outer motion controller on each axis. The dominant diagonal resonance is actively damped to enable closed-loop bandwidths beyond the first structural mode, while a parallel band-pass damping path is specifically tuned to a higher-order resonance that predominantly affects the cross-coupling channels. Experimental results demonstrate that this targeted band-pass damping substantially reduces cross-axis coupling and enhances disturbance rejection, without compromising tracking accuracy.
A Constraint Programming Model for the Super-Agile Earth Observation Satellite Imaging Scheduling Problem
As the dependence on satellite imaging continues to grow, modern satellites have become increasingly agile, with the new generation, namely super-agile Earth observation satellites (SAEOS), providing unprecedented imaging flexibility. The highly dynamic capabilities of these satellites introduce additional challenges to the scheduling of observation tasks, as existing approaches for conventional agile satellites do not account for variable observation durations and multiple imaging directions. Although some efforts have been made in this regard, the SAEOS imaging scheduling problem (SAEOS-ISP) remains largely unexplored, and no exact approaches have yet been proposed. In this context, this study presents the first exact Constraint Programming formulation for the SAEOS-ISP, considering flexible observation windows, multiple pointing directions and sequence-dependent transition times across multiple satellites. Computational experiments on a newly generated benchmark set demonstrate that the model can be solved efficiently and within very short computational times. Moreover, the results also show that the proposed approach has the potential to achieve higher computational performance compared to the non-exact approaches that are currently considered state-of-the-art.
comment: 12 pages, 4 figures, To be published in the Proceedings of the International Conference on Operations Research and Enterprise Systems (ICORES 2026)
Structured μ-Synthesis for Nanopositioners under Payload-Induced Uncertainties: Minimising Conservatism for Robust Performance
Most systems exhibit significant variability in their dynamics, including variations in system parameters and large high-frequency dynamic uncertainties. Traditional uncertainty modelling techniques consolidate all such variations into a single uncertainty block, often yielding overly conservative representations of the true plant behaviour. This paper introduces an uncertainty modelling framework that employs multiple structured and unstructured uncertainty blocks to reduce this conservatism. The methodology is evaluated for an industrial piezoelectric nanopositioner subject to payload-induced variations, using uncertainty models of differing complexity. A bandpass controller is synthesised via structured mixed-μ synthesis, and the resulting designs are compared in terms of conservatism of the uncertainty model, robust performance, and computational effort.
Least-Squares Multi-Step Koopman Operator Learning for Model Predictive Control
MPC is widely used in real-time applications, but practical implementations are typically restricted to convex QP formulations to ensure fast and certified execution. Koopman-based MPC enables QP-based control of nonlinear systems by lifting the dynamics to a higher-dimensional linear representation. However, existing approaches rely on single-step EDMD. Consequently, prediction errors may accumulate over long horizons when the EDMD operator is applied recursively. Moreover, the multi-step prediction loss is nonconvex with respect to the single-step EDMD operator, making long-horizon model identification particularly challenging. This paper proposes a multi-step EDMD framework that directly learns the condensed multi-step state-control mapping required for Koopman-MPC, thereby bypassing explicit identification of the lifted system matrices and subsequent model condensation. The resulting identification problem admits a convex least-squares formulation. We further show that the problem decomposes across prediction horizons and state coordinates, enabling parallel computation and row-wise $\ell_1$-regularization for automatic dictionary pruning. A non-asymptotic finite-sample analysis demonstrates that, unlike one-step EDMD, the proposed method avoids error compounding and yields error bounds that depend only on the target multi-step mapping. Numerical examples validate improved long-horizon prediction accuracy and closed-loop performance.
AI for Green Spaces: Leveraging Autonomous Navigation and Computer Vision for Park Litter Removal
There are 50 billion pieces of litter in the U.S. alone. Grass fields contribute to this problem because picnickers tend to leave trash on the field. We propose building a robot that can autonomously navigate, identify, and pick up trash in parks. To autonomously navigate the park, we used a Spanning Tree Coverage (STC) algorithm to generate a coverage path the robot could follow. To navigate this path, we successfully used Real-Time Kinematic (RTK) GPS, which provides a centimeter-level reading every second. For computer vision, we utilized the ResNet50 Convolutional Neural Network (CNN), which detects trash with 94.52% accuracy. For trash pickup, we tested multiple design concepts. We select a new pickup mechanism that specifically targets the trash we encounter on the field. Our solution achieved an overall success rate of 80%, demonstrating that autonomous trash pickup robots on grass fields are a viable solution.
comment: Published in IEEE/SICE SII 2025
A Constraint Programming Model for the Super-Agile Earth Observation Satellite Imaging Scheduling Problem
As the dependence on satellite imaging continues to grow, modern satellites have become increasingly agile, with the new generation, namely super-agile Earth observation satellites (SAEOS), providing unprecedented imaging flexibility. The highly dynamic capabilities of these satellites introduce additional challenges to the scheduling of observation tasks, as existing approaches for conventional agile satellites do not account for variable observation durations and multiple imaging directions. Although some efforts have been made in this regard, the SAEOS imaging scheduling problem (SAEOS-ISP) remains largely unexplored, and no exact approaches have yet been proposed. In this context, this study presents the first exact Constraint Programming formulation for the SAEOS-ISP, considering flexible observation windows, multiple pointing directions and sequence-dependent transition times across multiple satellites. Computational experiments on a newly generated benchmark set demonstrate that the model can be solved efficiently and within very short computational times. Moreover, the results also show that the proposed approach has the potential to achieve higher computational performance compared to the non-exact approaches that are currently considered state-of-the-art.
comment: 12 pages, 4 figures, To be published in the Proceedings of the International Conference on Operations Research and Enterprise Systems (ICORES 2026)
On the mixed monotonicity of polynomial functions
In this paper, it is shown that every polynomial function is mixed monotone globally with a polynomial decomposition function. For univariate polynomials, the decomposition functions can be constructed from the Gram matrix representation of polynomial functions. The tightness of polynomial decomposition functions is discussed. Several examples are provided. An example is provided to show that polynomial decomposition functions, in addition to being global decomposition functions, can be much tighter than local decomposition functions constructed using local Jacobian bounds. Furthermore, an example is provided to demonstrate the application to reachable set over-approximation.
Two-Stage Bidirectional Inverter Equivalent Circuit Model for Distribution Grid Steady-State Analysis and Optimization
This paper presents a \textit{physics-based} steady-state equivalent circuit model of a two-stage bidirectional inverter. These inverters connect distributed energy resources (DERs), such as photovoltaic (PV) and battery systems, to distribution grids. Existing inverter models have technical gaps on three fronts: i) inadequate modeling of inverter losses; ii) use of mathematical abstractions for bidirectional flow of power; and iii) inability to integrate different control modes into nonlinear solvers without loss of generality. We propose a physics-first model that explicitly captures losses in passive circuit components based on circuit-level principles. We enable bidirectional power flow without binary or complementarity constraints by formulating loss terms as smooth, sign-aware expressions of current. We introduce and parameterize controlled current sources with twice-differentiable continuous functions to enable inverter control modes without loss of generality. We integrate DERs with the proposed inverter model at the load buses of distribution networks to perform power flow and optimization studies on real-world distribution networks with over 20,000 nodes. We demonstrate that the proposed model is more accurate, integrates seamlessly with various control modes without loss of generality, and scales robustly to large optimization problems. Index Terms: bidirectional inverter model, circuit-based modeling, DERs, inverter efficiency, power control, steady-state analysis.
Anti-Jamming based on Beam-Steering Antennas and Intelligent UAV Swarm Behavior
In recent years, Unmanned Aerial Vehicles (UAVs) have brought a new true revolution to military tactics. While UAVs already constitute an advantage when operating alone, multi-UAV swarms expand the available possibilities, allowing the UAVs to collaborate and support each other as a team to carry out a given task. This entails the capability to exchange information related with situation awareness and action coordination by means of a suitable wireless communication technology. In such scenario, the adversary is expected to disrupt communications by jamming the communication channel. The latter becomes the Achilles heel of the swarm. While anti-jamming techniques constitute a well covered topic in the literature, the use of intelligent swarm behaviors to leverage those techniques is still an open research issue. This paper explores the use of Genetic Algorithms (GAs) to jointly optimize UAV swarm formation, beam-steering antennas and traffic routing in order to mitigate the effect of jamming in the main coordination channel, under the assumption that a more robust and low data rate channel is used for formation management signaling. Simulation results show the effectiveness of proposed approach. However, the significant computational cost paves the way for further research.
comment: 9 pages (8 pages in IEEE published version), conference paper
Stabilization by Controllers Having Integer Coefficients
The system property of ``having integer coefficients,'' that is, a transfer function has an integer monic polynomial as its denominator, is significant in the field of encrypted control as it is required for a dynamic controller to be realized over encrypted data. This paper shows that there always exists a controller with integer coefficients stabilizing a given discrete-time linear time-invariant plant. A constructive algorithm to obtain such a controller is provided, along with numerical examples. Furthermore, the proposed method is applied to converting a pre-designed controller to have integer coefficients, while the original performance is preserved in the sense that the transfer function of the closed-loop system remains unchanged.
An Analytical and Experimental Study of Distributed Uplink Beamforming in the Presence of Carrier Frequency Offsets
Realizing distributed multi-user beamforming (D-MUBF) in time division duplex (TDD)-based multi-user MIMO (MU-MIMO) systems faces significant challenges. One of the most fundamental challenges is achieving accurate over-the-air (OTA) timing and frequency synchronization among distributed access points (APs), particularly due to residual frequency offsets caused by local oscillator (LO) drifts. Despite decades of research on synchronization for MU-MIMO, there are only a few experimental studies that evaluate D-MUBF techniques under imperfect frequency synchronization among distributed antennas. This paper presents an analytical and experimental assessment of D-MUBF methods in the presence of frequency synchronization errors. We provide closed-form expressions for signal-to-interference-plus-noise ratio (SINR) as a function of channel characteristics and statistical properties of carrier frequency offset (CFO) among AP antennas. In addition, through experimental evaluations conducted with the RENEW massive MIMO testbed, we collected comprehensive datasets across various experimental scenarios. These datasets comprise uplink pilot samples for channel and CFO estimation, in addition to uplink multi-user data intended for analyzing D-MUBF techniques. By examining these datasets, we assess the performance of D-MUBF in the presence of CFO and compare the analytical predictions with empirical measurements. Furthermore, we make the datasets publicly available and provide insights on utilizing them for future research endeavors.
comment: 15 pages, 10 figures. v2: accepted to IEEE Transactions on Vehicular Technology; camera-ready version. DOI to be added
Retail electricity costs and emissions incentives are misaligned for commercial and industrial power consumers
Electrification is contributing to substantial growth in U.S. commercial and industrial loads, but the cost and Scope 2 carbon emission implications of this load growth are opaque for both power consumers and utilities. This work describes a unique spatiotemporally resolved data set of U.S. electricity costs and emissions and applies time series approximation methods to quantify the alignment of electricity cost and emission incentives for large commercial and industrial consumers. We present a comprehensive spatiotemporal dataset of U.S. price-based demand response (i.e., tariff) and incentive-based demand response programs, enabling direct comparison to previously published marginal emission factor, average emission factor, and day-ahead market prices. We resolved the structural incompatibility and fragmentation of these datasets by developing time series approximations of discrete data and unifying geospatially heterogeneous datasets. Analysis of these datasets reveals significant spatial and temporal heterogeneity in cost and carbon emissions incentives for demand-side energy flexibility, underscoring the importance of site selection as a key factor influencing power costs and Scope 2 emissions. Analysis also reveals broad misalignment of economic and emissions incentives under existing electricity tariff structures, meaning tariffs are incentivizing consumption of more carbon-intensive electricity, and highlighting potential barriers to electrification delivering carbon savings.
comment: Main manuscript has 25 pages, 6 figures, and 1 table. Supplementary information has 15 pages, 7 figures, and 1 table
Robotics
Learning Semantic-Geometric Task Graph-Representations from Human Demonstrations
Learning structured task representations from human demonstrations is essential for understanding long-horizon manipulation behaviors, particularly in bimanual settings where action ordering, object involvement, and interaction geometry can vary significantly. A key challenge lies in jointly capturing the discrete semantic structure of tasks and the temporal evolution of object-centric geometric relations in a form that supports reasoning over task progression. In this work, we introduce a semantic-geometric task graph-representation that encodes object identities, inter-object relations, and their temporal geometric evolution from human demonstrations. Building on this formulation, we propose a learning framework that combines a Message Passing Neural Network (MPNN) encoder with a Transformer-based decoder, decoupling scene representation learning from action-conditioned reasoning about task progression. The encoder operates solely on temporal scene graphs to learn structured representations, while the decoder conditions on action-context to predict future action sequences, associated objects, and object motions over extended time horizons. Through extensive evaluation on human demonstration datasets, we show that semantic-geometric task graph-representations are particularly beneficial for tasks with high action and object variability, where simpler sequence-based models struggle to capture task progression. Finally, we demonstrate that task graph representations can be transferred to a physical bimanual robot and used for online action selection, highlighting their potential as reusable task abstractions for downstream decision-making in manipulation systems.
comment: 9 pages, 7 figures, preprint
Learning-Based Shrinking Disturbance-Invariant Tubes for State- and Input-Dependent Uncertainty
We develop a learning-based framework for constructing shrinking disturbance-invariant tubes under state- and input-dependent uncertainty, intended as a building block for tube Model Predictive Control (MPC), and certify safety via a lifted, isotone (order-preserving) fixed-point map. Gaussian Process (GP) posteriors become $(1-α)$ credible ellipsoids, then polytopic outer sets for deterministic set operations. A two-time-scale scheme separates learning epochs, where these polytopes are frozen, from an inner, outside-in iteration that converges to a compact fixed point $Z^\star\!\subseteq\!\mathcal G$; its state projection is RPI for the plant. As data accumulate, disturbance polytopes tighten, and the associated tubes nest monotonically, resolving the circular dependence between the set to be verified and the disturbance model while preserving hard constraints. A double-integrator study illustrates shrinking tube cross-sections in data-rich regions while maintaining invariance.
The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents
Recently, with the rapid development of robot learning and imitation learning, numerous datasets and methods have emerged. However, these datasets and their task designs often lack systematic consideration and principles. This raises important questions: Do the current datasets and task designs truly advance the capabilities of robotic agents? Do evaluations on a few common tasks accurately reflect the differentiated performance of various methods proposed by different teams and evaluated on different tasks? To address these issues, we introduce the Great March 100 (\textbf{GM-100}) as the first step towards a robot learning Olympics. GM-100 consists of 100 carefully designed tasks that cover a wide range of interactions and long-tail behaviors, aiming to provide a diverse and challenging set of tasks to comprehensively evaluate the capabilities of robotic agents and promote diversity and complexity in robot dataset task designs. These tasks are developed through systematic analysis and expansion of existing task designs, combined with insights from human-object interaction primitives and object affordances. We collect a large amount of trajectory data on different robotic platforms and evaluate several baseline models. Experimental results demonstrate that the GM-100 tasks are 1) feasible to execute and 2) sufficiently challenging to effectively differentiate the performance of current VLA models. Our data and code are available at https://rhos.ai/research/gm-100.
ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models
Vision-Language-Action (VLA) models have emerged as essential generalist robot policies for diverse manipulation tasks, conventionally relying on directly translating multimodal inputs into actions via Vision-Language Model (VLM) embeddings. Recent advancements have introduced explicit intermediary reasoning, such as sub-task prediction (language) or goal image synthesis (vision), to guide action generation. However, these intermediate reasoning are often indirect and inherently limited in their capacity to convey the full, granular information required for precise action execution. Instead, we posit that the most effective form of reasoning is one that deliberates directly in the action space. We introduce Action Chain-of-Thought (ACoT), a paradigm where the reasoning process itself is formulated as a structured sequence of coarse action intents that guide the final policy. In this paper, we propose ACoT-VLA, a novel architecture that materializes the ACoT paradigm. Specifically, we introduce two complementary components: an Explicit Action Reasoner (EAR) and Implicit Action Reasoner (IAR). The former proposes coarse reference trajectories as explicit action-level reasoning steps, while the latter extracts latent action priors from internal representations of multimodal input, co-forming an ACoT that conditions the downstream action head to enable grounded policy learning. Extensive experiments in real-world and simulation environments demonstrate the superiority of our proposed method, which achieves 98.5%, 84.1%, and 47.4% on LIBERO, LIBERO-Plus and VLABench, respectively.
The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning
The development of robust learning-based control algorithms for unstable systems requires high-quality, real-world data, yet access to specialized robotic hardware remains a significant barrier for many researchers. This paper introduces a comprehensive dynamics dataset for the Mini Wheelbot, an open-source, quasi-symmetric balancing reaction wheel unicycle. The dataset provides 1 kHz synchronized data encompassing all onboard sensor readings, state estimates, ground-truth poses from a motion capture system, and third-person video logs. To ensure data diversity, we include experiments across multiple hardware instances and surfaces using various control paradigms, including pseudo-random binary excitation, nonlinear model predictive control, and reinforcement learning agents. We include several example applications in dynamics model learning, state estimation, and time-series classification to illustrate common robotics algorithms that can be benchmarked on our dataset.
Distributed Control Barrier Functions for Safe Multi-Vehicle Navigation in Heterogeneous USV Fleets
Collision avoidance in heterogeneous fleets of uncrewed vessels is challenging because the decision-making processes and controllers often differ between platforms, and it is further complicated by the limitations on sharing trajectories and control values in real-time. This paper presents a pragmatic approach that addresses these issues by adding a control filter on each autonomous vehicle that assumes worst-case behavior from other contacts, including crewed vessels. This distributed safety control filter is developed using control barrier function (CBF) theory and the application is clearly described to ensure explainability of these safety-critical methods. This work compares the worst-case CBF approach with a Collision Regulations (COLREGS) behavior-based approach in simulated encounters. Real-world experiments with three different uncrewed vessels and a human operated vessel were performed to confirm the approach is effective across a range of platforms and is robust to uncooperative behavior from human operators. Results show that combining both CBF methods and COLREGS behaviors achieves the best safety and efficiency.
comment: 8 pages, 10 figures
Skill-Aware Diffusion for Generalizable Robotic Manipulation
Robust generalization in robotic manipulation is crucial for robots to adapt flexibly to diverse environments. Existing methods usually improve generalization by scaling data and networks, but model tasks independently and overlook skill-level information. Observing that tasks within the same skill share similar motion patterns, we propose Skill-Aware Diffusion (SADiff), which explicitly incorporates skill-level information to improve generalization. SADiff learns skill-specific representations through a skill-aware encoding module with learnable skill tokens, and conditions a skill-constrained diffusion model to generate object-centric motion flow. A skill-retrieval transformation strategy further exploits skill-specific trajectory priors to refine the mapping from 2D motion flow to executable 3D actions. Furthermore, we introduce IsaacSkill, a high-fidelity dataset containing fundamental robotic skills for comprehensive evaluation and sim-to-real transfer. Experiments in simulation and real-world settings show that SADiff achieves good performance and generalization across various manipulation tasks. Code, data, and videos are available at https://sites.google.com/view/sa-diff.
VLAgents: A Policy Server for Efficient VLA Inference
The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies -- including OpenVLA and Pi Zero. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at https://github.com/RobotControlStack/vlagents
Adaptive Monitoring of Stochastic Fire Front Processes via Information-seeking Predictive Control
We consider the problem of adaptively monitoring a wildfire front using a mobile agent (e.g., a drone), whose trajectory determines where sensor data is collected and thus influences the accuracy of fire propagation estimation. This is a challenging problem, as the stochastic nature of wildfire evolution requires the seamless integration of sensing, estimation, and control, often treated separately in existing methods. State-of-the-art methods either impose linear-Gaussian assumptions to establish optimality or rely on approximations and heuristics, often without providing explicit performance guarantees. To address these limitations, we formulate the fire front monitoring task as a stochastic optimal control problem that integrates sensing, estimation, and control. We derive an optimal recursive Bayesian estimator for a class of stochastic nonlinear elliptical-growth fire front models. Subsequently, we transform the resulting nonlinear stochastic control problem into a finite-horizon Markov decision process and design an information-seeking predictive control law obtained via a lower confidence bound-based adaptive search algorithm with asymptotic convergence to the optimal policy.
comment: 2025 IEEE 64th Conference on Decision and Control (CDC)
Learning Quadrupedal Locomotion for a Heavy Hydraulic Robot Using an Actuator Model
The simulation-to-reality (sim-to-real) transfer of large-scale hydraulic robots presents a significant challenge in robotics because of the inherent slow control response and complex fluid dynamics. The complex dynamics result from the multiple interconnected cylinder structure and the difference in fluid rates of the cylinders. These characteristics complicate detailed simulation for all joints, making it unsuitable for reinforcement learning (RL) applications. In this work, we propose an analytical actuator model driven by hydraulic dynamics to represent the complicated actuators. The model predicts joint torques for all 12 actuators in under 1 microsecond, allowing rapid processing in RL environments. We compare our model with neural network-based actuator models and demonstrate the advantages of our model in data-limited scenarios. The locomotion policy trained in RL with our model is deployed on a hydraulic quadruped robot, which is over 300 kg. This work is the first demonstration of a successful transfer of stable and robust command-tracking locomotion with RL on a heavy hydraulic quadruped robot, demonstrating advanced sim-to-real transferability.
comment: 9 pages, Accepted to IEEE Robotics and Automation Letters (RA-L) 2025
Visual Marker Search for Autonomous Drone Landing in Diverse Urban Environments
Marker-based landing is widely used in drone delivery and return-to-base systems for its simplicity and reliability. However, most approaches assume idealized landing site visibility and sensor performance, limiting robustness in complex urban settings. We present a simulation-based evaluation suite on the AirSim platform with systematically varied urban layouts, lighting, and weather to replicate realistic operational diversity. Using onboard camera sensors (RGB for marker detection and depth for obstacle avoidance), we benchmark two heuristic coverage patterns and a reinforcement learning-based agent, analyzing how exploration strategy and scene complexity affect success rate, path efficiency, and robustness. Results underscore the need to evaluate marker-based autonomous landing under diverse, sensor-relevant conditions to guide the development of reliable aerial navigation systems.
A3D: Adaptive Affordance Assembly with Dual-Arm Manipulation AAAI2026
Furniture assembly is a crucial yet challenging task for robots, requiring precise dual-arm coordination where one arm manipulates parts while the other provides collaborative support and stabilization. To accomplish this task more effectively, robots need to actively adapt support strategies throughout the long-horizon assembly process, while also generalizing across diverse part geometries. We propose A3D, a framework which learns adaptive affordances to identify optimal support and stabilization locations on furniture parts. The method employs dense point-level geometric representations to model part interaction patterns, enabling generalization across varied geometries. To handle evolving assembly states, we introduce an adaptive module that uses interaction feedback to dynamically adjust support strategies during assembly based on previous interactions. We establish a simulation environment featuring 50 diverse parts across 8 furniture types, designed for dual-arm collaboration evaluation. Experiments demonstrate that our framework generalizes effectively to diverse part geometries and furniture categories in both simulation and real-world settings.
comment: AAAI2026 oral
H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning
In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose Hierarchical Autonomous Intelligent Multi-Robot Planning(H-AIM), a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experimental results demonstrate that H-AIM achieves a remarkable performance improvement, elevating the task success rate from 12% to 55% and boosting the goal condition recall from 32% to 72% against the strongest baseline, LaMMA-P.
Haptic Light-Emitting Diodes: Miniature, Luminous Tactile Actuators
We present Haptic Light-Emitting Diodes (HLEDs), luminous thermopneumatic actuators that directly convert pulsed light into mechanical forces and displacements. Each device packages a miniature surface-mount LED in a gas-filled cavity that contains a low-inertia graphite photoabsorber. The cavity is sealed by an elastic membrane, which functions as a working diaphragm. Brief optical pulses heat the photoabsorber, which heats the gas. The resulting rapid pressure increases generate forces and displacements at the working diaphragm. Millimeter-scale HLEDs produce forces exceeding 0.4 N and displacements of 1 mm at low voltages, with 5 to 100 ms response times, making them attractive as actuators providing tactile feedback in human-machine interfaces. Perceptual testing revealed that the strength of tactile feedback increased linearly with optical power. HLEDs devices are mechanically simple and efficient to fabricate. Unusually, these actuators are also light-emitting, as a fraction of optical energy is transmitted through the membrane. These opto-mechanical actuators have many potential applications in tactile displays, human interface engineering, wearable computing, and other areas.
Crane Lowering Guidance Using a Attachable Camera Module for Driver Vision Support
Cranes have long been essential equipment for lifting and placing heavy loads in construction projects. This study focuses on the lowering phase of crane operation, the stage in which the load is moved to the desired location. During this phase, a constant challenge exists: the load obstructs the operator's view of the landing point. As a result, operators traditionally have to rely on verbal or gestural instructions from ground personnel, which significantly impacts site safety. To alleviate this constraint, the proposed system incorporates a attachable camera module designed to be attached directly to the load via a suction cup. This module houses a single-board computer, battery, and compact camera. After installation, it streams and processes images of the ground directly below the load in real time to generate installation guidance. Simultaneously, this guidance is transmitted to and monitored by a host computer. Preliminary experiments were conducted by attaching this module to a test object, confirming the feasibility of real-time image acquisition and transmission. This approach has the potential to significantly improve safety on construction sites by providing crane operators with an instant visual reference of hidden landing zones.
comment: Presented at ICCR 2025(International COnference on Control and Robotics 2025). Submitted to the IEEE for possible publication
Where to Touch, How to Contact: Hierarchical RL-MPC Framework for Geometry-Aware Long-Horizon Dexterous Manipulation
A key challenge in contact-rich dexterous manipulation is the need to jointly reason over geometry, kinematic constraints, and intricate, nonsmooth contact dynamics. End-to-end visuomotor policies bypass this structure, but often require large amounts of data, transfer poorly from simulation to reality, and generalize weakly across tasks/embodiments. We address those limitations by leveraging a simple insight: dexterous manipulation is inherently hierarchical - at a high level, a robot decides where to touch (geometry) and move the object (kinematics); at a low level it determines how to realize that plan through contact dynamics. Building on this insight, we propose a hierarchical RL--MPC framework in which a high-level reinforcement learning (RL) policy predicts a contact intention, a novel object-centric interface that specifies (i) an object-surface contact location and (ii) a post-contact object-level subgoal pose. Conditioned on this contact intention, a low-level contact-implicit model predictive control (MPC) optimizes local contact modes and replans with contact dynamics to generate robot actions that robustly drive the object toward each subgoal. We evaluate the framework on non-prehensile tasks, including geometry-generalized pushing and object 3D reorientation. It achieves near-100% success with substantially reduced data (10x less than end-to-end baselines), highly robust performance, and zero-shot sim-to-real transfer.
comment: 13 Pages, Plan to submit RSS
Three Dimensional Hydrodynamic Flow-Based Collision Avoidance for UAV Formations Facing Emergent Dynamic Obstacles
This paper presents a three-dimensional, hydrodynamics-inspired collision avoidance framework for uncrewed aerial vehicle (UAV) formations operating in dynamic environments. When moving obstacles enter a UAV's sensing region, they are modeled as three dimensional doublets or ellipsoids that generate local velocity fields, guiding nearby UAVs to execute smooth, collision-free maneuvers without trajectory discontinuities or explicit trajectory replanning. This flow-based approach enables real-time operation and interpretable behavior by leveraging the nature of fluid flow around obstacles via the harmonic properties of Laplace's equation, inherently avoiding local minima common in traditional potential field methods. To establish and maintain coordination among the UAVs, a Virtual Rigid Body (VRB) formation strategy is integrated, ensuring that formation geometry and trajectory tracking are preserved. Simulation results demonstrate the feasibility and scalability of the method for both individual and multi-UAV scenarios with multiple formation geometries encountering moving obstacles. The proposed approach achieves safe, smooth, and computationally efficient avoidance maneuvers suitable for real-time and practical applications.
comment: 18 pages, 15 figures
A Hybrid Soft Haptic Display for Rendering Lump Stiffness in Remote Palpation
Remote palpation enables noninvasive tissue examination in telemedicine, yet current tactile displays often lack the fidelity to convey both large-scale forces and fine spatial details. This study introduces a hybrid fingertip display comprising a rigid platform and a $4\times4$ soft pneumatic tactile display (4.93 mm displacement and 1.175 N per single pneumatic chamber) to render a hard lump beneath soft tissue. This study compares three rendering strategies: a Platform-Only baseline that renders the total interaction force; a Hybrid A (Position + Force Feedback) strategy that adds a dynamic, real-time soft spatial cue; and a Hybrid B (Position + Preloaded Stiffness Feedback) strategy that provides a constant, pre-calculated soft spatial cue. In a 12-participant lump detection study, both hybrid methods dramatically improved accuracy over the Platform-Only baseline (from 50\% to over 95\%). While the Hybrid B was highlighted qualitatively for realism, its event-based averaging is expected to increase interaction latency in real-time operation. This suggests a trade-off between perceived lump realism and real-time responsiveness, such that rendering choices that enhance realism may conflict with those that minimize latency.
comment: Paper manuscript has been accepted by 2026 IEEE Haptics Symposium
Optimal Thruster Configuration for 6-DOF Control of a Small Satellite
With the growing deployment of small satellites (such as CubeSats, Nanosats, Picosats, and Femtosats) in Low Earth Orbit (LEO) for targeted applications like imaging, communication, data storage, and rendezvous-docking mission, there is increasing attention on orbit maintenance and attitude control. A common approach for active orbit control involves the use of multiple thrusters, which, when properly arranged, can also generate the required torque for attitude control. Starting from a 24-thruster configuration, this paper presents a set of thruster configurations (referred to as a viable configuration group) that enable full six degrees of freedom (6-DOF) control. Further, configuration group that requires minimum total thrust to achieve 6-DOF commands are found among the viable configuration group. One configuration from each of these groups is further evaluated for its attitude control performance through a representative rendezvous-docking mission, demonstrating that even with a reduced thruster count, sufficient maneuverability can be achieved.
comment: 19 pages, 9 figures
RobotDesignGPT: Automated Robot Design Synthesis using Vision Language Models
Robot design is a nontrivial process that involves careful consideration of multiple criteria, including user specifications, kinematic structures, and visual appearance. Therefore, the design process often relies heavily on domain expertise and significant human effort. The majority of current methods are rule-based, requiring the specification of a grammar or a set of primitive components and modules that can be composed to create a design. We propose a novel automated robot design framework, RobotDesignGPT, that leverages the general knowledge and reasoning capabilities of large pre-trained vision-language models to automate the robot design synthesis process. Our framework synthesizes an initial robot design from a simple user prompt and a reference image. Our novel visual feedback approach allows us to greatly improve the design quality and reduce unnecessary manual feedback. We demonstrate that our framework can design visually appealing and kinematically valid robots inspired by nature, ranging from legged animals to flying creatures. We justify the proposed framework by conducting an ablation study and a user study.
Physics-Constrained Denoising Autoencoders for Data-Scarce Wildfire UAV Sensing
Wildfire monitoring requires high-resolution atmospheric measurements, yet low-cost sensors on Unmanned Aerial Vehicles (UAVs) exhibit baseline drift, cross-sensitivity, and response lag that corrupt concentration estimates. Traditional deep learning denoising approaches demand large datasets impractical to obtain from limited UAV flight campaigns. We present PC$^2$DAE, a physics-informed denoising autoencoder that addresses data scarcity by embedding physical constraints directly into the network architecture. Non-negative concentration estimates are enforced via softplus activations and physically plausible temporal smoothing, ensuring outputs are physically admissible by construction rather than relying on loss function penalties. The architecture employs hierarchical decoder heads for Black Carbon, Gas, and CO$_2$ sensor families, with two variants: PC$^2$DAE-Lean (21k parameters) for edge deployment and PC$^2$DAE-Wide (204k parameters) for offline processing. We evaluate on 7,894 synchronized 1 Hz samples collected from UAV flights during prescribed burns in Saskatchewan, Canada (approximately 2.2 hours of flight data), two orders of magnitude below typical deep learning requirements. PC$^2$DAE-Lean achieves 67.3\% smoothness improvement and 90.7\% high-frequency noise reduction with zero physics violations. Five baselines (LSTM-AE, U-Net, Transformer, CBDAE, DeSpaWN) produce 15--23\% negative outputs. The lean variant outperforms wide (+5.6\% smoothness), suggesting reduced capacity with strong inductive bias prevents overfitting in data-scarce regimes. Training completes in under 65 seconds on consumer hardware.
UAV-Based Infrastructure Inspections: A Literature Review and Proposed Framework for AEC+FM
Unmanned Aerial Vehicles (UAVs) are transforming infrastructure inspections in the Architecture, Engineering, Construction, and Facility Management (AEC+FM) domain. By synthesizing insights from over 150 studies, this review paper highlights UAV-based methodologies for data acquisition, photogrammetric modeling, defect detection, and decision-making support. Key innovations include path optimization, thermal integration, and advanced machine learning (ML) models such as YOLO and Faster R-CNN for anomaly detection. UAVs have demonstrated value in structural health monitoring (SHM), disaster response, urban infrastructure management, energy efficiency evaluations, and cultural heritage preservation. Despite these advancements, challenges in real-time processing, multimodal data fusion, and generalizability remain. A proposed workflow framework, informed by literature and a case study, integrates RGB imagery, LiDAR, and thermal sensing with transformer-based architectures to improve accuracy and reliability in detecting structural defects, thermal anomalies, and geometric inconsistencies. The proposed framework ensures precise and actionable insights by fusing multimodal data and dynamically adapting path planning for complex environments, presented as a comprehensive step-by-step guide to address these challenges effectively. This paper concludes with future research directions emphasizing lightweight AI models, adaptive flight planning, synthetic datasets, and richer modality fusion to streamline modern infrastructure inspections.
comment: Accepted for publication at the International Conference on Construction Engineering and Management (I3CE 2025)
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training NeurIPS 2025
Behavior cloning has shown promise for robot manipulation, but real-world demonstrations are costly to acquire at scale. While simulated data offers a scalable alternative, particularly with advances in automated demonstration generation, transferring policies to the real world is hampered by various simulation and real domain gaps. In this work, we propose a unified sim-and-real co-training framework for learning generalizable manipulation policies that primarily leverages simulation and only requires a few real-world demonstrations. Central to our approach is learning a domain-invariant, task-relevant feature space. Our key insight is that aligning the joint distributions of observations and their corresponding actions across domains provides a richer signal than aligning observations (marginals) alone. We achieve this by embedding an Optimal Transport (OT)-inspired loss within the co-training framework, and extend this to an Unbalanced OT framework to handle the imbalance between abundant simulation data and limited real-world examples. We validate our method on challenging manipulation tasks, showing it can leverage abundant simulation data to achieve up to a 30% improvement in the real-world success rate and even generalize to scenarios seen only in simulation. Project webpage: https://ot-sim2real.github.io/.
comment: Accepted to NeurIPS 2025
Probabilistic Mission Design for Neuro-Symbolic Unmanned Aircraft Systems
Advanced Air Mobility (AAM) is a growing field that demands accurate and trustworthy models of legal concepts and restrictions for navigating Unmanned Aircraft Systems (UAS). In addition, any implementation of AAM needs to face the challenges posed by inherently dynamic and uncertain human-inhabited spaces robustly. Nevertheless, the employment of UAS beyond visual line of sight (BVLOS) is an endearing task that promises to significantly enhance today's logistics and emergency response capabilities. Hence, we propose Probabilistic Mission Design (ProMis), a novel neuro-symbolic approach to navigating UAS within legal frameworks. ProMis is an interpretable and adaptable system architecture that links uncertain geospatial data and noisy perception with declarative, Hybrid Probabilistic Logic Programs (HPLP) to reason over the agent's state space and its legality. To inform planning with legal restrictions and uncertainty in mind, ProMis yields Probabilistic Mission Landscapes (PML). These scalar fields quantify the belief that the HPLP is satisfied across the agent's state space. Extending prior work on ProMis' reasoning capabilities and computational characteristics, we show its integration with potent machine learning models such as Large Language Models (LLM) and Transformer-based vision models. Hence, our experiments underpin the application of ProMis with multi-modal input data and how our method applies to many AAM scenarios.
comment: arXiv admin note: text overlap with arXiv:2406.03454
Vision-Conditioned Variational Bayesian Last Layer Dynamics Models
Agile control of robotic systems often requires anticipating how the environment affects system behavior. For example, a driver must perceive the road ahead to anticipate available friction and plan actions accordingly. Achieving such proactive adaptation within autonomous frameworks remains a challenge, particularly under rapidly changing conditions. Traditional modeling approaches often struggle to capture abrupt variations in system behavior, while adaptive methods are inherently reactive and may adapt too late to ensure safety. We propose a vision-conditioned variational Bayesian last-layer dynamics model that leverages visual context to anticipate changes in the environment. The model first learns nominal vehicle dynamics and is then fine-tuned with feature-wise affine transformations of latent features, enabling context-aware dynamics prediction. The resulting model is integrated into an optimal controller for vehicle racing. We validate our method on a Lexus LC500 racing through water puddles. With vision-conditioning, the system completed all 12 attempted laps under varying conditions. In contrast, all baselines without visual context consistently lost control, demonstrating the importance of proactive dynamics adaptation in high-performance applications.
comment: 9 pages, 7 figures, currently under review
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
comment: Presented at the 13th International Conference on Robot Intelligence Technology and Applications
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics NeurIPS 2025
Large Vision-Language Models (LVLMs) have recently shown great promise in advancing robotics by combining embodied reasoning with robot control. A common approach involves training on embodied reasoning tasks related to robot control using Supervised Fine-Tuning (SFT). However, SFT datasets are often heuristically constructed and not explicitly optimized for improving robot control. Furthermore, SFT often leads to issues such as catastrophic forgetting and reduced generalization performance. To address these limitations, we introduce Robot-R1, a novel framework that leverages reinforcement learning to enhance embodied reasoning specifically for robot control. Robot-R1 learns to predict the next keypoint state required for task completion, conditioned on the current scene image and environment metadata derived from expert demonstrations. Inspired by the DeepSeek-R1 learning approach, Robot-R1 samples reasoning-based responses and reinforces those that lead to more accurate predictions. To rigorously evaluate Robot-R1, we also introduce a new benchmark that demands the diverse embodied reasoning capabilities for the task. Our experiments show that models trained with Robot-R1 outperform SFT methods on embodied reasoning tasks. Despite having only 7B parameters, Robot-R1 even surpasses GPT-4o on reasoning tasks related to low-level action control, such as spatial and movement reasoning.
comment: NeurIPS 2025
Collaborative Representation Learning for Alignment of Tactile, Language, and Vision Modalities
Tactile sensing offers rich and complementary information to vision and language, enabling robots to perceive fine-grained object properties. However, existing tactile sensors lack standardization, leading to redundant features that hinder cross-sensor generalization. Moreover, existing methods fail to fully integrate the intermediate communication among tactile, language, and vision modalities. To address this, we propose TLV-CoRe, a CLIP-based Tactile-Language-Vision Collaborative Representation learning method. TLV-CoRe introduces a Sensor-Aware Modulator to unify tactile features across different sensors and employs tactile-irrelevant decoupled learning to disentangle irrelevant tactile features. Additionally, a Unified Bridging Adapter is introduced to enhance tri-modal interaction within the shared representation space. To fairly evaluate the effectiveness of tactile models, we further propose the RSS evaluation framework, focusing on Robustness, Synergy, and Stability across different methods. Experimental results demonstrate that TLV-CoRe significantly improves sensor-agnostic representation learning and cross-modal alignment, offering a new direction for multimodal tactile representation.
SceneFoundry: Generating Interactive Infinite 3D Worlds
The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existing generative approaches often fail to capture the functional complexity of real-world interiors, particularly those containing articulated objects with movable parts essential for manipulation and navigation. This paper presents SceneFoundry, a language-guided diffusion framework that generates apartment-scale 3D worlds with functionally articulated furniture and semantically diverse layouts for robotic training. From natural language prompts, an LLM module controls floor layout generation, while diffusion-based posterior sampling efficiently populates the scene with articulated assets from large-scale 3D repositories. To ensure physical usability, SceneFoundry employs differentiable guidance functions to regulate object quantity, prevent articulation collisions, and maintain sufficient walkable space for robotic navigation. Extensive experiments demonstrate that our framework generates structurally valid, semantically coherent, and functionally interactive environments across diverse scene types and conditions, enabling scalable embodied AI research. project page: https://anc891203.github.io/SceneFoundry-Demo/
comment: 15 pages
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: This work has been submitted to the IEEE for possible publication. 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
Off Policy Lyapunov Stability in Reinforcement Learning
Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
comment: Conference on Robot Learning (CORL) 2025
EqVIO: An Equivariant Filter for Visual Inertial Odometry
Visual-Inertial Odometry (VIO) is the problem of estimating a robot's trajectory by combining information from an inertial measurement unit (IMU) and a camera, and is of great interest to the robotics community. This paper develops a novel Lie group symmetry for the VIO problem and applies the recently proposed equivariant filter. The proposed symmetry is compatible with the invariance of the VIO reference frame, leading to improved filter consistency. The bias-free IMU dynamics are group-affine, ensuring that filter linearisation errors depend only on the bias estimation error and measurement noise. Furthermore, visual measurements are equivariant with respect to the symmetry, enabling the application of the higher-order equivariant output approximation to reduce approximation error in the filter update equation. As a result, the equivariant filter (EqF) based on this Lie group is a consistent estimator for VIO with lower linearisation error in the propagation of state dynamics and a higher order equivariant output approximation than standard formulations. Experimental results on the popular EuRoC and UZH FPV datasets demonstrate that the proposed system outperforms other state-of-the-art VIO algorithms in terms of both speed and accuracy.
comment: 28 pages, 17 figures, published in IEEE TRO
From Human Bias to Robot Choice: How Occupational Contexts and Racial Priming Shape Robot Selection
As artificial agents increasingly integrate into professional environments, fundamental questions have emerged about how societal biases influence human-robot selection decisions. We conducted two comprehensive experiments (N = 1,038) examining how occupational contexts and stereotype activation shape robotic agent choices across construction, healthcare, educational, and athletic domains. Participants made selections from artificial agents that varied systematically in skin tone and anthropomorphic characteristics. Our study revealed distinct context-dependent patterns. Healthcare and educational scenarios demonstrated strong favoritism toward lighter-skinned artificial agents, while construction and athletic contexts showed greater acceptance of darker-toned alternatives. Participant race was associated with systematic differences in selection patterns across professional domains. The second experiment demonstrated that exposure to human professionals from specific racial backgrounds systematically shifted later robotic agent preferences in stereotype-consistent directions. These findings show that occupational biases and color-based discrimination transfer directly from human-human to human-robot evaluation contexts. The results highlight mechanisms through which robotic deployment may unintentionally perpetuate existing social inequalities.
comment: HRI '26
Multiagent Systems
The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents
The integration of AI agents into economic markets fundamentally alters the landscape of strategic interaction. We investigate the economic implications of expanding the set of available technologies in three canonical game-theoretic settings: bargaining (resource division), negotiation (asymmetric information trade), and persuasion (strategic information transmission). We find that simply increasing the choice of AI delegates can drastically shift equilibrium payoffs and regulatory outcomes, often creating incentives for regulators to proactively develop and release technologies. Conversely, we identify a strategic phenomenon termed the "Poisoned Apple" effect: an agent may release a new technology, which neither they nor their opponent ultimately uses, solely to manipulate the regulator's choice of market design in their favor. This strategic release improves the releaser's welfare at the expense of their opponent and the regulator's fairness objectives. Our findings demonstrate that static regulatory frameworks are vulnerable to manipulation via technology expansion, necessitating dynamic market designs that adapt to the evolving landscape of AI capabilities.
Can Small Agent Collaboration Beat a Single Big LLM?
This report studies whether small, tool-augmented agents can match or outperform larger monolithic models on the GAIA benchmark. Using Qwen3 models (4B-32B) within an adapted Agentic-Reasoning framework, we isolate the effects of model scale, explicit thinking (no thinking, planner-only, or full), and tool use (search, code, mind-map). Tool augmentation provides the largest and most consistent gains. Using tools, 4B models can outperform 32B models without tool access on GAIA in our experimental setup. In contrast, explicit thinking is highly configuration- and difficulty-dependent: planner-only thinking can improve decomposition and constraint tracking, while unrestricted full thinking often degrades performance by destabilizing tool orchestration, leading to skipped verification steps, excessive tool calls, non-termination, and output-format drift.
H-AIM: Orchestrating LLMs, PDDL, and Behavior Trees for Hierarchical Multi-Robot Planning
In embodied artificial intelligence, enabling heterogeneous robot teams to execute long-horizon tasks from high-level instructions remains a critical challenge. While large language models (LLMs) show promise in instruction parsing and preliminary planning, they exhibit limitations in long-term reasoning and dynamic multi-robot coordination. We propose Hierarchical Autonomous Intelligent Multi-Robot Planning(H-AIM), a novel embodied multi-robot task planning framework that addresses these issues through a three-stage cascaded architecture: 1) It leverages an LLM to parse instructions and generate Planning Domain Definition Language (PDDL) problem descriptions, thereby transforming commands into formal planning problems; 2) It combines the semantic reasoning of LLMs with the search capabilities of a classical planner to produce optimized action sequences; 3) It compiles the resulting plan into behavior trees for reactive control. The framework supports dynamically sized heterogeneous robot teams via a shared blackboard mechanism for communication and state synchronization. To validate our approach, we introduce the MACE-THOR benchmark dataset, comprising 42 complex tasks across 8 distinct household layouts. Experimental results demonstrate that H-AIM achieves a remarkable performance improvement, elevating the task success rate from 12% to 55% and boosting the goal condition recall from 32% to 72% against the strongest baseline, LaMMA-P.
From Agent Simulation to Social Simulator: A Comprehensive Review (Part 2)
The study of system complexity primarily has two objectives: to explore underlying patterns and to develop theoretical explanations. Pattern exploration seeks to clarify the mechanisms behind the emergence of system complexity, while theoretical explanations aim to identify the fundamental causes of this complexity. Laws are generally defined as mappings between variables, whereas theories offer causal explanations of system behavior. Agent Based Modeling(ABM) is an important approach for studying complex systems, but it tends to emphasize simulation over experimentation. As a result, ABM often struggles to deeply uncover the governing operational principles. Unlike conventional scenario analysis that relies on human reasoning, computational experiments emphasize counterfactual experiments-that is, creating parallel worlds that simulate alternative "evolutionary paths" of real-world events. By systematically adjusting input variables and observing the resulting changes in output variables, computational experiments provide a robust tool for causal inference, thereby addressing the limitations of traditional ABM. Together, these methods offer causal insights into the dynamic evolution of systems. This part can help readers gain a preliminary understanding of the entire computational experiment method, laying the foundation for the subsequent study.
comment: This paper is Part II of a planned multi-part review series on "From Agent Simulation to Social Simulator". It is a self-contained article and can be read independently of Part I. Although the authors have previously published related work, this submission is not a revision or updated version of any earlier paper
Regret-Driven Portfolios: LLM-Guided Smart Clustering for Optimal Allocation
We attempt to mitigate the persistent tradeoff between risk and return in medium- to long-term portfolio management. This paper proposes a novel LLM-guided no-regret portfolio allocation framework that integrates online learning dynamics, market sentiment indicators, and large language model (LLM)-based hedging to construct high-Sharpe ratio portfolios tailored for risk-averse investors and institutional fund managers. Our approach builds on a follow-the-leader approach, enriched with sentiment-based trade filtering and LLM-driven downside protection. Empirical results demonstrate that our method outperforms a SPY buy-and-hold baseline by 69% in annualized returns and 119% in Sharpe ratio.
The incompatibility of the Condorcet winner and loser criteria with positive involvement and resolvability
We prove that there is no preferential voting method satisfying the Condorcet winner and loser criteria, positive involvement (if a candidate $x$ wins in an initial preference profile, then adding a voter who ranks $x$ uniquely first cannot cause $x$ to lose), and $n$-voter resolvability (if $x$ initially ties for winning, then $x$ can be made the unique winner by adding up to $n$ voters). In a previous note, we proved an analogous result assuming an additional axiom of ordinal margin invariance, which we now show is unnecessary for an impossibility theorem, at least if the desired voting method is defined for five-candidate elections.
comment: 6 pages, 2 figures
UCB-type Algorithm for Budget-Constrained Expert Learning
In many modern applications, a system must dynamically choose between several adaptive learning algorithms that are trained online. Examples include model selection in streaming environments, switching between trading strategies in finance, and orchestrating multiple contextual bandit or reinforcement learning agents. At each round, a learner must select one predictor among $K$ adaptive experts to make a prediction, while being able to update at most $M \le K$ of them under a fixed training budget. We address this problem in the \emph{stochastic setting} and introduce \algname{M-LCB}, a computationally efficient UCB-style meta-algorithm that provides \emph{anytime regret guarantees}. Its confidence intervals are built directly from realized losses, require no additional optimization, and seamlessly reflect the convergence properties of the underlying experts. If each expert achieves internal regret $\tilde O(T^α)$, then \algname{M-LCB} ensures overall regret bounded by $\tilde O\!\Bigl(\sqrt{\tfrac{KT}{M}} \;+\; (K/M)^{1-α}\,T^α\Bigr)$. To our knowledge, this is the first result establishing regret guarantees when multiple adaptive experts are trained simultaneously under per-round budget constraints. We illustrate the framework with two representative cases: (i) parametric models trained online with stochastic losses, and (ii) experts that are themselves multi-armed bandit algorithms. These examples highlight how \algname{M-LCB} extends the classical bandit paradigm to the more realistic scenario of coordinating stateful, self-learning experts under limited resources.
Incentive Mechanism Design for Privacy-Preserving Decentralized Blockchain Relayers
Public blockchains, though renowned for their transparency and immutability, suffer from significant privacy concerns. Network-level analysis and long-term observation of publicly available transactions can often be used to infer user identities. To mitigate this, several blockchain applications rely on relayers, which serve as intermediary nodes between users and smart contracts deployed on the blockchain. However, dependence on a single relayer not only creates a single point of failure but also introduces exploitable vulnerabilities that weaken the system's privacy guarantees. This paper proposes a decentralized relayer architecture that enhances privacy and reliability through game-theoretic incentive design. We model the interaction among relayers as a non-cooperative game and design an incentive mechanism in which probabilistic uploading emerges as a unique mixed Nash equilibrium. Using evolutionary game analysis, we demonstrate the equilibrium's stability against perturbations and coordinated deviations. Through numerical evaluations, we analyze how equilibrium strategies and system behavior evolve with key parameters such as the number of relayers, upload costs, rewards, and penalties. In particular, we show that even with high transaction costs, the system maintains reliability with an outage probability below 0.05 . Furthermore, our results highlight a fundamental trade-off between privacy, reliability, robustness, and cost in decentralized relayer systems.
comment: This work has been submitted to the IEEE for possible publication
Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification
Intelligent anomaly detection in dynamic visual environments requires reconciling real-time performance with semantic interpretability. Conventional approaches address only fragments of this challenge. Reconstruction-based models capture low-level deviations without contextual reasoning, object detectors provide speed but limited semantics, and large vision-language systems deliver interpretability at prohibitive computational cost. This work introduces a cascading multi-agent framework that unifies these complementary paradigms into a coherent and interpretable architecture. Early modules perform reconstruction-gated filtering and object-level assessment, while higher-level reasoning agents are selectively invoked to interpret semantically ambiguous events. The system employs adaptive escalation thresholds and a publish-subscribe communication backbone, enabling asynchronous coordination and scalable deployment across heterogeneous hardware. Extensive evaluation on large-scale monitoring data demonstrates that the proposed cascade achieves a threefold reduction in latency compared to direct vision-language inference, while maintaining high perceptual fidelity (PSNR = 38.3 dB, SSIM = 0.965) and consistent semantic labeling. The framework advances beyond conventional detection pipelines by combining early-exit efficiency, adaptive multi-agent reasoning, and explainable anomaly attribution, establishing a reproducible and energy-efficient foundation for scalable intelligent visual monitoring.
comment: Author email changed, Acknowlegement changes
M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints
Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. We introduce \textbf{M olGen}, a fragment-level, retrieval-augmented, two-stage framework for molecule generation under multi-property constraints. Stage I : Prototype generation: a multi-agent reasoner performs retrieval-anchored, fragment-level edits to produce a candidate near the feasible region. Stage II : RL-based fine-grained optimization: a fragment-level optimizer trained with Group Relative Policy Optimization (GRPO) applies one- or multi-hop refinements to explicitly minimize the property errors toward our target while regulating edit complexity and deviation from the prototype. A large, automatically curated dataset with reasoning chains of fragment edits and measured property deltas underpins both stages, enabling deterministic, reproducible supervision and controllable multi-hop reasoning. Unlike prior work, our framework better reasons about molecules by leveraging fragments and supports controllable refinement toward numeric targets. Experiments on generation under two sets of property constraints (QED, LogP, Molecular Weight and HOMO, LUMO) show consistent gains in validity and precise satisfaction of multi-property targets, outperforming strong LLMs and graph-based algorithms.
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
Making deep learning recommendation model (DLRM) training and inference fast and efficient is important. However, this presents three key system challenges - model architecture diversity, kernel primitive diversity, and hardware generation and architecture heterogeneity. This paper presents KernelEvolve-an agentic kernel coding framework-to tackle heterogeneity at-scale for DLRM. KernelEvolve is designed to take kernel specifications as input and automate the process of kernel generation and optimization for recommendation model across heterogeneous hardware architectures. KernelEvolve does so by operating at multiple programming abstractions, from Triton and CuTe DSL to low-level hardware agnostic languages, spanning the full hardware-software optimization stack. The kernel optimization process is described as graph-based search with selection policy, universal operator, fitness function, and termination rule, dynamically adapts to runtime execution context through retrieval-augmented prompt synthesis. We designed, implemented, and deployed KernelEvolve to optimize a wide variety of production recommendation models across generations of NVIDIA and AMD GPUs, as well as Meta's AI accelerators. We validate KernelEvolve on the publicly-available KernelBench suite, achieving 100% pass rate on all 250 problems across three difficulty levels, and 160 PyTorch ATen operators across three heterogeneous hardware platforms, demonstrating 100% correctness. KernelEvolve reduces development time from weeks to hours and achieves substantial performance improvements over PyTorch baselines across diverse production use cases and for heterogeneous AI systems at-scale. Beyond performance efficiency improvements, KernelEvolve significantly mitigates the programmability barrier for new AI hardware by enabling automated kernel generation for in-house developed AI hardware.
Systems and Control (EESS)
MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management
Progress in Type 1 Diabetes (T1D) algorithm development is limited by the fragmentation and lack of standardization across existing T1D management datasets. Current datasets differ substantially in structure and are time-consuming to access and process, which impedes data integration and reduces the comparability and generalizability of algorithmic developments. This work aims to establish a unified and accessible data resource for T1D algorithm development. Multiple publicly available T1D datasets were consolidated into a unified resource, termed the MetaboNet dataset. Inclusion required the availability of both continuous glucose monitoring (CGM) data and corresponding insulin pump dosing records. Additionally, auxiliary information such as reported carbohydrate intake and physical activity was retained when present. The MetaboNet dataset comprises 3135 subjects and 1228 patient-years of overlapping CGM and insulin data, making it substantially larger than existing standalone benchmark datasets. The resource is distributed as a fully public subset available for immediate download at https://metabo-net.org/ , and with a Data Use Agreement (DUA)-restricted subset accessible through their respective application processes. For the datasets in the latter subset, processing pipelines are provided to automatically convert the data into the standardized MetaboNet format. A consolidated public dataset for T1D research is presented, and the access pathways for both its unrestricted and DUA-governed components are described. The resulting dataset covers a broad range of glycemic profiles and demographics and thus can yield more generalizable algorithmic performance than individual datasets.
comment: 22 pages, 5 figures, 7 supplementary figures, submitted to JDST
Implications of Grid-Forming Inverter Parameters on Disturbance Localization and Controllability
The shift from traditional synchronous generator (SG) based power generation to generation driven by power electronic devices introduces new dynamic phenomena and considerations for the control of large-scale power systems. In this paper, two aspects of all-inverter power systems are investigated: greater localization of system disturbance response and greater system controllability. The prevalence of both of these aspects are shown to be related to the lower effective inertia of inverters and have implications for future widearea control system design. Greater disturbance localization implies the need for feedback measurement placement close to generator nodes to properly reject disturbances in the system while increased system controllability implies that widearea control systems should preferentially actuate inverters to most efficiently control the system. This investigation utilizes reduced-order linear time-invariant models of both SGs and inverters that are shown to capture the frequency dynamics of interest in both all-SG and all-inverter systems, allowing for the efficient use of both frequency and time domain analysis methods.
Projection-based discrete-time consensus on the unit sphere
We address discrete-time consensus on the Euclidean unit sphere. For this purpose we consider a distributed algorithm comprising the iterative projection of a conical combination of neighboring states. Neighborhoods are represented by a strongly connected directed graph, and the conical combinations are represented by a (non-negative) weight matrix with a zero structure corresponding to the graph. A first result mirrors earlier results for gradient flows. Under the assumptions that each diagonal element of the weight matrix is more than $\sqrt{2}$ larger than the sum of the other elements in the corresponding row, the sphere dimension is greater or equal to 2, and the graph, as well as the weight matrix, is symmetric, we show that the algorithm comprises gradient ascent, stable fixed points are consensus points, and the set of initial points for which the algorithm converges to a non-consensus fixed point has measure zero. The second result is that for the unit circle and a strongly connected graph or for any unit sphere with dimension greater than or equal to $1$ and the complete graph, only for a measure zero set of weight matrices there are fixed points for the algorithm which do not have consensus or antipodal configurations.
comment: 14 pages including appendix, 0 figures
Learning-Based Shrinking Disturbance-Invariant Tubes for State- and Input-Dependent Uncertainty
We develop a learning-based framework for constructing shrinking disturbance-invariant tubes under state- and input-dependent uncertainty, intended as a building block for tube Model Predictive Control (MPC), and certify safety via a lifted, isotone (order-preserving) fixed-point map. Gaussian Process (GP) posteriors become $(1-α)$ credible ellipsoids, then polytopic outer sets for deterministic set operations. A two-time-scale scheme separates learning epochs, where these polytopes are frozen, from an inner, outside-in iteration that converges to a compact fixed point $Z^\star\!\subseteq\!\mathcal G$; its state projection is RPI for the plant. As data accumulate, disturbance polytopes tighten, and the associated tubes nest monotonically, resolving the circular dependence between the set to be verified and the disturbance model while preserving hard constraints. A double-integrator study illustrates shrinking tube cross-sections in data-rich regions while maintaining invariance.
The Mini Wheelbot Dataset: High-Fidelity Data for Robot Learning
The development of robust learning-based control algorithms for unstable systems requires high-quality, real-world data, yet access to specialized robotic hardware remains a significant barrier for many researchers. This paper introduces a comprehensive dynamics dataset for the Mini Wheelbot, an open-source, quasi-symmetric balancing reaction wheel unicycle. The dataset provides 1 kHz synchronized data encompassing all onboard sensor readings, state estimates, ground-truth poses from a motion capture system, and third-person video logs. To ensure data diversity, we include experiments across multiple hardware instances and surfaces using various control paradigms, including pseudo-random binary excitation, nonlinear model predictive control, and reinforcement learning agents. We include several example applications in dynamics model learning, state estimation, and time-series classification to illustrate common robotics algorithms that can be benchmarked on our dataset.
Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency
Energy efficiency has become an integral aspect of modern computing infrastructure design, impacting the performance, cost, scalability, and durability of production systems. The incorporation of power actuation and sensing capabilities in CPU designs is indicative of this, enabling the deployment of system software that can actively monitor and adjust energy consumption and performance at runtime. While reinforcement learning (RL) would seem ideal for the design of such energy efficiency control systems, online training presents challenges ranging from the lack of proper models for setting up an adequate simulated environment, to perturbation (noise) and reliability issues, if training is deployed on a live system. In this paper we discuss the use of offline reinforcement learning as an alternative approach for the design of an autonomous CPU power controller, with the goal of improving the energy efficiency of parallel applications at runtime without unduly impacting their performance. Offline RL sidesteps the issues incurred by online RL training by leveraging a dataset of state transitions collected from arbitrary policies prior to training. Our methodology applies offline RL to a gray-box approach to energy efficiency, combining online application-agnostic performance data (e.g., heartbeats) and hardware performance counters to ensure that the scientific objectives are met with limited performance degradation. Evaluating our method on a variety of compute-bound and memory-bound benchmarks and controlling power on a live system through Intel's Running Average Power Limit, we demonstrate that such an offline-trained agent can substantially reduce energy consumption at a tolerable performance degradation cost.
comment: 11 pages, 5 figures, 3 tables and unpublished
Distributed Control Barrier Functions for Safe Multi-Vehicle Navigation in Heterogeneous USV Fleets
Collision avoidance in heterogeneous fleets of uncrewed vessels is challenging because the decision-making processes and controllers often differ between platforms, and it is further complicated by the limitations on sharing trajectories and control values in real-time. This paper presents a pragmatic approach that addresses these issues by adding a control filter on each autonomous vehicle that assumes worst-case behavior from other contacts, including crewed vessels. This distributed safety control filter is developed using control barrier function (CBF) theory and the application is clearly described to ensure explainability of these safety-critical methods. This work compares the worst-case CBF approach with a Collision Regulations (COLREGS) behavior-based approach in simulated encounters. Real-world experiments with three different uncrewed vessels and a human operated vessel were performed to confirm the approach is effective across a range of platforms and is robust to uncooperative behavior from human operators. Results show that combining both CBF methods and COLREGS behaviors achieves the best safety and efficiency.
comment: 8 pages, 10 figures
Machine Learning on the Edge for Sustainable IoT Networks: A Systematic Literature Review
The Internet of Things (IoT) has become integral to modern technology, enhancing daily life and industrial processes through seamless connectivity. However, the rapid expansion of IoT systems presents significant sustainability challenges, such as high energy consumption and inefficient resource management. Addressing these issues is critical for the long-term viability of IoT networks. Machine learning (ML), with its proven success across various domains, offers promising solutions for optimizing IoT operations. ML algorithms can learn directly from raw data, uncovering hidden patterns and optimizing processes in dynamic environments. Executing ML at the edge of IoT networks can further enhance sustainability by reducing bandwidth usage, enabling real-time decision-making, and improving data privacy. Additionally, testing ML models on actual hardware is essential to ensure satisfactory performance under real-world conditions, as it captures the complexities and constraints of real-world IoT deployments. Combining ML at the edge and actual hardware testing, therefore, increases the reliability of ML models to effectively improve the sustainability of IoT systems. The present systematic literature review explores how ML can be utilized to enhance the sustainability of IoT networks, examining current methodologies, benefits, challenges, and future opportunities. Through our analysis, we aim to provide insights that will drive future innovations in making IoT networks more sustainable.
comment: Published in Elsevier Internet of Things
Composite and Staged Trust Evaluation for Multi-Hop Collaborator Selection
Multi-hop collaboration offers new perspectives for enhancing task execution efficiency by increasing available distributed collaborators for resource sharing. Consequently, selecting trustworthy collaborators becomes critical for realizing effective multi-hop collaboration. However, evaluating device trust requires the consideration of multiple factors, including relatively stable factors, such as historical interaction data, and dynamic factors, such as varying resources and network conditions. This differentiation makes it challenging to achieve the accurate evaluation of composite trust factors using one identical evaluation approach. To address this challenge, this paper proposes a composite and staged trust evaluation (CSTE) mechanism, where stable and dynamic factors are separately evaluated at different stages and then integrated for a final trust decision. First, a device interaction graph is constructed from stable historical interaction data to represent direct trust relationships between devices. A graph neural network framework is then used to propagate and aggregate these trust relationships to produce the historical trustworthiness of devices. In addition, a task-specific trust evaluation method is developed to assess the dynamic resources of devices based on task requirements, which generates the task-specific resource trustworthiness of devices. After these evaluations, CSTE integrates their results to identify devices within the network topology that satisfy the minimum trust thresholds of tasks. These identified devices then establish a trusted topology. Finally, within this trusted topology, an A* search algorithm is employed to construct a multi-hop collaboration path that satisfies the task requirements. Experimental results demonstrate that CSTE outperforms the comparison algorithms in identifying paths with the highest average trust values.
On Data-based Nash Equilibria in LQ Nonzero-sum Differential Games
This paper considers data-based solutions of linear-quadratic nonzero-sum differential games. Two cases are considered. First, the deterministic game is solved and Nash equilibrium strategies are obtained by using persistently excited data from the multiagent system. Then, a stochastic formulation of the game is considered, where each agent measures a different noisy output signal and state observers must be designed for each player. It is shown that the proposed data-based solutions of these games are equivalent to known model-based procedures. The resulting data-based solutions are validated in a numerical experiment.
comment: 7 pages, 2 figures
Analysis of Full Order Observer Based Control for Spacecraft Orbit Maneuver Trajectory Under Solar Radiation Pressure
This study investigates the application of modern control theory to improve the precision of spacecraft orbit maneuvers in low Earth orbit under the influence of solar radiation pressure. A full order observer based feedback control framework is developed to estimate system states and compensate for external disturbances during the trajectory correction phase following main engine cut off. The maneuver trajectory is generated using Lambert guidance, while the observer based controller ensures accurate tracking of the target orbit despite SRP perturbations. The effectiveness of the proposed design is assessed through stability, observability, and controllability analyses. Stability is validated by step-response simulations and eigenvalue distributions of the system dynamics. Observability is demonstrated through state matrix rank analysis, confirming complete state estimation. Controllability is verified using state feedback rank conditions and corresponding control performance plots. Comparative simulations highlight that, in contrast to uncontrolled or conventional control cases, the observer based controller achieves improved trajectory accuracy and robust disturbance rejection with moderate control effort. These findings indicate that observer-based feedback control offers a reliable and scalable solution for precision orbital maneuvering in LEO missions subject to environmental disturbances.
Adaptive Monitoring of Stochastic Fire Front Processes via Information-seeking Predictive Control
We consider the problem of adaptively monitoring a wildfire front using a mobile agent (e.g., a drone), whose trajectory determines where sensor data is collected and thus influences the accuracy of fire propagation estimation. This is a challenging problem, as the stochastic nature of wildfire evolution requires the seamless integration of sensing, estimation, and control, often treated separately in existing methods. State-of-the-art methods either impose linear-Gaussian assumptions to establish optimality or rely on approximations and heuristics, often without providing explicit performance guarantees. To address these limitations, we formulate the fire front monitoring task as a stochastic optimal control problem that integrates sensing, estimation, and control. We derive an optimal recursive Bayesian estimator for a class of stochastic nonlinear elliptical-growth fire front models. Subsequently, we transform the resulting nonlinear stochastic control problem into a finite-horizon Markov decision process and design an information-seeking predictive control law obtained via a lower confidence bound-based adaptive search algorithm with asymptotic convergence to the optimal policy.
comment: 2025 IEEE 64th Conference on Decision and Control (CDC)
Solution Concepts and Existence Results for Hybrid Systems with Continuous-time Inputs
In many scenarios, it is natural to model a plant's dynamical behavior using a hybrid dynamical system influenced by exogenous continuous-time inputs. While solution concepts and analytical tools for existence and completeness are well established for autonomous hybrid systems, corresponding results for hybrid dynamical systems involving continuous-time inputs are generally lacking. This work aims to address this gap. We first formalize notions of a solution for such systems. We then provide conditions that guarantee the existence and forward completeness of solutions. Moreover, we leverage results and ideas from viability theory to present more explicit conditions in terms of various tangent cone formulations. Variants are provided that depend on the regularity of the exogenous input signals.
Determining optimal thermal energy storage charging temperature for cooling using integrated building and coil modeling
Thermal energy storage (TES) systems coupled with heat pumps offer significant potential for improving building energy efficiency by shifting electricity demand to off-peak hours. However, conventional operating strategies maintain conservatively low chilled water temperatures throughout the cooling season, a practice that results in suboptimal heat pump performance. This study proposes a physics-based integrated simulation framework to determine the maximum feasible chilled water supply temperature while ensuring cooling stability. The framework integrates four submodels: relative humidity prediction, dynamic cooling load estimation, cooling coil performance prediction, and TES discharge temperature prediction. Validation against measured data from an office building demonstrates reliable accuracy across all sub-models (e.g., CVRMSE of 9.3% for cooling load and R2 of 0.91 for peak-time discharge temperature). The integrated simulation reveals that the proposed framework can increase the daily initial TES charging temperature by an average of 2.55 °C compared to conventional fixed-temperature operation, enabling the heat pump to operate at a higher coefficient of performance. This study contributes a practical methodology for optimizing TES charging temperatures in building heating, ventilation, and air conditioning (HVAC) systems while maintaining indoor setpoint temperatures.
A monolithic fabrication platform for intrinsically stretchable polymer transistors and complementary circuits
Soft, stretchable organic field-effect transistors (OFETs) can provide powerful on-skin signal conditioning, but current fabrication methods are often material-specific: each new polymer semiconductor (PSC) requires a tailored process. The challenge is even greater for complementary OFET circuits, where two PSCs must be patterned sequentially, which often leads to device degradation. Here, we introduce a universal, monolithic photolithography process that enables high-yield, high-resolution stretchable complementary OFETs and circuits. This approach is enabled by a process-design framework that includes (i) a direct, photopatternable, solvent-resistant, crosslinked dielectric/semiconductor interface, (ii) broadly applicable crosslinked PSC blends that preserve high mobility, and (iii) a patterning strategy that provides simultaneous etch masking and encapsulation. Using this platform, we achieve record integration density for stretchable OTFTs (55,000 cm^-2), channel lengths down to 2 um, and low-voltage operation at 5 V. We demonstrate photopatterning across multiple PSC types and realize complementary circuits, including 3 kHz stretchable ring oscillators, the first to exceed 1 kHz and representing more than a 60-fold increase in stage switching speed over the state of the art. Finally, we demonstrate the first stretchable complementary OTFT neuron circuit, where the output frequency is modulated by the input current to mimic neuronal signal processing. This scalable approach can be readily extended to diverse high-performance stretchable materials, accelerating the development and manufacturing of skin-like electronics.
comment: Submitted to Nature Electronics by December 2025
Toward Adaptive Grid Resilience: A Gradient-Free Meta-RL Framework for Critical Load Restoration
Restoring critical loads after extreme events demands adaptive control to maintain distribution-grid resilience, yet uncertainty in renewable generation, limited dispatchable resources, and nonlinear dynamics make effective restoration difficult. Reinforcement learning (RL) can optimize sequential decisions under uncertainty, but standard RL often generalizes poorly and requires extensive retraining for new outage configurations or generation patterns. We propose a meta-guided gradient-free RL (MGF-RL) framework that learns a transferable initialization from historical outage experiences and rapidly adapts to unseen scenarios with minimal task-specific tuning. MGF-RL couples first-order meta-learning with evolutionary strategies, enabling scalable policy search without gradient computation while accommodating nonlinear, constrained distribution-system dynamics. Experiments on IEEE 13-bus and IEEE 123-bus test systems show that MGF-RL outperforms standard RL, MAML-based meta-RL, and model predictive control across reliability, restoration speed, and adaptation efficiency under renewable forecast errors. MGF-RL generalizes to unseen outages and renewable patterns while requiring substantially fewer fine-tuning episodes than conventional RL. We also provide sublinear regret bounds that relate adaptation efficiency to task similarity and environmental variation, supporting the empirical gains and motivating MGF-RL for real-time load restoration in renewable-rich distribution grids.
Secure Data Bridging in Industry 4.0: An OPC UA Aggregation Approach for Including Insecure Legacy Systems
The increased connectivity of industrial networks has led to a surge in cyberattacks, emphasizing the need for cybersecurity measures tailored to the specific requirements of industrial systems. Modern Industry 4.0 technologies, such as OPC UA, offer enhanced resilience against these threats. However, widespread adoption remains limited due to long installation times, proprietary technology, restricted flexibility, and formal process requirements (e.g. safety certifications). Consequently, many systems do not yet implement these technologies, or only partially. This leads to the challenge of dealing with so-called brownfield systems, which are often placed in isolated security zones to mitigate risks. However, the need for data exchange between secure and insecure zones persists. This paper reviews existing solutions to address this challenge by analysing their approaches, advantages, and limitations. Building on these insights, we identify three key concepts, evaluate their suitability and compatibility, and ultimately introduce the SigmaServer, a novel TCP-level aggregation method. The developed proof-of-principle implementation is evaluated in an operational technology (OT) testbed, demonstrating its applicability and effectiveness in bridging secure and insecure zones.
Three Dimensional Hydrodynamic Flow-Based Collision Avoidance for UAV Formations Facing Emergent Dynamic Obstacles
This paper presents a three-dimensional, hydrodynamics-inspired collision avoidance framework for uncrewed aerial vehicle (UAV) formations operating in dynamic environments. When moving obstacles enter a UAV's sensing region, they are modeled as three dimensional doublets or ellipsoids that generate local velocity fields, guiding nearby UAVs to execute smooth, collision-free maneuvers without trajectory discontinuities or explicit trajectory replanning. This flow-based approach enables real-time operation and interpretable behavior by leveraging the nature of fluid flow around obstacles via the harmonic properties of Laplace's equation, inherently avoiding local minima common in traditional potential field methods. To establish and maintain coordination among the UAVs, a Virtual Rigid Body (VRB) formation strategy is integrated, ensuring that formation geometry and trajectory tracking are preserved. Simulation results demonstrate the feasibility and scalability of the method for both individual and multi-UAV scenarios with multiple formation geometries encountering moving obstacles. The proposed approach achieves safe, smooth, and computationally efficient avoidance maneuvers suitable for real-time and practical applications.
comment: 18 pages, 15 figures
Optimal Thruster Configuration for 6-DOF Control of a Small Satellite
With the growing deployment of small satellites (such as CubeSats, Nanosats, Picosats, and Femtosats) in Low Earth Orbit (LEO) for targeted applications like imaging, communication, data storage, and rendezvous-docking mission, there is increasing attention on orbit maintenance and attitude control. A common approach for active orbit control involves the use of multiple thrusters, which, when properly arranged, can also generate the required torque for attitude control. Starting from a 24-thruster configuration, this paper presents a set of thruster configurations (referred to as a viable configuration group) that enable full six degrees of freedom (6-DOF) control. Further, configuration group that requires minimum total thrust to achieve 6-DOF commands are found among the viable configuration group. One configuration from each of these groups is further evaluated for its attitude control performance through a representative rendezvous-docking mission, demonstrating that even with a reduced thruster count, sufficient maneuverability can be achieved.
comment: 19 pages, 9 figures
Modeling and Simulation of Virtual Rigid Body Formations and Their Applications Using Multiple Air Vehicles
This paper presents thorough mathematical modeling, control law development, and simulation of virtual structure formations which are inspired by the characteristics of rigid bodies. The stable constraint forces that establish the rigidity in the formation are synthesized by utilizing d'Alembert's principle of virtual work, constraint sensitivities (Lagrange multipliers) and constraint stabilization using Baumgarte stabilization. The governing equations of motion of a multiagent system are derived via Newton's and Euler's equations to include these constraint forces and to enable inputs regarding the formation as if it were an independent rigid body. The performance of this framework is evaluated under multiple cases including waypoint following missions, and using different number of agents.
Age-Based Scheduling for a Memory-Constrained Quantum Switch
In a time-slotted system, we study the problem of scheduling multipartite entanglement requests in a quantum switch with a finite number of quantum memory registers. Specifically, we consider probabilistic link-level entanglement (LLE) generation for each user, probabilistic entanglement swapping, and one-slot decoherence. To evaluate the performance of the proposed scheduling policies, we introduce a novel age-based metric, coined age of entanglement establishment (AoEE). We consider two families of low-complexity policies for which we obtain closed-form expressions for their corresponding AoEE performance. Optimizing over each family, we obtain two policies. Further, we propose one more low-complexity policy and provide its performance guarantee. Finally, we numerically compare the performance of the proposed policies.
From Everything-is-a-File to Files-Are-All-You-Need: How Unix Philosophy Informs the Design of Agentic AI Systems
A core abstraction in early Unix systems was the principle that 'everything is a file', enabling heterogeneous devices and kernel resources to be manipulated via uniform read/write interfaces. This paper explores how an analogous unification is emerging in contemporary agentic AI. We trace the evolution from Unix to DevOps, Infrastructure-as-Code, and finally autonomous software agents, highlighting how file-like abstractions and code-based specifications collapse diverse resources into consistent, composable interfaces. The resulting perspective suggests that adopting file- and code-centric interaction models may enable agentic systems that are more maintainable, auditable, and operationally robust.
Predictive Modeling of Power Outages during Extreme Events: Integrating Weather and Socio-Economic Factors
This paper presents a novel learning based framework for predicting power outages caused by extreme events. The proposed approach targets low probability high consequence outage scenarios and leverages a comprehensive set of features derived from publicly available data sources. We integrate EAGLE-I outage records from 2014 to 2024 with weather, socioeconomic, infrastructure, and seasonal event data. Incorporating social and demographic indicators reveals patterns of community vulnerability and improves understanding of outage risk during extreme conditions. Four machine learning models are evaluated including Random Forest (RF), Graph Neural Network (GNN), Adaptive Boosting (AdaBoost), and Long Short Term Memory (LSTM). Experimental validation is performed on a large scale dataset covering counties in the lower peninsula of Michigan. Among all models tested, the LSTM network achieves higher accuracy.
comment: This is a preprint of a manuscript currently under review at Electric Power Systems Research. The content may be subject to change following peer review
Hybrid dynamical systems modeling of power systems
The increasing integration of renewable energy sources has introduced complex dynamic behavior in power systems that challenge the adequacy of traditional continuous-time modeling approaches. These developments call for modeling frameworks that can capture the intricate interplay between continuous dynamics and discrete events characterizing modern grid operations. Hybrid dynamical systems offer a rigorous foundation for representing such mixed dynamics and have emerged as a valuable tool in power system analysis. Despite their potential, existing studies remain focused on isolated applications or case-specific implementations, offering limited generalizability and guidance for model selection. This paper addresses that gap by providing a comprehensive overview of hybrid modeling approaches relevant to power systems. It critically examines key formalisms, including hybrid automata, switched systems, and piecewise affine models, evaluating their respective strengths, limitations, and suitability across control, stability, and system design tasks. In doing so, the paper identifies open challenges and outlines future research directions to support the systematic application of hybrid methods in renewable-rich, converter-dominated power systems
Fine-Tuning of Neural Network Approximate MPC without Retraining via Bayesian Optimization
Approximate model-predictive control (AMPC) aims to imitate an MPC's behavior with a neural network, removing the need to solve an expensive optimization problem at runtime. However, during deployment, the parameters of the underlying MPC must usually be fine-tuned. This often renders AMPC impractical as it requires repeatedly generating a new dataset and retraining the neural network. Recent work addresses this problem by adapting AMPC without retraining using approximated sensitivities of the MPC's optimization problem. Currently, this adaption must be done by hand, which is labor-intensive and can be unintuitive for high-dimensional systems. To solve this issue, we propose using Bayesian optimization to tune the parameters of AMPC policies based on experimental data. By combining model-based control with direct and local learning, our approach achieves superior performance to nominal AMPC on hardware, with minimal experimentation. This allows automatic and data-efficient adaptation of AMPC to new system instances and fine-tuning to cost functions that are difficult to directly implement in MPC. We demonstrate the proposed method in hardware experiments for the swing-up maneuver on an inverted cartpole and yaw control of an under-actuated balancing unicycle robot, a challenging control problem.
comment: Presented at the 13th International Conference on Robot Intelligence Technology and Applications
Identification of a Kalman filter: consistency of local solutions
Prediction error and maximum likelihood methods are powerful tools for identifying linear dynamical systems and, in particular, enable the joint estimation of model parameters and the Kalman filter used for state estimation. A key limitation, however, is that these methods require solving a generally non-convex optimization problem to global optimality. This paper analyzes the statistical behavior of local minimizers in the special case where only the Kalman gain is estimated. We prove that these local solutions are statistically consistent estimates of the true Kalman gain. This follows from asymptotic unimodality: as the dataset grows, the objective function converges to a limit with a unique local (and therefore global) minimizer. We further provide guidelines for designing the optimization problem for Kalman filter tuning and discuss extensions to the joint estimation of additional linear parameters and noise covariances. Finally, the theoretical results are illustrated using three examples of increasing complexity. The main practical takeaway of this paper is that difficulties caused by local minimizers in system identification are, at least, not attributable to the tuning of the Kalman gain.
comment: Submitted for review to the proceedings of the IFAC World Congress 2026
Scalable and Approximation-free Symbolic Control for Unknown Euler-Lagrange Systems
We propose a novel symbolic control framework for enforcing temporal logic specifications in Euler-Lagrange systems that addresses the key limitations of traditional abstraction-based approaches. Unlike existing methods that require exact system models and provide guarantees only at discrete sampling instants, our approach relies only on bounds on system parameters and input constraints, and ensures correctness for the full continuous-time trajectory. The framework combines scalable abstraction of a simplified virtual system with a closed-form, model-free controller that guarantees trajectories satisfy the original specification while respecting input bounds and remaining robust to unknown but bounded disturbances. We provide feasibility conditions for the construction of confinement regions and analyze the trade-off between efficiency and conservatism. Case studies on pendulum dynamics, a two-link manipulator, and multi-agent systems, including hardware experiments, demonstrate that the proposed approach ensures both correctness and safety while significantly reducing computational time and memory requirements. These results highlight its scalability and practicality for real-world robotic systems where precise models are unavailable and continuous-time guarantees are essential.
Stabilization of Linear Switched Systems with Long Constant Input Delay via Average or Averaging Predictor Feedbacks
We develop delay-compensating feedback laws for linear switched systems with time-dependent switching. Because the future values of the switching signal, which are needed for constructing an exact predictor-feedback law, may be unavailable at current time, the key design challenge is how to construct a proper predictor state. We resolve this challenge constructing two alternative, average predictor-based feedback laws. The first is viewed as a predictor-feedback law for a particular average system, properly modified to provide exact state predictions over a horizon that depends on a minimum dwell time of the switching signal (when it is available). The second is, essentially, a modification of an average of predictor feedbacks, each one corresponding to the fixed-mode predictor-feedback law. We establish that under the control laws introduced, the closed-loop systems are (uniformly) exponentially stable, provided that the differences among system's matrices and among (nominal stabilizing) controller's gains are sufficiently small, with a size that is inversely proportional to the delay length. Since no restriction is imposed on the delay, such a limitation is inherent to the problem considered (in which the future switching signal values are unavailable), and thus, it cannot be removed. The stability proof relies on multiple Lyapunov functionals constructed via backstepping and derivation of solutions' estimates for quantifying the difference between average and exact predictor states. We present consistent numerical simulation results, which illustrate the necessity of employing the average predictor-based laws and demonstrate the performance improvement when the knowledge of a minimum dwell time is properly utilized for improving state prediction accuracy.
comment: 17 pages, 10 figures, preprint submitted to Systems & Control Letters
Quantum feedback control of a two-atom network closed by a semi-infinite waveguide
The purpose of this paper is to study the delay-dependent coherent feedback dynamics by focusing on one typical realization, i.e., a two-atom quantum network whose feedback loop is closed by a semi-infinite waveguide. In this set-up, an initially excited two-level atom can emit a photon into the waveguide, where the propagating photon can be reflected by the terminal mirror of the waveguide or absorbed by the other atom, thus constructing various coherent feedback loops. We show that there can be two-photon, one-photon or zero-photon states in the waveguide, which can be controlled by the feedback loop length and the coupling strengths between the atoms and waveguide. The photonic states in the waveguide are analyzed in both the frequency domain and the spatial domain, and the transient process of photon emissions is better understood based on a comprehensive analysis using both domains. Interestingly, we clarify that this quantum coherent feedback network can be mathematically modeled as a linear control system with multiple delays, which are determined by the distances between atoms and the terminal mirror of the semi-infinite waveguide. Therefore, based on time-delayed linear control system theory, the influence of delays on the stability of the quantum state evolution and the steady-state atomic and photonic states is investigated, for both small and large delays.
LeLaR: The First In-Orbit Demonstration of an AI-Based Satellite Attitude Controller
Attitude control is essential for many satellite missions. Classical controllers, however, are time-consuming to design and sensitive to model uncertainties and variations in operational boundary conditions. Deep Reinforcement Learning (DRL) offers a promising alternative by learning adaptive control strategies through autonomous interaction with a simulation environment. Overcoming the Sim2Real gap, which involves deploying an agent trained in simulation onto the real physical satellite, remains a significant challenge. In this work, we present the first successful in-orbit demonstration of an AI-based attitude controller for inertial pointing maneuvers. The controller was trained entirely in simulation and deployed to the InnoCube 3U nanosatellite, which was developed by the Julius-Maximilians-Universität Würzburg in cooperation with the Technische Universität Berlin, and launched in January 2025. We present the AI agent design, the methodology of the training procedure, the discrepancies between the simulation and the observed behavior of the real satellite, and a comparison of the AI-based attitude controller with the classical PD controller of InnoCube. Steady-state metrics confirm the robust performance of the AI-based controller during repeated in-orbit maneuvers.
comment: This work has been submitted to the IEEE for possible publication. 55 pages, 27 figures, 29 tables. The maneuver telemetry datasets generated and analyzed during this work are available in the GitHub repository under https://github.com/kdjebko/lelar-in-orbit-data
Scalable Sensor Placement for Cyclic Networks with Observability Guarantees: Application to Water Distribution Networks
Optimal sensor placement is essential for state estimation and effective network monitoring. As known in the literature, this problem becomes particularly challenging in large-scale undirected or bidirected cyclic networks with parametric uncertainties, such as water distribution networks (WDNs), where pipe resistance and demand patterns are often unknown. Motivated by the challenges of cycles, parametric uncertainties, and scalability, this paper proposes a sensor placement algorithm that guarantees structural observability for cyclic and acyclic networks with parametric uncertainties. By leveraging a graph-based strategy, the proposed method efficiently addresses the computational complexities of large-scale networks. To demonstrate the algorithm's effectiveness, we apply it to several EPANET benchmark WDNs. Most notably, the developed algorithm solves the sensor placement problem with guaranteed structured observability for the L-town WDN with 1694 nodes and 124 cycles in under 0.1 seconds.
comment: Extended version of the paper accepted for IEEE-CDC 2025
A Fully Analog Implementation of Model Predictive Control with Application to Buck Converters
This paper proposes a novel approach to design analog electronic circuits that implement Model Predictive Control (MPC) policies for dynamical systems described by affine models. Effective approaches to define a reduced-complexity Explicit MPC form are combined and applied to realize an analog circuit comprising a limited set of low-latency, commercially available components. The practical feasibility and effectiveness of the proposed approach are demonstrated through its application in the design of a novel MPC-based controller for DC-DC Buck converters. We formally analyze the stability of the resulting system and conduct extensive numerical simulations to demonstrate the control system's performance in rejecting line and load disturbances.
Incremental Collision Laws Based on the Bouc-Wen Model: Improved Collision Models and Further Results
In the article titled "The Bouc-Wen Model for Binary Direct Collinear Collisions of Convex Viscoplastic Bodies" and published in the Journal of Computational and Nonlinear Dynamics (Volume 20, Issue 6, June 2025), the authors studied mathematical models of binary direct collinear collisions of convex viscoplastic bodies that employed two incremental collision laws based on the Bouc-Wen differential model of hysteresis. It was shown that the models possess favorable analytical properties, and several model parameter identification studies were conducted, demonstrating that the models can accurately capture the nature of a variety of collision phenomena. In this article, the aforementioned models are augmented by modeling the effects of external forces as time-dependent inputs. Furthermore, the range of the parameters under which the models possess favorable analytical properties is extended to several corner cases that were not considered in the prior publication. Finally, the previously conducted model parameter identification studies are extended, and an additional model parameter identification study is provided in an attempt to validate the ability of the augmented models to represent the effects of external forces.
comment: 15 pages, 9 figures, see https://gitlab.com/user9716869/EBWCM ; (v2-v8) various minor amendments; (v5) replaced the parameter identification study of Quinn (2004) with Villegas et al (2021) due to incompatibility of the proposed collision models with the experimental setup in Quinn (2004); (v7) added ODEs for COM and an application example; arXiv admin note: text overlap with arXiv:2410.08147
Predictability Enables Parallelization of Nonlinear State Space Models NeurIPS '25
The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) and DeepPCR (arXiv:2309.16318) recast sequential evaluation as a parallelizable optimization problem, sometimes yielding dramatic speedups. However, the factors governing the difficulty of these optimization problems remained unclear, limiting broader adoption. In this work, we establish a precise relationship between a system's dynamics and the conditioning of its corresponding optimization problem, as measured by its Polyak-Lojasiewicz (PL) constant. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior and quantified by the largest Lyapunov exponent (LLE), impacts the number of optimization steps required for evaluation. For predictable systems, the state trajectory can be computed in at worst $O((\log T)^2)$ time, where $T$ is the sequence length: a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis shows that predictable systems always yield well-conditioned optimization problems, whereas unpredictable systems lead to severe conditioning degradation. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized. We highlight predictability as a key design principle for parallelizable models.
comment: NeurIPS '25. XG and LK dual lead authors. Code: https://github.com/lindermanlab/predictability_enables_parallelization
Constructive Observer Design for Visual Simultaneous Localisation and Mapping
Visual Simultaneous Localisation and Mapping (VSLAM) is a well-known problem in robotics with a large range of applications. This paper presents a novel approach to VSLAM by lifting the observer design to a novel Lie group on which the system output is equivariant. The perspective gained from this analysis facilitates the design of a non-linear observer with almost semi-globally asymptotically stable error dynamics. Simulations are provided to illustrate the behaviour of the proposed observer and experiments on data gathered using a fixed-wing UAV flying outdoors demonstrate its performance.
comment: 17 pages, 8 figures. Published in IFAC Automatica
Distributionally Robust Kalman Filter
We study state estimation for discrete-time linear stochastic systems under distributional ambiguity in the initial state, process noise, and measurement noise. We propose a noise-centric distributionally robust Kalman filter (DRKF) based on Wasserstein ambiguity sets imposed directly on these distributions. This formulation excludes dynamically unreachable priors and yields a Kalman-type recursion driven by least-favorable covariances computed via semidefinite programs (SDP). In the time-invariant case, the steady-state DRKF is obtained from a single stationary SDP, producing a constant gain with Kalman-level online complexity. We establish the convergence of the DR Riccati covariance iteration to the stationary SDP solution, together with an explicit sufficient condition for a prescribed convergence rate. We further show that the proposed noise-centric model induces a priori spectral bounds on all feasible covariances and a Kalman filter sandwiching property for the DRKF covariances. Finally, we prove that the steady-state error dynamics are Schur stable, and the steady-state DRKF is asymptotically minimax optimal with respect to worst-case mean-square error.
Equivariant Filter (EqF)
The kinematics of many systems encountered in robotics, mechatronics, and avionics are naturally posed on homogeneous spaces; that is, their state lies in a smooth manifold equipped with a transitive Lie group symmetry. This paper proposes a novel filter, the Equivariant Filter (EqF), by posing the observer state on the symmetry group, linearising global error dynamics derived from the equivariance of the system, and applying extended Kalman filter design principles. We show that equivariance of the system output can be exploited to reduce linearisation error and improve filter performance. Simulation experiments of an example application show that the EqF significantly outperforms the extended Kalman filter and that the reduced linearisation error leads to a clear improvement in performance.
comment: 20 pages, 3 figures, published in IEEE TAC
Off Policy Lyapunov Stability in Reinforcement Learning
Traditional reinforcement learning lacks the ability to provide stability guarantees. More recent algorithms learn Lyapunov functions alongside the control policies to ensure stable learning. However, the current self-learned Lyapunov functions are sample inefficient due to their on-policy nature. This paper introduces a method for learning Lyapunov functions off-policy and incorporates the proposed off-policy Lyapunov function into the Soft Actor Critic and Proximal Policy Optimization algorithms to provide them with a data efficient stability certificate. Simulations of an inverted pendulum and a quadrotor illustrate the improved performance of the two algorithms when endowed with the proposed off-policy Lyapunov function.
comment: Conference on Robot Learning (CORL) 2025
Constructive Equivariant Observer Design for Inertial Velocity-Aided Attitude
Inertial Velocity-Aided Attitude (VAA), the estimation of the velocity and attitude of a vehicle using gyroscope, accelerometer, and inertial-frame velocity (e.g. GPS velocity) measurements, is an important problem in the control of Remotely Piloted Aerial Systems (RPAS). Existing solutions provide limited stability guarantees, relying on local linearisation, high gain design, or assuming specific trajectories such as constant acceleration of the vehicle. This paper proposes a novel non-linear observer for inertial VAA with almost globally asymptotically and locally exponentially stable error dynamics. The approach exploits Lie group symmetries of the system dynamics to construct a globally valid correction term. To the authors' knowledge, this construction is the first observer to provide almost global convergence for the inertial VAA problem. The observer performance is verified in simulation, where it is shown that the estimation error converges to zero even with an extremely poor initial condition.
comment: 11 pages, 2 figures, presented at IFAC NOLCOS 2022
EqVIO: An Equivariant Filter for Visual Inertial Odometry
Visual-Inertial Odometry (VIO) is the problem of estimating a robot's trajectory by combining information from an inertial measurement unit (IMU) and a camera, and is of great interest to the robotics community. This paper develops a novel Lie group symmetry for the VIO problem and applies the recently proposed equivariant filter. The proposed symmetry is compatible with the invariance of the VIO reference frame, leading to improved filter consistency. The bias-free IMU dynamics are group-affine, ensuring that filter linearisation errors depend only on the bias estimation error and measurement noise. Furthermore, visual measurements are equivariant with respect to the symmetry, enabling the application of the higher-order equivariant output approximation to reduce approximation error in the filter update equation. As a result, the equivariant filter (EqF) based on this Lie group is a consistent estimator for VIO with lower linearisation error in the propagation of state dynamics and a higher order equivariant output approximation than standard formulations. Experimental results on the popular EuRoC and UZH FPV datasets demonstrate that the proposed system outperforms other state-of-the-art VIO algorithms in terms of both speed and accuracy.
comment: 28 pages, 17 figures, published in IEEE TRO
Formation and Investigation of Cooperative Platooning at the Early Stage of Connected and Automated Vehicles Deployment
Cooperative platooning, enabled by cooperative adaptive cruise control (CACC), is a cornerstone technology for connected automated vehicles (CAVs), offering significant improvements in safety, comfort, and traffic efficiency over traditional adaptive cruise control (ACC). This paper addresses a key challenge in the initial deployment phase of CAVs: the limited benefits of cooperative platooning due to the sparse distribution of CAVs on the road. To overcome this limitation, we propose an innovative control framework that enhances cooperative platooning in mixed traffic environments. Two techniques are utilized: (1) a mixed cooperative platooning strategy that integrates CACC with unconnected vehicles (CACCu), and (2) a strategic lane-change decision model designed to facilitate safe and efficient lane changes for platoon formation. Additionally, a surrounding vehicle identification system is embedded in the framework to enable CAVs to effectively identify and select potential platooning leaders. Simulation studies across various CV market penetration rates (MPRs) show that incorporating CACCu systems significantly improves safety, comfort, and traffic efficiency compared to existing systems with only CACC and ACC systems, even at CV penetration as low as 10%. The maximized platoon formation increases by up to 24%, accompanied by an 11% reduction in acceleration and a 7% decrease in fuel consumption. Furthermore, the strategic lane-change model enhances CAV performance, achieving notable improvements between 6% and 60% CV penetration, without adversely affecting overall traffic flow.
comment: Accepted by IEEE Transactions on Intelligent Transportation Systems (T-ITS)
3D UAV Trajectory Design for Fair and Energy-Efficient Communication: A Deep Reinforcement Learning Technique
In different situations, like disaster communication and network connectivity for rural locations, unmanned aerial vehicles (UAVs) could indeed be utilized as airborne base stations to improve both the functionality and coverage of communication networks. Ground users can employ mobile UAVs to establish communication channels and deliver packages. UAVs, on the other hand, have restricted transmission capabilities and fuel supplies. They can't always cover the full region or continue to fly for a long time, especially in a huge territory. Controlling a swarm of UAVs to yield a relatively long communication coverage while maintaining connectivity and limiting energy usage is so difficult. We use modern deep reinforcement learning (DRL) for UAV connectivity to provide an innovative and extremely energy-efficient DRL-based algorithm. The proposed method: 1) enhances novel energy efficiency while taking into account communications throughput, energy consumption, fairness, and connectivity; 2) evaluates the environment and its dynamics; and 3) makes judgments using strong deep neural networks. For performance evaluation, we have performed comprehensive simulations. In terms of energy consumption and fairness, simulation results show that the DRL-based algorithm consistently outperforms two commonly used baseline techniques.
comment: An early draft containing incomplete and outdated information
System Availability Optimization: Integrating Quantity Discounts and Delivery Lead Time Considerations
Purpose: The model allocates the system components orders to the suppliers to minimize the parts price and the system construction delay penalties and maximize the system availability during its use. It considers the quantity-based discount and variation of delivery lead time by ordering similar components. The model also reflects the prerequisite relationships between construction activities and calculates the delay penalty resulting from parts delivery lead time. Design/methodology/approach: This research presents a model for selecting suppliers of components of an industrial series-parallel multi-state system. A nonlinear binary mathematical program uses the Markov process results to select system components. It minimizes the total system construction phase costs, including the components' price, and the system construction delay penalty, and the system exploitation phase costs, including the system shutdown and working at half capacity. Findings: The model allocates the optimal orders for a typical industrial system's components, composing four elements. The proposed approach combines the nonlinear binary program and the Markov process results to optimize the system life cycle parameters, including the system construction cost and operational availability. Originality/value: Using the Markov chain results in binary nonlinear mathematical programming, this study attempts to strike the right balance between the construction phase's objectives and an industrial unit's operation phase.
comment: Revised version (v2): expanded literature review, corrected typos, and added numerical results
Robotics
See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection
Recent advances in end-to-end autonomous driving show that policies trained on patch-aligned features extracted from foundation models generalize better to Out-of-Distribution (OOD). We hypothesize that due to the self-attention mechanism, each patch feature implicitly embeds/contains information from all other patches, represented in a different way and intensity, making these descriptors highly redundant. We quantify redundancy in such (BLIP2) features via PCA and cross-patch similarity: $90$% of variance is captured by $17/64$ principal components, and strong inter-token correlations are pervasive. Training on such overlapping information leads the policy to overfit spurious correlations, hurting OOD robustness. We present Stochastic-Patch-Selection (SPS), a simple yet effective approach for learning policies that are more robust, generalizable, and efficient. For every frame, SPS randomly masks a fraction of patch descriptors, not feeding them to the policy model, while preserving the spatial layout of the remaining patches. Thus, the policy is provided with different stochastic but complete views of the (same) scene: every random subset of patches acts like a different, yet still sensible, coherent projection of the world. The policy thus bases its decisions on features that are invariant to which specific tokens survive. Extensive experiments confirm that across all OOD scenarios, our method outperforms the state of the art (SOTA), achieving a $6.2$% average improvement and up to $20.4$% in closed-loop simulations, while being $2.4\times$ faster. We conduct ablations over masking rates and patch-feature reorganization, training and evaluating 9 systems, with 8 of them surpassing prior SOTA. Finally, we show that the same learned policy transfers to a physical, real-world car without any tuning.
SurgGoal: Rethinking Surgical Planning Evaluation via Goal-Satisfiability
Surgical planning integrates visual perception, long-horizon reasoning, and procedural knowledge, yet it remains unclear whether current evaluation protocols reliably assess vision-language models (VLMs) in safety-critical settings. Motivated by a goal-oriented view of surgical planning, we define planning correctness via phase-goal satisfiability, where plan validity is determined by expert-defined surgical rules. Based on this definition, we introduce a multicentric meta-evaluation benchmark with valid procedural variations and invalid plans containing order and content errors. Using this benchmark, we show that sequence similarity metrics systematically misjudge planning quality, penalizing valid plans while failing to identify invalid ones. We therefore adopt a rule-based goal-satisfiability metric as a high-precision meta-evaluation reference to assess Video-LLMs under progressively constrained settings, revealing failures due to perception errors and under-constrained reasoning. Structural knowledge consistently improves performance, whereas semantic guidance alone is unreliable and benefits larger models only when combined with structural constraints.
Online identification of nonlinear time-varying systems with uncertain information
Digital twins (DTs), serving as the core enablers for real-time monitoring and predictive maintenance of complex cyber-physical systems, impose critical requirements on their virtual models: high predictive accuracy, strong interpretability, and online adaptive capability. However, existing techniques struggle to meet these demands simultaneously: Bayesian methods excel in uncertainty quantification but lack model interpretability, while interpretable symbolic identification methods (e.g., SINDy) are constrained by their offline, batch-processing nature, which make real-time updates challenging. To bridge this semantic and computational gap, this paper proposes a novel Bayesian Regression-based Symbolic Learning (BRSL) framework. The framework formulates online symbolic discovery as a unified probabilistic state-space model. By incorporating sparse horseshoe priors, model selection is transformed into a Bayesian inference task, enabling simultaneous system identification and uncertainty quantification. Furthermore, we derive an online recursive algorithm with a forgetting factor and establish precise recursive conditions that guarantee the well-posedness of the posterior distribution. These conditions also function as real-time monitors for data utility, enhancing algorithmic robustness. Additionally, a rigorous convergence analysis is provided, demonstrating the convergence of parameter estimates under persistent excitation conditions. Case studies validate the effectiveness of the proposed framework in achieving interpretable, probabilistic prediction and online learning.
FastStair: Learning to Run Up Stairs with Humanoid Robots
Running up stairs is effortless for humans but remains extremely challenging for humanoid robots due to the simultaneous requirements of high agility and strict stability. Model-free reinforcement learning (RL) can generate dynamic locomotion, yet implicit stability rewards and heavy reliance on task-specific reward shaping tend to result in unsafe behaviors, especially on stairs; conversely, model-based foothold planners encode contact feasibility and stability structure, but enforcing their hard constraints often induces conservative motion that limits speed. We present FastStair, a planner-guided, multi-stage learning framework that reconciles these complementary strengths to achieve fast and stable stair ascent. FastStair integrates a parallel model-based foothold planner into the RL training loop to bias exploration toward dynamically feasible contacts and to pretrain a safety-focused base policy. To mitigate planner-induced conservatism and the discrepancy between low- and high-speed action distributions, the base policy was fine-tuned into speed-specialized experts and then integrated via Low-Rank Adaptation (LoRA) to enable smooth operation across the full commanded-speed range. We deploy the resulting controller on the Oli humanoid robot, achieving stable stair ascent at commanded speeds up to 1.65 m/s and traversing a 33-step spiral staircase (17 cm rise per step) in 12 s, demonstrating robust high-speed performance on long staircases. Notably, the proposed approach served as the champion solution in the Canton Tower Robot Run Up Competition.
CHORAL: Traversal-Aware Planning for Safe and Efficient Heterogeneous Multi-Robot Routing
Monitoring large, unknown, and complex environments with autonomous robots poses significant navigation challenges, where deploying teams of heterogeneous robots with complementary capabilities can substantially improve both mission performance and feasibility. However, effectively modeling how different robotic platforms interact with the environment requires rich, semantic scene understanding. Despite this, existing approaches often assume homogeneous robot teams or focus on discrete task compatibility rather than continuous routing. Consequently, scene understanding is not fully integrated into routing decisions, limiting their ability to adapt to the environment and to leverage each robot's strengths. In this paper, we propose an integrated semantic-aware framework for coordinating heterogeneous robots. Starting from a reconnaissance flight, we build a metric-semantic map using open-vocabulary vision models and use it to identify regions requiring closer inspection and capability-aware paths for each platform to reach them. These are then incorporated into a heterogeneous vehicle routing formulation that jointly assigns inspection tasks and computes robot trajectories. Experiments in simulation and in a real inspection mission with three robotic platforms demonstrate the effectiveness of our approach in planning safer and more efficient routes by explicitly accounting for each platform's navigation capabilities. We release our framework, CHORAL, as open source to support reproducibility and deployment of diverse robot teams.
The impact of tactile sensor configurations on grasp learning efficiency -- a comparative evaluation in simulation
Tactile sensors are breaking into the field of robotics to provide direct information related to contact surfaces, including contact events, slip events and even texture identification. These events are especially important for robotic hand designs, including prosthetics, as they can greatly improve grasp stability. Most presently published robotic hand designs, however, implement them in vastly different densities and layouts on the hand surface, often reserving the majority of the available space. We used simulations to evaluate 6 different tactile sensor configurations with different densities and layouts, based on their impact on reinforcement learning. Our two-setup system allows for robust results that are not dependent on the use of a given physics simulator, robotic hand model or machine learning algorithm. Our results show setup-specific, as well as generalized effects across the 6 sensorized simulations, and we identify one configuration as consistently yielding the best performance across both setups. These results could help future research aimed at robotic hand designs, including prostheses.
comment: 13 pages, 6 figures, 2 tables
Proactive Local-Minima-Free Robot Navigation: Blending Motion Prediction with Safe Control
This work addresses the challenge of safe and efficient mobile robot navigation in complex dynamic environments with concave moving obstacles. Reactive safe controllers like Control Barrier Functions (CBFs) design obstacle avoidance strategies based only on the current states of the obstacles, risking future collisions. To alleviate this problem, we use Gaussian processes to learn barrier functions online from multimodal motion predictions of obstacles generated by neural networks trained with energy-based learning. The learned barrier functions are then fed into quadratic programs using modulated CBFs (MCBFs), a local-minimum-free version of CBFs, to achieve safe and efficient navigation. The proposed framework makes two key contributions. First, it develops a prediction-to-barrier function online learning pipeline. Second, it introduces an autonomous parameter tuning algorithm that adapts MCBFs to deforming, prediction-based barrier functions. The framework is evaluated in both simulations and real-world experiments, consistently outperforming baselines and demonstrating superior safety and efficiency in crowded dynamic environments.
comment: Co-first authors: Yifan Xue and Ze Zhang
A Unified Framework for Kinematic Simulation of Rigid Foldable Structures
Origami-inspired structures with rigid panels now span thick, kirigami, and multi-sheet realizations, making unified kinematic analysis essential. Yet a general method that consolidates their loop constraints has been lacking. We present an automated approach that generates the Pfaffian constraint matrix for arbitrary rigid foldable structures (RFS). From a minimally extended data schema, the tool constructs the facet-hinge graph, extracts a minimum cycle basis that captures all constraints, and assembles a velocity-level constraint matrix via screw theory that encodes coupled rotation and translation loop closure. The framework computes and visualizes deploy and fold motions across diverse RFS while eliminating tedious and error-prone constraint calculations.
comment: 34 pages (20 pages main text), 11 figures (7 in main text, 4 in appendix)
Terrain-Adaptive Mobile 3D Printing with Hierarchical Control
Mobile 3D printing on unstructured terrain remains challenging due to the conflict between platform mobility and deposition precision. Existing gantry-based systems achieve high accuracy but lack mobility, while mobile platforms struggle to maintain print quality on uneven ground. We present a framework that tightly integrates AI-driven disturbance prediction with multi-modal sensor fusion and hierarchical hardware control, forming a closed-loop perception-learning-actuation system. The AI module learns terrain-to-perturbation mappings from IMU, vision, and depth sensors, enabling proactive compensation rather than reactive correction. This intelligence is embedded into a three-layer control architecture: path planning, predictive chassis-manipulator coordination, and precision hardware execution. Through outdoor experiments on terrain with slopes and surface irregularities, we demonstrate sub-centimeter printing accuracy while maintaining full platform mobility. This AI-hardware integration establishes a practical foundation for autonomous construction in unstructured environments.
comment: Submitted to the 43rd International Symposium on Automation and Robotics in Construction (ISARC 2026)
RAG-3DSG: Enhancing 3D Scene Graphs with Re-Shot Guided Retrieval-Augmented Generation
Open-vocabulary 3D Scene Graph (3DSG) generation can enhance various downstream tasks in robotics, such as manipulation and navigation, by leveraging structured semantic representations. A 3DSG is constructed from multiple images of a scene, where objects are represented as nodes and relationships as edges. However, existing works for open-vocabulary 3DSG generation suffer from both low object-level recognition accuracy and speed, mainly due to constrained viewpoints, occlusions, and redundant surface density. To address these challenges, we propose RAG-3DSG to mitigate aggregation noise through re-shot guided uncertainty estimation and support object-level Retrieval-Augmented Generation (RAG) via reliable low-uncertainty objects. Furthermore, we propose a dynamic downsample-mapping strategy to accelerate cross-image object aggregation with adaptive granularity. Experiments on Replica dataset demonstrate that RAG-3DSG significantly improves node captioning accuracy in 3DSG generation while reducing the mapping time by two-thirds compared to the vanilla version.
comment: 9 pages, 6 figures
CoCoPlan: Adaptive Coordination and Communication for Multi-robot Systems in Dynamic and Unknown Environments
Multi-robot systems can greatly enhance efficiency through coordination and collaboration, yet in practice, full-time communication is rarely available and interactions are constrained to close-range exchanges. Existing methods either maintain all-time connectivity, rely on fixed schedules, or adopt pairwise protocols, but none adapt effectively to dynamic spatio-temporal task distributions under limited communication, resulting in suboptimal coordination. To address this gap, we propose CoCoPlan, a unified framework that co-optimizes collaborative task planning and team-wise intermittent communication. Our approach integrates a branch-and-bound architecture that jointly encodes task assignments and communication events, an adaptive objective function that balances task efficiency against communication latency, and a communication event optimization module that strategically determines when, where and how the global connectivity should be re-established. Extensive experiments demonstrate that it outperforms state-of-the-art methods by achieving a 22.4% higher task completion rate, reducing communication overhead by 58.6%, and improving the scalability by supporting up to 100 robots in dynamic environments. Hardware experiments include the complex 2D office environment and large-scale 3D disaster-response scenario.
comment: 8 pages, 8 figures, published to RA-L
UEOF: A Benchmark Dataset for Underwater Event-Based Optical Flow WACV
Underwater imaging is fundamentally challenging due to wavelength-dependent light attenuation, strong scattering from suspended particles, turbidity-induced blur, and non-uniform illumination. These effects impair standard cameras and make ground-truth motion nearly impossible to obtain. On the other hand, event cameras offer microsecond resolution and high dynamic range. Nonetheless, progress on investigating event cameras for underwater environments has been limited due to the lack of datasets that pair realistic underwater optics with accurate optical flow. To address this problem, we introduce the first synthetic underwater benchmark dataset for event-based optical flow derived from physically-based ray-traced RGBD sequences. Using a modern video-to-event pipeline applied to rendered underwater videos, we produce realistic event data streams with dense ground-truth flow, depth, and camera motion. Moreover, we benchmark state-of-the-art learning-based and model-based optical flow prediction methods to understand how underwater light transport affects event formation and motion estimation accuracy. Our dataset establishes a new baseline for future development and evaluation of underwater event-based perception algorithms. The source code and dataset for this project are publicly available at https://robotic-vision-lab.github.io/ueof.
comment: To be presented at the 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshop on Event-Based Vision in the Era of Generative AI
In-the-Wild Compliant Manipulation with UMI-FT ICRA 2026
Many manipulation tasks require careful force modulation. With insufficient force the task may fail, while excessive force could cause damage. The high cost, bulky size and fragility of commercial force/torque (F/T) sensors have limited large-scale, force-aware policy learning. We introduce UMI-FT, a handheld data-collection platform that mounts compact, six-axis force/torque sensors on each finger, enabling finger-level wrench measurements alongside RGB, depth, and pose. Using the multimodal data collected from this device, we train an adaptive compliance policy that predicts position targets, grasp force, and stiffness for execution on standard compliance controllers. In evaluations on three contact-rich, force-sensitive tasks (whiteboard wiping, skewering zucchini, and lightbulb insertion), UMI-FT enables policies that reliably regulate external contact forces and internal grasp forces, outperforming baselines that lack compliance or force sensing. UMI-FT offers a scalable path to learning compliant manipulation from in-the-wild demonstrations. We open-source the hardware and software to facilitate broader adoption at:https://umi-ft.github.io/.
comment: submitted to ICRA 2026
OT-Drive: Out-of-Distribution Off-Road Traversable Area Segmentation via Optimal Transport
Reliable traversable area segmentation in unstructured environments is critical for planning and decision-making in autonomous driving. However, existing data-driven approaches often suffer from degraded segmentation performance in out-of-distribution (OOD) scenarios, consequently impairing downstream driving tasks. To address this issue, we propose OT-Drive, an Optimal Transport--driven multi-modal fusion framework. The proposed method formulates RGB and surface normal fusion as a distribution transport problem. Specifically, we design a novel Scene Anchor Generator (SAG) to decompose scene information into the joint distribution of weather, time-of-day, and road type, thereby constructing semantic anchors that can generalize to unseen scenarios. Subsequently, we design an innovative Optimal Transport-based multi-modal fusion module (OT Fusion) to transport RGB and surface normal features onto the manifold defined by the semantic anchors, enabling robust traversable area segmentation under OOD scenarios. Experimental results demonstrate that our method achieves 95.16% mIoU on ORFD OOD scenarios, outperforming prior methods by 6.35%, and 89.79% mIoU on cross-dataset transfer tasks, surpassing baselines by 13.99%.These results indicate that the proposed model can attain strong OOD generalization with only limited training data, substantially enhancing its practicality and efficiency for real-world deployment.
comment: 9 pages, 8 figures, 6 tables. This work has been submitted to the IEEE for possible publication. Code will be released upon acceptance
Is open robotics innovation a threat to international peace and security?
Open access to publication, software and hardware is central to robotics: it lowers barriers to entry, supports reproducible science and accelerates reliable system development. However, openness also exacerbates the inherent dual-use risks associated with research and innovation in robotics. It lowers barriers for states and non-state actors to develop and deploy robotics systems for military use and harmful purposes. Compared to other fields of engineering where dual-use risks are present - e.g., those that underlie the development of weapons of mass destruction (chemical, biological, radiological, and nuclear weapons) and even the field of AI, robotics offers no specific regulation and little guidance as to how research and innovation may be conducted and disseminated responsibly. While other fields can be used for guidance, robotics has its own needs and specificities which have to be taken into account. The robotics community should therefore work toward its own set of sector-specific guidance and possibly regulation. To that end, we propose a roadmap focusing on four practices: a) education in responsible robotics; b) incentivizing risk assessment; c) moderating the diffusion of high-risk material; and d) developing red lines.
IMU-based Real-Time Crutch Gait Phase and Step Detections in Lower-Limb Exoskeletons
Lower limb exoskeletons and prostheses require precise, real time gait phase and step detections to ensure synchronized motion and user safety. Conventional methods often rely on complex force sensing hardware that introduces control latency. This paper presents a minimalist framework utilizing a single, low cost Inertial-Measurement Unit (IMU) integrated into the crutch hand grip, eliminating the need for mechanical modifications. We propose a five phase classification system, including standard gait phases and a non locomotor auxiliary state, to prevent undesired motion. Three deep learning architectures were benchmarked on both a PC and an embedded system. To improve performance under data constrained conditions, models were augmented with a Finite State Machine (FSM) to enforce biomechanical consistency. The Temporal Convolutional Network (TCN) emerged as the superior architecture, yielding the highest success rates and lowest latency. Notably, the model generalized to a paralyzed user despite being trained exclusively on healthy participants. Achieving a 94% success rate in detecting crutch steps, this system provides a high performance, cost effective solution for real time exoskeleton control.
Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets
If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient and natural than relying solely on end-effectors (i.e., fingertips). However, state-of-the-art model-based planners for CRM are still focused on feasibility rather than optimality, limiting their ability to fully exploit CRM's advantages. We introduce a new paradigm that computes approximately optimal manipulator plans. This approach has two phases. Offline, we construct a graph of mutual reachable sets, where each set contains all object orientations reachable from a starting object orientation and grasp. Online, we plan over this graph, effectively computing and sequencing local plans for globally optimized motion. On a challenging, representative contact-rich task, our approach outperforms a leading planner, reducing task cost by 61%. It also achieves a 91% success rate across 250 queries and maintains sub-minute query times, ultimately demonstrating that globally optimized contact-rich manipulation is now practical for real-world tasks.
comment: 17 pages, 14 figures; under submission to IEEE Transactions on Robotics
SurfSLAM: Sim-to-Real Underwater Stereo Reconstruction For Real-Time SLAM
Localization and mapping are core perceptual capabilities for underwater robots. Stereo cameras provide a low-cost means of directly estimating metric depth to support these tasks. However, despite recent advances in stereo depth estimation on land, computing depth from image pairs in underwater scenes remains challenging. In underwater environments, images are degraded by light attenuation, visual artifacts, and dynamic lighting conditions. Furthermore, real-world underwater scenes frequently lack rich texture useful for stereo depth estimation and 3D reconstruction. As a result, stereo estimation networks trained on in-air data cannot transfer directly to the underwater domain. In addition, there is a lack of real-world underwater stereo datasets for supervised training of neural networks. Poor underwater depth estimation is compounded in stereo-based Simultaneous Localization and Mapping (SLAM) algorithms, making it a fundamental challenge for underwater robot perception. To address these challenges, we propose a novel framework that enables sim-to-real training of underwater stereo disparity estimation networks using simulated data and self-supervised finetuning. We leverage our learned depth predictions to develop \algname, a novel framework for real-time underwater SLAM that fuses stereo cameras with IMU, barometric, and Doppler Velocity Log (DVL) measurements. Lastly, we collect a challenging real-world dataset of shipwreck surveys using an underwater robot. Our dataset features over 24,000 stereo pairs, along with high-quality, dense photogrammetry models and reference trajectories for evaluation. Through extensive experiments, we demonstrate the advantages of the proposed training approach on real-world data for improving stereo estimation in the underwater domain and for enabling accurate trajectory estimation and 3D reconstruction of complex shipwreck sites.
Bidirectional Human-Robot Communication for Physical Human-Robot Interaction
Effective physical human-robot interaction requires systems that are not only adaptable to user preferences but also transparent about their actions. This paper introduces BRIDGE, a system for bidirectional human-robot communication in physical assistance. Our method allows users to modify a robot's planned trajectory -- position, velocity, and force -- in real time using natural language. We utilize a large language model (LLM) to interpret any trajectory modifications implied by user commands in the context of the planned motion and conversation history. Importantly, our system provides verbal feedback in response to the user, either assuring any resulting changes or posing a clarifying question. We evaluated our method in a user study with 18 older adults across three assistive tasks, comparing BRIDGE to an ablation without verbal feedback and a baseline. Results show that participants successfully used the system to modify trajectories in real time. Moreover, the bidirectional feedback led to significantly higher ratings of interactivity and transparency, demonstrating that the robot's verbal response is critical for a more intuitive user experience. Videos and code can be found on our project website: https://bidir-comm.github.io/
comment: 12 pages, 8 figures. To be published in 2026 ACM/IEEE International Conference on Human-Robot Interaction
Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*
This paper presents the FS-Planner, a fast graph-search planner based on a modified Lazy Theta* algorithm that exploits the analytical properties of Euclidean Distance Fields (EDFs). We introduce a new cost function that integrates an EDF-based term proven to satisfy the triangle inequality, enabling efficient parent selection and reducing computation time while generating safe paths with smaller heading variations. We also derive an analytic approximation of the EDF integral along a segment and analyze the influence of the line-of-sight limit on the approximation error, motivating the use of a bounded visibility range. Furthermore, we propose a gradient-based neighbour-selection mechanism that decreases the number of explored nodes and improves computational performance without degrading safety or path quality. The FS-Planner produces safe paths with small heading changes without requiring the use of post-processing methods. Extensive experiments and comparisons in challenging 3D indoor simulation environments, complemented by tests in real-world outdoor environments, are used to evaluate and validate the FS-Planner. The results show consistent improvements in computation time, exploration efficiency, safety, and smoothness in a geometric sense compared with baseline heuristic planners, while maintaining sub-optimality within acceptable bounds. Finally, the proposed EDF-based cost formulation is orthogonal to the underlying search method and can be incorporated into other planning paradigms.
RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization
We introduce RGS-SLAM, a robust Gaussian-splatting SLAM framework that replaces the residual-driven densification stage of GS-SLAM with a training-free correspondence-to-Gaussian initialization. Instead of progressively adding Gaussians as residuals reveal missing geometry, RGS-SLAM performs a one-shot triangulation of dense multi-view correspondences derived from DINOv3 descriptors refined through a confidence-aware inlier classifier, generating a well-distributed and structure-aware Gaussian seed prior to optimization. This initialization stabilizes early mapping and accelerates convergence by roughly 20\%, yielding higher rendering fidelity in texture-rich and cluttered scenes while remaining fully compatible with existing GS-SLAM pipelines. Evaluated on the TUM RGB-D and Replica datasets, RGS-SLAM achieves competitive or superior localization and reconstruction accuracy compared with state-of-the-art Gaussian and point-based SLAM systems, sustaining real-time mapping performance at up to 925 FPS. Additional details and resources are available at this URL: https://breeze1124.github.io/rgs-slam-project-page/
comment: 10 pages, 9 figures
Sampling-Based Constrained Motion Planning with Products of Experts
We present a novel approach to enhance the performance of sampling-based Model Predictive Control (MPC) in constrained optimization by leveraging products of experts. Our methodology divides the main problem into two components: one focused on optimality and the other on feasibility. By combining the solutions from each component, represented as distributions, we apply products of experts to implement a project-then-sample strategy. In this strategy, the optimality distribution is projected into the feasible area, allowing for more efficient sampling. This approach contrasts with the traditional sample-then-project and naive sample-then-reject method, leading to more diverse exploration and reducing the accumulation of samples on the boundaries. We demonstrate an effective implementation of this principle using a tensor train-based distribution model, which is characterized by its non-parametric nature, ease of combination with other distributions at the task level, and straightforward sampling technique. We adapt existing tensor train models to suit this purpose and validate the efficacy of our approach through experiments in various tasks, including obstacle avoidance, non-prehensile manipulation, and tasks involving staying in a restricted volume. Our experimental results demonstrate that the proposed method consistently outperforms known baselines, providing strong empirical support for its effectiveness. Sample codes for this project are available at https://github.com/idiap/smpc_poe.
Singularity-Free Guiding Vector Field over Bézier's Curves Applied to Rovers Path Planning and Path Following
This paper presents a guidance algorithm for solving the problem of following parametric paths, as well as a curvature-varying speed setpoint for land-based car-type wheeled mobile robots (WMRs). The guidance algorithm relies on Singularity-Free Guiding Vector Fields SF-GVF. This novel GVF approach expands the desired robot path and the Guiding vector field to a higher dimensional space, in which an angular control function can be found to ensure global asymptotic convergence to the desired parametric path while avoiding field singularities. In SF-GVF, paths should follow a parametric definition. This feature makes using Bezier's curves attractive to define the robot's desired patch. The curvature-varying speed setpoint, combined with the guidance algorithm, eases the convergence to the path when physical restrictions exist, such as minimal turning radius or maximal lateral acceleration. We provide theoretical results, simulations, and outdoor experiments using a WMR platform assembled with off-the-shelf components.
comment: Final version, accepted for publication. 26 pages, 15 figures
Bootstrap Off-policy with World Model NeurIPS 2025
Online planning has proven effective in reinforcement learning (RL) for improving sample efficiency and final performance. However, using planning for environment interaction inevitably introduces a divergence between the collected data and the policy's actual behaviors, degrading both model learning and policy improvement. To address this, we propose BOOM (Bootstrap Off-policy with WOrld Model), a framework that tightly integrates planning and off-policy learning through a bootstrap loop: the policy initializes the planner, and the planner refines actions to bootstrap the policy through behavior alignment. This loop is supported by a jointly learned world model, which enables the planner to simulate future trajectories and provides value targets to facilitate policy improvement. The core of BOOM is a likelihood-free alignment loss that bootstraps the policy using the planner's non-parametric action distribution, combined with a soft value-weighted mechanism that prioritizes high-return behaviors and mitigates variability in the planner's action quality within the replay buffer. Experiments on the high-dimensional DeepMind Control Suite and Humanoid-Bench show that BOOM achieves state-of-the-art results in both training stability and final performance. The code is accessible at https://github.com/molumitu/BOOM_MBRL.
comment: NeurIPS 2025
UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories AAAI 2026
Navigating complex urban environments using natural language instructions poses significant challenges for embodied agents, including noisy language instructions, ambiguous spatial references, diverse landmarks, and dynamic street scenes. Current visual navigation methods are typically limited to simulated or off-street environments, and often rely on precise goal formats, such as specific coordinates or images. This limits their effectiveness for autonomous agents like last-mile delivery robots navigating unfamiliar cities. To address these limitations, we introduce UrbanNav, a scalable framework that trains embodied agents to follow free-form language instructions in diverse urban settings. Leveraging web-scale city walking videos, we develop an scalable annotation pipeline that aligns human navigation trajectories with language instructions grounded in real-world landmarks. UrbanNav encompasses over 1,500 hours of navigation data and 3 million instruction-trajectory-landmark triplets, capturing a wide range of urban scenarios. Our model learns robust navigation policies to tackle complex urban scenarios, demonstrating superior spatial reasoning, robustness to noisy instructions, and generalization to unseen urban settings. Experimental results show that UrbanNav significantly outperforms existing methods, highlighting the potential of large-scale web video data to enable language-guided, real-world urban navigation for embodied agents.
comment: 9 pages, 5 figures, accepted to AAAI 2026. Project page:https://github.com/CASIA-IVA-Lab/UrbanNav
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Large Vision-Language Models (LVLMs) have recently shown great promise in advancing robotics by combining embodied reasoning with robot control. A common approach involves training on embodied reasoning tasks related to robot control using Supervised Fine-Tuning (SFT). However, SFT datasets are often heuristically constructed and not explicitly optimized for improving robot control. Furthermore, SFT often leads to issues such as catastrophic forgetting and reduced generalization performance. To address these limitations, we introduce Robot-R1, a novel framework that leverages reinforcement learning to enhance embodied reasoning specifically for robot control. Robot-R1 learns to predict the next keypoint state required for task completion, conditioned on the current scene image and environment metadata derived from expert demonstrations. Inspired by the DeepSeek-R1 learning approach, Robot-R1 samples reasoning-based responses and reinforces those that lead to more accurate predictions. To rigorously evaluate Robot-R1, we also introduce a new benchmark that demands the diverse embodied reasoning capabilities for the task. Our experiments show that models trained with Robot-R1 outperform SFT methods on embodied reasoning tasks. Despite having only 7B parameters, Robot-R1 even surpasses GPT-4o on reasoning tasks related to low-level action control, such as spatial and movement reasoning.
comment: 29 pages, 13 figures
Learning Quadrotor Control From Visual Features Using Differentiable Simulation ICRA
The sample inefficiency of reinforcement learning (RL) remains a significant challenge in robotics. RL requires large-scale simulation and can still cause long training times, slowing research and innovation. This issue is particularly pronounced in vision-based control tasks where reliable state estimates are not accessible. Differentiable simulation offers an alternative by enabling gradient back-propagation through the dynamics model, providing low-variance analytical policy gradients and, hence, higher sample efficiency. However, its usage for real-world robotic tasks has yet been limited. This work demonstrates the great potential of differentiable simulation for learning quadrotor control. We show that training in differentiable simulation significantly outperforms model-free RL in terms of both sample efficiency and training time, allowing a policy to learn to recover a quadrotor in seconds when providing vehicle states and in minutes when relying solely on visual features. The key to our success is two-fold. First, the use of a simple surrogate model for gradient computation greatly accelerates training without sacrificing control performance. Second, combining state representation learning with policy learning enhances convergence speed in tasks where only visual features are observable. These findings highlight the potential of differentiable simulation for real-world robotics and offer a compelling alternative to conventional RL approaches.
comment: Accepted for presentation at the IEEE International Conference on Robotics and Automation (ICRA) 2025
Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory
Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of 44000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.
comment: Submitted to IEEE Transactions on Robotics, 20 pages, 14 figures
Adaptive Querying for Reward Learning from Human Feedback
Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety. Existing approaches typically consider a single querying (interaction) format when seeking human feedback and do not leverage multiple modes of user interaction with a robot. We examine how to learn a penalty function associated with unsafe behaviors using multiple forms of human feedback, by optimizing both the query state and feedback format. Our proposed adaptive feedback selection is an iterative, two-phase approach which first selects critical states for querying, and then uses information gain to select a feedback format for querying across the sampled critical states. The feedback format selection also accounts for the cost and probability of receiving feedback in a certain format. Our experiments in simulation demonstrate the sample efficiency of our approach in learning to avoid undesirable behaviors. The results of our user study with a physical robot highlight the practicality and effectiveness of adaptive feedback selection in seeking informative, user-aligned feedback that accelerate learning. Experiment videos, code and appendices are found on our website: https://tinyurl.com/AFS-learning.
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making
One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a single wrong instruction can directly endanger human safety. This paper addresses the urgent need to systematically evaluate LLM performance in scenarios where even minor errors are catastrophic. Through a qualitative evaluation of a fire evacuation scenario, we identified critical failure cases in LLM-based decision-making. Based on these, we designed seven tasks for quantitative assessment, categorized into: Complete Information, Incomplete Information, and Safety-Oriented Spatial Reasoning (SOSR). Complete information tasks utilize ASCII maps to minimize interpretation ambiguity and isolate spatial reasoning from visual processing. Incomplete information tasks require models to infer missing context, testing for spatial continuity versus hallucinations. SOSR tasks use natural language to evaluate safe decision-making in life-threatening contexts. We benchmark various LLMs and Vision-Language Models (VLMs) across these tasks. Beyond aggregate performance, we analyze the implications of a 1% failure rate, highlighting how "rare" errors escalate into catastrophic outcomes. Results reveal serious vulnerabilities: several models achieved a 0% success rate in ASCII navigation, while in a simulated fire drill, models instructed robots to move toward hazardous areas instead of emergency exits. Our findings lead to a sobering conclusion: current LLMs are not ready for direct deployment in safety-critical systems. A 99% accuracy rate is dangerously misleading in robotics, as it implies one out of every hundred executions could result in catastrophic harm. We demonstrate that even state-of-the-art models cannot guarantee safety, and absolute reliance on them creates unacceptable risks.
CoinFT: A Coin-Sized, Capacitive 6-Axis Force Torque Sensor for Robotic Applications
We introduce CoinFT, a capacitive 6-axis force/torque (F/T) sensor that is compact, light, low-cost, and robust with an average root-mean-squared error of 0.16N for force and 1.08mNm for moment when the input ranges from 0~14N and 0~5N in normal and shear directions, respectively. CoinFT is a stack of two rigid PCBs with comb-shaped electrodes connected by an array of silicone rubber pillars. The microcontroller interrogates the electrodes in different subsets in order to enhance sensitivity for measuring 6-axis F/T. The combination of features of CoinFT enables various contact-rich robot interactions across different embodiment domains including drones, robot end-effectors, and wearable haptic devices. We demonstrate the utility of CoinFT through two representative applications: a multi-axial contact-probing experiment in which a CoinFT mounted beneath a hemispherical fingertip measures 6-axes of force and torque representative of manipulation scenarios, and an attitude-based force-control task on a drone. The design, fabrication, and firmware of CoinFT are open-sourced at https://coin-ft.github.io/.
Bayesian Monocular Depth Refinement via Neural Radiance Fields
Monocular depth estimation has applications in many fields, such as autonomous navigation and extended reality, making it an essential computer vision task. However, current methods often produce smooth depth maps that lack the fine geometric detail needed for accurate scene understanding. We propose MDENeRF, an iterative framework that refines monocular depth estimates using depth information from Neural Radiance Fields (NeRFs). MDENeRF consists of three components: (1) an initial monocular estimate for global structure, (2) a NeRF trained on perturbed viewpoints, with per-pixel uncertainty, and (3) Bayesian fusion of the noisy monocular and NeRF depths. We derive NeRF uncertainty from the volume rendering process to iteratively inject high-frequency fine details. Meanwhile, our monocular prior maintains global structure. We demonstrate improvements on key metrics and experiments using indoor scenes from the SUN RGB-D dataset.
comment: IEEE 8th International Conference on Algorithms, Computing and Artificial Intelligence (ACAI 2025)
A Taxonomy for Evaluating Generalist Robot Manipulation Policies
Machine learning for robot manipulation promises to unlock generalization to novel tasks and environments. But how should we measure the progress of these policies towards generalization? Evaluating and quantifying generalization is the Wild West of modern robotics, with each work proposing and measuring different types of generalization in their own, often difficult to reproduce settings. In this work, our goal is (1) to outline the forms of generalization we believe are important for robot manipulation in a comprehensive and fine-grained manner, and (2) to provide reproducible guidelines for measuring these notions of generalization. We first propose STAR-Gen, a taxonomy of generalization for robot manipulation structured around visual, semantic, and behavioral generalization. Next, we instantiate STAR-Gen with two case studies on real-world benchmarking: one based on open-source models and the Bridge V2 dataset, and another based on the bimanual ALOHA 2 platform that covers more dexterous and longer horizon tasks. Our case studies reveal many interesting insights: for example, we observe that open-source vision-language-action models often struggle with semantic generalization, despite pre-training on internet-scale language datasets. We provide videos and other supplementary material at our website stargen-taxonomy.github.io.
comment: IEEE Robotics and Automation Letters (RA-L)
Multiagent Systems
Procedural Fairness in Multi-Agent Bandits
In the context of multi-agent multi-armed bandits (MA-MAB), fairness is often reduced to outcomes: maximizing welfare, reducing inequality, or balancing utilities. However, evidence in psychology, economics, and Rawlsian theory suggests that fairness is also about process and who gets a say in the decisions being made. We introduce a new fairness objective, procedural fairness, which provides equal decision-making power for all agents, lies in the core, and provides for proportionality in outcomes. Empirical results confirm that fairness notions based on optimizing for outcomes sacrifice equal voice and representation, while the sacrifice in outcome-based fairness objectives (like equality and utilitarianism) is minimal under procedurally fair policies. We further prove that different fairness notions prioritize fundamentally different and incompatible values, highlighting that fairness requires explicit normative choices. This paper argues that procedural legitimacy deserves greater focus as a fairness objective, and provides a framework for putting procedural fairness into practice.
Generative AI collective behavior needs an interactionist paradigm
In this article, we argue that understanding the collective behavior of agents based on large language models (LLMs) is an essential area of inquiry, with important implications in terms of risks and benefits, impacting us as a society at many levels. We claim that the distinctive nature of LLMs--namely, their initialization with extensive pre-trained knowledge and implicit social priors, together with their capability of adaptation through in-context learning--motivates the need for an interactionist paradigm consisting of alternative theoretical foundations, methodologies, and analytical tools, in order to systematically examine how prior knowledge and embedded values interact with social context to shape emergent phenomena in multi-agent generative AI systems. We propose and discuss four directions that we consider crucial for the development and deployment of LLM-based collectives, focusing on theory, methods, and trans-disciplinary dialogue.
Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems
Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS
comment: Preprint
The incompatibility of the Condorcet winner and loser criteria with positive involvement and resolvability
We prove that there is no preferential voting method satisfying the Condorcet winner and loser criteria, positive involvement (if a candidate $x$ wins in an initial preference profile, then adding a voter who ranks $x$ uniquely first cannot cause $x$ to lose), and resolvability (if $x$ initially ties for winning, then $x$ can be made the unique winner by adding a single voter). In a previous note, we proved an analogous result assuming an additional axiom of ordinal margin invariance, which we now show is unnecessary for an impossibility theorem, at least if the desired voting method is defined for five-candidate elections.
comment: 6 pages, 2 figures
Multipath Routing for Multi-Hop UAV Networks
Multi-hop uncrewed aerial vehicle (UAV) networks are promising to extend the terrestrial network coverage. Existing multi-hop UAV networks employ a single routing path by selecting the next-hop forwarding node in a hop-by-hop manner, which leads to local congestion and increases traffic delays. In this paper, a novel traffic-adaptive multipath routing method is proposed for multi-hop UAV networks, which enables each UAV to dynamically split and forward traffic flows across multiple next-hop neighbors, thus meeting latency requirements of diverse traffic flows in dynamic mobile environments. An on-time packet delivery ratio maximization problem is formulated to determine the traffic splitting ratios at each hop. This sequential decision-making problem is modeled as a decentralized partially observable Markov decision process (Dec-POMDP). To solve this Dec-POMDP, a novel multi-agent deep reinforcement leaning (MADRL) algorithm, termed Independent Proximal Policy Optimization with Dirichlet Modeling (IPPO-DM), is developed. Specifically, the IPPO serves as the core optimization framework, where the Dirichlet distribution is leveraged to parameterize a continuous stochastic policy network on the probability simplex, inherently ensuring feasible traffic splitting ratios. Simulation results demonstrate that IPPO-DM outperforms benchmark schemes in terms of both delivery latency guarantee and packet loss performance.
comment: This paper has been submitted to IEEE Transactions on Communications
M^4olGen: Multi-Agent, Multi-Stage Molecular Generation under Precise Multi-Property Constraints
Generating molecules that satisfy precise numeric constraints over multiple physicochemical properties is critical and challenging. Although large language models (LLMs) are expressive, they struggle with precise multi-objective control and numeric reasoning without external structure and feedback. We introduce \textbf{M olGen}, a fragment-level, retrieval-augmented, two-stage framework for molecule generation under multi-property constraints. Stage I : Prototype generation: a multi-agent reasoner performs retrieval-anchored, fragment-level edits to produce a candidate near the feasible region. Stage II : RL-based fine-grained optimization: a fragment-level optimizer trained with Group Relative Policy Optimization (GRPO) applies one- or multi-hop refinements to explicitly minimize the property errors toward our target while regulating edit complexity and deviation from the prototype. A large, automatically curated dataset with reasoning chains of fragment edits and measured property deltas underpins both stages, enabling deterministic, reproducible supervision and controllable multi-hop reasoning. Unlike prior work, our framework better reasons about molecules by leveraging fragments and supports controllable refinement toward numeric targets. Experiments on generation under two sets of property constraints (QED, LogP, Molecular Weight and HOMO, LUMO) show consistent gains in validity and precise satisfaction of multi-property targets, outperforming strong LLMs and graph-based algorithms.
Fairness Driven Multi-Agent Path Finding Problem
The Multi-Agent Path Finding (MAPF) problem aims at finding non-conflicting paths for multiple agents from their respective sources to destinations. This problem arises in multiple real-life situations, including robot motion planning and airspace assignment for unmanned aerial vehicle movement. The problem is computationally expensive, and adding to it, the agents are rational and can misreport their private information. In this paper, we study both variants of the problem under the realm of fairness. For the non-rational agents, we propose a heuristic solution for this problem. Considering the agents are rational, we develop a mechanism and demonstrate that it is a dominant strategy, incentive compatible, and individually rational. We employ various solution methodologies to highlight the effectiveness and efficiency of the proposed solution approaches.
comment: This paper has been accepted in the 18th International Conference on Agents and Artificial Intelligence (ICCART 2026)
TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems
Optimizing communication topology in LLM-based multi-agent system is critical for enabling collective intelligence. Existing methods mainly rely on spatio-temporal interaction paradigms, where the sequential execution of multi-round dialogues incurs high latency and computation. Motivated by the recent insights that evaluation and debate mechanisms can improve problem-solving in multi-agent systems, we propose TopoDIM, a framework for one-shot Topology generation with Diverse Interaction Modes. Designed for decentralized execution to enhance adaptability and privacy, TopoDIM enables agents to autonomously construct heterogeneous communication without iterative coordination, achieving token efficiency and improved task performance. Experiments demonstrate that TopoDIM reduces total token consumption by 46.41% while improving average performance by 1.50% over state-of-the-art methods. Moreover, the framework exhibits strong adaptability in organizing communication among heterogeneous agents. Code is available at: https://anonymous.4open.science/r/TopoDIM-8D35/
When Personas Override Payoffs: Role Identity Bias in Multi-Agent LLM Decision-Making
Large language models are increasingly deployed in multi-agent systems for strategic tasks, yet how design choices such as role-based personas and payoff visibility affect reasoning remains poorly understood. We investigate whether multi-agent systems function as strategic reasoners capable of payoff optimization or as identity-driven actors that prioritize role alignment over explicit incentives. Using Nash equilibrium achievement as a diagnostic for strategic reasoning, we conduct systematic experiments across four LLM architectures (Qwen-7B, Qwen-32B, Llama-8B, Mistral-7B) in complex environmental decision-making games involving four agents. We show that role identity bias fundamentally alters strategic reasoning even when payoff-optimal equilibria exist and complete payoff information is available. Removing personas and providing explicit payoffs enables Qwen models to achieve high Nash equilibrium rates, indicating that both conditions are necessary for strategic reasoning. In contrast, personas systematically bias equilibrium selection toward socially preferred outcomes: with personas present, all of the achieved equilibria correspond to Green Transition, while models entirely fail to reach equilibrium when Tragedy of the Commons is payoff-optimal. The effect of explicit payoffs depends entirely on persona presence, revealing strong interactions between representational design choices. We also observe clear model-dependent patterns. Qwen architectures are highly sensitive to both personas and payoff visibility, whereas Llama and Mistral exhibit rigid reasoning behavior across conditions. These findings demonstrate that representational choices are substantive governance decisions that determine whether multi-agent systems act as strategic reasoners or identity-driven actors, with important implications for real-world deployment.
Cooperative UAVs for Remote Data Collection under Limited Communications: An Asynchronous Multiagent Learning Framework
This paper addresses the joint optimization of trajectories and bandwidth allocation for multiple Unmanned Aerial Vehicles (UAVs) to enhance energy efficiency in the cooperative data collection problem. We focus on an important yet underestimated aspect of the system, where action synchronization across all UAVs is impossible. Since most existing learning-based solutions are not designed to learn in this asynchronous environment, we formulate the trajectory planning problem as a Decentralized Partially Observable Semi-Markov Decision Process and introduce an asynchronous multi-agent learning algorithm to learn UAVs' cooperative policies. Once the UAVs' trajectory policies are learned, the bandwidth allocation can be optimally solved based on local observations at each collection point. Comprehensive empirical results demonstrate the superiority of the proposed method over other learning-based and heuristic baselines in terms of both energy efficiency and mission completion time. Additionally, the learned policies exhibit robustness under varying environmental conditions.
comment: Accepted to IEEE Transactions on Wireless Communications
Towards Reliable ML Feature Engineering via Planning in Constrained-Topology of LLM Agents
Recent advances in code generation models have unlocked unprecedented opportunities for automating feature engineering, yet their adoption in real-world ML teams remains constrained by critical challenges: (i) the scarcity of datasets capturing the iterative and complex coding processes of production-level feature engineering, (ii) limited integration and personalization of widely used coding agents, such as CoPilot and Devin, with a team's unique tools, codebases, workflows, and practices, and (iii) suboptimal human-AI collaboration due to poorly timed or insufficient feedback. We address these challenges with a planner-guided, constrained-topology multi-agent framework that generates code for repositories in a multi-step fashion. The LLM-powered planner leverages a team's environment, represented as a graph, to orchestrate calls to available agents, generate context-aware prompts, and use downstream failures to retroactively correct upstream artifacts. It can request human intervention at critical steps, ensuring generated code is reliable, maintainable, and aligned with team expectations. On a novel in-house dataset, our approach achieves 38% and 150% improvement in the evaluation metric over manually crafted and unplanned workflows respectively. In practice, when building features for recommendation models serving over 120 million users, our approach has delivered real-world impact by reducing feature engineering cycles from three weeks to a single day.
AI Agents Need Memory Control Over More Context
AI agents are increasingly used in long, multi-turn workflows in both research and enterprise settings. As interactions grow, agent behavior often degrades due to loss of constraint focus, error accumulation, and memory-induced drift. This problem is especially visible in real-world deployments where context evolves, distractions are introduced, and decisions must remain consistent over time. A common practice is to equip agents with persistent memory through transcript replay or retrieval-based mechanisms. While convenient, these approaches introduce unbounded context growth and are vulnerable to noisy recall and memory poisoning, leading to unstable behavior and increased drift. In this work, we introduce the Agent Cognitive Compressor (ACC), a bio-inspired memory controller that replaces transcript replay with a bounded internal state updated online at each turn. ACC separates artifact recall from state commitment, enabling stable conditioning while preventing unverified content from becoming persistent memory. We evaluate ACC using an agent-judge-driven live evaluation framework that measures both task outcomes and memory-driven anomalies across extended interactions. Across scenarios spanning IT operations, cybersecurity response, and healthcare workflows, ACC consistently maintains bounded memory and exhibits more stable multi-turn behavior, with significantly lower hallucination and drift than transcript replay and retrieval-based agents. These results show that cognitive compression provides a practical and effective foundation for reliable memory control in long-horizon AI agents.
comment: 32 pages, 7 figures
Self-Organizing Railway Traffic Management
Improving traffic management in case of perturbation is one of the main challenges in today's railway research. The great majority of the existing literature proposes approaches to make centralized decisions to minimize delay propagation. In this paper, we propose a new paradigm to the same aim: we design and implement a modular process to allow trains to self-organize. This process consists in having trains identifying their neighbors, formulating traffic management hypotheses, checking their compatibility and selecting the best ones through a consensus mechanism. Finally, these hypotheses are merged into a directly applicable traffic plan. In a thorough experimental analysis on a portion of the Italian network, we compare the results of self-organization with those of a state-of-the-art centralized approach. In particular, we make this comparison mimicking a realistic deployment thanks to a closed-loop framework including a microscopic railway simulator. The results indicate that self-organization achieves better results than the centralized algorithm, specifically thanks to the definition and exploitation of the instance decomposition allowed by the proposed approach.
comment: This work has been submitted to the IEEE for possible publication
Systems and Control (EESS)
Safe Trajectory Gradient Flow Control of a Grid-Interfacing Inverter
Grid-interfacing inverters serve as the interface between renewable energy resources and the electric power grid, offering fast, programmable control capabilities. However, their operation is constrained by hardware limitations, such as bounds on the current magnitude. Existing control methods for these systems often neglect these constraints during controller design and instead rely on ad hoc limiters, which can introduce instability or degrade performance. In this work, we present a control framework that directly incorporates constraints into the control of a voltage-source inverter. We propose a safe trajectory gradient flow controller, which applies the safe gradient flow method to a rolling horizon trajectory optimization problem to ensure that the states remain within a safe set defined by the constraints while directing the trajectory towards an optimal equilibrium point of a nonlinear program. Simulation results demonstrate that our approach can drive the outputs of a simulated inverter system to optimal values and maintain state constraints, even when using a limited number of optimization steps per control cycle.
comment: 5 pages, 4 figures, Submitted to PES-GM 2026
A user subscription model in mobile radio access networks with network slicing
Network slicing is an architectural enabling technology that logically decouples the current cellular networks into infrastructure providers (InPs) and Network Slice Tenants (NSTs). The network resources (e.g., radio access resources at each cell) are owned by the InP, and are shared by the NSTs to provide a service to their mobile users. In this context, we proposed a business model that includes resource allocation and user subscription to NSTs in a competitive setting, and provides, among other things, closed-form expressions for the subscription indicators in equilibrium of each NST at each cell. This model relies on the widely adopted logit model to characterize user subscriptions. However, as a consequence of user mobility and radio propagation, some of the underlying assumptions in the logit model do not hold. Therefore, further research is needed to assess the accuracy of the results provided by the logit model in a mobile radio scenario. We carry out a thorough evaluation of the validity of the model by comparing its results against those obtained through computer simulation. Our simulation model includes complete and realistic characterizations of user mobility and radio propagation. From the results, we conclude in most cases the logit model provides valid results in a mobile radio scenario.
Online identification of nonlinear time-varying systems with uncertain information
Digital twins (DTs), serving as the core enablers for real-time monitoring and predictive maintenance of complex cyber-physical systems, impose critical requirements on their virtual models: high predictive accuracy, strong interpretability, and online adaptive capability. However, existing techniques struggle to meet these demands simultaneously: Bayesian methods excel in uncertainty quantification but lack model interpretability, while interpretable symbolic identification methods (e.g., SINDy) are constrained by their offline, batch-processing nature, which make real-time updates challenging. To bridge this semantic and computational gap, this paper proposes a novel Bayesian Regression-based Symbolic Learning (BRSL) framework. The framework formulates online symbolic discovery as a unified probabilistic state-space model. By incorporating sparse horseshoe priors, model selection is transformed into a Bayesian inference task, enabling simultaneous system identification and uncertainty quantification. Furthermore, we derive an online recursive algorithm with a forgetting factor and establish precise recursive conditions that guarantee the well-posedness of the posterior distribution. These conditions also function as real-time monitors for data utility, enhancing algorithmic robustness. Additionally, a rigorous convergence analysis is provided, demonstrating the convergence of parameter estimates under persistent excitation conditions. Case studies validate the effectiveness of the proposed framework in achieving interpretable, probabilistic prediction and online learning.
Single-Feed Circularly Polarized Super Realized Gain Antenna
This paper presents a super realized gain, circularly polarized strip-crossed dipole antenna operating at 3.5 GHz. Superdirective behavior is achieved by leveraging strong inter-element mutual coupling through careful adjustment of the strip dimensions. The antenna features a single driven element, with the other element passively loaded with a reactive impedance. The structure is optimized to maximize left-hand circularly polarized (LHCP) realized gain, ensuring high polarization purity and good impedance matching. The optimized design exhibits a 50 $Ω$ impedance bandwidth of 3.29 - 4.17 GHz (23.75%) and an axial-ratio bandwidth of 3.43 - 3.57 GHz (4%). At 3.5 GHz, the antenna achieves a peak realized gain of 6.1 dB ($ka \approx 1.65$), with an axial ratio of 1.4 dB. These results demonstrate that circular polarization and superdirectivity can be simultaneously realized in a geometrically simple, low-profile ($0.15λ$) antenna, rendering it suitable for integration into compact sub-6~GHz wireless and sensing platforms.
Model Predictive Control of Thermo-Hydraulic Systems Using Primal Decomposition
Decarbonizing the global energy supply requires more efficient heating and cooling systems. Model predictive control enhances the operation of cooling and heating systems but depends on accurate system models, often based on control volumes. We present an automated framework including time discretization to generate model predictive controllers for such models. To ensure scalability, a primal decomposition exploiting the model structure is applied. The approach is validated on an underground heating system with varying numbers of states, demonstrating the primal decomposition's advantage regarding scalability.
comment: This work has been submitted to the IFAC World Congress 2026 for possible publication
HyMGP: A Customized MILP-Based Tool for Techno-Economic Planning of Islanded Microgrids
This paper presents a customized microgrid planning algorithm and tool, HyMGP, for remote sites in arid regions, which is formulated as a Mixed Integer Linear Programming (MILP) problem. HyMGP is compared with HOMER Pro to evaluate its performance in optimizing the sizing of microgrid components, including photovoltaic panels (PVs), vertical axis wind turbines (VAWTs), and battery energy storage systems (BESS), for remote and off-grid applications. The study focuses on a standalone microgrid in the Saudi Arabia, considering high solar irradiance, limited wind availability, and a constant load profile composed of continuous cathodic protection and daytime cooling. In the simulation environment, comparisons with HOMER solutions demonstrate the advantages of HyMGP, which provides optimal and more flexible solutions by allowing user-defined component specifications and strictly enforcing all constraints. Further analysis shows that incorporating wind turbines reduces the Net Present Cost (NPC) by decreasing the required PV and battery capacities. Increasing battery autonomy leads to a higher NPC in both PV-only and hybrid systems due to the need for larger storage. Finally, lithium iron phosphate (Li-ion LFP) batteries are found to be more cost effective than lead acid, offering lower NPCs due to their longer lifespan, deeper discharge capability, and fewer replacement cycles.
comment: 2026 IEEE Power & Energy Society (PES) International Meeting (IM)
Leveraging Digital Twin Technologies: All-Photonics Networks-as-a-Service for Data Center Xchange in the Era of AI [Invited Tutorial]
This paper presents a data center exchange (Data Center Xchange, DCX) architecture for all-photonics networks-as-a-service in distributed data center infrastructures, enabling the creation of a virtual large-scale data center by directly interconnecting distributed data centers in metropolitan areas. Key requirements for such an architecture are identified: support for low-latency operations, scalability, reliability, and flexibility within a single network architecture; the ability to add new operator-driven automation functionalities based on an open networking approach; and the ability to control and manage remotely deployed transponders connected via access links with unknown physical parameters. We propose a set of technologies that enable digital twin operations for optical networks, including a cloud-native architecture for coherent transceivers, remote transponder control, fast end-to-end optical path provisioning, transceiver-based physical-parameter estimation incorporating digital longitudinal monitoring, and optical line system calibration, demonstrating their feasibility through field validations.
On the Computation and Approximation of Backward Reachable Sets for Max-Plus Linear Systems using Polyhedras
This paper investigates reachability analysis for max-plus linear systems (MPLS), an important class of dynamical systems that model synchronization and delay phenomena in timed discrete-event systems. We specifically focus on backward reachability analysis, i.e., determining the set of states that can reach a given target set within a certain number of steps. Computing backward reachable sets presents significant challenges due to the non-convexity of max-plus dynamics and the complexity of set complement operations. To address these challenges, we propose a novel approximation framework that efficiently computes backward reachable sets by exploiting the structure of tropical polyhedra. Our approach reformulates the problem as a sequence of symbolic operations and approximates non-convex target sets through closure operations on unions of tropical polyhedra. We develop a systematic algorithm that constructs both outer (M-form) and inner (V-form) representations of the resulting sets, incorporating extremal filtering to reduce computational complexity. The proposed method offers a scalable alternative to traditional DBM-based approaches, enabling reliable approximate backward reachability analysis for general target regions in MPLS.
Event-Driven Deep RL Dispatcher for Post-Storm Distribution System Restoration
Natural hazards such as hurricanes and floods damage power grid equipment, forcing operators to replan restoration repeatedly as new information becomes available. This paper develops a deep reinforcement learning (DRL) dispatcher that serves as a real-time decision engine for crew-to-repair assignments. We model restoration as a sequential, information-revealing process and learn an actor-critic policy over compact features such as component status, travel/repair times, crew availability, and marginal restoration value. A feasibility mask blocks unsafe or inoperable actions, such as power flow limits, switching rules, and crew-time constraints, before they are applied. To provide realistic runtime inputs without relying on heavy solvers, we use lightweight surrogates for wind and flood intensities, fragility-based failure, spatial clustering of damage, access impairments, and progressive ticket arrivals. In simulated hurricane and flood events, the learned policy updates crew decisions in real time as new field reports arrive. Because the runtime logic is lightweight, it improves online performance (energy-not-supplied, critical-load restoration time, and travel distance) compared with mixed-integer programs and standard heuristics. The proposed approach is tested on the IEEE 13- and 123-bus feeders with mixed hurricane/flood scenarios.
Emergency Department Patient Flow Optimization with an Alternative Care Threshold Policy
Emergency department (ED) overcrowding and patient boarding represent critical systemic challenges that compromise care quality. We propose a threshold-based admission policy that redirects non-urgent patients to alternative care pathways, such as telemedicine, during peak congestion. The ED is modeled as a two-class $M/M/c$ preemptive-priority queuing system, where high-acuity patients are prioritized and low-acuity patients are subject to state-dependent redirection. Analyzed via a level-dependent Quasi-Birth-Death (QBD) process, the model determines the optimal threshold by maximizing a long-run time-averaged objective function comprising redirection-affected revenue and costs associated with patient balking and system occupancy. Numerical analysis using national healthcare data reveals that optimal policies are highly context-dependent. While rural EDs generally optimize at lower redirection thresholds, urban EDs exhibit performance peaks at moderate thresholds. Results indicate that our optimal policy yields significant performance gains of up to $4.84\%$ in rural settings and $5.90\%$ in urban environments. This research provides a mathematically rigorous framework for balancing clinical priority with operational efficiency across diverse ED settings.
comment: 37 pages, 14 figures
Extremum Seeking Nonovershooting Control of Strict-Feedback Systems Under Unknown Control Direction
This paper addresses the nonovershooting control problem for strict-feedback nonlinear systems with unknown control direction. We propose a method that integrates extremum seeking with Lie bracket-based design to achieve approximately nonovershooting tracking. The approach ensures that arbitrary reference trajectories can be tracked from below for any initial condition, with the overshoot reducible to arbitrarily small levels through parameter tuning. The method further provides a mechanism for enforcing high-relative-degree nonovershooting constraints in safety-critical scenarios involving unknown control directions.
Interfacing Superconductor and Semiconductor Digital Electronics
Interface circuits are the key components that enable the hybrid integration of superconductor and semiconductor digital electronics. The design requirements of superconductor-semiconductor interface circuits vary depending on the application, such as high-performance classical computing, superconducting quantum computing, and digital signal processing. In this survey, various interface circuits are categorized based on the working principle and structure. The superconducting output drivers are explored, which are capable of converting and amplifying, e.g., single flux quantum (SFQ) voltage pulses, to voltage levels that semiconductor circuits can process. Several trade-offs between circuit- and system-level design parameters are examined. Accordingly, parameters such as the data rate, output voltage, power dissipation, layout area, thermal/heat load of cryogenic cables, and bit-error rate are considered.
Sustainable Vertical Heterogeneous Networks: A Cell Switching Approach with High Altitude Platform Station
The rapid growth of radio access networks (RANs) is increasing energy consumption and challenging the sustainability of future systems. We consider a dense-urban vertical heterogeneous network (vHetNet) comprising a high-altitude platform station (HAPS) acting as a super macro base station, a terrestrial macro base station (MBS), and multiple small base stations (SBSs). We propose a HAPS-enhanced cell-switching algorithm that selectively deactivates SBSs based on their traffic load and the capacity and channel conditions of both the MBS and HAPS. The resulting energy-minimization problem, subject to an outage-based quality-of-service (QoS) constraint, is formulated as a mixed-integer nonlinear program and reformulated into a mixed-integer program for efficient solution. Using realistic 3GPP channel models, simulations show substantial energy savings versus All-ON, terrestrial cell switching, and sorting benchmarks. Relative to All-ON, the proposed method reduces power consumption by up to 77% at low loads and about 40% at high loads; a NoQoS variant achieves up to 90% and 47%, respectively. The approach maintains high served-traffic levels and provides a tunable trade-off between power efficiency and outage-based QoS, supporting scalable and sustainable 6G deployments.
comment: 16 pages
Balanced allocation: considerations from large scale service environments
We study d-way balanced allocation, which assigns each incoming job to the lightest loaded among d randomly chosen servers. While prior work has extensively studied the performance of the basic scheme, there has been less published work on adapting this technique to many aspects of large-scale systems. Based on our experience in building and running planet-scale cloud applications, we extend the understanding of d-way balanced allocation along the following dimensions: (i) Bursts: Events such as breaking news can produce bursts of requests that may temporarily exceed the servicing capacity of the system. Thus, we explore what happens during a burst and how long it takes for the system to recover from such bursts. (ii) Priorities: Production systems need to handle jobs with a mix of priorities (e.g., user facing requests may be high priority while other requests may be low priority). We extend d-way balanced allocation to handle multiple priorities. (iii) Noise: Production systems are often typically distributed and thus d-way balanced allocation must work with stale or incorrect information. Thus we explore the impact of noisy information and their interactions with bursts and priorities. We explore the above using both extensive simulations and analytical arguments. Specifically we show, (i) using simulations, that d-way balanced allocation quickly recovers from bursts and can gracefully handle priorities and noise; and (ii) that analysis of the underlying generative models complements our simulations and provides insight into our simulation results.
Disturbance Attenuation Regulator II: Stage Bound Finite Horizon Solution
This paper develops a generalized finite horizon recursive solution to the discrete time stage bound disturbance attenuation regulator (StDAR) for state feedback control. This problem addresses linear dynamical systems subject to stage bound disturbances, i.e., disturbance sequences constrained independently at each time step through stagewise squared two-norm bounds. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, this work derives a recursive solution for the optimal state feedback policy. The optimal policy is nonlinear in the state and requires solving a tractable convex optimization for the Lagrange multiplier vector at each stage; the control is then explicit. For systems with constant stage bound, the problem admits a steady-state optimization expressed as a tractable linear matrix inequality (LMI) with $O(n^3)$ complexity. Numerical examples illustrate the properties of the solution. This work provides a complete feedback solution to the StDAR for arbitrary initial states. Companion papers address the signal bound disturbance attenuation regulator (SiDAR): the finite horizon solution in Part~I-A and convergence properties in Part~I-B.
Disturbance Attenuation Regulator I-B: Signal Bound Convergence and Steady-State
This paper establishes convergence and steady-state properties for the signal bound disturbance attenuation regulator (SiDAR). Building on the finite horizon recursive solution developed in a companion paper, we introduce the steady-state SiDAR and derive its tractable linear matrix inequality (LMI) with $O(n^3)$ complexity. Systems are classified as degenerate or nondegenerate based on steady-state solution properties. For nondegenerate systems, the finite horizon solution converges to the steady-state solution for all states as the horizon approaches infinity. For degenerate systems, convergence holds in one region of the state space, while a turnpike arises in the complementary region. When convergence holds, the optimal multiplier and control gain are obtained directly from the LMI solution. Numerical examples illustrate convergence behavior and turnpike phenomena. Companion papers address the finite horizon SiDAR solution and the stage bound disturbance attenuation regulator (StDAR).
Disturbance Attenuation Regulator I-A: Signal Bound Finite Horizon Solution
This paper develops a generalized finite horizon recursive solution to the discrete time signal bound disturbance attenuation regulator (SiDAR) for state feedback control. This problem addresses linear dynamical systems subject to signal bound disturbances, i.e., disturbance sequences whose squared signal two-norm is bounded by a fixed budget. The term generalized indicates that the results accommodate arbitrary initial states. By combining game theory and dynamic programming, we derive a recursive solution for the optimal state feedback policy valid for arbitrary initial states. The optimal policy is nonlinear in the state and requires solving a tractable convex scalar optimization for the Lagrange multiplier at each stage; the control is then explicit. For fixed disturbance budget $α$, the state space partitions into two distinct regions: $\mathcal{X}_L(α)$, where the optimal control policy is linear and coincides with the standard linear $H_{\infty}$ state feedback control, and $\mathcal{X}_{NL}(α)$, where the optimal control policy is nonlinear. We establish monotonicity and boundedness of the associated Riccati recursions and characterize the geometry of the solution regions. A numerical example illustrates the theoretical properties. This work provides a complete feedback solution to the finite horizon SiDAR for arbitrary initial states. Companion papers address the steady-state problem and convergence properties for the signal bound case, and the stage bound disturbance attenuation regulator (StDAR).
Beyond Uptime: Actionable Performance Metrics for EV Charging Site Operators
The transition to electric vehicles (EVs) depends heavily on the reliability of charging infrastructure, yet approximately 1 in 5 drivers report being unable to charge during station visits due to inoperable equipment. While regulatory efforts such as the National Electric Vehicle Infrastructure (NEVI) program have established uptime requirements, these metrics are often simplistic, delayed, and fail to provide the diagnostic granularity needed by Charging Site Operators (CSOs). Despite their pivotal role in maintaining and improving site performance, CSOs have been largely overlooked by existing reporting standards. In this paper, we propose a suite of readily computable, actionable performance metrics-Fault Time, Fault-Reason Time, and Unreachable Time-that decompose charger behavior into operationally meaningful states. Unlike traditional uptime, these metrics are defined over configurable periods and distinguish between hardware malfunctions and network connectivity issues. We demonstrate the implementation of these metrics via an open-source tool that derives performance data from existing infrastructure without requiring hardware modifications. A case study involving 98 chargers at a California academic institution spanning 2018-2024 demonstrates that these metrics reveal persistent "zombie chargers" and high-frequency network instability that remain hidden in standard annual reporting.
Approximately Optimal Global Planning for Contact-Rich SE(2) Manipulation on a Graph of Reachable Sets
If we consider human manipulation, it is clear that contact-rich manipulation (CRM)-the ability to use any surface of the manipulator to make contact with objects-can be far more efficient and natural than relying solely on end-effectors (i.e., fingertips). However, state-of-the-art model-based planners for CRM are still focused on feasibility rather than optimality, limiting their ability to fully exploit CRM's advantages. We introduce a new paradigm that computes approximately optimal manipulator plans. This approach has two phases. Offline, we construct a graph of mutual reachable sets, where each set contains all object orientations reachable from a starting object orientation and grasp. Online, we plan over this graph, effectively computing and sequencing local plans for globally optimized motion. On a challenging, representative contact-rich task, our approach outperforms a leading planner, reducing task cost by 61%. It also achieves a 91% success rate across 250 queries and maintains sub-minute query times, ultimately demonstrating that globally optimized contact-rich manipulation is now practical for real-world tasks.
comment: 17 pages, 14 figures; under submission to IEEE Transactions on Robotics
Adaptive Economic Model Predictive Control: Performance Guarantees for Nonlinear Systems
We consider the problem of optimizing the economic performance of nonlinear constrained systems subject to uncertain time-varying parameters and bounded disturbances. In particular, we propose an adaptive economic model predictive control (MPC) framework that: (i) directly minimizes transient economic costs, (ii) addresses parametric uncertainty through online model adaptation, (iii) determines optimal setpoints online, and (iv) ensures robustness by using a tube-based approach. The proposed design ensures recursive feasibility, robust constraint satisfaction, and a transient performance bound. In case the disturbances have a finite energy and the parameter variations have a finite path length, the asymptotic average performance is (approximately) not worse than the performance obtained when operating at the best reachable steady-state. We highlight performance benefits in a numerical example involving a chemical reactor with unknown time-invariant and time-varying parameters.
comment: This is the accepted version of the paper in IEEE Transactions on Automatic Control, 2026
Exploiting Euclidean Distance Field Properties for Fast and Safe 3D planning with a modified Lazy Theta*
This paper presents the FS-Planner, a fast graph-search planner based on a modified Lazy Theta* algorithm that exploits the analytical properties of Euclidean Distance Fields (EDFs). We introduce a new cost function that integrates an EDF-based term proven to satisfy the triangle inequality, enabling efficient parent selection and reducing computation time while generating safe paths with smaller heading variations. We also derive an analytic approximation of the EDF integral along a segment and analyze the influence of the line-of-sight limit on the approximation error, motivating the use of a bounded visibility range. Furthermore, we propose a gradient-based neighbour-selection mechanism that decreases the number of explored nodes and improves computational performance without degrading safety or path quality. The FS-Planner produces safe paths with small heading changes without requiring the use of post-processing methods. Extensive experiments and comparisons in challenging 3D indoor simulation environments, complemented by tests in real-world outdoor environments, are used to evaluate and validate the FS-Planner. The results show consistent improvements in computation time, exploration efficiency, safety, and smoothness in a geometric sense compared with baseline heuristic planners, while maintaining sub-optimality within acceptable bounds. Finally, the proposed EDF-based cost formulation is orthogonal to the underlying search method and can be incorporated into other planning paradigms.
Random Walk Learning and the Pac-Man Attack
Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them inherently vulnerable to malicious behavior. In this work, we investigate an adversarial threat that we term the ``Pac-Man'' attack, in which a malicious node probabilistically terminates any RW that visits it. This stealthy behavior gradually eliminates active RWs from the network, effectively halting the learning process without triggering failure alarms. To counter this threat, we propose the Average Crossing (AC) algorithm--a fully decentralized mechanism for duplicating RWs to prevent RW extinction in the presence of Pac-Man. Our theoretical analysis establishes that (i) the RW population remains almost surely bounded under AC and (ii) RW-based stochastic gradient descent remains convergent under AC, even in the presence of Pac-Man, with a quantifiable deviation from the true optimum. Our extensive empirical results on both synthetic and real-world datasets corroborate our theoretical findings. Furthermore, they uncover a phase transition in the extinction probability as a function of the duplication threshold. We offer theoretical insights by analyzing a simplified variant of the AC, which sheds light on the observed phase transition.
comment: The updated manuscript represents an incomplete version of the work. A substantially updated version will be prepared before further dissemination
Mixed Bernstein-Fourier Approximants for Optimal Trajectory Generation with Periodic Behavior
Efficient trajectory generation is crucial for autonomous systems; however, current numerical methods often struggle to handle periodic behaviors effectively, particularly when the onboard sensors require equidistant temporal sampling. This paper introduces a novel mixed Bernstein-Fourier approximation framework tailored explicitly for optimal motion planning. Our proposed methodology leverages the uniform convergence properties of Bernstein polynomials for nonperiodic behaviors while effectively capturing periodic dynamics through the Fourier series. Theoretical results are established, including uniform convergence proofs for approximations of functions, derivatives, and integrals, as well as detailed error bound analyses. We further introduce a regulated least squares approach for determining approximation coefficients, enhancing numerical stability and practical applicability. Within an optimal control context, we establish the feasibility and consistency of approximated solutions to their continuous counterparts. We also extend the covector mapping theorem, providing theoretical guarantees for approximating dual variables crucial in verifying the necessary optimality conditions from Pontryagin's Maximum Principle. Numerical examples illustrate the method's superior performance, demonstrating substantial improvements in computational efficiency and precision in scenarios with complex periodic constraints and dynamics. Our mixed Bernstein-Fourier methodology thus presents a robust, theoretically grounded, and computationally efficient approach for advanced optimal trajectory planning in autonomous systems.
comment: 51 pages, 10 figures
Semi-Tensor-Product Based Convolutional Neural Networks
The semi-tensor product of vectors generalizes the conventional inner product, enabling algebraic operations between vectors of different dimensions. Building upon this foundation, we introduce a domain-based convolutional product and integrate it with the STP to formulate a padding-free convolutional operation. This new operation inherently avoids zero or other artificial padding, thereby eliminating redundant information and boundary artifacts commonly present in conventional convolutional neural networks. Based on this operation, we further develop an STP-based CNN framework that extends convolutional computation to irregular and cross-dimensional data domains. Applications to image processing and third-order signal identification demonstrate the proposed method's effectiveness in handling irregular, incomplete, and high-dimensional data without the distortions caused by padding.
Singularity-Free Guiding Vector Field over Bézier's Curves Applied to Rovers Path Planning and Path Following
This paper presents a guidance algorithm for solving the problem of following parametric paths, as well as a curvature-varying speed setpoint for land-based car-type wheeled mobile robots (WMRs). The guidance algorithm relies on Singularity-Free Guiding Vector Fields SF-GVF. This novel GVF approach expands the desired robot path and the Guiding vector field to a higher dimensional space, in which an angular control function can be found to ensure global asymptotic convergence to the desired parametric path while avoiding field singularities. In SF-GVF, paths should follow a parametric definition. This feature makes using Bezier's curves attractive to define the robot's desired patch. The curvature-varying speed setpoint, combined with the guidance algorithm, eases the convergence to the path when physical restrictions exist, such as minimal turning radius or maximal lateral acceleration. We provide theoretical results, simulations, and outdoor experiments using a WMR platform assembled with off-the-shelf components.
comment: Final version, accepted for publication. 26 pages, 15 figures
Barrier Certificates for Unknown Systems with Latent States and Polynomial Dynamics using Bayesian Inference
Certifying safety in dynamical systems is crucial, but barrier certificates - widely used to verify that system trajectories remain within a safe region - typically require explicit system models. When dynamics are unknown, data-driven methods can be used instead, yet obtaining a valid certificate requires rigorous uncertainty quantification. For this purpose, existing methods usually rely on full-state measurements, limiting their applicability. This paper proposes a novel approach for synthesizing barrier certificates for unknown systems with latent states and polynomial dynamics. A Bayesian framework is employed, where a prior in state-space representation is updated using output data via a targeted marginal Metropolis-Hastings sampler. The resulting samples are used to construct a barrier certificate through a sum-of-squares program. Probabilistic guarantees for its validity with respect to the true, unknown system are obtained by testing on an additional set of posterior samples. The approach and its probabilistic guarantees are illustrated through a numerical simulation.
comment: Accepted for publication in the Proceedings of the 64th IEEE Conference on Decision and Control
Deep Learning for Continuous-Time Stochastic Control with Jumps NeurIPS 2025
In this paper, we introduce a model-based deep-learning approach to solve finite-horizon continuous-time stochastic control problems with jumps. We iteratively train two neural networks: one to represent the optimal policy and the other to approximate the value function. Leveraging a continuous-time version of the dynamic programming principle, we derive two different training objectives based on the Hamilton-Jacobi-Bellman equation, ensuring that the networks capture the underlying stochastic dynamics. Empirical evaluations on different problems illustrate the accuracy and scalability of our approach, demonstrating its effectiveness in solving complex high-dimensional stochastic control tasks.
comment: NeurIPS 2025
Adaptive Model-Predictive Control of a Soft Continuum Robot Using a Physics-Informed Neural Network Based on Cosserat Rod Theory
Dynamic control of soft continuum robots (SCRs) holds great potential for expanding their applications, but remains a challenging problem due to the high computational demands of accurate dynamic models. While data-driven approaches like Koopman-operator-based methods have been proposed, they typically lack adaptability and cannot reconstruct the full robot shape, limiting their applicability. This work introduces a real-time-capable nonlinear model-predictive control (MPC) framework for SCRs based on a domain-decoupled physics-informed neural network (DD-PINN) with adaptable bending stiffness. The DD-PINN serves as a surrogate for the dynamic Cosserat rod model with a speed-up factor of 44000. It is also used within an unscented Kalman filter for estimating the model states and bending compliance from end-effector position measurements. We implement a nonlinear evolutionary MPC running at 70 Hz on the GPU. In simulation, it demonstrates accurate tracking of dynamic trajectories and setpoint control with end-effector position errors below 3 mm (2.3% of the actuator's length). In real-world experiments, the controller achieves similar accuracy and accelerations up to 3.55 m/s2.
comment: Submitted to IEEE Transactions on Robotics, 20 pages, 14 figures
Hardware Acceleration for Neural Networks: A Comprehensive Survey
Neural networks have become dominant computational workloads across cloud and edge platforms, but their rapid growth in model size and deployment diversity has exposed hardware bottlenecks increasingly dominated by memory movement, communication, and irregular operators rather than peak arithmetic throughput. This survey reviews the current technology landscape for hardware acceleration of deep learning, spanning GPUs and tensor-core architectures, domain-specific accelerators (TPUs, NPUs), FPGA-based designs, ASIC inference engines, and emerging LLM-serving accelerators such as LPUs, alongside in-/near-memory computing and neuromorphic/analog approaches. We organize the survey using a unified taxonomy across (i) workloads (CNNs, RNNs, GNNs, Transformers/LLMs), (ii) execution settings (training vs.\ inference; datacenter vs.\ edge), and (iii) optimization levers (reduced precision, sparsity and pruning, operator fusion, compilation and scheduling, memory-system/interconnect design). We synthesize key architectural ideas such as systolic arrays, vector and SIMD engines, specialized attention and softmax kernels, quantization-aware datapaths, and high-bandwidth memory, and discuss how software stacks and compilers bridge model semantics to hardware. Finally, we highlight open challenges -- including efficient long-context LLM inference (KV-cache management), robust support for dynamic and sparse workloads, energy- and security-aware deployment, and fair benchmarking -- pointing to promising directions for the next generation of neural acceleration.
From Ground to Sky: Architectures, Applications, and Challenges Shaping Low-Altitude Wireless Networks
In this article, we introduce a novel low-altitude wireless network (LAWN), which is a reconfigurable, three-dimensional (3D) layered architecture. In particular, the LAWN integrates connectivity, sensing, control, and computing across aerial and terrestrial nodes that enable seamless operation in complex, dynamic, and mission-critical environments. Different from the conventional aerial communication systems, LAWN's distinctive feature is its tight integration of functional planes in which multiple functionalities continually reshape themselves to operate safely and efficiently in the low-altitude sky. With the LAWN, we discuss several enabling technologies, such as integrated sensing and communication (ISAC), semantic communication, and fully-actuated control systems. Finally, we identify potential applications and key cross-layer challenges. This article offers a comprehensive roadmap for future research and development in the low-altitude airspace.
comment: 12 pages
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography
Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel linear descriptors-based block field method (MLD-BFM) for continuous decoding of five finger-joint DoFs by leveraging the rich spatial information of high-density surface electromyography (HD sEMG). Twenty-one healthy participants performed dynamic sinusoidal finger movements while HD sEMG signals were recorded from the extensor digitorum communis (EDC) and flexor digitorum superficialis (FDS) muscles. MLD-BFM extracted region-specific spatial features, including effective field strength ($Σ$), field-strength variation rate ($Φ$), and spatial complexity ($Ω$). Model performance was optimized (block size: $2 \times 2$; window: 0.15 s) and compared with conventional time-domain features and dimensionality reduction approaches when applied to multi-output regression models. MLD-BFM consistently achieved the highest $\mathrm{R}^2_{\mathrm{vw}}$ values across all models. The multilayer perceptron (MLP) combined with MLD-BFM yielded the best performance ($\mathrm{R}^2_{\mathrm{vw}} = 86.68\% \pm 0.33$). Time-domain features also showed strong predictive capability and were statistically comparable to MLD-BFM in some models, whereas dimensionality reduction techniques exhibited lower accuracy. Decoding accuracy was higher for the middle and ring fingers than for the thumb. Overall, MLD-BFM improved continuous finger movement decoding accuracy, underscoring the importance of taking advantage of the spatial richness of HD sEMG. These findings suggest that spatially structured features enhance SPC and provide practical guidance for designing robust, real-time, and responsive myoelectric interfaces.
comment: 39 pages, 13 figures, 2 tables
Zeroing neural dynamics solving time-variant complex conjugate matrix equation $X(τ)F(τ)-A(τ)\overline{X}(τ)=C(τ)$
Complex conjugate matrix equations (CCME) are important in computation and antilinear systems. Existing research mainly focuses on the time-invariant version, while studies on the time-variant version and its solution using artificial neural networks are still lacking. This paper introduces zeroing neural dynamics (ZND) to solve the earliest time-variant CCME. Firstly, the vectorization and Kronecker product in the complex field are defined uniformly. Secondly, Con-CZND1 and Con-CZND2 models are proposed, and their convergence and effectiveness are theoretically proved. Thirdly, numerical experiments confirm their effectiveness and highlight their differences. The results show the advantages of ZND in the complex field compared with that in the real field, and further refine the related theory.
Robotics
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning
Vision-Language-Action (VLA) tasks require reasoning over complex visual scenes and executing adaptive actions in dynamic environments. While recent studies on reasoning VLAs show that explicit chain-of-thought (CoT) can improve generalization, they suffer from high inference latency due to lengthy reasoning traces. We propose Fast-ThinkAct, an efficient reasoning framework that achieves compact yet performant planning through verbalizable latent reasoning. Fast-ThinkAct learns to reason efficiently with latent CoTs by distilling from a teacher, driven by a preference-guided objective to align manipulation trajectories that transfers both linguistic and visual planning capabilities for embodied control. This enables reasoning-enhanced policy learning that effectively connects compact reasoning to action execution. Extensive experiments across diverse embodied manipulation and reasoning benchmarks demonstrate that Fast-ThinkAct achieves strong performance with up to 89.3\% reduced inference latency over state-of-the-art reasoning VLAs, while maintaining effective long-horizon planning, few-shot adaptation, and failure recovery.
comment: Project page: https://jasper0314-huang.github.io/fast-thinkact/
Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets
Vision-based policies for robot manipulation have achieved significant recent success, but are still brittle to distribution shifts such as camera viewpoint variations. Robot demonstration data is scarce and often lacks appropriate variation in camera viewpoints. Simulation offers a way to collect robot demonstrations at scale with comprehensive coverage of different viewpoints, but presents a visual sim2real challenge. To bridge this gap, we propose MANGO -- an unpaired image translation method with a novel segmentation-conditioned InfoNCE loss, a highly-regularized discriminator design, and a modified PatchNCE loss. We find that these elements are crucial for maintaining viewpoint consistency during sim2real translation. When training MANGO, we only require a small amount of fixed-camera data from the real world, but show that our method can generate diverse unseen viewpoints by translating simulated observations. In this domain, MANGO outperforms all other image translation methods we tested. Imitation-learning policies trained on data augmented by MANGO are able to achieve success rates as high as 60\% on views that the non-augmented policy fails completely on.
Multimodal Signal Processing For Thermo-Visible-Lidar Fusion In Real-time 3D Semantic Mapping
In complex environments, autonomous robot navigation and environmental perception pose higher requirements for SLAM technology. This paper presents a novel method for semantically enhancing 3D point cloud maps with thermal information. By first performing pixel-level fusion of visible and infrared images, the system projects real-time LiDAR point clouds onto this fused image stream. It then segments heat source features in the thermal channel to instantly identify high temperature targets and applies this temperature information as a semantic layer on the final 3D map. This approach generates maps that not only have accurate geometry but also possess a critical semantic understanding of the environment, making it highly valuable for specific applications like rapid disaster assessment and industrial preventive maintenance.
comment: 5 pages,7 figures. Under review
Learning Whole-Body Human-Humanoid Interaction from Human-Human Demonstrations
Enabling humanoid robots to physically interact with humans is a critical frontier, but progress is hindered by the scarcity of high-quality Human-Humanoid Interaction (HHoI) data. While leveraging abundant Human-Human Interaction (HHI) data presents a scalable alternative, we first demonstrate that standard retargeting fails by breaking the essential contacts. We address this with PAIR (Physics-Aware Interaction Retargeting), a contact-centric, two-stage pipeline that preserves contact semantics across morphology differences to generate physically consistent HHoI data. This high-quality data, however, exposes a second failure: conventional imitation learning policies merely mimic trajectories and lack interactive understanding. We therefore introduce D-STAR (Decoupled Spatio-Temporal Action Reasoner), a hierarchical policy that disentangles when to act from where to act. In D-STAR, Phase Attention (when) and a Multi-Scale Spatial module (where) are fused by the diffusion head to produce synchronized whole-body behaviors beyond mimicry. By decoupling these reasoning streams, our model learns robust temporal phases without being distracted by spatial noise, leading to responsive, synchronized collaboration. We validate our framework through extensive and rigorous simulations, demonstrating significant performance gains over baseline approaches and a complete, effective pipeline for learning complex whole-body interactions from HHI data.
CLARE: Continual Learning for Vision-Language-Action Models via Autonomous Adapter Routing and Expansion
To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.
comment: Project page: https://tum-lsy.github.io/clare. 9 pages, 5 figures
Data Scaling for Navigation in Unknown Environments
Generalization of imitation-learned navigation policies to environments unseen in training remains a major challenge. We address this by conducting the first large-scale study of how data quantity and data diversity affect real-world generalization in end-to-end, map-free visual navigation. Using a curated 4,565-hour crowd-sourced dataset collected across 161 locations in 35 countries, we train policies for point goal navigation and evaluate their closed-loop control performance on sidewalk robots operating in four countries, covering 125 km of autonomous driving. Our results show that large-scale training data enables zero-shot navigation in unknown environments, approaching the performance of policies trained with environment-specific demonstrations. Critically, we find that data diversity is far more important than data quantity. Doubling the number of geographical locations in a training set decreases navigation errors by ~15%, while performance benefit from adding data from existing locations saturates with very little data. We also observe that, with noisy crowd-sourced data, simple regression-based models outperform generative and sequence-based architectures. We release our policies, evaluation setup and example videos on the project page.
ReflexDiffusion: Reflection-Enhanced Trajectory Planning for High-lateral-acceleration Scenarios in Autonomous Driving AAAI 2026
Generating safe and reliable trajectories for autonomous vehicles in long-tail scenarios remains a significant challenge, particularly for high-lateral-acceleration maneuvers such as sharp turns, which represent critical safety situations. Existing trajectory planners exhibit systematic failures in these scenarios due to data imbalance. This results in insufficient modelling of vehicle dynamics, road geometry, and environmental constraints in high-risk situations, leading to suboptimal or unsafe trajectory prediction when vehicles operate near their physical limits. In this paper, we introduce ReflexDiffusion, a novel inference-stage framework that enhances diffusion-based trajectory planners through reflective adjustment. Our method introduces a gradient-based adjustment mechanism during the iterative denoising process: after each standard trajectory update, we compute the gradient between the conditional and unconditional noise predictions to explicitly amplify critical conditioning signals, including road curvature and lateral vehicle dynamics. This amplification enforces strict adherence to physical constraints, particularly improving stability during high-lateral-acceleration maneuvers where precise vehicle-road interaction is paramount. Evaluated on the nuPlan Test14-hard benchmark, ReflexDiffusion achieves a 14.1% improvement in driving score for high-lateral-acceleration scenarios over the state-of-the-art (SOTA) methods. This demonstrates that inference-time trajectory optimization can effectively compensate for training data sparsity by dynamically reinforcing safety-critical constraints near handling limits. The framework's architecture-agnostic design enables direct deployment to existing diffusion-based planners, offering a practical solution for improving autonomous vehicle safety in challenging driving conditions.
comment: Accepted by AAAI 2026
Feedback-Based Mobile Robot Navigation in 3-D Environments Using Artificial Potential Functions Technical Report
This technical report presents the construction and analysis of polynomial navigation functions for motion planning in 3-D workspaces populated by spherical and cylindrical obstacles. The workspace is modeled as a bounded spherical region, and obstacles are encoded using smooth polynomial implicit functions. We establish conditions under which the proposed navigation functions admit a unique non-degenerate minimum at the target while avoiding local minima, including in the presence of pairwise intersecting obstacles. Gradient and Hessian analyses are provided, and the theoretical results are validated through numerical simulations in obstacle rich 3-D environments.
Online Trajectory Optimization for Arbitrary-Shaped Mobile Robots via Polynomial Separating Hypersurfaces
An emerging class of trajectory optimization methods enforces collision avoidance by jointly optimizing the robot's configuration and a separating hyperplane. However, as linear separators only apply to convex sets, these methods require convex approximations of both the robot and obstacles, which becomes an overly conservative assumption in cluttered and narrow environments. In this work, we unequivocally remove this limitation by introducing nonlinear separating hypersurfaces parameterized by polynomial functions. We first generalize the classical separating hyperplane theorem and prove that any two disjoint bounded closed sets in Euclidean space can be separated by a polynomial hypersurface, serving as the theoretical foundation for nonlinear separation of arbitrary geometries. Building on this result, we formulate a nonlinear programming (NLP) problem that jointly optimizes the robot's trajectory and the coefficients of the separating polynomials, enabling geometry-aware collision avoidance without conservative convex simplifications. The optimization remains efficiently solvable using standard NLP solvers. Simulation and real-world experiments with nonconvex robots demonstrate that our method achieves smooth, collision-free, and agile maneuvers in environments where convex-approximation baselines fail.
Vision-Conditioned Variational Bayesian Last Layer Dynamics Models
Agile control of robotic systems often requires anticipating how the environment affects system behavior. For example, a driver must perceive the road ahead to anticipate available friction and plan actions accordingly. Achieving such proactive adaptation within autonomous frameworks remains a challenge, particularly under rapidly changing conditions. Traditional modeling approaches often struggle to capture abrupt variations in system behavior, while adaptive methods are inherently reactive and may adapt too late to ensure safety. We propose a vision-conditioned variational Bayesian last-layer dynamics model that leverages visual context to anticipate changes in the environment. The model first learns nominal vehicle dynamics and is then fine-tuned with feature-wise affine transformations of latent features, enabling context-aware dynamics prediction. The resulting model is integrated into an optimal controller for vehicle racing. We validate our method on a Lexus LC500 racing through water puddles. With vision-conditioning, the system completed all 12 attempted laps under varying conditions. In contrast, all baselines without visual context consistently lost control, demonstrating the importance of proactive dynamics adaptation in high-performance applications.
comment: 9 pages, 7 figures, currently under review
CEI: A Unified Interface for Cross-Embodiment Visuomotor Policy Learning in 3D Space
Robotic foundation models trained on large-scale manipulation datasets have shown promise in learning generalist policies, but they often overfit to specific viewpoints, robot arms, and especially parallel-jaw grippers due to dataset biases. To address this limitation, we propose Cross-Embodiment Interface (\CEI), a framework for cross-embodiment learning that enables the transfer of demonstrations across different robot arm and end-effector morphologies. \CEI introduces the concept of \textit{functional similarity}, which is quantified using Directional Chamfer Distance. Then it aligns robot trajectories through gradient-based optimization, followed by synthesizing observations and actions for unseen robot arms and end-effectors. In experiments, \CEI transfers data and policies from a Franka Panda robot to \textbf{16} different embodiments across \textbf{3} tasks in simulation, and supports bidirectional transfer between a UR5+AG95 gripper robot and a UR5+Xhand robot across \textbf{6} real-world tasks, achieving an average transfer ratio of 82.4\%. Finally, we demonstrate that \CEI can also be extended with spatial generalization and multimodal motion generation capabilities using our proposed techniques. Project website: https://cross-embodiment-interface.github.io/
Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams
Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and multiple planetary orbiters lay the groundwork for future missions that use ground-aerial robotic teams. In this paper, we consider rovers using machine learning to localise themselves in a local aerial map using limited field-of-view monocular ground-view RGB images as input. A key consideration for machine learning methods is that real space data with ground-truth position labels suitable for training is scarce. In this work, we propose a novel method of localising rovers in an aerial map using cross-view-localising dual-encoder deep neural networks. We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images. We also contribute a new cross-view dataset of real-world rover trajectories with corresponding ground-truth localisation data captured in a planetary analogue facility, plus a high volume dataset of analogous synthetic image pairs. Using particle filters for state estimation with the cross-view networks allows accurate position estimation over simple and complex trajectories based on sequences of ground-view images.
comment: 7 pages, 10 figures. Presented at the International Conference on Space Robotics (iSpaRo) 2025 in Sendai, Japan. Dataset available: https://doi.org/10.5281/zenodo.17364038
Design Methodology of Hydraulically-driven Soft Robotic Gripper for a Large and Heavy Object
This paper presents a design methodology of a hydraulically-driven soft robotic gripper for grasping a large and heavy object -- approximately 10 - 20 kg with 20 - 30 cm diameter. Most existing soft grippers are pneumatically actuated with several hundred kPa pressure, and cannot generate output force sufficient for such a large and heavy object. Instead of pneumatic actuation, hydraulic actuation has a potential to generate much larger power by several MPa pressure. In this study, we develop a hydraulically-driven soft gripper, in which its basic design parameters are determined based on a mathematical model that represents the relationship among the driving pressure, bending angle, object mass and grasping force. Moreover, we selected materials suitable for grasping a heavier object, based on the finite element analysis result of the detailed design. We report experimental results on a 20 kg object grasping and closed-loop control of the finger bending angle.
SyncTwin: Fast Digital Twin Construction and Synchronization for Safe Robotic Grasping
Accurate and safe grasping under dynamic and visually occluded conditions remains a core challenge in real-world robotic manipulation. We present SyncTwin, a digital twin framework that unifies fast 3D scene reconstruction and real-to-sim synchronization for robust and safety-aware grasping in such environments. In the offline stage, we employ VGGT to rapidly reconstruct object-level 3D assets from RGB images, forming a reusable geometry library for simulation. During execution, SyncTwin continuously synchronizes the digital twin by tracking real-world object states via point cloud segmentation updates and aligning them through colored-ICP registration. The updated twin enables motion planners to compute collision-free and dynamically feasible trajectories in simulation, which are safely executed on the real robot through a closed real-to-sim-to-real loop. Experiments in dynamic and occluded scenes show that SyncTwin improves grasp accuracy and motion safety, demonstrating the effectiveness of digital-twin synchronization for real-world robotic execution.
How Human Motion Prediction Quality Shapes Social Robot Navigation Performance in Constrained Spaces
Motivated by the vision of integrating mobile robots closer to humans in warehouses, hospitals, manufacturing plants, and the home, we focus on robot navigation in dynamic and spatially constrained environments. Ensuring human safety, comfort, and efficiency in such settings requires that robots are endowed with a model of how humans move around them. Human motion prediction around robots is especially challenging due to the stochasticity of human behavior, differences in user preferences, and data scarcity. In this work, we perform a methodical investigation of the effects of human motion prediction quality on robot navigation performance, as well as human productivity and impressions. We design a scenario involving robot navigation among two human subjects in a constrained workspace and instantiate it in a user study ($N=80$) involving two different robot platforms, conducted across two sites from different world regions. Key findings include evidence that: 1) the widely adopted average displacement error is not a reliable predictor of robot navigation performance and human impressions; 2) the common assumption of human cooperation breaks down in constrained environments, with users often not reciprocating robot cooperation, and causing performance degradations; 3) more efficient robot navigation often comes at the expense of human efficiency and comfort.
Interprofessional and Agile Development of Mobirobot: A Socially Assistive Robot for Pediatric Therapy Across Clinical and Therapeutic Settings
Introduction: Socially assistive robots hold promise for enhancing therapeutic engagement in paediatric clinical settings. However, their successful implementation requires not only technical robustness but also context-sensitive, co-designed solutions. This paper presents Mobirobot, a socially assistive robot developed to support mobilisation in children recovering from trauma, fractures, or depressive disorders through personalised exercise programmes. Methods: An agile, human-centred development approach guided the iterative design of Mobirobot. Multidisciplinary clinical teams and end users were involved throughout the co-development process, which focused on early integration into real-world paediatric surgical and psychiatric settings. The robot, based on the NAO platform, features a simple setup, adaptable exercise routines with interactive guidance, motivational dialogue, and a graphical user interface (GUI) for monitoring and no-code system feedback. Results: Deployment in hospital environments enabled the identification of key design requirements and usability constraints. Stakeholder feedback led to refinements in interaction design, movement capabilities, and technical configuration. A feasibility study is currently underway to assess acceptance, usability, and perceived therapeutic benefit, with data collection including questionnaires, behavioural observations, and staff-patient interviews. Discussion: Mobirobot demonstrates how multiprofessional, stakeholder-led development can yield a socially assistive system suited for dynamic inpatient settings. Early-stage findings underscore the importance of contextual integration, robustness, and minimal-intrusion design. While challenges such as sensor limitations and patient recruitment remain, the platform offers a promising foundation for further research and clinical application.
comment: submitted to Frontiers in Digital Health
LCF3D: A Robust and Real-Time Late-Cascade Fusion Framework for 3D Object Detection in Autonomous Driving
Accurately localizing 3D objects like pedestrians, cyclists, and other vehicles is essential in Autonomous Driving. To ensure high detection performance, Autonomous Vehicles complement RGB cameras with LiDAR sensors, but effectively combining these data sources for 3D object detection remains challenging. We propose LCF3D, a novel sensor fusion framework that combines a 2D object detector on RGB images with a 3D object detector on LiDAR point clouds. By leveraging multimodal fusion principles, we compensate for inaccuracies in the LiDAR object detection network. Our solution combines two key principles: (i) late fusion, to reduce LiDAR False Positives by matching LiDAR 3D detections with RGB 2D detections and filtering out unmatched LiDAR detections; and (ii) cascade fusion, to recover missed objects from LiDAR by generating new 3D frustum proposals corresponding to unmatched RGB detections. Experiments show that LCF3D is beneficial for domain generalization, as it turns out to be successful in handling different sensor configurations between training and testing domains. LCF3D achieves significant improvements over LiDAR-based methods, particularly for challenging categories like pedestrians and cyclists in the KITTI dataset, as well as motorcycles and bicycles in nuScenes. Code can be downloaded from: https://github.com/CarloSgaravatti/LCF3D.
comment: 35 pages, 14 figures. Published at Pattern Recognition
Causality-enhanced Decision-Making for Autonomous Mobile Robots in Dynamic Environments
The growing integration of robots in shared environments - such as warehouses, shopping centres, and hospitals - demands a deep understanding of the underlying dynamics and human behaviours, including how, when, and where individuals engage in various activities and interactions. This knowledge goes beyond simple correlation studies and requires a more comprehensive causal analysis. By leveraging causal inference to model cause-and-effect relationships, we can better anticipate critical environmental factors and enable autonomous robots to plan and execute tasks more effectively. To this end, we propose a novel causality-based decision-making framework that reasons over a learned causal model to assist the robot in deciding when and how to complete a given task. In the examined use case - i.e., a warehouse shared with people - we exploit the causal model to estimate battery usage and human obstructions as factors influencing the robot's task execution. This reasoning framework supports the robot in making informed decisions about task timing and strategy. To achieve this, we developed also PeopleFlow, a new Gazebo-based simulator designed to model context-sensitive human-robot spatial interactions in shared workspaces. PeopleFlow features realistic human and robot trajectories influenced by contextual factors such as time, environment layout, and robot state, and can simulate a large number of agents. While the simulator is general-purpose, in this paper we focus on a warehouse-like environment as a case study, where we conduct an extensive evaluation benchmarking our causal approach against a non-causal baseline. Our findings demonstrate the efficacy of the proposed solutions, highlighting how causal reasoning enables autonomous robots to operate more efficiently and safely in dynamic environments shared with humans.
comment: Causal Discovery and Inference - Robot Autonomy - Human-Robot Spatial Interaction - Decision-Making
SPARK: Safe Protective and Assistive Robot Kit
This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. To facilitate safe deployment of complex robot systems, SPARK can be used as a toolbox that comes with state-of-the-art safe control algorithms in a modular and composable robot control framework. Users can easily configure safety criteria and sensitivity levels to optimize the balance between safety and performance. To accelerate humanoid safety research and development, SPARK provides simulation benchmarks that compare safety approaches in a variety of environments, tasks, and robot models. Furthermore, SPARK allows quick deployment of synthesized safe controllers on real robots. For hardware deployment, SPARK supports Apple Vision Pro (AVP) or a Motion Capture System as external sensors, while offering interfaces for seamless integration with alternative hardware setups at the same time. This paper demonstrates SPARK's capability with both simulation experiments and case studies with a Unitree G1 humanoid robot. Leveraging these advantages of SPARK, users and researchers can significantly improve the safety of their humanoid systems as well as accelerate relevant research. The open source code is available at: https://github.com/intelligent-control-lab/spark.
comment: Presented at IFAC Symposium on Robotics
Environment as Policy: Learning to Race in Unseen Tracks ICRA
Reinforcement learning (RL) has achieved outstanding success in complex robot control tasks, such as drone racing, where the RL agents have outperformed human champions in a known racing track. However, these agents fail in unseen track configurations, always requiring complete retraining when presented with new track layouts. This work aims to develop RL agents that generalize effectively to novel track configurations without retraining. The naive solution of training directly on a diverse set of track layouts can overburden the agent, resulting in suboptimal policy learning as the increased complexity of the environment impairs the agent's ability to learn to fly. To enhance the generalizability of the RL agent, we propose an adaptive environment-shaping framework that dynamically adjusts the training environment based on the agent's performance. We achieve this by leveraging a secondary RL policy to design environments that strike a balance between being challenging and achievable, allowing the agent to adapt and improve progressively. Using our adaptive environment shaping, one single racing policy efficiently learns to race in diverse challenging tracks. Experimental results validated in both simulation and the real world show that our method enables drones to successfully fly complex and unseen race tracks, outperforming existing environment-shaping techniques. Project page: http://rpg.ifi.uzh.ch/env_as_policy.
comment: Accepted at IEEE International Conference on Robotics and Automation (ICRA), 2025
Periodic robust robotic rock chop via virtual model control
Robotic cutting is a challenging contact-rich manipulation task where the robot must simultaneously negotiate unknown object mechanics, large contact forces, and precise motion requirements. We introduce a new active virtual-model control scheme that enables knife rocking motion for robot manipulators, without pre-planned trajectories or precise information of the environment. Motion is generated and controlled through switching virtual coupling with virtual mechanisms, given by virtual springs, dampers, and masses arranged in a suitable way. Through analysis and experiments, we demonstrate that the controlled robot behavior settles into a periodic motion. Experiments with a Franka manipulator demonstrate robust cuts with five different vegetables, and sub-millimeter slice accuracy from 1 mm to 6 mm at nearly one cut per second. The same controller survives changes in knife shape and cutting board height, and adaptation to a different humanoid manipulator, demonstrating robustness and platform independence.
Shape-Space Graphs: Fast and Collision-Free Path Planning for Soft Robots
Soft robots, inspired by elephant trunks or octopus arms, offer extraordinary flexibility to bend, twist, and elongate in ways that rigid robots cannot. However, their motion planning remains a challenge, especially in cluttered environments with obstacles, due to their highly nonlinear and infinite-dimensional kinematics. Here, we present a graph-based path planning tool for an elephant-trunk-inspired soft robot designed with three artificial muscle fibers that allow for continuous deformation through contraction. Using a biomechanical model that integrates morphoelastic and active filament theories, we precompute a shape library and construct a k-nearest neighbor graph in \emph{shape space}, ensuring that each node corresponds to a valid robot shape. For the graph, we use signed distance functions to prune nodes and edges colliding with obstacles, and define multi-objective edge costs based on geometric distance and actuation effort, enabling energy-aware planning with collision avoidance. We demonstrate that our algorithm reliably avoids obstacles and generates feasible paths within milliseconds from precomputed graphs using Dijkstra's algorithm. We show that including energy costs can drastically reduce the actuation effort compared to geometry-only planning, at the expense of longer tip trajectories. Our results highlight the potential of shape-space graph search for fast and reliable path planning in the field of soft robotics, paving the way for real-time applications in surgical, industrial, and assistive settings.
comment: revised version
UniConFlow: A Unified Constrained Flow-Matching Framework for Certified Motion Planning
Generative models have become increasingly powerful tools for robot motion generation, enabling flexible and multimodal trajectory generation across various tasks. Yet, most existing approaches remain limited in handling multiple types of constraints, such as collision avoidance, actuation limits, and dynamic consistency, which are typically addressed individually or heuristically. In this work, we propose UniConFlow, a unified constrained flow matching-based framework for trajectory generation that systematically incorporates both equality and inequality constraints. Moreover, UniConFlow introduces a novel prescribed-time zeroing function that shapes a time-varying guidance field during inference, allowing the generation process to adapt to varying system models and task requirements. Furthermore, to further address the computational challenges of long-horizon and high-dimensional trajectory generation, we propose two practical strategies for the terminal constraint enforcement and inference process: a violation-segment extraction protocol that precisely localizes and refines only the constraint-violating portions of trajectories, and a trajectory compression method that accelerates optimization in a reduced-dimensional space while preserving high-fidelity reconstruction after decoding. Empirical validation across three experiments, including a double inverted pendulum, a real-to-sim car racing task, and a sim-to-real manipulation task, demonstrates that UniConFlow outperforms state-of-the-art generative planners and conventional optimization baselines, achieving superior performance on certified motion planning metrics such as safety, kinodynamic consistency, and action feasibility. Project page is available at: https://uniconflow.github.io.
Learning on the Fly: Rapid Policy Adaptation via Differentiable Simulation
Learning control policies in simulation enables rapid, safe, and cost-effective development of advanced robotic capabilities. However, transferring these policies to the real world remains difficult due to the sim-to-real gap, where unmodeled dynamics and environmental disturbances can degrade policy performance. Existing approaches, such as domain randomization and Real2Sim2Real pipelines, can improve policy robustness, but either struggle under out-of-distribution conditions or require costly offline retraining. In this work, we approach these problems from a different perspective. Instead of relying on diverse training conditions before deployment, we focus on rapidly adapting the learned policy in the real world in an online fashion. To achieve this, we propose a novel online adaptive learning framework that unifies residual dynamics learning with real-time policy adaptation inside a differentiable simulation. Starting from a simple dynamics model, our framework refines the model continuously with real-world data to capture unmodeled effects and disturbances such as payload changes and wind. The refined dynamics model is embedded in a differentiable simulation framework, enabling gradient backpropagation through the dynamics and thus rapid, sample-efficient policy updates beyond the reach of classical RL methods like PPO. All components of our system are designed for rapid adaptation, enabling the policy to adjust to unseen disturbances within 5 seconds of training. We validate the approach on agile quadrotor control under various disturbances in both simulation and the real world. Our framework reduces hovering error by up to 81% compared to L1-MPC and 55% compared to DATT, while also demonstrating robustness in vision-based control without explicit state estimation.
JuggleRL: Mastering Ball Juggling with a Quadrotor via Deep Reinforcement Learning
Aerial robots interacting with objects must perform precise, contact-rich maneuvers under uncertainty. In this paper, we study the problem of aerial ball juggling using a quadrotor equipped with a racket, a task that demands accurate timing, stable control, and continuous adaptation. We propose JuggleRL, the first reinforcement learning-based system for aerial juggling. It learns closed-loop policies in large-scale simulation using systematic calibration of quadrotor and ball dynamics to reduce the sim-to-real gap. The training incorporates reward shaping to encourage racket-centered hits and sustained juggling, as well as domain randomization over ball position and coefficient of restitution to enhance robustness and transferability. The learned policy outputs mid-level commands executed by a low-level controller and is deployed zero-shot on real hardware, where an enhanced perception module with a lightweight communication protocol reduces delays in high-frequency state estimation and ensures real-time control. Experiments show that JuggleRL achieves an average of $311$ hits over $10$ consecutive trials in the real world, with a maximum of $462$ hits observed, far exceeding a model-based baseline that reaches at most $14$ hits with an average of $3.1$. Moreover, the policy generalizes to unseen conditions, successfully juggling a lighter $5$ g ball with an average of $145.9$ hits. This work demonstrates that reinforcement learning can empower aerial robots with robust and stable control in dynamic interaction tasks.
SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
Training expressive flow-based policies with off-policy reinforcement learning is notoriously unstable due to gradient pathologies in the multi-step action sampling process. We trace this instability to a fundamental connection: the flow rollout is algebraically equivalent to a residual recurrent computation, making it susceptible to the same vanishing and exploding gradients as RNNs. To address this, we reparameterize the velocity network using principles from modern sequential models, introducing two stable architectures: Flow-G, which incorporates a gated velocity, and Flow-T, which utilizes a decoded velocity. We then develop a practical SAC-based algorithm, enabled by a noise-augmented rollout, that facilitates direct end-to-end training of these policies. Our approach supports both from-scratch and offline-to-online learning and achieves state-of-the-art performance on continuous control and robotic manipulation benchmarks, eliminating the need for common workarounds like policy distillation or surrogate objectives.
Where Did I Leave My Glasses? Open-Vocabulary Semantic Exploration in Real-World Semi-Static Environments
Robots deployed in real-world environments, such as homes, must not only navigate safely but also understand their surroundings and adapt to changes in the environment. To perform tasks efficiently, they must build and maintain a semantic map that accurately reflects the current state of the environment. Existing research on semantic exploration largely focuses on static scenes without persistent object-level instance tracking. In this work, we propose an open-vocabulary, semantic exploration system for semi-static environments. Our system maintains a consistent map by building a probabilistic model of object instance stationarity, systematically tracking semi-static changes, and actively exploring areas that have not been visited for an extended period. In addition to active map maintenance, our approach leverages the map's semantic richness with large language model (LLM)-based reasoning for open-vocabulary object-goal navigation. This enables the robot to search more efficiently by prioritizing contextually relevant areas. We compare our approach against state-of-the-art baselines using publicly available object navigation and mapping datasets, and we further demonstrate real-world transferability in three real-world environments. Our approach outperforms the compared baselines in both success rate and search efficiency for object-navigation tasks and can more reliably handle changes in mapping semi-static environments. In real-world experiments, our system detects 95% of map changes on average, improving efficiency by more than 29% as compared to random and patrol strategies.
IKDiffuser: a Diffusion-based Generative Inverse Kinematics Solver for Kinematic Trees
Solving Inverse Kinematics (IK) for arbitrary kinematic trees presents significant challenges due to their high-dimensionality, redundancy, and complex inter-branch constraints. Conventional optimization-based solvers can be sensitive to initialization and suffer from local minima or conflicting gradients. At the same time, existing learning-based approaches are often tied to a predefined number of end-effectors and a fixed training objective, limiting their reusability across various robot morphologies and task requirements. To address these limitations, we introduce IKDiffuser, a scalable IK solver built upon conditional diffusion-based generative models, which learns the distribution of the configuration space conditioned on end-effector poses. We propose a structure-agnostic formulation that represents end-effector poses as a sequence of tokens, leading to a unified framework that handles varying numbers of end-effectors while learning the implicit kinematic structures entirely from data. Beyond standard IK generation, IKDiffuser handles partially specified goals via a masked marginalization mechanism that conditions only on a subset of end-effector constraints. Furthermore, it supports adding task objectives at inference through objective-guided sampling, enabling capabilities such as warm-start initialization and manipulability maximization without retraining. Extensive evaluations across seven diverse robotic platforms demonstrate that IKDiffuser significantly outperforms state-of-the-art baselines in accuracy, solution diversity, and collision avoidance. Moreover, when used to initialize optimization-based solvers, IKDiffuser significantly boosts success rates on challenging redundant systems with high Degrees of Freedom (DoF), such as the 29-DoF Unitree G1 humanoid, from 21.01% to 96.96% while reducing computation time to the millisecond range.
comment: under review
Tackling the Kidnapped Robot Problem via Sparse Feasible Hypothesis Sampling and Reliable Batched Multi-Stage Inference
This paper addresses the Kidnapped Robot Problem (KRP), a core localization challenge of relocalizing a robot in a known map without prior pose estimate when localization loss or at SLAM initialization. For this purpose, a passive 2-D global relocalization framework is proposed. It estimates the global pose efficiently and reliably from a single LiDAR scan and an occupancy grid map while the robot remains stationary, thereby enhancing the long-term autonomy of mobile robots. The proposed framework casts global relocalization as a non-convex problem and solves it via the multi-hypothesis scheme with batched multi-stage inference and early termination, balancing completeness and efficiency. The Rapidly-exploring Random Tree (RRT), under traversability constraints, asymptotically covers the reachable space to generate sparse, uniformly distributed feasible positional hypotheses, fundamentally reducing the sampling space. The hypotheses are preliminarily ordered by the proposed Scan Mean Absolute Difference (SMAD), a coarse beam-error level metric that facilitates the early termination by prioritizing high-likelihood candidates. The SMAD computation is optimized for limited scan measurements. The Translation-Affinity Scan-to-Map Alignment Metric (TAM) is proposed for reliable orientation selection at hypothesized positions and accurate final global pose evaluation to mitigate degradation in conventional likelihood-field metrics under translational uncertainty induced by sparse hypotheses, as well as non-panoramic LiDAR scan and environmental changes. Real-world experiments on a resource-constrained mobile robot with non-panoramic LiDAR scans show that the proposed framework achieves competitive performance in success rate, robustness under measurement uncertainty, and computational efficiency.
comment: 10 pages, 8 figures. This work has been submitted to the IEEE for possible publication
Autonomous Robotic Bone Micro-Milling System with Automatic Calibration and 3D Surface Fitting
Automating bone micro-milling using a robotic system presents challenges due to the uncertainties in both the external and internal features of bone tissue. For example, during mouse cranial window creation, a circular path with a radius of 2 to 4 mm needs to be milled on the mouse skull using a microdrill. The uneven surface and non-uniform thickness of the mouse skull make it difficult to fully automate this process, requiring the system to possess advanced perceptual and adaptive capabilities. In this study, we address this challenge by integrating a Microscopic Stereo Camera System (MSCS) into the robotic bone micro-milling system and proposing a novel online pre-measurement pipeline for the target surface. Starting from uncalibrated cameras, the pipeline enables automatic calibration and 3D surface fitting through a convolutional neural network (CNN)-based keypoint detection. Combined with the existing feedback-based system, we develop the world's first autonomous robotic bone micro-milling system capable of rapidly, in real-time perceiving and adapting to surface unevenness and non-uniform thickness, thereby enabling an end-to-end autonomous cranial window creation workflow without human assistance. Validation experiments on euthanized mice demonstrate that the improved system achieves a success rate of 85.7 % and an average milling time of 2.1 minutes, showing not only significant performance improvements over the previous system but also exceptional accuracy, speed, and stability compared to human operators.
comment: 8 pages, 8 figures, accepted by RA-L. Please refer to the DOI to access the accepted version
Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?
The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We starts by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.
Do What You Say: Steering Vision-Language-Action Models via Runtime Reasoning-Action Alignment Verification
Reasoning Vision Language Action (VLA) models improve robotic instruction-following by generating step-by-step textual plans before low-level actions, an approach inspired by Chain-of-Thought (CoT) reasoning in language models. Yet even with a correct textual plan, the generated actions can still miss the intended outcomes in the plan, especially in out-of-distribution (OOD) scenarios. We formalize this phenomenon as a lack of embodied CoT faithfulness, and introduce a training-free, runtime policy steering method for reasoning-action alignment. Given a reasoning VLA's intermediate textual plan, our framework samples multiple candidate action sequences from the same model, predicts their outcomes via simulation, and uses a pre-trained Vision-Language Model (VLM) to select the sequence whose outcome best aligns with the VLA's own textual plan. Only executing action sequences that align with the textual reasoning turns our base VLA's natural action diversity from a source of error into a strength, boosting robustness to semantic and visual OOD perturbations and enabling novel behavior composition without costly re-training. We also contribute a reasoning-annotated extension of LIBERO-100, environment variations tailored for OOD evaluation, and demonstrate up to 15% performance gain over prior work on behavior composition tasks and scales with compute and data diversity. Project Website at: https://yilin-wu98.github.io/steering-reasoning-vla/
Virtual-force Based Visual Servo for Multiple Peg-in-Hole Assembly with Tightly Coupled Multi-Manipulator
Multiple Peg-in-Hole (MPiH) assembly is one of the fundamental tasks in robotic assembly. In the MPiH tasks for large-size parts, it is challenging for a single manipulator to simultaneously align multiple distant pegs and holes, necessitating tightly coupled multi-manipulator systems. For such MPiH tasks using tightly coupled multiple manipulators, we propose a collaborative visual servo control framework that uses only the monocular in-hand cameras of each manipulator to reduce positioning errors. Initially, we train a state classification neural network and a positioning neural network. The former divides the states of the peg and hole in the image into three categories: obscured, separated, and overlapped, while the latter determines the position of the peg and hole in the image. Based on these findings, we propose a method to integrate the visual features of multiple manipulators using virtual forces, which can naturally combine with the cooperative controller of the multi-manipulator system. To generalize our approach to holes of different appearances, we varied the appearance of the holes during the dataset generation process. The results confirm that by considering the appearance of the holes, classification accuracy and positioning precision can be improved. Finally, the results show that our method achieves 100\% success rate in dual-manipulator dual peg-in-hole tasks with a clearance of 0.2 mm, while robust to camera calibration errors.
comment: 8 pages, 11 figures, this paper has been published by IEEE Robotics and Automation Letters
AURASeg: Attention Guided Upsampling with Residual Boundary-Assistive Refinement for Drivable-Area Segmentation
Free space ground segmentation is essential to navigate autonomous robots, recognize drivable zones, and traverse efficiently. Fine-grained features remain challenging for existing segmentation models, particularly for robots in indoor and structured environments. These difficulties arise from ineffective multi-scale processing, suboptimal boundary refinement, and limited feature representation. To address this, we propose Attention-Guided Upsampling with Residual Boundary-Assistive Refinement (AURASeg), a ground-plane semantic segmentation framework designed to improve border precision while preserving strong region accuracy. Built on a ResNet-50 backbone, AURASeg introduces (i) a Residual Border Refinement Module (RBRM) that enhances edge delineation through boundary-assistive feature refinement, and (ii) Attention Progressive Upsampling Decoder (APUD) blocks that progressively fuse multi-level features during decoding. Additionally, we integrate a (iii) lightweight ASPPLite module to capture multi-scale context with minimal overhead. Extensive experiments on CARL-D, the Ground Mobile Robot Perception (GMRP) dataset, and a custom Gazebo indoor dataset show that AURASeg consistently outperforms strong baselines, with notable gains in boundary metrics. Finally, we demonstrate real-time deployment on a Kobuki TurtleBot, validating practical usability. The code is available at https://github.com/Narendhiranv04/AURASeg
comment: 6 pages, 4 figures, 4 tables
DAVOS: An Autonomous Vehicle Operating System in the Vehicle Computing Era
Vehicle computing represents a fundamental shift in how autonomous vehicles are designed and deployed, transforming them from isolated transportation systems into mobile computing platforms that support both safety-critical, real-time driving and data-centric services. In this setting, vehicles simultaneously support real-time driving pipelines and a growing set of data-driven applications, placing increased responsibility on the vehicle operating system to coordinate computation, data movement, storage, and access. These demands highlight recurring system considerations related to predictable execution, data and execution protection, efficient handling of high-rate sensor data, and long-term system evolvability, commonly summarized as Safety, Security, Efficiency, and Extensibility (SSEE). Existing vehicle operating systems and runtimes address these concerns in isolation, resulting in fragmented software stacks that limit coordination between autonomy workloads and vehicle data services. This paper presents DAVOS, the Delaware Autonomous Vehicle Operating System, a unified vehicle operating system architecture designed for the vehicle computing context. DAVOS provides a cohesive operating system foundation that supports both real-time autonomy and extensible vehicle computing within a single system framework.
Multiagent Systems
SC-MAS: Constructing Cost-Efficient Multi-Agent Systems with Edge-Level Heterogeneous Collaboration
Large Language Model (LLM)-based Multi-Agent Systems (MAS) enhance complex problem solving through multi-agent collaboration, but often incur substantially higher costs than single-agent systems. Recent MAS routing methods aim to balance performance and overhead by dynamically selecting agent roles and language models. However, these approaches typically rely on a homogeneous collaboration mode, where all agents follow the same interaction pattern, limiting collaboration flexibility across different roles. Motivated by Social Capital Theory, which emphasizes that different roles benefit from distinct forms of collaboration, we propose SC-MAS, a framework for constructing heterogeneous and cost-efficient multi-agent systems. SC-MAS models MAS as directed graphs, where edges explicitly represent pairwise collaboration strategies, allowing different agent pairs to interact through tailored communication patterns. Given an input query, a unified controller progressively constructs an executable MAS by selecting task-relevant agent roles, assigning edge-level collaboration strategies, and allocating appropriate LLM backbones to individual agents. Experiments on multiple benchmarks demonstrate the effectiveness of SC-MAS. In particular, SC-MAS improves accuracy by 3.35% on MMLU while reducing inference cost by 15.38%, and achieves a 3.53% accuracy gain with a 12.13% cost reduction on MBPP. These results validate the feasibility of SC-MAS and highlight the effectiveness of heterogeneous collaboration in multi-agent systems.
Speech-Hands: A Self-Reflection Voice Agentic Approach to Speech Recognition and Audio Reasoning with Omni Perception
We introduce a voice-agentic framework that learns one critical omni-understanding skill: knowing when to trust itself versus when to consult external audio perception. Our work is motivated by a crucial yet counterintuitive finding: naively fine-tuning an omni-model on both speech recognition and external sound understanding tasks often degrades performance, as the model can be easily misled by noisy hypotheses. To address this, our framework, Speech-Hands, recasts the problem as an explicit self-reflection decision. This learnable reflection primitive proves effective in preventing the model from being derailed by flawed external candidates. We show that this agentic action mechanism generalizes naturally from speech recognition to complex, multiple-choice audio reasoning. Across the OpenASR leaderboard, Speech-Hands consistently outperforms strong baselines by 12.1% WER on seven benchmarks. The model also achieves 77.37% accuracy and high F1 on audio QA decisions, showing robust generalization and reliability across diverse audio question answering datasets. By unifying perception and decision-making, our work offers a practical path toward more reliable and resilient audio intelligence.
comment: Preprint. The version was submitted in October 2025
MACRO-LLM: LLM-Empowered Multi-Agent Collaborative Reasoning under Spatiotemporal Partial Observability
Large Language Model (LLM) agents deployed in complex real-world scenarios typically operate as spatially distributed entities. However, this physical dispersion constrains agents to limited local perception and finite temporal horizons. We characterize this bottleneck as spatiotemporal partial observability. Given such fragmented awareness, distributed agents struggle to coordinate efficiently. To bridge this gap, we introduce MACRO-LLM, LLM-empowered multi-agent collaborative reasoning under spatiotemporal partial observability. The architecture addresses spatiotemporal constraints via three modules: (1) the CoProposer mitigates temporal uncertainty by verifying candidate actions via predictive rollouts; (2) the Negotiator overcomes spatial myopia by resolving conflicts through mean-field statistical aggregation; and (3) the Introspector ensures continuous adaptation by analyzing historical experience to refine strategies via semantic gradient descent. Extensive evaluations on two complex long-horizon tasks, cooperative adaptive cruise control and pandemic control, demonstrate that our framework effectively mitigates spatiotemporal partial observability through spatial and temporal strategies, enabling robust coordination.
Learning-Augmented Perfectly Secure Collaborative Matrix Multiplication
This paper presents a perfectly secure matrix multiplication (PSMM) protocol for multiparty computation (MPC) of $\mathrm{A}^{\top}\mathrm{B}$ over finite fields. The proposed scheme guarantees correctness and information-theoretic privacy against threshold-bounded, semi-honest colluding agents, under explicit local storage constraints. Our scheme encodes submatrices as evaluations of sparse masking polynomials and combines coefficient alignment with Beaver-style randomness to ensure perfect secrecy. We demonstrate that any colluding set of parties below the security threshold observes uniformly random shares, and that the recovery threshold is optimal, matching existing information-theoretic limits. Building on this framework, we introduce a learning-augmented extension that integrates tensor-decomposition-based local block multiplication, capturing both classical and learned low-rank methods. We demonstrate that the proposed learning-based PSMM preserves privacy and recovery guarantees for MPC, while providing scalable computational efficiency gains (up to $80\%$) as the matrix dimensions grow.
Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits
We study a decentralized collaborative requesting problem that aims to optimize the information freshness of time-sensitive clients in edge networks consisting of multiple clients, access nodes (ANs), and servers. Clients request content through ANs acting as gateways, without observing AN states or the actions of other clients. We define the reward as the age of information reduction resulting from a client's selection of an AN, and formulate the problem as a non-stationary multi-armed bandit. In this decentralized and partially observable setting, the resulting reward process is history-dependent and coupled across clients, and exhibits both abrupt and gradual changes in expected rewards, rendering classical bandit-based approaches ineffective. To address these challenges, we propose the AGING BANDIT WITH ADAPTIVE RESET algorithm, which combines adaptive windowing with periodic monitoring to track evolving reward distributions. We establish theoretical performance guarantees showing that the proposed algorithm achieves near-optimal performance, and we validate the theoretical results through simulations.
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning
Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).
When Single-Agent with Skills Replace Multi-Agent Systems and When They Fail
Multi-agent AI systems have proven effective for complex reasoning. These systems are compounded by specialized agents, which collaborate through explicit communication, but incur substantial computational overhead. A natural question arises: can we achieve similar modularity benefits with a single agent that selects from a library of skills? We explore this question by viewing skills as internalized agent behaviors. From this perspective, a multi-agent system can be compiled into an equivalent single-agent system, trading inter-agent communication for skill selection. Our preliminary experiments suggest this approach can substantially reduce token usage and latency while maintaining competitive accuracy on reasoning benchmarks. However, this efficiency raises a deeper question that has received little attention: how does skill selection scale as libraries grow? Drawing on principles from cognitive science, we propose that LLM skill selection exhibits bounded capacity analogous to human decision-making. We investigate the scaling behavior of skill selection and observe a striking pattern. Rather than degrading gradually, selection accuracy remains stable up to a critical library size, then drops sharply, indicating a phase transition reminiscent of capacity limits in human cognition. Furthermore, we find evidence that semantic confusability among similar skills, rather than library size alone, plays a central role in this degradation. This perspective suggests that hierarchical organization, which has long helped humans manage complex choices, may similarly benefit AI systems. Our initial results with hierarchical routing support this hypothesis. This work opens new questions about the fundamental limits of semantic-based skill selection in LLMs and offers a cognitive-grounded framework and practical guidelines for designing scalable skill-based agents.
comment: 25 pages, technical report
Information Theoretic Optimal Surveillance for Epidemic Prevalence in Networks AAAI 2026
Estimating the true prevalence of an epidemic outbreak is a key public health problem. This is challenging because surveillance is usually resource intensive and biased. In the network setting, prior work on cost sensitive disease surveillance has focused on choosing a subset of individuals (or nodes) to minimize objectives such as probability of outbreak detection. Such methods do not give insights into the outbreak size distribution which, despite being complex and multi-modal, is very useful in public health planning. We introduce TESTPREV, a problem of choosing a subset of nodes which maximizes the mutual information with disease prevalence, which directly provides information about the outbreak size distribution. We show that, under the independent cascade (IC) model, solutions computed by all prior disease surveillance approaches are highly sub-optimal for TESTPREV in general. We also show that TESTPREV is hard to even approximate. While this mutual information objective is computationally challenging for general networks, we show that it can be computed efficiently for various network classes. We present a greedy strategy, called GREEDYMI, that uses estimates of mutual information from cascade simulations and thus can be applied on any network and disease model. We find that GREEDYMI does better than natural baselines in terms of maximizing the mutual information as well as reducing the expected variance in outbreak size, under the IC model.
comment: 25 pages; Added acknowledgments; In Proceedings of the AAAI 2026 Conference
An intelligent agent-based simulation of human mobility in extreme urban morphologies
This paper investigates the feasibility of human mobility in extreme urban morphologies, characterized by high-density vertical structures and linear city layouts. To assess whether agents can navigate efficiently within such unprecedented topologies, we develop a hybrid simulation framework that integrates agent-based modeling, reinforcement learning (RL), supervised learning, and graph neural networks (GNNs). The simulation captures multi-modal transportation behaviors across multiple vertical levels and varying density scenarios, using both synthetic data and real-world traces from high-density cities. Experiments show that the full AI-integrated architecture enables agents to achieve an average commute time of 7.8--8.4 minutes, a satisfaction rate exceeding 89\%, and a reachability index over 91\%, even during peak congestion periods. Ablation studies indicate that removing intelligent modules such as RL or GNN significantly degrades performance, with commute times increasing by up to 85\% and reachability falling below 70\%. Environmental modeling demonstrates low energy consumption and minimal CO$_2$ emissions when electric modes are prioritized. These results suggest that efficient and sustainable mobility in extreme urban forms is achievable, provided adaptive AI systems, intelligent infrastructure, and real-time feedback mechanisms are implemented.
Systems and Control (EESS)
From Prompt to Protocol: Fast Charging Batteries with Large Language Models
Efficiently optimizing battery charging protocols is challenging because each evaluation is slow, costly, and non-differentiable. Many existing approaches address this difficulty by heavily constraining the protocol search space, which limits the diversity of protocols that can be explored, preventing the discovery of higher-performing solutions. We introduce two gradient-free, LLM-driven closed-loop methods: Prompt-to-Optimizer (P2O), which uses an LLM to propose the code for small neural-network-based protocols, which are then trained by an inner loop, and Prompt-to-Protocol (P2P), which simply writes an explicit function for the current and its scalar parameters. Across our case studies, LLM-guided P2O outperforms neural networks designed by Bayesian optimization, evolutionary algorithms, and random search. In a realistic fast charging scenario, both P2O and P2P yield around a 4.2 percent improvement in state of health (capacity retention based health metric under fast charging cycling) over a state-of-the-art multi-step constant current (CC) baseline, with P2P achieving this under matched evaluation budgets (same number of protocol evaluations). These results demonstrate that LLMs can expand the space of protocol functional forms, incorporate language-based constraints, and enable efficient optimization in high cost experimental settings.
A Novel $αβ$-Approximation Method Based on Numerical Integration for Discretizing Continuous Systems
In this article, we propose a novel discretization method based on numerical integration for discretizing continuous systems, termed the $αβ$-approximation or Scalable Bilinear Transformation (SBT). In contrast to existing methods, the proposed method consists of two factors, i.e., shape factor ($α$) and time factor ($β$). Depending on the discretization technique applied, we identify two primary distortion modes in discrete resonant controllers: frequency warping and resonance damping. We further provide a theoretical explanation for these distortion modes, and demonstrate that the performance of the method is superior to all typical methods. The proposed method is implemented to discretize a quasi-resonant (QR) controller on a control board, achieving 25\% reduction in the root-mean-square error (RMSE) compared to the SOTA method. Finally, the approach is extended to discretizing a resonant controller of a grid-tied inverter. The efficacy of the proposed method is conclusively validated through favorable comparisons among the theory, simulation, and experiments.
comment: arXiv admin note: text overlap with arXiv:2511.03403
Residual Power Flow for Neural Solvers
The energy transition challenges operational tasks based on simulations and optimisation. These computations need to be fast and flexible as the grid is ever-expanding, and renewables' uncertainty requires a flexible operational environment. Learned approximations, proxies or surrogates -- we refer to them as Neural Solvers -- excel in terms of evaluation speed, but are inflexible with respect to adjusting to changing tasks. Hence, neural solvers are usually applicable to highly specific tasks, which limits their usefulness in practice; a widely reusable, foundational neural solver is required. Therefore, this work proposes the Residual Power Flow (RPF) formulation. RPF formulates residual functions based on Kirchhoff's laws to quantify the infeasibility of an operating condition. The minimisation of the residuals determines the voltage solution; an additional slack variable is needed to achieve AC-feasibility. RPF forms a natural, foundational subtask of tasks subject to power flow constraints. We propose to learn RPF with neural solvers to exploit their speed. Furthermore, RPF improves learning performance compared to common power flow formulations. To solve operational tasks, we integrate the neural solver in a Predict-then-Optimise (PO) approach to combine speed and flexibility. The case study investigates the IEEE 9-bus system and three tasks (AC Optimal Power Flow (OPF), power-flow and quasi-steady state power flow) solved by PO. The results demonstrate the accuracy and flexibility of learning with RPF.
comment: This work has been submitted to the IEEE for possible publication
Echo-Side Integrated Sensing and Communication via Space-Time Reconfigurable Intelligent Surfaces
This paper presents an echo-side modulation framework for integrated sensing and communication (ISAC) systems. A space-time reconfigurable intelligent surface (ST-RIS) impresses a continuous-phase modulation onto the radar echo, enabling uplink data transmission with a phase modulation of the transmitted radar-like waveform. The received signal is a multiplicative composition of the sensing waveform and the phase for communication. Both functionalities share the same physical signal and perceive each other as impairments. The achievable communication rate is expressed as a function of a coupling parameter that links sensing accuracy to phase error accumulation. Under a fixed bandwidth constraint, the sensing and communication figures of merit define a convex Pareto frontier. The optimal bandwidth allocation satisfying a minimum sensing requirement is derived in closed form. The modified Cramer-Rao bound (MCRB) for range estimation is derived in closed form; this parameter must be estimated to compensate for the frequency offset before data demodulation. Frame synchronization is formulated as a generalized likelihood ratio test (GLRT), and the detection probability is obtained through characteristic function inversion, accounting for residual frequency errors from imperfect range estimation. Numerical results validate the theoretical bounds and characterize the trade-off across the operating range.
Two-Scale Spatial Deployment for Cost-Effective Wireless Networks via Cooperative IRSs and Movable Antennas
This paper proposes a two-scale spatial deployment strategy to ensure reliable coverage for multiple target areas, integrating macroscopic intelligent reflecting surfaces (IRSs) and fine-grained movable antennas (MAs). Specifically, IRSs are selectively deployed from candidate sites to shape the propagation geometry, while MAs are locally repositioned among discretized locations to exploit small-scale channel variations. The objective is to minimize the total deployment cost of MAs and IRSs by jointly optimizing the IRS site selection, MA positions, transmit precoding, and IRS phase shifts, subject to the signal-to-noise ratio (SNR) requirements for all target areas. This leads to a challenging mixed-integer non-convex optimization problem that is intractable to solve directly. To address this, we first formulate an auxiliary problem to verify the feasibility. A penalty-based double-loop algorithm integrating alternating optimization and successive convex approximation (SCA) is developed to solve this feasibility issue, which is subsequently adapted to obtain a suboptimal solution for the original cost minimization problem. Finally, based on the obtained solution, we formulate an element refinement problem to further reduce the deployment cost, which is solved by a penalty-based SCA algorithm. Simulation results demonstrate that the proposed designs consistently outperform benchmarks relying on independent area planning or full IRS deployment in terms of cost-efficiency. Moreover, for cost minimization, MA architectures are preferable in large placement apertures, whereas fully populated FPA architectures excel in compact ones; for worst-case SNR maximization, MA architectures exhibit a lower cost threshold for feasibility, while FPA architectures can attain peak SNR at a lower total cost.
comment: 13 pages, 7 figures, submitted to an IEEE journal for possible publication
Semi-Contention-Free Access in IoT NOMA Networks: A Reinforcement Learning Framework
The unprecedented surge of massive Internet of things (mIoT) traffic in beyond fifth generation (B5G) communication systems calls for transformative approaches for multiple access and data transmission. While classical model-based tools have been proven to be powerful and precise, an imminent trend for resource management in B5G networks is promoting solutions towards data-driven design. Considering an IoT network with devices spread in clusters covered by a base station, we present in this paper a novel model-free multiple access and data transmission framework empowered by reinforcement learning, designed for power-domain non-orthogonal multiple access networks to facilitate uplink traffic of small data packets. The framework supports two access modes referred to as contention-based and semi-contention-free, with its core component being a policy gradient algorithm executed at the base station. The base station performs access control and optimal radio resource allocation by periodically broadcasting two control parameters to each cluster of devices that considerably reduce data detection failures with a minimum computation requirement on devices. Numerical results, in terms of system and cluster throughput, throughput fairness, access delay, and energy consumption, demonstrate the efficiency and scalability of the framework as network size and traffic load vary.
Semi-physical Gamma-Process Degradation Modeling and Performance-Driven Opportunistic Maintenance Optimization for LED Lighting Systems
Large-scale LED lighting systems degrade through gradual package degradation and abrupt driver outages, while acceptability is determined by spatio-temporal illuminance compliance rather than component reliability alone. This paper proposes a performance-driven, simulation-in-the-loop framework for opportunistic maintenance optimization of LED lighting systems. LED package degradation is modeled by a semi-physical non-homogeneous Gamma process whose mean follows an exponential lumen-maintenance trend, and driver outages are described by a Weibull lifetime model. Parameters are calibrated from LM-80 accelerated degradation data via Bayesian inference, enabling uncertainty propagation to operating conditions. System performance is evaluated using ray-tracing-based illuminance mapping, and static indices (average illuminance and uniformity) are converted into a long-term dynamic deficiency-ratio metric via performance-deficiency durations over event intervals. To enable scalable Monte Carlo policy evaluation and search, a surrogate-based performance mapping replaces repeated ray-tracing with negligible loss of fidelity. An opportunistic policy is optimized in a multi-objective setting to balance performance deficiency, site visits, and replacements. A case study demonstrates the practicality of the framework and the resulting Pareto trade-offs for maintenance decision support.
comment: 29 pages, 17 figures
Feedback-Based Mobile Robot Navigation in 3-D Environments Using Artificial Potential Functions Technical Report
This technical report presents the construction and analysis of polynomial navigation functions for motion planning in 3-D workspaces populated by spherical and cylindrical obstacles. The workspace is modeled as a bounded spherical region, and obstacles are encoded using smooth polynomial implicit functions. We establish conditions under which the proposed navigation functions admit a unique non-degenerate minimum at the target while avoiding local minima, including in the presence of pairwise intersecting obstacles. Gradient and Hessian analyses are provided, and the theoretical results are validated through numerical simulations in obstacle rich 3-D environments.
On Discrete Age of Information of Status Updating System With General Packet Arrival Processes
Characterizing Age of Information (AoI) in status updating systems with general arrival and service processes has great significance considering that the interarrival and service time of updates can possibly be arbitrary in a real world. While expressions of average continuous AoI under G/G/1/1 queues have been derived in the paper by Soysal and Ulukus, the discrete case remained unsolved. To address it, this paper gives a fully characterization of probability generation functions (PGF) of discrete AoI under G/G/1/1 settings when preemption is allowed. In the non-preemptive case, this paper gives the expressions of PGF of discrete AoI under G/Geo/1/1 settings, which also extends the former results. The average discrete AoI is derived and discussed based on these new theoretical findings.
Boundary adaptive observer design for semilinear hyperbolic rolling contact ODE-PDE systems with uncertain friction
This paper presents an adaptive observer design for semilinear hyperbolic rolling contact ODE-PDE systems with uncertain friction characteristics parameterized by a matrix of unknown coefficients appearing in the nonlinear (and possibly non-smooth) PDE source terms. Under appropriate assumptions of forward completeness and boundary sensing, an adaptive observer is synthesized to simultaneously estimate the lumped and distributed states, as well as the uncertain friction parameters, using only boundary measurements. The observer combines a finite-dimensional parameter estimator with an infinite-dimensional description of the state error dynamics, and achieves exponential convergence under persistent excitation. The effectiveness of the proposed design is demonstrated in simulation by considering a relevant example borrowed from road vehicle dynamics.
comment: 12 pages, 5 figures. Under review at Automatica
System Availability Optimization: Integrating Quantity Discounts and Delivery Lead Time Considerations
Purpose: The model allocates the system components orders to the suppliers to minimize the parts price and the system construction delay penalties and maximize the system availability during its use. It considers the quantity-based discount and variation of delivery lead time by ordering similar components. The model also reflects the prerequisite relationships between construction activities and calculates the delay penalty resulting from parts delivery lead time. Design/methodology/approach: This research presents a model for selecting suppliers of components of an industrial series-parallel multi-state system. A nonlinear binary mathematical program uses the Markov process results to select system components. It minimizes the total system construction phase costs, including the components' price, and the system construction delay penalty, and the system exploitation phase costs, including the system shutdown and working at half capacity. Findings: The model allocates the optimal orders for a typical industrial system's components, composing four elements. The proposed approach combines the nonlinear binary program and the Markov process results to optimize the system life cycle parameters, including the system construction cost and operational availability. Originality/value: Using the Markov chain results in binary nonlinear mathematical programming, this study attempts to strike the right balance between the construction phase's objectives and an industrial unit's operation phase.
Dynamic Association of Semantics and Parameter Estimates by Filtering
We propose a probabilistic semantic filtering framework in which parameters of a dynamical system are inferred and associated with a closed set of semantic classes in a map. We extend existing methods to a multi-parameter setting using a posterior that tightly couples semantics with the parameter likelihoods, and propose a filter to compute this posterior sequentially, subject to dynamics in the map's state. Using Bayesian moment matching, we show that the computational complexity of measurement updates scales linearly in the dimension of the parameter space. Finally, we demonstrate limitations of applying existing methods to a problem from the driving domain, and show that the proposed framework better captures time-varying parameter-to-semantic associations.
comment: 8 pages, 4 figures, submitted as conference paper
RIS-Aided E2E Multi-Path Uplink Transmission Optimization for 6G Time-Sensitive Services
The Access Traffic Steering, Switching, and Splitting (ATSSS) defined in the 3GPP latest Release enables traffic flow over the multiple access paths to achieve the lower-latency End-to-end (E2E) delivery for 6G time-sensitive services. However, the existing E2E multi-path operation often falls short of more stringent QoS requirements for 6G time-sensitive services. This proposes a Reconfigurable Intelligent Surfaces (RIS)-aided E2E multi-path uplink(UL) transmission architecture that explicitly accounts for both radio link latency and N3 backhaul latency, via the coupled designs of the UL traffic-splitting ratio, transmit power, receive combining, and RIS phase shift under practical constraints to achieve the minimum average E2E latency. We develop an alternating optimization framework that updates the above target parameters to be optimized. The simulations were conducted to compare the effectiveness of the proposed E2E optimization framework that lowers the average E2E latency up to 43% for a single user and 15% for the whole system compared with baselines.
comment: This work has been submitted to the IEEE for possible publication.5 pages,2 figures,journal paper
Collision Avoidance for Non-Cooperative Multi-Swarm Coverage Control with Bounded Disturbance Measurements
This paper proposes a new algorithm for collision-free coverage control of multiple non-cooperating swarms in the presence of bounded disturbances. A new methodology is introduced that accounts for uncertainties in disturbance measurements. The proposed methodology is used to develop an algorithm that ensures collision-free motion in multi-swarm coverage control, specifically for cases where disturbances are present and their measurements are subject to bounded uncertainty. The theoretical results are validated through simulations of multiple swarms that independently aim to cover a given region in an environment with disturbances.
Learning-Augmented Perfectly Secure Collaborative Matrix Multiplication
This paper presents a perfectly secure matrix multiplication (PSMM) protocol for multiparty computation (MPC) of $\mathrm{A}^{\top}\mathrm{B}$ over finite fields. The proposed scheme guarantees correctness and information-theoretic privacy against threshold-bounded, semi-honest colluding agents, under explicit local storage constraints. Our scheme encodes submatrices as evaluations of sparse masking polynomials and combines coefficient alignment with Beaver-style randomness to ensure perfect secrecy. We demonstrate that any colluding set of parties below the security threshold observes uniformly random shares, and that the recovery threshold is optimal, matching existing information-theoretic limits. Building on this framework, we introduce a learning-augmented extension that integrates tensor-decomposition-based local block multiplication, capturing both classical and learned low-rank methods. We demonstrate that the proposed learning-based PSMM preserves privacy and recovery guarantees for MPC, while providing scalable computational efficiency gains (up to $80\%$) as the matrix dimensions grow.
SPARK: Safe Protective and Assistive Robot Kit
This paper introduces the Safe Protective and Assistive Robot Kit (SPARK), a comprehensive benchmark designed to ensure safety in humanoid autonomy and teleoperation. Humanoid robots pose significant safety risks due to their physical capabilities of interacting with complex environments. The physical structures of humanoid robots further add complexity to the design of general safety solutions. To facilitate safe deployment of complex robot systems, SPARK can be used as a toolbox that comes with state-of-the-art safe control algorithms in a modular and composable robot control framework. Users can easily configure safety criteria and sensitivity levels to optimize the balance between safety and performance. To accelerate humanoid safety research and development, SPARK provides simulation benchmarks that compare safety approaches in a variety of environments, tasks, and robot models. Furthermore, SPARK allows quick deployment of synthesized safe controllers on real robots. For hardware deployment, SPARK supports Apple Vision Pro (AVP) or a Motion Capture System as external sensors, while offering interfaces for seamless integration with alternative hardware setups at the same time. This paper demonstrates SPARK's capability with both simulation experiments and case studies with a Unitree G1 humanoid robot. Leveraging these advantages of SPARK, users and researchers can significantly improve the safety of their humanoid systems as well as accelerate relevant research. The open source code is available at: https://github.com/intelligent-control-lab/spark.
comment: Presented at IFAC Symposium on Robotics
Provable Acceleration of Distributed Optimization with Local Updates
In conventional distributed optimization, each agent performs a single local update between two communication rounds with its neighbors to synchronize solutions. Inspired by the success of using multiple local updates in federated learning, incorporating local updates into distributed optimization has recently attracted increasing attention. However, unlike federated learning, where multiple local updates can accelerate learning by improving gradient estimation under mini-batch settings, it remains unclear whether similar benefits hold in distributed optimization when gradients are exact. Moreover, existing theoretical results typically require reducing the step size when multiple local updates are employed, which can entirely offset any potential benefit of these additional local updates and obscure their true impact on convergence. In this paper, we focus on the classic DIGing algorithm and leverage the tight performance bounds provided by Performance Estimation Problems (PEP) to show that incorporating local updates can indeed accelerate distributed optimization. To the best of our knowledge, this is the first rigorous demonstration of such acceleration for a broad class of objective functions. Our analysis further reveals that, under an appropriate step size, performing only two local updates is sufficient to achieve the maximal possible improvement, and that additional local updates provide no further gains. Because more updates increase computational cost, these findings offer practical guidance for efficient implementation. Extensive experiments on both synthetic and real-world datasets corroborate the theoretical findings.
Physically Plausible Multi-System Trajectory Generation and Symmetry Discovery
From metronomes to celestial bodies, mechanics underpins how the world evolves in time and space. With consideration of this, a number of recent neural network models leverage inductive biases from classical mechanics to encourage model interpretability and ensure forecasted states are physical. However, in general, these models are designed to capture the dynamics of a single system with fixed physical parameters, from state-space measurements of a known configuration space. In this paper we introduce Symplectic Phase Space GAN (SPS-GAN) which can capture the dynamics of multiple systems, and generalize to unseen physical parameters from. Moreover, SPS-GAN does not require prior knowledge of the system configuration space. In fact, SPS-GAN can discover the configuration space structure of the system from arbitrary measurement types (e.g., state-space measurements, video frames). To achieve physically plausible generation, we introduce a novel architecture which embeds a Hamiltonian neural network recurrent module in a conditional GAN backbone. To discover the structure of the configuration space, we optimize the conditional time-series GAN objective with an additional physically motivated term to encourages a sparse representation of the configuration space. We demonstrate the utility of SPS-GAN for trajectory prediction, video generation and symmetry discovery. Our approach captures multiple systems and achieves performance on par with supervised models designed for single systems.
A Taxonomy and Review of Algorithms for Modeling and Predicting Human Driver Behavior
An open problem in autonomous driving research is modeling human driving behavior, which is needed for the planning component of the autonomy stack, safety validation through traffic simulation, and causal inference for generating explanations for autonomous driving. Modeling human driving behavior is challenging because it is stochastic, high-dimensional, and involves interaction between multiple agents. This problem has been studied in various communities with a vast body of literature. Existing reviews have generally focused on one aspect: motion prediction. In this article, we present a unification of the literature that covers intent estimation, trait estimation, and motion prediction. This unification is enabled by modeling multi-agent driving as a partially observable stochastic game, which allows us to cast driver modeling tasks as inference problems. We classify driver models into a taxonomy based on the specific tasks they address and the key attributes of their approach. Finally, we identify open research opportunities in the field of driver modeling.
comment: This revision has been accepted for publication in the Proceedings of the IEEE. The final version is available on IEEE Xplore
Periodic robust robotic rock chop via virtual model control
Robotic cutting is a challenging contact-rich manipulation task where the robot must simultaneously negotiate unknown object mechanics, large contact forces, and precise motion requirements. We introduce a new active virtual-model control scheme that enables knife rocking motion for robot manipulators, without pre-planned trajectories or precise information of the environment. Motion is generated and controlled through switching virtual coupling with virtual mechanisms, given by virtual springs, dampers, and masses arranged in a suitable way. Through analysis and experiments, we demonstrate that the controlled robot behavior settles into a periodic motion. Experiments with a Franka manipulator demonstrate robust cuts with five different vegetables, and sub-millimeter slice accuracy from 1 mm to 6 mm at nearly one cut per second. The same controller survives changes in knife shape and cutting board height, and adaptation to a different humanoid manipulator, demonstrating robustness and platform independence.
AQ-PCDSys: An Adaptive Quantized Planetary Crater Detection System for Autonomous Space Exploration
Successful autonomous planetary exploration hinges on real-time, high-fidelity environmental perception. However, standard deep learning models usually demand far more memory and computation power than space-qualified, radiation-hardened onboard hardware can provide. This creates a fundamental design challenge of deploying sophisticated detection architectures without saturating the rigid power and memory envelopes of the computation hardware of planetary exploration platforms. We propose the Adaptive Quantized Planetary Crater Detection System to resolve this bottleneck. Our framework integrates a Quantized Neural Network, refined through Quantization Aware Training, with an Adaptive Multi-Sensor Fusion module. By forcing weights into low-precision integer arithmetic, we effectively strip away the floating-point overhead that typically bottlenecks onboard processors and system memory. This yields a leaner model footprint and significantly faster processing while the detection fidelity remains high. Such efficiency enables AMF module to merge high-bandwidth Optical Imagery streams with Digital Elevation Models using an Adaptive Weighting Mechanism to re-balance sensor priority under variable conditions like deep shadows or high albedo. Integrated Multi-Scale Detection Heads then resolve craters across a wide range of diameters, providing a computationally efficient and precise solution for real-time detection, localization of craters and hazard avoidance. This paper establishes the architectural design and theoretical justification of the system. While our methodology is grounded in principles of hybrid computer vision and planetary science, we present this as a blueprint for future empirical validation and hardware benchmarking on integer-arithmetic units. This system provides a capability vital for the next generation of autonomous landing, navigation, and deep space explorations.
comment: 17 pages, 6 figures. A research paper on a novel deep learning framework for planetary crater detection
Joint Communication Scheduling and Resource Allocation for Distributed Edge Learning: Seamless Integration in Next-Generation Wireless Networks
Distributed edge learning (DL) is considered a cornerstone of intelligence enablers, since it allows for collaborative training without the necessity for local clients to share raw data with other parties, thereby preserving privacy and security. Integrating DL into the 6G networks requires a coexistence design with existing services such as high-bandwidth (HB) traffic like eMBB. Current designs in the literature mainly focus on communication round-wise designs that assume a rigid resource allocation throughout each communication round (CR). However, rigid resource allocation within a CR is a highly inefficient and inaccurate representation of the system's realistic behavior, especially when CR duration far exceeds the channel coherence time due to large model size or limited resources. This is due to the heterogeneous nature of the system, as clients inherently may need to access the network at different time instants. This work zooms into one arbitrary CR, and demonstrates the importance of considering a time-dependent design for sharing the resource pool with HB traffic. We first formulate a timeslot-wise optimization problem to minimize the consumed time by DL within the CR while constrained by a DL energy budget. Due to its intractability, a session-based optimization problem is formulated assuming a CR lasts less than a large-scale coherence time. Some scheduling properties of such multi-server joint communication scheduling and resource allocation framework have been established. An iterative algorithm has been designed to solve such non-convex and non-block-separable-constrained problems. Simulation results confirm the importance of the efficient and accurate integration design proposed in this work.
Autonomous Robotic Bone Micro-Milling System with Automatic Calibration and 3D Surface Fitting
Automating bone micro-milling using a robotic system presents challenges due to the uncertainties in both the external and internal features of bone tissue. For example, during mouse cranial window creation, a circular path with a radius of 2 to 4 mm needs to be milled on the mouse skull using a microdrill. The uneven surface and non-uniform thickness of the mouse skull make it difficult to fully automate this process, requiring the system to possess advanced perceptual and adaptive capabilities. In this study, we address this challenge by integrating a Microscopic Stereo Camera System (MSCS) into the robotic bone micro-milling system and proposing a novel online pre-measurement pipeline for the target surface. Starting from uncalibrated cameras, the pipeline enables automatic calibration and 3D surface fitting through a convolutional neural network (CNN)-based keypoint detection. Combined with the existing feedback-based system, we develop the world's first autonomous robotic bone micro-milling system capable of rapidly, in real-time perceiving and adapting to surface unevenness and non-uniform thickness, thereby enabling an end-to-end autonomous cranial window creation workflow without human assistance. Validation experiments on euthanized mice demonstrate that the improved system achieves a success rate of 85.7 % and an average milling time of 2.1 minutes, showing not only significant performance improvements over the previous system but also exceptional accuracy, speed, and stability compared to human operators.
comment: 8 pages, 8 figures, accepted by RA-L. Please refer to the DOI to access the accepted version
Minimal Actuator Selection
Selecting a few available actuators to ensure the controllability of a linear system is a fundamental problem in control theory. Previous works either focus on optimal performance, simplifying the controllability issue, or make the system controllable under structural assumptions, such as in graphs or when the input matrix is a design parameter. We generalize these approaches to offer a precise characterization of the general minimal actuator selection problem where a set of actuators is given, described by a fixed input matrix, and goal is to choose the fewest actuators that make the system controllable. We show that this problem can be equivalently cast as an integer linear program and, if actuation channels are sufficiently independent, as a set multicover problem under multiplicity constraints. The latter equivalence is always true if the state matrix has all distinct eigenvalues, in which case it simplifies to the set cover problem. Such characterizations hold even when a robust selection that tolerates a given number of faulty actuators is desired. Our established connection legitimates a designer to use algorithms from the rich literature on the set multicover problem to select the smallest subset of actuators, including exact solutions that do not require brute-force search.
comment: This work has been submitted to the IEEE for possible publication
Predictable Interval MDPs through Entropy Regularization
Regularization of control policies using entropy can be instrumental in adjusting predictability of real-world systems. Applications benefiting from such approaches range from, e.g., cybersecurity, which aims at maximal unpredictability, to human-robot interaction, where predictable behavior is highly desirable. In this paper, we consider entropy regularization for interval Markov decision processes (IMDPs). IMDPs are uncertain MDPs, where transition probabilities are only known to belong to intervals. Lately, IMDPs have gained significant popularity in the context of abstracting stochastic systems for control design. In this work, we address robust minimization of the linear combination of entropy and a standard cumulative cost in IMDPs, thereby establishing a trade-off between optimality and predictability. We show that optimal deterministic policies exist, and devise a value-iteration algorithm to compute them. The algorithm solves a number of convex programs at each step. Finally, through an illustrative example we show the benefits of penalizing entropy in IMDPs.
comment: This paper has been presented at the 2024 63rd IEEE Conference on Decision and Control (CDC)
Scalable and Approximation-free Symbolic Control for Unknown Euler-Lagrange Systems
We propose a novel symbolic control framework for enforcing temporal logic specifications in Euler-Lagrange systems that addresses the key limitations of traditional abstraction-based approaches. Unlike existing methods that require exact system models and provide guarantees only at discrete sampling instants, our approach relies only on bounds on system parameters and input constraints, and ensures correctness for the full continuous-time trajectory. The framework combines scalable abstraction of a simplified virtual system with a closed-form, model-free controller that guarantees trajectories satisfy the original specification while respecting input bounds and remaining robust to unknown but bounded disturbances. We provide feasibility conditions for the construction of confinement regions and analyze the trade-off between efficiency and conservatism. Case studies on pendulum dynamics, a two-link manipulator, and multi-agent systems, including hardware experiments, demonstrate that the proposed approach ensures both correctness and safety while significantly reducing computational time and memory requirements. These results highlight its scalability and practicality for real-world robotic systems where precise models are unavailable and continuous-time guarantees are essential.
Context-Aware Information Transfer via Digital Semantic Communication in UAV-Based Networks
In smart cities, bandwidth-constrained Unmanned Aerial Vehicles (UAVs) often fail to relay mission-critical data in time, compromising real-time decision-making. This highlights the need for faster and more efficient transmission of only the most relevant information. To address this, we propose DSC-UAV model, leveraging a context-adaptive Digital Semantic Communication (DSC) framework. This model redefines aerial data transmission through three core components: prompt-aware encoding, dynamic UAV-enabled relaying, and user mobility-optimized reinforcement learning. Ground users transmit context-driven visual content. Images are encoded via Vision Transformer combined with a prompt-text encoder to generate semantic features based on the desired context (generic or object-specific). These features are then quantized and transmitted over a UAV network that dynamically relays the data. Joint trajectory and resource allocation are optimized using Truncated Quantile Critic (TQC)-aided reinforcement learning technique, which offers greater stability and precision over standard SAC and TD3 due to its resistance to overestimation bias. Simulations demonstrate significant performance improvement, up to 22% gain in semantic-structural similarity and 14% reduction in Age of Information (AoI) compared to digital and prior UAV-semantic communication baselines. By integrating mobility control with context-driven visual abstraction, DSC-UAV advances resilient, information-centric surveillance for next-generation UAV networks in bandwidth-constrained environments.
comment: 10 Pages, 6 figures
Adversarial Multi-Agent Reinforcement Learning for Proactive False Data Injection Detection
Smart inverters are instrumental in the integration of distributed energy resources into the electric grid. Such inverters rely on communication layers for continuous control and monitoring, potentially exposing them to cyber-physical attacks such as false data injection attacks (FDIAs). We propose to construct a defense strategy against a priori unknown FDIAs with a multi-agent reinforcement learning (MARL) framework. The first agent is an adversary that simulates and discovers various FDIA strategies, while the second agent is a defender in charge of detecting and locating FDIAs. This approach enables the defender to be trained against new FDIAs continuously generated by the adversary. In addition, we show that the detection skills of an MARL defender can be combined with those of a supervised offline defender through a transfer learning approach. Numerical experiments conducted on a distribution and transmission system demonstrate that: a) the proposed MARL defender outperforms the offline defender against adversarial attacks; b) the transfer learning approach makes the MARL defender capable against both synthetic and unseen FDIAs.
Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
Two-time-scale stochastic approximation algorithms are iterative methods used in applications such as optimization, reinforcement learning, and control. Finite-time analysis of these algorithms has primarily focused on fixed point iterations where both time-scales have contractive mappings. In this work, we broaden the scope of such analyses by considering settings where the slower time-scale has a non-expansive mapping. For such algorithms, the slower time-scale can be viewed as a stochastic inexact Krasnoselskii-Mann iteration. We also study a variant where the faster time-scale has a projection step which leads to non-expansiveness in the slower time-scale. We show that the last-iterate mean square residual error for such algorithms decays at a rate $O(1/k^{1/4-ε})$, where $ε>0$ is arbitrarily small. We further establish almost sure convergence of iterates to the set of fixed points. We demonstrate the applicability of our framework by applying our results to minimax optimization, linear stochastic approximation, and Lagrangian optimization.
comment: Submitted to SIAM Journal on Control and Optimization
Robotics
Motion Attribution for Video Generation
Despite the rapid progress of video generation models, the role of data in influencing motion is poorly understood. We present Motive (MOTIon attribution for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which fine-tuning clips improve or degrade temporal dynamics. Motive isolates temporal dynamics from static appearance via motion-weighted loss masks, yielding efficient and scalable motion-specific influence computation. On text-to-video models, Motive identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With Motive-selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 74.1% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework to attribute motion rather than visual appearance in video generative models and to use it to curate fine-tuning data.
comment: See the project website at https://research.nvidia.com/labs/sil/projects/MOTIVE/
Older Adults' Preferences for Feedback Cadence from an Exercise Coach Robot
People can respond to feedback and guidance in different ways, and it is important for robots to personalize their interactions and utilize verbal and nonverbal communication cues. We aim to understand how older adults respond to different cadences of verbal and nonverbal feedback of a robot exercise coach. We conducted an online study of older adults, where participants evaluated videos of the robot giving feedback at different cadences for each modality. The results indicate that changing the cadence of one modality affects the perception of both it and the other modality. We can use the results from this study to better design the frequency of the robot coach's feedback during an exercise session with this population.
comment: Nonarchival submission to RO-MAN 2024 - poster session
Real-Time Localization Framework for Autonomous Basketball Robots
Localization is a fundamental capability for autonomous robots, enabling them to operate effectively in dynamic environments. In Robocon 2025, accurate and reliable localization is crucial for improving shooting precision, avoiding collisions with other robots, and navigating the competition field efficiently. In this paper, we propose a hybrid localization algorithm that integrates classical techniques with learning based methods that rely solely on visual data from the court's floor to achieve self-localization on the basketball field.
comment: 8 pages, 12 figures, Project code: https://github.com/NarenTheNumpkin/Basketball-robot-localization
A Hybrid Model-based and Data-based Approach Developed for a Prosthetic Hand Wrist
The incorporation of advanced control algorithms into prosthetic hands significantly enhances their ability to replicate the intricate motions of a human hand. This work introduces a model-based controller that combines an Artificial Neural Network (ANN) approach with a Sliding Mode Controller (SMC) designed for a tendon-driven soft continuum wrist integrated into a prosthetic hand known as "PRISMA HAND II". Our research focuses on developing a controller that provides a fast dynamic response with reduced computational effort during wrist motions. The proposed controller consists of an ANN for computing bending angles together with an SMC to regulate tendon forces. Kinematic and dynamic models of the wrist are formulated using the Piece-wise Constant Curvature (PCC) hypothesis. The performance of the proposed controller is compared with other control strategies developed for the same wrist. Simulation studies and experimental validations of the fabricated wrist using the controller are included in the paper.
VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory
VLA models have shown promising potential in embodied navigation by unifying perception and planning while inheriting the strong generalization abilities of large VLMs. However, most existing VLA models rely on reactive mappings directly from observations to actions, lacking the explicit reasoning capabilities and persistent memory required for complex, long-horizon navigation tasks. To address these challenges, we propose VLingNav, a VLA model for embodied navigation grounded in linguistic-driven cognition. First, inspired by the dual-process theory of human cognition, we introduce an adaptive chain-of-thought mechanism, which dynamically triggers explicit reasoning only when necessary, enabling the agent to fluidly switch between fast, intuitive execution and slow, deliberate planning. Second, to handle long-horizon spatial dependencies, we develop a visual-assisted linguistic memory module that constructs a persistent, cross-modal semantic memory, enabling the agent to recall past observations to prevent repetitive exploration and infer movement trends for dynamic environments. For the training recipe, we construct Nav-AdaCoT-2.9M, the largest embodied navigation dataset with reasoning annotations to date, enriched with adaptive CoT annotations that induce a reasoning paradigm capable of adjusting both when to think and what to think about. Moreover, we incorporate an online expert-guided reinforcement learning stage, enabling the model to surpass pure imitation learning and to acquire more robust, self-explored navigation behaviors. Extensive experiments demonstrate that VLingNav achieves state-of-the-art performance across a wide range of embodied navigation benchmarks. Notably, VLingNav transfers to real-world robotic platforms in a zero-shot manner, executing various navigation tasks and demonstrating strong cross-domain and cross-task generalization.
comment: Project page: https://wsakobe.github.io/VLingNav-web/
QP-Based Control of an Underactuated Aerial Manipulator under Constraints
This paper presents a constraint-aware control framework for underactuated aerial manipulators, enabling accurate end-effector trajectory tracking while explicitly accounting for safety and feasibility constraints. The control problem is formulated as a quadratic program that computes dynamically consistent generalized accelerations subject to underactuation, actuator bounds, and system constraints. To enhance robustness against disturbances, modeling uncertainties, and steady-state errors, a passivity-based integral action is incorporated at the torque level without compromising feasibility. The effectiveness of the proposed approach is demonstrated through high-fidelity physics-based simulations, which include parameter perturbations, viscous joint friction, and realistic sensing and state-estimation effects. This demonstrates accurate tracking, smooth control inputs, and reliable constraint satisfaction under realistic operating conditions.
Keyframe-based Dense Mapping with the Graph of View-Dependent Local Maps ICRA 2020
In this article, we propose a new keyframe-based mapping system. The proposed method updates local Normal Distribution Transform maps (NDT) using data from an RGB-D sensor. The cells of the NDT are stored in 2D view-dependent structures to better utilize the properties and uncertainty model of RGB-D cameras. This method naturally represents an object closer to the camera origin with higher precision. The local maps are stored in the pose graph which allows correcting global map after loop closure detection. We also propose a procedure that allows merging and filtering local maps to obtain a global map of the environment. Finally, we compare our method with Octomap and NDT-OM and provide example applications of the proposed mapping method.
comment: Accepted in ICRA 2020
Simplifying ROS2 controllers with a modular architecture for robot-agnostic reference generation
This paper introduces a novel modular architecture for ROS2 that decouples the logic required to acquire, validate, and interpolate references from the control laws that track them. The design includes a dedicated component, named Reference Generator, that receives references, in the form of either single points or trajectories, from external nodes (e.g., planners), and writes single-point references at the controller's sampling period via the existing ros2_control chaining mechanism to downstream controllers. This separation removes duplicated reference-handling code from controllers and improves reusability across robot platforms. We implement two reference generators: one for handling joint-space references and one for Cartesian references, along with a set of new controllers (PD with gravity compensation, Cartesian pose, and admittance controllers) and validate the approach on simulated and real Universal Robots and Franka Emika manipulators. Results show that (i) references are tracked reliably in all tested scenarios, (ii) reference generators reduce duplicated reference-handling code across chained controllers to favor the construction and reuse of complex controller pipelines, and (iii) controller implementations remain focused only on control laws.
comment: 5 pages, 7 figures
AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization
Internet of underwater things (IoUT) is increasingly gathering attention with the aim of monitoring sea life and deep ocean environment, underwater surveillance as well as maintenance of underwater installments. However, conventional IoUT devices, reliant on battery power, face limitations in lifespan and pose environmental hazards upon disposal. This paper introduces a sustainable approach for simultaneous information uplink from the IoUT devices and acoustic energy transfer (AET) to the devices via an autonomous underwater vehicle (AUV), potentially enabling them to operate indefinitely. To tackle the time-sensitivity, we adopt age of information (AoI), and Jain's fairness index. We develop two deep-reinforcement learning (DRL) algorithms, offering a high-complexity, high-performance frequency division duplex (FDD) solution and a low-complexity, medium-performance time division duplex (TDD) approach. The results elucidate that the proposed FDD and TDD solutions significantly reduce the average AoI and boost the harvested energy as well as data collection fairness compared to baseline approaches.
AME-2: Agile and Generalized Legged Locomotion via Attention-Based Neural Map Encoding
Achieving agile and generalized legged locomotion across terrains requires tight integration of perception and control, especially under occlusions and sparse footholds. Existing methods have demonstrated agility on parkour courses but often rely on end-to-end sensorimotor models with limited generalization and interpretability. By contrast, methods targeting generalized locomotion typically exhibit limited agility and struggle with visual occlusions. We introduce AME-2, a unified reinforcement learning (RL) framework for agile and generalized locomotion that incorporates a novel attention-based map encoder in the control policy. This encoder extracts local and global mapping features and uses attention mechanisms to focus on salient regions, producing an interpretable and generalized embedding for RL-based control. We further propose a learning-based mapping pipeline that provides fast, uncertainty-aware terrain representations robust to noise and occlusions, serving as policy inputs. It uses neural networks to convert depth observations into local elevations with uncertainties, and fuses them with odometry. The pipeline also integrates with parallel simulation so that we can train controllers with online mapping, aiding sim-to-real transfer. We validate AME-2 with the proposed mapping pipeline on a quadruped and a biped robot, and the resulting controllers demonstrate strong agility and generalization to unseen terrains in simulation and in real-world experiments.
comment: under review
Real2Sim based on Active Perception with automatically VLM-generated Behavior Trees
Constructing an accurate simulation model of real-world environments requires reliable estimation of physical parameters such as mass, geometry, friction, and contact surfaces. Traditional real-to-simulation (Real2Sim) pipelines rely on manual measurements or fixed, pre-programmed exploration routines, which limit their adaptability to varying tasks and user intents. This paper presents a Real2Sim framework that autonomously generates and executes Behavior Trees for task-specific physical interactions to acquire only the parameters required for a given simulation objective, without relying on pre-defined task templates or expert-designed exploration routines. Given a high-level user request, an incomplete simulation description, and an RGB observation of the scene, a vision-language model performs multi-modal reasoning to identify relevant objects, infer required physical parameters, and generate a structured Behavior Tree composed of elementary robotic actions. The resulting behavior is executed on a torque-controlled Franka Emika Panda, enabling compliant, contact-rich interactions for parameter estimation. The acquired measurements are used to automatically construct a physics-aware simulation. Experimental results on the real manipulator demonstrate estimation of object mass, surface height, and friction-related quantities across multiple scenarios, including occluded objects and incomplete prior models. The proposed approach enables interpretable, intent-driven, and autonomously Real2Sim pipelines, bridging high-level reasoning with physically-grounded robotic interaction.
Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?
The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning (DRL) for real-time policy optimization. We starts by introducing the foundational principles of EI driving and LMMs. Moreover, we examine the emerging opportunities this framework enables, encompassing potential benefits and representative use cases. A case study is conducted experimentally to validate the performance superiority of our framework in completing lane-change planning task. Finally, several future research directions to empower EI driving are identified to guide subsequent work.
Teaching Robots Like Dogs: Learning Agile Navigation from Luring, Gesture, and Speech
In this work, we aim to enable legged robots to learn how to interpret human social cues and produce appropriate behaviors through physical human guidance. However, learning through physical engagement can place a heavy burden on users when the process requires large amounts of human-provided data. To address this, we propose a human-in-the-loop framework that enables robots to acquire navigational behaviors in a data-efficient manner and to be controlled via multimodal natural human inputs, specifically gestural and verbal commands. We reconstruct interaction scenes using a physics-based simulation and aggregate data to mitigate distributional shifts arising from limited demonstration data. Our progressive goal cueing strategy adaptively feeds appropriate commands and navigation goals during training, leading to more accurate navigation and stronger alignment between human input and robot behavior. We evaluate our framework across six real-world agile navigation scenarios, including jumping over or avoiding obstacles. Our experimental results show that our proposed method succeeds in almost all trials across these scenarios, achieving a 97.15% task success rate with less than 1 hour of demonstration data in total.
comment: 10 pages, 7 figures
Edge-Optimized Multimodal Learning for UAV Video Understanding via BLIP-2
The demand for real-time visual understanding and interaction in complex scenarios is increasingly critical for unmanned aerial vehicles. However, a significant challenge arises from the contradiction between the high computational cost of large Vision language models and the limited computing resources available on UAV edge devices. To address this challenge, this paper proposes a lightweight multimodal task platform based on BLIP-2, integrated with YOLO-World and YOLOv8-Seg models. This integration extends the multi-task capabilities of BLIP-2 for UAV applications with minimal adaptation and without requiring task-specific fine-tuning on drone data. Firstly, the deep integration of BLIP-2 with YOLO models enables it to leverage the precise perceptual results of YOLO for fundamental tasks like object detection and instance segmentation, thereby facilitating deeper visual-attention understanding and reasoning. Secondly, a content-aware key frame sampling mechanism based on K-Means clustering is designed, which incorporates intelligent frame selection and temporal feature concatenation. This equips the lightweight BLIP-2 architecture with the capability to handle video-level interactive tasks effectively. Thirdly, a unified prompt optimization scheme for multi-task adaptation is implemented. This scheme strategically injects structured event logs from the YOLO models as contextual information into BLIP-2's input. Combined with output constraints designed to filter out technical details, this approach effectively guides the model to generate accurate and contextually relevant outputs for various tasks.
comment: The Tenth International Conference on Data Mining and Big Data (DMBD'2025)
Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments
Benefiting from the rapid advancements in large language models (LLMs), human-drone interaction has reached unprecedented opportunities. In this paper, we propose a method that integrates a fine-tuned CodeT5 model with the Unreal Engine-based AirSim drone simulator to efficiently execute multi-task operations using natural language commands. This approach enables users to interact with simulated drones through prompts or command descriptions, allowing them to easily access and control the drone's status, significantly lowering the operational threshold. In the AirSim simulator, we can flexibly construct visually realistic dynamic environments to simulate drone applications in complex scenarios. By combining a large dataset of (natural language, program code) command-execution pairs generated by ChatGPT with developer-written drone code as training data, we fine-tune the CodeT5 to achieve automated translation from natural language to executable code for drone tasks. Experimental results demonstrate that the proposed method exhibits superior task execution efficiency and command understanding capabilities in simulated environments. In the future, we plan to extend the model functionality in a modular manner, enhancing its adaptability to complex scenarios and driving the application of drone technologies in real-world environments.
comment: 1st International Conference on Drones and Unmanned Systems (DAUS' 2025)
Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition
This paper introduces a decentralized multi-agent reinforcement learning framework enabling structurally heterogeneous teams of agents to jointly discover and acquire randomly located targets in environments characterized by partial observability, communication constraints, and dynamic interactions. Each agent's policy is trained with the Multi-Agent Proximal Policy Optimization algorithm and employs a Graph Attention Network encoder that integrates simulated range-sensing data with communication embeddings exchanged among neighboring agents, enabling context-aware decision-making from both local sensing and relational information. In particular, this work introduces a unified framework that integrates graph-based communication and trajectory-aware safety through safety filters. The architecture is supported by a structured reward formulation designed to encourage effective target discovery and acquisition, collision avoidance, and de-correlation between the agents' communication vectors by promoting informational orthogonality. The effectiveness of the proposed reward function is demonstrated through a comprehensive ablation study. Moreover, simulation results demonstrate safe and stable task execution, confirming the framework's effectiveness.
comment: 7 pages, 4 figures, submitted to the IFAC World Congress 2026
ActiveVLA: Injecting Active Perception into Vision-Language-Action Models for Precise 3D Robotic Manipulation
Recent advances in robot manipulation have leveraged pre-trained vision-language models (VLMs) and explored integrating 3D spatial signals into these models for effective action prediction, giving rise to the promising vision-language-action (VLA) paradigm. However, most existing approaches overlook the importance of active perception: they typically rely on static, wrist-mounted cameras that provide an end-effector-centric viewpoint. As a result, these models are unable to adaptively select optimal viewpoints or resolutions during task execution, which significantly limits their performance in long-horizon tasks and fine-grained manipulation scenarios. To address these limitations, we propose ActiveVLA, a novel vision-language-action framework that empowers robots with active perception capabilities for high-precision, fine-grained manipulation. ActiveVLA adopts a coarse-to-fine paradigm, dividing the process into two stages: (1) Critical region localization. ActiveVLA projects 3D inputs onto multi-view 2D projections, identifies critical 3D regions, and supports dynamic spatial awareness. (2) Active perception optimization. Drawing on the localized critical regions, ActiveVLA uses an active view selection strategy to choose optimal viewpoints. These viewpoints aim to maximize amodal relevance and diversity while minimizing occlusions. Additionally, ActiveVLA applies a 3D zoom-in to improve resolution in key areas. Together, these steps enable finer-grained active perception for precise manipulation. Extensive experiments demonstrate that ActiveVLA achieves precise 3D manipulation and outperforms state-of-the-art baselines on three simulation benchmarks. Moreover, ActiveVLA transfers seamlessly to real-world scenarios, enabling robots to learn high-precision tasks in complex environments.
Spiking Neural-Invariant Kalman Fusion for Accurate Localization Using Low-Cost IMUs
Low-cost inertial measurement units (IMUs) are widely utilized in mobile robot localization due to their affordability and ease of integration. However, their complex, nonlinear, and time-varying noise characteristics often lead to significant degradation in localization accuracy when applied directly for dead reckoning. To overcome this limitation, we propose a novel brain-inspired state estimation framework that combines a spiking neural network (SNN) with an invariant extended Kalman filter (InEKF). The SNN is designed to extract motion-related features from long sequences of IMU data affected by substantial random noise and is trained via a surrogate gradient descent algorithm to enable dynamic adaptation of the covariance noise parameter within the InEKF. By fusing the SNN output with raw IMU measurements, the proposed method enhances the robustness and accuracy of pose estimation. Extensive experiments conducted on the KITTI dataset and real-world data collected using a mobile robot equipped with a low-cost IMU demonstrate that the proposed approach outperforms state-of-the-art methods in localization accuracy and exhibits strong robustness to sensor noise, highlighting its potential for real-world mobile robot applications.
FSAG: Enhancing Human-to-Dexterous-Hand Finger-Specific Affordance Grounding via Diffusion Models
Dexterous grasp synthesis remains a central challenge: the high dimensionality and kinematic diversity of multi-fingered hands prevent direct transfer of algorithms developed for parallel-jaw grippers. Existing approaches typically depend on large, hardware-specific grasp datasets collected in simulation or through costly real-world trials, hindering scalability as new dexterous hand designs emerge. To this end, we propose a data-efficient framework, which is designed to bypass robot grasp data collection by exploiting the rich, object-centric semantic priors latent in pretrained generative diffusion models. Temporally aligned and fine-grained grasp affordances are extracted from raw human video demonstrations and fused with 3D scene geometry from depth images to infer semantically grounded contact targets. A kinematics-aware retargeting module then maps these affordance representations to diverse dexterous hands without per-hand retraining. The resulting system produces stable, functionally appropriate multi-contact grasps that remain reliably successful across common objects and tools, while exhibiting strong generalization across previously unseen object instances within a category, pose variations, and multiple hand embodiments. This work (i) introduces a semantic affordance extraction pipeline leveraging vision-language generative priors for dexterous grasping, (ii) demonstrates cross-hand generalization without constructing hardware-specific grasp datasets, and (iii) establishes that a single depth modality suffices for high-performance grasp synthesis when coupled with foundation-model semantics. Our results highlight a path toward scalable, hardware-agnostic dexterous manipulation driven by human demonstrations and pretrained generative models.
A brain-inspired information fusion method for enhancing robot GPS outages navigation
Low-cost inertial navigation systems (INS) are prone to sensor biases and measurement noise, which lead to rapid degradation of navigation accuracy during global positioning system (GPS) outages. To address this challenge and improve positioning continuity in GPS-denied environments, this paper proposes a brain-inspired GPS/INS fusion network (BGFN) based on spiking neural networks (SNNs). The BGFN architecture integrates a spiking Transformer with a spiking encoder to simultaneously extract spatial features from inertial measurement unit (IMU) signals and capture their temporal dynamics. By modeling the relationship between vehicle attitude, specific force, angular rate, and GPS-derived position increments, the network leverages both current and historical IMU data to estimate vehicle motion. The effectiveness of the proposed method is evaluated through real-world field tests and experiments on public datasets. Compared to conventional deep learning approaches, the results demonstrate that BGFN achieves higher accuracy and enhanced reliability in navigation performance, particularly under prolonged GPS outages.
Robust Subpixel Localization of Diagonal Markers in Large-Scale Navigation via Multi-Layer Screening and Adaptive Matching
This paper proposes a robust, high-precision positioning methodology to address localization failures arising from complex background interference in large-scale flight navigation and the computational inefficiency inherent in conventional sliding window matching techniques. The proposed methodology employs a three-tiered framework incorporating multi-layer corner screening and adaptive template matching. Firstly, dimensionality is reduced through illumination equalization and structural information extraction. A coarse-to-fine candidate selection strategy minimizes sliding window computational costs, enabling rapid estimation of the marker's position. Finally, adaptive templates are generated for candidate points, achieving subpixel precision through improved template matching with correlation coefficient extremum fitting. Experimental results demonstrate the method's effectiveness in extracting and localizing diagonal markers in complex, large-scale environments, making it ideal for field-of-view measurement in navigation tasks.
comment: This paper has been accepted by Applied Optics
A Pin-Array Structure for Gripping and Shape Recognition of Convex and Concave Terrain Profiles
This paper presents a gripper capable of grasping and recognizing terrain shapes for mobile robots in extreme environments. Multi-limbed climbing robots with grippers are effective on rough terrains, such as cliffs and cave walls. However, such robots may fall over by misgrasping the surface or getting stuck owing to the loss of graspable points in unknown natural environments. To overcome these issues, we need a gripper capable of adaptive grasping to irregular terrains, not only for grasping but also for measuring the shape of the terrain surface accurately. We developed a gripper that can grasp both convex and concave terrains and simultaneously measure the terrain shape by introducing a pin-array structure. We demonstrated the mechanism of the gripper and evaluated its grasping and terrain recognition performance using a prototype. Moreover, the proposed pin-array design works well for 3D terrain mapping as well as adaptive grasping for irregular terrains.
comment: Author's version of a manuscript accepted at the 2022 IEEE International Conference on Robotics and Biomimetics (ROBIO). (c) IEEE
Efficient Incremental SLAM via Information-Guided and Selective Optimization
We present an efficient incremental SLAM back-end that achieves the accuracy of full batch optimization while substantially reducing computational cost. The proposed approach combines two complementary ideas: information-guided gating (IGG) and selective partial optimization (SPO). IGG employs an information-theoretic criterion based on the log-determinant of the information matrix to quantify the contribution of new measurements, triggering global optimization only when a significant information gain is observed. This avoids unnecessary relinearization and factorization when incoming data provide little additional information. SPO executes multi-iteration Gauss-Newton (GN) updates but restricts each iteration to the subset of variables most affected by the new measurements, dynamically refining this active set until convergence. Together, these mechanisms retain all measurements to preserve global consistency while focusing computation on parts of the graph where it yields the greatest benefit. We provide theoretical analysis showing that the proposed approach maintains the convergence guarantees of full GN. Extensive experiments on benchmark SLAM datasets show that our approach consistently matches the estimation accuracy of batch solvers, while achieving significant computational savings compared to conventional incremental approaches. The results indicate that the proposed approach offers a principled balance between accuracy and efficiency, making it a robust and scalable solution for real-time operation in dynamic data-rich environments.
Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation
Humanoid robot manipulation is a crucial research area for executing diverse human-level tasks, involving high-level semantic reasoning and low-level action generation. However, precise scene understanding and sample-efficient learning from human demonstrations remain critical challenges, severely hindering the applicability and generalizability of existing frameworks. This paper presents a novel RGMP-S, Recurrent Geometric-prior Multimodal Policy with Spiking features, facilitating both high-level skill reasoning and data-efficient motion synthesis. To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases to enable precise 3D scene understanding within the vision-language model. Specifically, we construct a Long-horizon Geometric Prior Skill Selector that effectively aligns the semantic instructions with spatial constraints, ultimately achieving robust generalization in unseen environments. For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network. We parameterize robot-object interactions via recursive spiking for spatiotemporal consistency, fully distilling long-horizon dynamic features while mitigating the overfitting issue in sparse demonstration scenarios. Extensive experiments are conducted across the Maniskill simulation benchmark and three heterogeneous real-world robotic systems, encompassing a custom-developed humanoid, a desktop manipulator, and a commercial robotic platform. Empirical results substantiate the superiority of our method over state-of-the-art baselines and validate the efficacy of the proposed modules in diverse generalization scenarios. To facilitate reproducibility, the source code and video demonstrations are publicly available at https://github.com/xtli12/RGMP-S.git.
Fairness risk and its privacy-enabled solution in AI-driven robotic applications
Complex decision-making by autonomous machines and algorithms could underpin the foundations of future society. Generative AI is emerging as a powerful engine for such transitions. However, we show that Generative AI-driven developments pose a critical pitfall: fairness concerns. In robotic applications, although intuitions about fairness are common, a precise and implementable definition that captures user utility and inherent data randomness is missing. Here we provide a utility-aware fairness metric for robotic decision making and analyze fairness jointly with user-data privacy, deriving conditions under which privacy budgets govern fairness metrics. This yields a unified framework that formalizes and quantifies fairness and its interplay with privacy, which is tested in a robot navigation task. In view of the fact that under legal requirements, most robotic systems will enforce user privacy, the approach shows surprisingly that such privacy budgets can be jointly used to meet fairness targets. Addressing fairness concerns in the creative combined consideration of privacy is a step towards ethical use of AI and strengthens trust in autonomous robots deployed in everyday environments.
Heterogeneous computing platform for real-time robotics
After Industry 4.0 has embraced tight integration between machinery (OT), software (IT), and the Internet, creating a web of sensors, data, and algorithms in service of efficient and reliable production, a new concept of Society 5.0 is emerging, in which infrastructure of a city will be instrumented to increase reliability, efficiency, and safety. Robotics will play a pivotal role in enabling this vision that is pioneered by the NEOM initiative - a smart city, co-inhabited by humans and robots. In this paper we explore the computing platform that will be required to enable this vision. We show how we can combine neuromorphic computing hardware, exemplified by the Loihi2 processor used in conjunction with event-based cameras, for sensing and real-time perception and interaction with a local AI compute cluster (GPUs) for high-level language processing, cognition, and task planning. We demonstrate the use of this hybrid computing architecture in an interactive task, in which a humanoid robot plays a musical instrument with a human. Central to our design is the efficient and seamless integration of disparate components, ensuring that the synergy between software and hardware maximizes overall performance and responsiveness. Our proposed system architecture underscores the potential of heterogeneous computing architectures in advancing robotic autonomy and interactive intelligence, pointing toward a future where such integrated systems become the norm in complex, real-time applications.
An Adaptive Neuro-Controller Developed for a Prosthetic Hand Wrist
The significance of employing a controller in prosthetic hands cannot be overstated, as it plays a crucial role in enhancing the functionality and usability of these systems. This paper introduces an adaptive neuro-controller specifically developed for a tendon-driven soft continuum wrist of a prosthetic hand. Kinematic and dynamic modeling of the wrist is carried out using the Timoshenko beam theory. A Neural Network (NN) based strategy is adopted to predict the required motor currents to manipulate the wrist tendons from the errors in the deflection of the wrist section. The Timoshenko beam theory is used to compute the required tendon tension from the input motor current. A comparison of the adaptive neuro-controller with other similar controllers is conducted to analyze the performance of the proposed approach. Simulation studies and experimental validations of the fabricated wrist are included to demonstrate the effectiveness of the controller.
Learning Force Distribution Estimation for the GelSight Mini Optical Tactile Sensor Based on Finite Element Analysis
Contact-rich manipulation remains a major challenge in robotics. Optical tactile sensors like GelSight Mini offer a low-cost solution for contact sensing by capturing soft-body deformations of the silicone gel. However, accurately inferring shear and normal force distributions from these gel deformations has yet to be fully addressed. In this work, we propose a machine learning approach using a U-net architecture to predict force distributions directly from the sensor's raw images. Our model, trained on force distributions inferred from \ac{fea}, demonstrates promising accuracy in predicting normal and shear force distributions for the commercially available GelSight Mini sensor. It also shows potential for generalization across indenters, sensors of the same type, and for enabling real-time application. The codebase, dataset and models are open-sourced and available at https://feats-ai.github.io .
Robotic Tele-Operation for Upper Aerodigestive Tract Microsurgery: System Design and Validation
Upper aerodigestive tract (UADT) treatments frequently employ transoral laser microsurgery (TLM) for procedures such as the removal of tumors or polyps. In TLM, a laser beam is used to cut target tissue, while forceps are employed to grasp, manipulate, and stabilize tissue within the UADT. Although TLM systems may rely on different technologies and interfaces, forceps manipulation is still predominantly performed manually, introducing limitations in ergonomics, precision, and controllability. This paper proposes a novel robotic system for tissue manipulation in UADT procedures, based on a novel end-effector designed for forceps control. The system is integrated within a teleoperation framework that employs a robotic manipulator with a programmed remote center of motion (RCM), enabling precise and constrained instrument motion while improving surgeon ergonomics. The proposed approach is validated through two experimental studies and a dedicated usability evaluation, demonstrating its effectiveness and suitability for UADT surgical applications.
comment: I would like to withdraw the paper because I would like to change some of the results in it which will take some time. For this reason, I prefer to remove it and do a new resubmission once I've finished my work
Using Mobile AR for Rapid Feasibility Analysis for Deployment of Robots: A Usability Study with Non-Expert Users
Automating a production line with robotic arms is a complex, demanding task that requires not only substantial resources but also a deep understanding of the automated processes and available technologies and tools. Expert integrators must consider factors such as placement, payload, and robot reach requirements to determine the feasibility of automation. Ideally, such considerations are based on a detailed digital simulation developed before any hardware is deployed. However, this process is often time-consuming and challenging. To simplify these processes, we introduce a much simpler method for the feasibility analysis of robotic arms' reachability, designed for non-experts. We implement this method through a mobile, sensing-based prototype tool. The two-step experimental evaluation included the expert user study results, which helped us identify the difficulty levels of various deployment scenarios and refine the initial prototype. The results of the subsequent quantitative study with 22 non-expert participants utilizing both scenarios indicate that users could complete both simple and complex feasibility analyses in under ten minutes, exhibiting similar cognitive loads and high engagement. Overall, the results suggest that the tool was well-received and rated as highly usable, thereby showing a new path for changing the ease of feasibility analysis for automation.
comment: Accepted in IEEE RA-L
RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization
We introduce RGS-SLAM, a robust Gaussian-splatting SLAM framework that replaces the residual-driven densification stage of GS-SLAM with a training-free correspondence-to-Gaussian initialization. Instead of progressively adding Gaussians as residuals reveal missing geometry, RGS-SLAM performs a one-shot triangulation of dense multi-view correspondences derived from DINOv3 descriptors refined through a confidence-aware inlier classifier, generating a well-distributed and structure-aware Gaussian seed prior to optimization. This initialization stabilizes early mapping and accelerates convergence by roughly 20\%, yielding higher rendering fidelity in texture-rich and cluttered scenes while remaining fully compatible with existing GS-SLAM pipelines. Evaluated on the TUM RGB-D and Replica datasets, RGS-SLAM achieves competitive or superior localization and reconstruction accuracy compared with state-of-the-art Gaussian and point-based SLAM systems, sustaining real-time mapping performance at up to 925 FPS. Project page:https://breeze1124.github.io/rgs-slam-project-page/
comment: 10 pages, 9 figures
Symbolic Learning of Interpretable Reduced-Order Models for Jumping Quadruped Robots
Reduced-order models are central to motion planning and control of quadruped robots, yet existing templates are often hand-crafted for a specific locomotion modality. This motivates the need for automatic methods that extract task-specific, interpretable low-dimensional dynamics directly from data. We propose a methodology that combines a linear autoencoder with symbolic regression to derive such models. The linear autoencoder provides a consistent latent embedding for configurations, velocities, accelerations, and inputs, enabling the sparse identification of nonlinear dynamics (SINDy) to operate in a compact, physics-aligned space. A multi-phase, hybrid-aware training scheme ensures coherent latent coordinates across contact transitions. We focus our validation on quadruped jumping-a representative, challenging, yet contained scenario in which a principled template model is especially valuable. The resulting symbolic dynamics outperform the state-of-the-art handcrafted actuated spring-loaded inverted pendulum (aSLIP) baseline in simulation and hardware across multiple robots and jumping modalities.
comment: 8 pages
FoldNet: Learning Generalizable Closed-Loop Policy for Garment Folding via Keypoint-Driven Asset and Demonstration Synthesis
Due to the deformability of garments, generating a large amount of high-quality data for robotic garment manipulation tasks is highly challenging. In this paper, we present a synthetic garment dataset that can be used for robotic garment folding. We begin by constructing geometric garment templates based on keypoints and applying generative models to generate realistic texture patterns. Leveraging these keypoint annotations, we generate folding demonstrations in simulation and train folding policies via closed-loop imitation learning. To improve robustness, we propose KG-DAgger, which uses a keypoint-based strategy to generate demonstration data for recovering from failures. KG-DAgger significantly improves the model performance, boosting the real-world success rate by 25\%. After training with 15K trajectories (about 2M image-action pairs), the model achieves a 75\% success rate in the real world. Experiments in both simulation and real-world settings validate the effectiveness of our proposed framework.
DexH2R: Task-oriented Dexterous Manipulation from Human to Robots
Dexterous manipulation is a critical aspect of human capability, enabling interaction with a wide variety of objects. Recent advancements in learning from human demonstrations and teleoperation have enabled progress for robots in such ability. However, these approaches either require complex data collection such as costly human effort for eye-robot contact, or suffer from poor generalization when faced with novel scenarios. To solve both challenges, we propose a framework, DexH2R, that combines human hand motion retargeting with a task-oriented residual action policy, improving task performance by bridging the embodiment gap between human and robotic dexterous hands. Specifically, DexH2R learns the residual policy directly from retargeted primitive actions and task-oriented rewards, eliminating the need for labor-intensive teleoperation systems. Moreover, we incorporate test-time guidance for novel scenarios by taking in desired trajectories of human hands and objects, allowing the dexterous hand to acquire new skills with high generalizability. Extensive experiments in both simulation and real-world environments demonstrate the effectiveness of our work, outperforming prior state-of-the-arts by 40% across various settings.
Learning Contextually-Adaptive Rewards via Calibrated Features
A key challenge in reward learning from human input is that desired agent behavior often changes based on context. For example, a robot must adapt to avoid a stove once it becomes hot. We observe that while high-level preferences (e.g., prioritizing safety over efficiency) often remain constant, context alters the $\textit{saliency}$--or importance--of reward features. For instance, stove heat changes the relevance of the robot's proximity, not the underlying preference for safety. Moreover, these contextual effects recur across tasks, motivating the need for transferable representations to encode them. Existing multi-task and meta-learning methods simultaneously learn representations and task preferences, at best $\textit{implicitly}$ capturing contextual effects and requiring substantial data to separate them from task-specific preferences. Instead, we propose $\textit{explicitly}$ modeling and learning context-dependent feature saliency separately from context-invariant preferences. We introduce $\textit{calibrated features}$--modular representations that capture contextual effects on feature saliency--and present specialized paired comparison queries that isolate saliency from preference for efficient learning. Simulated experiments show our method improves sample efficiency, requiring 10x fewer preference queries than baselines to achieve equivalent reward accuracy, with up to 15% better performance in low-data regimes (5-10 queries). An in-person user study (N=12) demonstrates that participants can effectively teach their personal contextual preferences with our method, enabling adaptable and personalized reward learning.
comment: Proceedings of the 21st ACM/IEEE International Conference on Human-Robot Interaction (HRI '26), March 16 - 19, 2026, Edinburgh, Scotland, UK
MSSF: A 4D Radar and Camera Fusion Framework With Multi-Stage Sampling for 3D Object Detection in Autonomous Driving
As one of the automotive sensors that have emerged in recent years, 4D millimeter-wave radar has a higher resolution than conventional 3D radar and provides precise elevation measurements. But its point clouds are still sparse and noisy, making it challenging to meet the requirements of autonomous driving. Camera, as another commonly used sensor, can capture rich semantic information. As a result, the fusion of 4D radar and camera can provide an affordable and robust perception solution for autonomous driving systems. However, previous radar-camera fusion methods have not yet been thoroughly investigated, resulting in a large performance gap compared to LiDAR-based methods. Specifically, they ignore the feature-blurring problem and do not deeply interact with image semantic information. To this end, we present a simple but effective multi-stage sampling fusion (MSSF) network based on 4D radar and camera. On the one hand, we design a fusion block that can deeply interact point cloud features with image features, and can be applied to commonly used single-modal backbones in a plug-and-play manner. The fusion block encompasses two types, namely, simple feature fusion (SFF) and multiscale deformable feature fusion (MSDFF). The SFF is easy to implement, while the MSDFF has stronger fusion abilities. On the other hand, we propose a semantic-guided head to perform foreground-background segmentation on voxels with voxel feature re-weighting, further alleviating the problem of feature blurring. Extensive experiments on the View-of-Delft (VoD) and TJ4DRadset datasets demonstrate the effectiveness of our MSSF. Notably, compared to state-of-the-art methods, MSSF achieves a 7.0% and 4.0% improvement in 3D mean average precision on the VoD and TJ4DRadSet datasets, respectively. It even surpasses classical LiDAR-based methods on the VoD dataset.
comment: T-TITS accepted, code avaliable
Real-Time LiDAR Point Cloud Densification for Low-Latency Spatial Data Transmission
To realize low-latency spatial transmission system for immersive telepresence, there are two major problems: capturing dynamic 3D scene densely and processing them in real time. LiDAR sensors capture 3D in real time, but produce sparce point clouds. Therefore, this paper presents a high-speed LiDAR point cloud densification method to generate dense 3D scene with minimal latency, addressing the need for on-the-fly depth completion while maintaining real-time performance. Our approach combines multiple LiDAR inputs with high-resolution color images and applies a joint bilateral filtering strategy implemented through a convolutional neural network architecture. Experiments demonstrate that the proposed method produces dense depth maps at full HD resolution in real time (30 fps), which is over 15x faster than a recent training-based depth completion approach. The resulting dense point clouds exhibit accurate geometry without multiview inconsistencies or ghosting artifacts.
Variable Elimination in Hybrid Factor Graphs for Discrete-Continuous Inference & Estimation
Many hybrid problems in robotics involve both continuous and discrete components, and modeling them together for estimation tasks has been a long standing and difficult problem. Hybrid Factor Graphs give us a mathematical framework to model these types of problems, however existing approaches for solving them are based on approximations. In this work, we propose an efficient Hybrid Factor Graph framework alongwith a variable elimination algorithm to produce a hybrid Bayes network, which can then be used for exact Maximum A Posteriori estimation and marginalization over both sets of variables. Our approach first develops a novel hybrid Gaussian factor which can connect to both discrete and continuous variables, and a hybrid conditional which can represent multiple continuous hypotheses conditioned on the discrete variables. Using these representations, we derive the process of hybrid variable elimination under the Conditional Linear Gaussian scheme, giving us exact posteriors as hybrid Bayes network. To bound the number of discrete hypotheses, we use a tree-structured representation of the factors coupled with a simple pruning and probabilistic assignment scheme, which allows for tractable inference. We demonstrate the applicability of our framework on a SLAM dataset with ambiguous measurements, where discrete choices for the most likely measurement have to be made. Our demonstrated results showcase the accuracy, generality, and simplicity of our hybrid factor graph framework.
On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning
Vision-Language-Action models have recently emerged as a powerful paradigm for general-purpose robot learning, enabling agents to map visual observations and natural-language instructions into executable robotic actions. Though popular, they are primarily trained via supervised fine-tuning or training-time reinforcement learning, requiring explicit fine-tuning phases, human interventions, or controlled data collection. Consequently, existing methods remain unsuitable for challenging simulated- or physical-world deployments, where robots must respond autonomously and flexibly to evolving environments. To address this limitation, we introduce a Test-Time Reinforcement Learning for VLAs (TT-VLA), a framework that enables on-the-fly policy adaptation during inference. TT-VLA formulates a dense reward mechanism that leverages step-by-step task-progress signals to refine action policies during test time while preserving the SFT/RL-trained priors, making it an effective supplement to current VLA models. Empirical results show that our approach enhances overall adaptability, stability, and task success in dynamic, previously unseen scenarios under simulated and real-world settings. We believe TT-VLA offers a principled step toward self-improving, deployment-ready VLAs.
Cross-Domain Imitation Learning via Optimal Transport ICLR 2022
Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents. Our theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space.
comment: ICLR 2022
Ensemble-Based Event Camera Place Recognition Under Varying Illumination
Compared to conventional cameras, event cameras provide a high dynamic range and low latency, offering greater robustness to rapid motion and challenging lighting conditions. Although the potential of event cameras for visual place recognition (VPR) has been established, developing robust VPR frameworks under severe illumination changes remains an open research problem. In this paper, we introduce an ensemble-based approach to event camera place recognition that combines sequence-matched results from multiple event-to-frame reconstructions, VPR feature extractors, and temporal resolutions. Unlike previous event-based ensemble methods, which only utilise temporal resolution, our broader fusion strategy delivers significantly improved robustness under varied lighting conditions (e.g., afternoon, sunset, night), achieving a 57% relative improvement in Recall@1 across day-night transitions. We evaluate our approach on two long-term driving datasets (with 8 km per traverse) without metric subsampling, thereby preserving natural variations in speed and stop duration that influence event density. We also conduct a comprehensive analysis of key design choices, including binning strategies, polarity handling, reconstruction methods, and feature extractors, to identify the most critical components for robust performance. Additionally, we propose a modification to the standard sequence matching framework that enhances performance at longer sequence lengths. To facilitate future research, we will release our codebase and benchmarking framework.
Human-in-the-Loop Segmentation of Multi-species Coral Imagery CVPR 2024
Marine surveys by robotic underwater and surface vehicles result in substantial quantities of coral reef imagery, however labeling these images is expensive and time-consuming for domain experts. Point label propagation is a technique that uses existing images labeled with sparse points to create augmented ground truth data, which can be used to train a semantic segmentation model. In this work, we show that recent advances in large foundation models facilitate the creation of augmented ground truth masks using only features extracted by the denoised version of the DINOv2 foundation model and K-Nearest Neighbors (KNN), without any pre-training. For images with extremely sparse labels, we use human-in-the-loop principles to enhance annotation efficiency: if there are 5 point labels per image, our method outperforms the prior state-of-the-art by 19.7% for mIoU. When human-in-the-loop labeling is not available, using the denoised DINOv2 features with a KNN still improves on the prior state-of-the-art by 5.8% for mIoU (5 grid points). On the semantic segmentation task, we outperform the prior state-of-the-art by 13.5% for mIoU when only 5 point labels are used for point label propagation. Additionally, we perform a comprehensive study into the number and placement of point labels, and make several recommendations for improving the efficiency of labeling images with points.
comment: IEEE Journal of Oceanic Engineering accepted preprint of extended paper, 36 pages, 14 figures. Original conference paper (v2) accepted at the CVPR 2024 3rd Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU)
Look as You Leap: Planning Simultaneous Motion and Perception for High-DOF Robots
Most common tasks for robots in dynamic spaces require that the environment is regularly and actively perceived, with many of them explicitly requiring objects or persons to be within view, i.e., for monitoring or safety. However, solving motion and perception tasks simultaneously is challenging, as these objectives often impose conflicting requirements. Furthermore, while robots must react quickly to changes in the environment, directly evaluating the quality of perception (e.g., object detection confidence) is often expensive or infeasible at runtime. This problem is especially important in human-centered environments, such as homes and hospitals, where effective perception is essential for safe and reliable operation. In this work, we address the challenge of solving motion planning problems for high-degree-of-freedom (DoF) robots from a start to a goal configuration with continuous perception constraints under both static and dynamic environments. We propose a GPU-parallelized perception-score-guided probabilistic roadmap planner with a neural surrogate model (PS-PRM). Unlike existing active perception-, visibility-aware or learning-based planners, our work integrates perception tasks and constraints directly into the motion planning formulation. Our method uses a neural surrogate model to approximate perception scores, incorporates them into the roadmap, and leverages GPU parallelism to enable efficient online replanning in dynamic settings. We demonstrate that our planner, evaluated on high-DoF robots, outperforms baseline methods in both static and dynamic environments in both simulation and real-robot experiments.
comment: 20 pages, 13 figures, under review
Multiagent Systems
Agent Contracts: A Formal Framework for Resource-Bounded Autonomous AI Systems
The Contract Net Protocol (1980) introduced coordination through contracts in multi-agent systems. Modern agent protocols standardize connectivity and interoperability; yet, none provide formal, resource governance-normative mechanisms to bound how much agents may consume or how long they may operate. We introduce Agent Contracts, a formal framework that extends the contract metaphor from task allocation to resource-bounded execution. An Agent Contract unifies input/output specifications, multi-dimensional resource constraints, temporal boundaries, and success criteria into a coherent governance mechanism with explicit lifecycle semantics. For multi-agent coordination, we establish conservation laws ensuring delegated budgets respect parent constraints, enabling hierarchical coordination through contract delegation. Empirical validation across four experiments demonstrates 90% token reduction with 525x lower variance in iterative workflows, zero conservation violations in multi-agent delegation, and measurable quality-resource tradeoffs through contract modes. Agent Contracts provide formal foundations for predictable, auditable, and resource-bounded autonomous AI deployment.
Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits
We study a decentralized collaborative requesting problem that aims to optimize the information freshness of time-sensitive clients in edge networks consisting of multiple clients, access nodes (ANs), and servers. Clients request content through ANs acting as gateways, without observing AN states or the actions of other clients. We define the reward as the age of information reduction resulting from a client's selection of an AN, and formulate the problem as a non-stationary multi-armed bandit. In this decentralized and partially observable setting, the resulting reward process is history-dependent and coupled across clients, and exhibits both abrupt and gradual changes in expected rewards, rendering classical bandit-based approaches ineffective. To address these challenges, we propose the AGING BANDIT WITH ADAPTIVE RESET algorithm, which combines adaptive windowing with periodic monitoring to track evolving reward distributions. We establish theoretical performance guarantees showing that the proposed algorithm achieves near-optimal performance, and we validate the theoretical results through simulations.
When KV Cache Reuse Fails in Multi-Agent Systems: Cross-Candidate Interaction is Crucial for LLM Judges
Multi-agent LLM systems routinely generate multiple candidate responses that are aggregated by an LLM judge. To reduce the dominant prefill cost in such pipelines, recent work advocates KV cache reuse across partially shared contexts and reports substantial speedups for generation agents. In this work, we show that these efficiency gains do not transfer uniformly to judge-centric inference. Across GSM8K, MMLU, and HumanEval, we find that reuse strategies that are effective for execution agents can severely perturb judge behavior: end-task accuracy may appear stable, yet the judge's selection becomes highly inconsistent with dense prefill. We quantify this risk using Judge Consistency Rate (JCR) and provide diagnostics showing that reuse systematically weakens cross-candidate attention, especially for later candidate blocks. Our ablation further demonstrates that explicit cross-candidate interaction is crucial for preserving dense-prefill decisions. Overall, our results identify a previously overlooked failure mode of KV cache reuse and highlight judge-centric inference as a distinct regime that demands dedicated, risk-aware system design.
Autonomous Materials Exploration by Integrating Automated Phase Identification and AI-Assisted Human Reasoning
Autonomous experimentation holds the potential to accelerate materials development by combining artificial intelligence (AI) with modular robotic platforms to explore extensive combinatorial chemical and processing spaces. Such self-driving laboratories can not only increase the throughput of repetitive experiments, but also incorporate human domain expertise to drive the search towards user-defined objectives, including improved materials performance metrics. We present an autonomous materials synthesis extension to SARA, the Scientific Autonomous Reasoning Agent, utilizing phase information provided by an automated probabilistic phase labeling algorithm to expedite the search for targeted phase regions. By incorporating human input into an expanded SARA-H (SARA with human-in-the-loop) framework, we enhance the efficiency of the underlying reasoning process. Using synthetic benchmarks, we demonstrate the efficiency of our AI implementation and show that the human input can contribute to significant improvement in sampling efficiency. We conduct experimental active learning campaigns using robotic processing of thin-film samples of several oxide material systems, including Bi$_2$O$_3$, SnO$_x$, and Bi-Ti-O, using lateral-gradient laser spike annealing to synthesize and kinetically trap metastable phases. We showcase the utility of human-in-the-loop autonomous experimentation for the Bi-Ti-O system, where we identify extensive processing domains that stabilize $δ$-Bi$_2$O$_3$ and Bi$_2$Ti$_2$O$_7$, explore dwell-dependent ternary oxide phase behavior, and provide evidence confirming predictions that cationic substitutional doping of TiO$_2$ with Bi inhibits the unfavorable transformation of the metastable anatase to the ground-state rutile phase. The autonomous methods we have developed enable the discovery of new materials and new understanding of materials synthesis and properties.
comment: Main manuscript: 21 pages(including references), 6 figures. Supplementary Information: 12 pages, 9 figures, 1 table
Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay
Current multi-agent LLM frameworks rely on explicit orchestration patterns borrowed from human organizational structures: planners delegate to executors, managers coordinate workers, and hierarchical control flow governs agent interactions. These approaches suffer from coordination overhead that scales poorly with agent count and task complexity. We propose a fundamentally different paradigm inspired by natural coordination mechanisms: agents operate locally on a shared artifact, guided only by pressure gradients derived from measurable quality signals, with temporal decay preventing premature convergence. We formalize this as optimization over a pressure landscape and prove convergence guarantees under mild conditions. Empirically, on Latin Square constraint satisfaction across 1,078 trials, pressure-field coordination matches hierarchical control (38.2% vs 38.8% aggregate solve rate, p=0.94, indicating statistical equivalence). Both significantly outperform sequential (23.3%), random (11.7%), and conversation-based multi-agent dialogue (8.6%, p<0.00001). Temporal decay is essential: disabling it increases final pressure 49-fold (d=4.15). On easy problems, pressure-field achieves 87% solve rate. The approach maintains consistent performance from 2 to 32 agents. Our key finding: implicit coordination through shared pressure gradients achieves parity with explicit hierarchical control while dramatically outperforming explicit dialogue-based coordination. This suggests that constraint-driven emergence offers a simpler, equally effective foundation for multi-agent AI.
comment: 19 pages, 9 tables. Code available at https://github.com/Govcraft/latin-experiment
Incentivizing Multi-Tenant Split Federated Learning for Foundation Models at the Network Edge
Foundation models (FMs) such as GPT-4 exhibit exceptional generative capabilities across diverse downstream tasks through fine-tuning. Split Federated Learning (SFL) facilitates privacy-preserving FM fine-tuning on resource-constrained local devices by offloading partial FM computations to edge servers, enabling device-edge synergistic fine-tuning. Practical edge networks often host multiple SFL tenants to support diversified downstream tasks. However, existing research primarily focuses on single-tenant SFL scenarios, and lacks tailored incentive mechanisms for multi-tenant settings, which are essential to effectively coordinate self-interested local devices for participation in various downstream tasks, ensuring that each SFL tenant's distinct FM fine-tuning requirements (e.g., FM types, performance targets, and fine-tuning deadlines) are met. To address this gap, we propose a novel Price-Incentive Mechanism (PRINCE) that guides multiple SFL tenants to offer strategic price incentives, which solicit high-quality device participation for efficient FM fine-tuning. Specifically, we first develop a bias-resilient global SFL model aggregation scheme to eliminate model biases caused by independent device participation. We then derive a rigorous SFL convergence bound to evaluate the contributions of heterogeneous devices to FM performance improvements, guiding the incentive strategies of SFL tenants. Furthermore, we model inter-tenant device competition as a congestion game for Stackelberg equilibrium (SE) analysis, deriving each SFL tenant's optimal incentive strategy. Extensive simulations involving four representative SFL tenant types (ViT, BERT, Whisper, and LLaMA) across diverse data modalities (text, images, and audio) demonstrate that PRINCE accelerates FM fine-tuning by up to 3.07x compared to state-of-the-art approaches, while consistently meeting fine-tuning performance targets.
comment: Accepted for publication in IEEE/ACM Transactions on Networking. Index Terms: Foundation models, Edge computing, Split federated learning, Multi-tenant system, Incentive mechanism
A Computational Social Simulation of Ageing and Care Accessibility in Italian Inner Areas
Ageing societies face increasing strain on formal and informal care systems, particularly in low-density mountainous municipalities where sparse services and steep terrain constrain access. This study presents a spatially explicit agent-based model that integrates a road-network GIS, synthetic populations derived through Iterative Proportional Fitting, and behavioural heterogeneity to examine how alternative service configurations shape accessibility and caregiver burden. The model, applied to Premeno (Piedmont, Italy), compares a baseline distribution of ambulatory services with a relocation scenario at Villa Bernocchi. System-level indicators (Caregiver Effort, Overwhelmed Caregivers, Hours Not Cared, Walkability) and micro-spatial metrics (Walkability, Detour Ratio, Proximity) are analysed across 40 batches and 50 stochastic replications per scenario. Results reveal aggregate neutrality but pronounced local redistribution of accessibility. Sensitivity analysis shows that spatial impedance dominates accessibility, whereas behavioural capacity modulates care effort. The findings illustrate distinctive properties of complex adaptive social systems - emergence, heterogeneity, and feedback - demonstrating how computational social simulation can highlight policy trade-offs between spatial efficiency, social equity, and care sustainability in ageing territories.
comment: Comments: 18 pages, 2 figures, 6 tables; supplementary material included
Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification
Intelligent anomaly detection in dynamic visual environments requires reconciling real-time performance with semantic interpretability. Conventional approaches address only fragments of this challenge. Reconstruction-based models capture low-level deviations without contextual reasoning, object detectors provide speed but limited semantics, and large vision-language systems deliver interpretability at prohibitive computational cost. This work introduces a cascading multi-agent framework that unifies these complementary paradigms into a coherent and interpretable architecture. Early modules perform reconstruction-gated filtering and object-level assessment, while higher-level reasoning agents are selectively invoked to interpret semantically ambiguous events. The system employs adaptive escalation thresholds and a publish-subscribe communication backbone, enabling asynchronous coordination and scalable deployment across heterogeneous hardware. Extensive evaluation on large-scale monitoring data demonstrates that the proposed cascade achieves a threefold reduction in latency compared to direct vision-language inference, while maintaining high perceptual fidelity (PSNR = 38.3 dB, SSIM = 0.965) and consistent semantic labeling. The framework advances beyond conventional detection pipelines by combining early-exit efficiency, adaptive multi-agent reasoning, and explainable anomaly attribution, establishing a reproducible and energy-efficient foundation for scalable intelligent visual monitoring.
comment: Author email changed, Acknowlegement changes
VGC-Bench: Towards Mastering Diverse Team Strategies in Competitive Pokémon AAMAS 2026
Developing AI agents that can robustly adapt to varying strategic landscapes without retraining is a central challenge in multi-agent learning. Pokémon Video Game Championships (VGC) is a domain with a vast space of approximately $10^{139}$ team configurations, far larger than those of other games such as Chess, Go, Poker, StarCraft, or Dota. The combinatorial nature of team building in Pokémon VGC causes optimal strategies to vary substantially depending on both the controlled team and the opponent's team, making generalization uniquely challenging. To advance research on this problem, we introduce VGC-Bench: a benchmark that provides critical infrastructure, standardizes evaluation protocols, and supplies a human-play dataset of over 700,000 battle logs and a range of baseline agents based on heuristics, large language models, behavior cloning, and multi-agent reinforcement learning with empirical game-theoretic methods such as self-play, fictitious play, and double oracle. In the restricted setting where an agent is trained and evaluated in a mirror match with a single team configuration, our methods can win against a professional VGC competitor. We repeat this training and evaluation with progressively larger team sets and find that as the number of teams increases, the best-performing algorithm in the single-team setting has worse performance and is more exploitable, but has improved generalization to unseen teams. Our code and dataset are open-sourced at https://github.com/cameronangliss/vgc-bench and https://huggingface.co/datasets/cameronangliss/vgc-battle-logs.
comment: AAMAS 2026
StackPilot: Autonomous Function Agents for Scalable and Environment-Free Code Execution
Recent advances in large language models (LLMs) have substantially enhanced automated code generation across a wide range of programming languages. Nonetheless, verifying the correctness and executability of LLM-generated code remains a significant challenge, as traditional methods rely on language-specific compilers and environment-dependent runtimes. To overcome these limitations, we introduce StackPilot, an LLM-native, multi-agent framework designed for language-agnostic code verification and execution, which operates independently of conventional toolchains. StackPilot offers three principal innovations: (1) a Function-as-Agents paradigm, in which each function is modeled as an autonomous agent capable of fine-grained reasoning and collaborative verification; (2) an LLM-as-Executor strategy, which enables scalable verification via stack-based scheduling; and (3) a novel snapshot mechanism that preserves complete execution contexts, facilitating deterministic and lossless context switching during verification. Empirical evaluations demonstrate that StackPilot achieves framework reliability rates between 89% and 97%, substantially outperforming baseline approaches. These results indicate that StackPilot can reliably verify and execute a significantly larger proportion of LLM-generated code across diverse programming tasks compared to existing methods.
comment: This method needs to be reconsidered and there is something wrong with experiment
Systems and Control (EESS)
Grid-Aware Charging and Operational Optimization for Mixed-Fleet Public Transit SC
The rapid growth of urban populations and the increasing need for sustainable transportation solutions have prompted a shift towards electric buses in public transit systems. However, the effective management of mixed fleets consisting of both electric and diesel buses poses significant operational challenges. One major challenge is coping with dynamic electricity pricing, where charging costs vary throughout the day. Transit agencies must optimize charging assignments in response to such dynamism while accounting for secondary considerations such as seating constraints. This paper presents a comprehensive mixed-integer linear programming (MILP) model to address these challenges by jointly optimizing charging schedules and trip assignments for mixed (electric and diesel bus) fleets while considering factors such as dynamic electricity pricing, vehicle capacity, and route constraints. We address the potential computational intractability of the MILP formulation, which can arise even with relatively small fleets, by employing a hierarchical approach tailored to the fleet composition. By using real-world data from the city of Chattanooga, Tennessee, USA, we show that our approach can result in significant savings in the operating costs of the mixed transit fleets.
comment: 7 pages, 7 figures, 4 algorithms. Published in the Proceedings of the 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)
Improving the GMAW process through current control
A control strategy for the electrical current in GMAW processes is proposed. The control is in closed-loop, designed by formal methods, based on a mathematical model of the electrical behavior of the GMAW process, and implemented in C+ language in a microcontroller. The model consists of a switched equivalent electrical circuit whose parameters are obtained in a data-driven manner. The strategy is tested in numerous experiments with both manual and robot welding, showing improvements in the overall welding process.
AUV Trajectory Learning for Underwater Acoustic Energy Transfer and Age Minimization
Internet of underwater things (IoUT) is increasingly gathering attention with the aim of monitoring sea life and deep ocean environment, underwater surveillance as well as maintenance of underwater installments. However, conventional IoUT devices, reliant on battery power, face limitations in lifespan and pose environmental hazards upon disposal. This paper introduces a sustainable approach for simultaneous information uplink from the IoUT devices and acoustic energy transfer (AET) to the devices via an autonomous underwater vehicle (AUV), potentially enabling them to operate indefinitely. To tackle the time-sensitivity, we adopt age of information (AoI), and Jain's fairness index. We develop two deep-reinforcement learning (DRL) algorithms, offering a high-complexity, high-performance frequency division duplex (FDD) solution and a low-complexity, medium-performance time division duplex (TDD) approach. The results elucidate that the proposed FDD and TDD solutions significantly reduce the average AoI and boost the harvested energy as well as data collection fairness compared to baseline approaches.
Disturbance observer-based tracking control for roll-to-roll slot die coating systems under gap and pump rate disturbances
Roll-to-roll slot die coating is a widely used industrial manufacturing technique applied in a diverse range of applications such as the production of lithium-ion batteries, solar cells and optical films. The efficiency of roll-to-roll slot die coating depends on the precise control of various input parameters such as pump rate, substrate velocity and coating gap. However, these inputs are sensitive to disturbances in process conditions, leading to inconsistencies in the various characteristics of the produced film. To address this challenge, a \gls{DO} is utilized for detecting disturbances, which may occur in the same or different channels as the control signal within the system. A generalized compensator is then implemented to mitigate the impact of these disturbances on the output, thereby enhancing uncertainty suppression. Additionally, integrating the disturbance rejection system with an output tracking controller enables the coating system to maintain the desired thickness under varying input conditions and disturbances. The effectiveness of this approach is then validated using a test rig equipped with a camera system, which facilitates the development of a data-driven model of the dynamic process, represented by state-space equations. The simulation results were demonstrated to showcase the effectiveness of the DOBOTC system, which provides a resilient solution for the output tracking issue in a data-driven model with generalized disturbances.
comment: 11 pages, 13 figures
Current and temperature imbalances in parallel-connected grid storage battery modules
A key challenge with large battery systems is heterogeneous currents and temperatures in modules with parallel-connected cells. Although extreme currents and temperatures are detrimental to the performance and lifetime of battery cells, there is not a consensus on the scale of typical imbalances within grid storage modules. Here, we quantify these imbalances through simulations and experiments on an industrially representative grid storage battery module consisting of prismatic lithium iron phosphate cells, elucidating the evolution of current and temperature imbalances and their dependence on individual cell and module parameter variations. Using a sensitivity analysis, we find that varying contact resistances and cell resistances contribute strongly to temperature differences between cells, from which we define safety thresholds on cell-to-cell variability. Finally, we investigate how these thresholds change for different applications, to outline a set of robustness metrics that show how cycling at lower C-rates and narrower SOC ranges can mitigate failures.
Multiobjective Model Predictive Control for Residential Demand Response Management Under Uncertainty
Residential users in demand response programs must balance electricity costs and user dissatisfaction under real-time pricing. This study proposes a multiobjective model predictive control approach for home energy management systems with battery storage, aiming to minimize both objectives while mitigating uncertainties. Laguerre functions parameterize control signals, transforming the optimization problem into one with linear inequalities for efficient exploration. A constrained multiobjective evolutionary algorithm, incorporating convex sampler-based crossover and mutation, is developed to ensure feasible solutions. Simulations show that the proposed method outperforms existing approaches, limiting cost increases to 0.52\% under uncertainties, compared to at least 2.3\% with other methods.
comment: 29 pages, 5 figures
Large Language Models to Enhance Multi-task Drone Operations in Simulated Environments
Benefiting from the rapid advancements in large language models (LLMs), human-drone interaction has reached unprecedented opportunities. In this paper, we propose a method that integrates a fine-tuned CodeT5 model with the Unreal Engine-based AirSim drone simulator to efficiently execute multi-task operations using natural language commands. This approach enables users to interact with simulated drones through prompts or command descriptions, allowing them to easily access and control the drone's status, significantly lowering the operational threshold. In the AirSim simulator, we can flexibly construct visually realistic dynamic environments to simulate drone applications in complex scenarios. By combining a large dataset of (natural language, program code) command-execution pairs generated by ChatGPT with developer-written drone code as training data, we fine-tune the CodeT5 to achieve automated translation from natural language to executable code for drone tasks. Experimental results demonstrate that the proposed method exhibits superior task execution efficiency and command understanding capabilities in simulated environments. In the future, we plan to extend the model functionality in a modular manner, enhancing its adaptability to complex scenarios and driving the application of drone technologies in real-world environments.
comment: 1st International Conference on Drones and Unmanned Systems (DAUS' 2025)
Data-Driven Time-Limited h2 Optimal Model Reduction for Linear Discrete-Time Systems
This paper develops a data-driven h2 model reduction method for discrete-time linear time-invariant systems. Specifically, we solve the h2 model reduction problem defined over a finite horizon using only impulse response data. Furthermore, we show that the proposed data-driven algorithm converges to a stationary point under certain assumptions. Numerical experiments demonstrate that the proposed method constructs a good reduced-order model in terms of the h2 norm defined over the finite horizon using a SLICOT benchmark (the CD player model).
Blockchain-Enabled Renewable Energy Certificate Trading: A Secure and Privacy-Preserving Approach
In the 21st century, transitioning to renewable energy sources is imperative, with fossil fuel reserves depleting rapidly and recognizing critical environmental issues such as climate change, air pollution, water pollution, and habitat destruction. Embracing renewable energy is not only an environmental necessity but also a strategic move with multiple benefits. By shifting to renewable energy sources and supporting their production through the acquisition of renewable energy certificates, we foster innovation and drive economic growth in the renewable energy sector. This, in turn, reduces greenhouse gas emissions, aligning with global efforts to mitigate climate change. Additionally, renewable energy certificates ensure compliance with regulations that mandate the use of renewable energy, enhancing legal adherence while promoting transparency and trust in energy sourcing. To monitor the uptake of renewable energy, governments have implemented Renewable Energy Certificates (RECs) as a tracking mechanism for the production and consumption of renewable energy. However, there are two main challenges to the existing REC schema: 1) The RECs have not been globally adopted due to inconsistent design; 2) The consumer privacy has not been well incorporated in the design of blockchain. In this study, we investigate the trading of RECs between suppliers and consumers using the directed acyclic graph (DAG) blockchain system and introduce a trading schema to help protect consumer information. Our results demonstrate lower transaction time by 41\% and energy consumption by 65\% compared to proof-of-stake.
comment: 26 pages, 7 figures
Minimal Actuator Selection
Selecting a few available actuators to ensure the controllability of a linear system is a fundamental problem in control theory. Previous works either focus on optimal performance, simplifying the controllability issue, or make the system controllable under structural assumptions, such as in graphs or when the input matrix is a design parameter. We generalize these approaches to offer a precise characterization of the general minimal actuator selection problem where a set of actuators is given, described by a fixed input matrix, and goal is to choose the fewest actuators that make the system controllable. We show that this problem can be equivalently cast as an integer linear program and, if actuation channels are sufficiently independent, as a set multicover problem under multiplicity constraints. The latter equivalence is always true if the state matrix has all distinct eigenvalues, in which case it simplifies to the set cover problem. Such characterizations hold even when a robust selection that tolerates a given number of faulty actuators is desired. Our established connection legitimates a designer to use algorithms from the rich literature on the set multicover problem to select the smallest subset of actuators, including exact solutions that do not require brute-force search.
comment: This work has been submitted to the IEEE for possible publication
On Robust Fixed-Time Stabilization of the Cauchy Problem in Hilbert Spaces
This paper presents finite-time and fixed-time stabilization results for inhomogeneous abstract evolution problems, extending existing theories. We prove well-posedness for strong and weak solutions, and estimate upper bounds for settling times for both homogeneous and inhomogeneous systems. We generalize finite-dimensional results to infinite-dimensional systems and demonstrate partial state stabilization with actuation on a subset of the domain. The interest of these results are illustrated through an application of a heat equation with memory term.
Large Artificial Intelligence Model Guided Deep Reinforcement Learning for Resource Allocation in Non Terrestrial Networks
Large AI Model (LAM) have been proposed to applications of Non-Terrestrial Networks (NTN), that offer better performance with its great generalization and reduced task specific trainings. In this paper, we propose a Deep Reinforcement Learning (DRL) agent that is guided by a Large Language Model (LLM). The LLM operates as a high level coordinator that generates textual guidance that shape the reward of the DRL agent during training. The results show that the LAM-DRL outperforms the traditional DRL by 40% in nominal weather scenarios and 64% in extreme weather scenarios compared to heuristics in terms of throughput, fairness, and outage probability.
Hybrid Centralized Distributed Control for Lifelong MAPF over Wireless Connections
In lifelong multi-agent path finding (MAPF) with many robots, unreliable wireless links and stochastic executions are the norm. Existing approaches typically either rely on centralized planning under idealized communication, or run fully distributed local controllers with fixed communication patterns; they rarely couple communication scheduling with policy learning, and thus struggle when bandwidth is scarce or packets are frequently dropped. We address this joint control--communication problem and propose a hybrid centralized--distributed scheme: a centralized cloud policy sends small residual corrections only when selected, while a lightweight on-board Gated recurrent unit (GRU) policy provides a safe default fallback when wireless connection is not available.
comment: 11pages, 9 figures
Research on Mechanical Properties and Deformation-Fracture Energy Consumption Characteristics of Plateau Frozen Rocks
The exploitation of mineral resources in plateau regions is confronted with critical challenges including low blasting efficiency, excessive energy consumption,and compromised operational safety when dealing with low-temperature water-bearing frozen rock masses.This study systematically investigates the dynamic-static mechanical properties,deformation-fracture behaviors,and energy consumption characteristics of plateau frozen sandstone under the coupled effects of temperature and moisture content (5%-15%).The research methodology integrates field sampling, low-pressure low-temperature simulation tests, graded impact loading tests, and numerical inversion analysis. Results demonstrate that freezing significantly enhances the dynamic strength and brittleness of saturated sandstone.The pore structure undergoes substantial evolution with decreasing temperature, with the porosity increasing by 63.15%.Based on PFC3D microscopic simulations, the mechanism of frost heave damage and the regulatory effect of water-ice phase transition on rock mechanical behaviors are elucidated.A quantitative analysis method for energy dissipation is proposed, revealing that the energy absorption increment of frozen rocks is higher than that of room-temperature samples.The findings provide a theoretical basis and technical support for optimizing blasting parameters, realizing directional energy release,and promoting green construction of frozen rock masses in high-altitude areas.
Memetic Covariance Matrix Adaptation Evolution Strategy for Bilinear Matrix Inequality Problems in Control System Design
Bilinear Matrix Inequalities (BMIs) are fundamental to control system design but are notoriously difficult to solve due to their nonconvexity. This study addresses BMI-based control optimization problems by adapting and integrating advanced evolutionary strategies. Specifically, a memetic Covariance Matrix Adaptation Evolution Strategy (memetic CMA-ES) is proposed, which incorporates a local refinement phase via a (1+1)-CMA-ES within the global search process. While these algorithmic components are established in evolutionary computing, their tailored integration and specific tuning for control design tasks represent a novel application in this context. Experimental evaluations on $H_{\infty}$ controller synthesis and spectral abscissa optimization demonstrate that the proposed method achieves superior performance compared to existing BMI solvers in terms of both solution quality and robustness. This work bridges the gap between evolutionary computation and control theory, providing a practical and effective approach to tackling challenging BMI-constrained problems.
comment: 24 pages, 5 figures
Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies
Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that effectively reduce importance sampling variance. We show that existing noise-expectation and gradient-expectation methods are two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and enables the principled combination of Q-value and Q-gradient information to derive an optimal, minimum-variance estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL, and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.
Modal Parameter Extraction via Propeller-Driven Vibration Testing
Ground Vibration Testing (GVT) supports aircraft certification but often requires lengthy and costly campaigns. Propeller-driven Vibration Testing (PVT) is assessed here as an output-only alternative, in line with Operational Modal Analysis approaches such as Taxi Vibration Testing and Flight Vibration Testing. A cantilever Aluminium 7075-T6 wing spar is instrumented with seven accelerometers and excited by an outboard electric motor and propeller. Seven runs are carried out: a motor-off baseline, five constant-throttle cases, and a manual up-down throttle sweep. The acquired spectra indicate that the dominant resonances remain observable under propeller excitation, while low-throttle conditions introduce narrowband harmonics that may mask structural peaks; the sweep reduces persistent overlap. Modal parameters are identified for the baseline and sweep cases using the Natural Excitation Technique with the Loewner Framework (NExT-LF). The first two modes remain closely matched (Modal Assurance Criterion (MAC) > 0.99), whereas the third mode shows reduced repeatability (MAC = 0.827) and a larger frequency shift, consistent with propeller-induced bending--torsion coupling and non-ideal sweep control. Overall, PVT provides a viable complement to GVT for extracting low-frequency modal information and motivates pursuing future work on automated throttle scheduling and coupling-aware test planning.
Coordinated Cooling and Compute Management for AI Datacenters
The AI datacenters are currently being deployed on a large scale to support the training and deployment of power-intensive large-language models (LLMs). Extensive amount of computation and cooling required in datacenters increase concerns about the energy use and carbon emissions of AI datacenters. Although current state-of-the-art has examined the energy efficiency of LLM inference, most prior research focused on optimizing compute-side scheduling without considering thermal objectives or constraints. Since GPU-intensive inference generates substantial heat that can degrade datacenter performance, ignoring thermal effects can increase total energy consumption and reduce the efficiency of LLM serving. To fill this gap, we profile the characteristics of GPU servers under varying cooling and AI jobs, and develop a joint cooling and computing modeling approach for AI datacenters. Built upon such workload and thermal dynamics models, a novel hierarchical control framework is proposed to co-optimize computing and thermal management by identifying the optimal GPU parallelism, frequency (DVFS), and cooling control knobs. Using real Azure inference traces and detailed GPU profiling, our model balances serving latency and thermal constraints in AI datacenters while significantly improving AI datacenters' energy efficiency.
comment: In Submission, 12 pages, 8 pages
Efficient Incremental SLAM via Information-Guided and Selective Optimization
We present an efficient incremental SLAM back-end that achieves the accuracy of full batch optimization while substantially reducing computational cost. The proposed approach combines two complementary ideas: information-guided gating (IGG) and selective partial optimization (SPO). IGG employs an information-theoretic criterion based on the log-determinant of the information matrix to quantify the contribution of new measurements, triggering global optimization only when a significant information gain is observed. This avoids unnecessary relinearization and factorization when incoming data provide little additional information. SPO executes multi-iteration Gauss-Newton (GN) updates but restricts each iteration to the subset of variables most affected by the new measurements, dynamically refining this active set until convergence. Together, these mechanisms retain all measurements to preserve global consistency while focusing computation on parts of the graph where it yields the greatest benefit. We provide theoretical analysis showing that the proposed approach maintains the convergence guarantees of full GN. Extensive experiments on benchmark SLAM datasets show that our approach consistently matches the estimation accuracy of batch solvers, while achieving significant computational savings compared to conventional incremental approaches. The results indicate that the proposed approach offers a principled balance between accuracy and efficiency, making it a robust and scalable solution for real-time operation in dynamic data-rich environments.
STO-RL: Offline RL under Sparse Rewards via LLM-Guided Subgoal Temporal Order AAMAS 2026
Offline reinforcement learning (RL) enables policy learning from pre-collected datasets, avoiding costly and risky online interactions, but it often struggles with long-horizon tasks involving sparse rewards. Existing goal-conditioned and hierarchical offline RL methods decompose such tasks and generate intermediate rewards to mitigate limitations of traditional offline RL, but usually overlook temporal dependencies among subgoals and rely on imprecise reward shaping, leading to suboptimal policies. To address these issues, we propose STO-RL (Offline RL using LLM-Guided Subgoal Temporal Order), an offline RL framework that leverages large language models (LLMs) to generate temporally ordered subgoal sequences and corresponding state-to-subgoal-stage mappings. Using this temporal structure, STO-RL applies potential-based reward shaping to transform sparse terminal rewards into dense, temporally consistent signals, promoting subgoal progress while avoiding suboptimal solutions. The resulting augmented dataset with shaped rewards enables efficient offline training of high-performing policies. Evaluations on four discrete and continuous sparse-reward benchmarks demonstrate that STO-RL consistently outperforms state-of-the-art offline goal-conditioned and hierarchical RL baselines, achieving faster convergence, higher success rates, and shorter trajectories. Ablation studies further confirm STO-RL's robustness to imperfect or noisy LLM-generated subgoal sequences, demonstrating that LLM-guided subgoal temporal structures combined with theoretically grounded reward shaping provide a practical and scalable solution for long-horizon offline RL.
comment: Accepted at International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems
We examine stability properties of primal-dual gradient flow dynamics for composite convex optimization problems with multiple, possibly nonsmooth, terms in the objective function under the generalized consensus constraint. The proposed dynamics are based on the proximal augmented Lagrangian and they provide a viable alternative to ADMM which faces significant challenges from both analysis and implementation viewpoints in large-scale multi-block scenarios. In contrast to customized algorithms with individualized convergence guarantees, we develop a systematic approach for solving a broad class of challenging composite optimization problems. We leverage various structural properties to establish global (exponential) convergence guarantees for the proposed dynamics. Our assumptions are much weaker than those required to prove (exponential) stability of primal-dual dynamics as well as (linear) convergence of discrete-time methods such as standard two-block and multi-block ADMM and EXTRA algorithms. Finally, we show necessity of some of our structural assumptions for exponential stability and provide computational experiments to demonstrate the convenience of the proposed approach for parallel and distributed computing applications.
comment: 32 pages; 4 figures
Adaptive Scheduling: A Reinforcement Learning Whittle Index Approach for Wireless Sensor Networks
We propose a reinforcement learning based scheduling framework for Restless Multi-Armed Bandit (RMAB) problems, centred on a Whittle Index Q-Learning policy with Upper Confidence Bound (UCB) exploration, referred to as WIQL-UCB. Unlike existing approaches that rely on fixed or adaptive epsilon-greedy strategies and require careful hyperparameter tuning, the proposed method removes problem-specific tuning and is therefore more generalisable across diverse RMAB settings. We evaluate WIQL-UCB on standard RMAB benchmarks and on a practical sensor scheduling application based on the Age of Incorrect Information (AoII), using an edge-based state estimation scheme that requires no prior knowledge of system dynamics. Experimental results show that WIQL-UCB achieves near-optimal performance while significantly improving computational and memory efficiency. For a representative problem size of N = 15 and M = 3, the proposed method requires only around 600 bytes of memory, compared with several kilobytes for tabular Q-learning and hundreds of kilobytes to megabytes for deep reinforcement learning baselines. In addition, WIQL-UCB achieves sub-millisecond per-decision runtimes and is several times faster than deep reinforcement learning approaches, while maintaining competitive performance. Overall, these results demonstrate that WIQL-UCB consistently outperforms both non-Whittle-based and Whittle-index learning baselines across a wide range of RMAB settings.
comment: 18 pages, 13 figures, 2 Tables, 1 algorithm
An Adaptive Neuro-Controller Developed for a Prosthetic Hand Wrist
The significance of employing a controller in prosthetic hands cannot be overstated, as it plays a crucial role in enhancing the functionality and usability of these systems. This paper introduces an adaptive neuro-controller specifically developed for a tendon-driven soft continuum wrist of a prosthetic hand. Kinematic and dynamic modeling of the wrist is carried out using the Timoshenko beam theory. A Neural Network (NN) based strategy is adopted to predict the required motor currents to manipulate the wrist tendons from the errors in the deflection of the wrist section. The Timoshenko beam theory is used to compute the required tendon tension from the input motor current. A comparison of the adaptive neuro-controller with other similar controllers is conducted to analyze the performance of the proposed approach. Simulation studies and experimental validations of the fabricated wrist are included to demonstrate the effectiveness of the controller.
A predictive modular approach to constraint satisfaction under uncertainty -- with application to glycosylation in continuous monoclonal antibody biosimilar production
The paper proposes a modular-based approach to constraint handling in process optimization and control. This is partly motivated by the recent interest in learning-based methods, e.g., within bioproduction, for which constraint handling under uncertainty is a challenge. The proposed constraint handler, called predictive filter, is combined with an adaptive constraint margin and a constraint violation cost monitor to minimize the cost of violating soft constraints due to model uncertainty and disturbances. The module can be combined with any controller and is based on minimally modifying the controller output, in a least squares sense, such that constraints are satisfied within the considered horizon. The proposed method is computationally efficient and suitable for real-time applications. The effectiveness of the method is illustrated through a realistic case study of glycosylation constraint satisfaction in continuous monoclonal antibody biosimilar production using Chinese hamster ovary cells, employing a metabolic network model consisting of 23 extracellular metabolites and 126 reactions. In the case study, the average constraint-violation cost is reduced by more than 60% compared to the case without the proposed constraint-handling method.
Symbolic Learning of Interpretable Reduced-Order Models for Jumping Quadruped Robots
Reduced-order models are central to motion planning and control of quadruped robots, yet existing templates are often hand-crafted for a specific locomotion modality. This motivates the need for automatic methods that extract task-specific, interpretable low-dimensional dynamics directly from data. We propose a methodology that combines a linear autoencoder with symbolic regression to derive such models. The linear autoencoder provides a consistent latent embedding for configurations, velocities, accelerations, and inputs, enabling the sparse identification of nonlinear dynamics (SINDy) to operate in a compact, physics-aligned space. A multi-phase, hybrid-aware training scheme ensures coherent latent coordinates across contact transitions. We focus our validation on quadruped jumping-a representative, challenging, yet contained scenario in which a principled template model is especially valuable. The resulting symbolic dynamics outperform the state-of-the-art handcrafted actuated spring-loaded inverted pendulum (aSLIP) baseline in simulation and hardware across multiple robots and jumping modalities.
comment: 8 pages
Analysis, detection and control of secure and safe cyber-physical control systems in a unified framework
This paper deals with analysis, simultaneous detection of faults and attacks, fault-tolerant control and attack-resilient of cyber-physical control systems. In our recent work, it has been observed that an attack detector driven by an input residual signal is capable of reliably detecting attacks. In particular, observing system dynamics from the perspective of the system input-output signal space reveals that attacks and system uncertainties act on different system subspaces. These results motivate our exploration of secure and safe cyber-physical control systems in the unified framework of control and detection. The unified framework is proposed to handle control and detection issues uniformly and in subspaces of system input-output data. Its mathematical and control-theoretic basis is system coprime factorizations with Bezout identity at its core. We firstly explore those methods and schemes of the unified framework, which serve as the major control-theoretic tool in our work. It is followed by re-visiting and examining established attack detection and resilient control schemes. The major part of our work is the endeavours to develop a control-theoretic paradigm, in which analysis, simultaneous detection of faults and attacks, fault-tolerant and attack-resilient control of cyber-physical control systems are addressed in a unified manner.
Time-causal and time-recursive wavelets
This paper presents a framework for time-causal wavelet analysis. It targets real-time processing of temporal signals, where data from the future are not available. The study builds upon temporal scale-space theory, originating from a complete classification of temporal smoothing kernels that guarantee non-creation of new structures from finer to coarser temporal scale levels. We construct temporal wavelets from the temporal derivatives of a special time-causal smoothing kernel, referred to as the time-causal limit kernel, as arising from the classification of variation-diminishing smoothing transformations with the complementary requirement of temporal scale covariance, to guarantee self-similar handling of structures in the input signal at different temporal scales. This enables decomposition of the signal into different components at different scales, while adhering to temporal causality. The paper establishes theoretical foundations for these time-causal wavelet representations, and maps structural relationships to the non-causal Ricker or Mexican hat wavelets. We also describe how efficient discrete approximations of the presented theory can be performed in terms of first-order recursive filters coupled in cascade, which enables numerically well-conditioned real-time processing with low resource usage. We characterize and quantify how the continuous scaling properties transfer to the discrete implementation, demonstrating how the proposed time-causal wavelet representation can reflect the duration of locally dominant temporal structures in the input signal.
comment: 28 pages, 11 figures
Semi-Tensor-Product Based Convolutional Neural Networks
The semi-tensor product of vectors generalizes the conventional inner product, enabling algebraic operations between vectors of different dimensions. Building upon this foundation, we introduce a domain-based convolutional product and integrate it with the STP to formulate a padding-free convolutional operation. This new operation inherently avoids zero or other artificial padding, thereby eliminating redundant information and boundary artifacts commonly present in conventional convolutional neural networks. Based on this operation, we further develop an STP-based CNN framework that extends convolutional computation to irregular and cross-dimensional data domains. Applications to image processing and third-order signal identification demonstrate the proposed method's effectiveness in handling irregular, incomplete, and high-dimensional data without the distortions caused by padding.
Adaptive Model-Based Reinforcement Learning for Orbit Feedback Control in NSLS-II Storage Ring
The National Synchrotron Light Source II (NSLS-II) uses highly stable electron beam to produce high-quality X-ray beams with high brightness and low-emittance synchrotron radiation. The traditional algorithm to stabilize the beam applies singular value decomposition (SVD) on the orbit response matrix to remove noise and extract actions. Supervised learning has been studied on NSLS-II storage ring stabilization and other accelerator facilities recently. Several problems, for example, machine status drifting, environment noise, and non-linear accelerator dynamics, remain unresolved in the SVD-based and supervised learning algorithms. To address these problems, we propose an adaptive training framework based on model-based reinforcement learning. This framework consists of two types of optimizations: trajectory optimization attempts to minimize the expected total reward in a differentiable environment, and online model optimization learns non-linear machine dynamics through the agent-environment interaction. Through online training, this framework tracks the internal status drifting in the electron beam ring. Simulation and real in-facility experiments on NSLS-II reveal that our method stabilizes the beam position and minimizes the alignment error, defined as the root mean square (RMS) error between adjusted beam positions and the reference position, down to ~1$μ$m.
comment: Accepted by the 20th International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS 2025)
zkSTAR: A zero knowledge system for time series attack detection enforcing regulatory compliance in critical infrastructure networks
Industrial control systems (ICS) form the operational backbone of critical infrastructure networks (CIN) such as power grids, water supply systems, and gas pipelines. As cyber threats to these systems escalate, regulatory agencies are imposing stricter compliance requirements to ensure system-wide security and reliability. A central challenge, however, is enabling regulators to verify the effectiveness of detection mechanisms without requiring utilities to disclose sensitive operational data. In this paper, we introduce zkSTAR, a cyberattack detection framework that leverages zk-SNARKs to reconcile these requirements and enable provable detection guarantees while preserving data confidentiality. Our approach builds on established residual-based statistical hypothesis testing methods applied to state-space detection models. Specifically, we design a two-pronged zk-SNARK architecture that enforces (i) temporal consistency of the state-space dynamics and (ii) statistical consistency of the detection tests, enabling regulators to verify correctness and prevent suppression of alarms without visibility into utility-level data. We formally analyze the soundness and zero-knowledge properties of our framework and validate its practical feasibility through computational experiments on real-world ICS datasets. As a result, our work demonstrates a scalable, privacy-preserving alternative for regulatory compliance for ICS driven critical infrastructure networks.
From learning to safety: A Direct Data-Driven Framework for Constrained Control
Ensuring safety in the sense of constraint satisfaction for learning-based control is a critical challenge, especially in the model-free case. While safety filters address this challenge in the model-based setting by modifying unsafe control inputs, they typically rely on predictive models derived from physics or data. This reliance limits their applicability for advanced model-free learning control methods. To address this gap, we propose a new optimization-based control framework that determines safe control inputs directly from data. The benefit of the framework is that it can be updated through arbitrary model-free learning algorithms to pursue optimal performance. As a key component, the concept of direct data-driven safety filters (3DSF) is first proposed. The framework employs a novel safety certificate, called the state-action control barrier function (SACBF). We present three different schemes to learn the SACBF. Furthermore, based on input-to-state safety analysis, we present the error-to-state safety analysis framework, which provides formal guarantees on safety and recursive feasibility even in the presence of learning inaccuracies. The proposed control framework bridges the gap between model-free learning-based control and constrained control, by decoupling performance optimization from safety enforcement. Simulations on vehicle control illustrate the superior performance regarding constraint satisfaction and task achievement compared to model-based methods and reward shaping.
Integration Matters for Learning PDEs with Backward SDEs NeurIPS 2025
Backward stochastic differential equation (BSDE)-based deep learning methods provide an alternative to Physics-Informed Neural Networks (PINNs) for solving high-dimensional partial differential equations (PDEs), offering potential algorithmic advantages in settings such as stochastic optimal control, where the PDEs of interest are tied to an underlying dynamical system. However, standard BSDE-based solvers have empirically been shown to underperform relative to PINNs in the literature. In this paper, we identify the root cause of this performance gap as a discretization bias introduced by the standard Euler-Maruyama (EM) integration scheme applied to one-step self-consistency BSDE losses, which shifts the optimization landscape off target. We find that this bias cannot be satisfactorily addressed through finer step-sizes or multi-step self-consistency losses. To properly handle this issue, we propose a Stratonovich-based BSDE formulation, which we implement with stochastic Heun integration. We show that our proposed approach completely eliminates the bias issues faced by EM integration. Furthermore, our empirical results show that our Heun-based BSDE method consistently outperforms EM-based variants and achieves competitive results with PINNs across multiple high-dimensional benchmarks. Our findings highlight the critical role of integration schemes in BSDE-based PDE solvers, an algorithmic detail that has received little attention thus far in the literature.
comment: NeurIPS 2025